Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate Phase and simplify Conditions #7856

Closed
bgrant0607 opened this issue May 6, 2015 · 53 comments
Closed

Eliminate Phase and simplify Conditions #7856

bgrant0607 opened this issue May 6, 2015 · 53 comments
Labels
area/api Indicates an issue on api area. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture.
Projects

Comments

@bgrant0607
Copy link
Member

Forked from #6951.

This is also discussed #1899 (comment) and elsewhere.

Users and developers apparently think of phases as states in a state machine, regardless of how much we try to dissuade them:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/api-conventions.md#spec-and-status

This interpretation conflicts with the declarative, level-based, observation-driven design, as well as limiting evolution. In particular, the phase should be derivable by observing other state and shouldn't require durable persistence for correctness.

In my experience, the danger to system extensibility is significant, and results from assumptions baked into clients.

Phases aren't themselves properties or conditions of their objects, but clients inevitably infer properties from them.

For example, a client might try to determine whether a pod is active, or whether it has reached a permanent, terminal state, such as with a switch statement, as follows:

switch pod.Status.Phase {
case api.PodPending:
    // not active yet
    ...
case api.PodRunning:
    // pod active
    ...
case api.PodSucceeded:
    fallthrough
case api.PodFailed:
    // terminated and won't re-activate
    ...
case api.PodUnknown:
    // WTF!
    ...
default:
    glog.Fatalf("unexpected phase: %s", pod.Status.Phase)
}

However, let's say someone wanted to add a new Phase where the pod would also be active (containers running/restarting, etc.), as in #6951. That would mean that every client/component that made decisions based on the phase would need to also then consider the new phase.

Enums aren't extensible. Every addition is a breaking, non-backward-compatible API change. We could create a new major API version for every such addition, but that would be very expensive and disruptive. At some point, it will just become too painful to add new phases, and then we'll be left with an unprincipled distinction between phases and non-phases. Creating all imaginable phases up front would not only create a lot of complexity prematurely, but would also almost certainly not be correct.

Conditions are more extensible since the addition of new conditions doesn't (shouldn't) invalidate decisions based on existing conditions, and are also better suited to reflect conditions/properties that may toggle back and forth and/or that may not be mutually exclusive.

Rather than distributing logic that combines multiple conditions or other properties for common inferences, such as whether the scheduler may schedule new pods to a node, we should expose those properties/conditions explicitly, for similar reasons as why we explicitly return fields containing default values.

Proposal:

  1. Add LastTransitionTime, Reason, and Message to PodCondition (similar to NodeCondition)
  2. Add new Condition types (e.g., PodCondition and NodeCondition), as needed. At minimum, we need to be able to identify whether the pod has reached a terminal condition or not. We could add a Terminated condition for that. The Reason could distinguish Succeeded from failed -- any reason other than Succeeded would be considered a failure. We have at least a couple dozen such failure reasons internally.
  3. Add Conditions to whatever other objects currently have Phase but not Conditions
  4. Eliminate *.Phase and PodStatus.Message from the v1 API

cc @smarterclayton @derekwaynecarr @thockin @timothysc @davidopp @markturansky

@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/api Indicates an issue on api area. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels May 6, 2015
This was referenced May 6, 2015
@dchen1107
Copy link
Member

This is part of 1.0 though.

@bgrant0607
Copy link
Member Author

Not part of 1.0, you mean.

@lavalamp
Copy link
Member

lavalamp commented May 6, 2015

Add LastTransitionTime, Reason, and Message to PodCondition (similar to NodeCondition)

Just thinking out loud: could we auto-populate this from events filed about the object?

@derekwaynecarr
Copy link
Member

@lavalamp I thought from prior discussions we did not want to use events as a reliable messaging system to derive status

@lavalamp
Copy link
Member

lavalamp commented May 7, 2015

@derekwaynecarr I think it's a bit different if we're doing it in the master as opposed to joining events & objects externally.

@davidopp
Copy link
Member

davidopp commented May 7, 2015

I've mentioned my opinion this before, but to recap...

I don't have a problem with what is being suggested here, but I think we should recognize the tradeoff. Everything I've ever read about job management, or heard from users, suggests that users are going to reason about the state of their pods (and jobs) as being part of a state machine. So, we have a choice: we can make the state machine explicit, or make the user synthesize one from the Conditions, or do something in between (e.g. provide the user with a library that synthesizes states from the Conditions, or have kubectl do it).

Brian described good arguments for pushing this onto the user. The counter-argument is that we're infrastructure providers, and the way we provide value is by deciding what abstractions will be useful for users, and more generally tackling these kinds of difficult problems instead of pushing them onto the user. If we're completely un-opionated about everything (how to reason about your job state, how to deal with workers of a batch job, etc.), then we're not providing as much value as we could.

I think the sweet spot might be the layered approach, i.e. the server works the way Brian described, and then we build a library that synthesizes states (and use that library inside kubectl to print states, as well as allowing users to use it). But I definitely feel that users will want a state machine abstraction.

@thockin
Copy link
Member

thockin commented May 7, 2015

I know I am contrarian here, but I don't think we will ever prevent users
from thinking of things in terms of state machines - it's a natural mental
model. And I don't think that's bad.

I agree that enumerated states are not extensible, but I still argue that
there exists an abstract state machine that every such object naturally
goes through that is vague enough to never be extended (or very very
rarely) but still provides value and the mental model that users are
seeking. It can be observed, not persisted, and probably simply derives
from some combination of Conditions. As such it is not strictly required
that we even call it out, but it will emerge! If we're clever we can make
the state machine similar or even identical across many objects.

Anyway, orthogonal Conditions is really what we agreed on months ago.
What's not clear to me is how someone is supposed to know what the ABSENCE
of a Condition means. Pulling from the load-balancer conversation - we now
have async external LB assignment and it was suggested that maybe a Service
should be not-ready until Endpoints is non-empty. So how do I denote that
Service.Status.Phase is not-ready? Do I put a Pending condition on it that
EndpointsController has to clear? But there are two things that both need
to be true to be not-not-ready (LB and endpoints). Or do I put a Ready
condition that EndpointsController has to set? But there are still two
things that need to be true. Or do we put 2 orthogonal Conditions? How
does a user determine from 2 orthogonal Conditions that a Service is
"ready"? Didn't we just define a "state" that users have to know about a
priori?

On Wed, May 6, 2015 at 3:20 PM, Daniel Smith notifications@github.com
wrote:

Add LastTransitionTime, Reason, and Message to PodCondition (similar to
NodeCondition)

Just thinking out loud: could we auto-populate this from events filed
about the object?


Reply to this email directly or view it on GitHub
#7856 (comment)
.

@lavalamp
Copy link
Member

lavalamp commented May 7, 2015

Since we're enumerating cons, let me add one additional reason to retain some sort of pre-cached state in status: it makes it easy for clients to set a fieldSelector and watch for their desired state.

@thockin
Copy link
Member

thockin commented May 7, 2015

Quick writeup of something Leo (another Borg person) put on the table.

Instead of something like a single field "Phase" being one-of (Pending,
Running, Succeeded, Failed), turn it 90 degrees. Make 4 fields "Pending",
"Running", "Succeeded", "Failed" each of which is of type "outlook" (his
word). An outlook is one-of (Yes, Eventually, Never).

The nugget of clever here is that mostly people want an answer to a
question - "is this pod running?" It's much easier to comprehend this
three-state answer than to encode a state-diagram into your code. It's
also easier to extend. Suppose, hypothetically, we want to add a state for
"scheduled but not yet acknowledged by kubelet". With Phase/State we have
to update everyone who knows the diagram. With this new idea, you just add
one field. Any code that was waiting for a pod to be running still works
just fine.

I suppose this could be modeled in Conditions. E.g. scheduler patches
Status to include a "scheduled" condition, the apiserver realizes this is a
significant moment and adds a "scheduled-not-acked" condition. Kubelet
eventually patches Status to include an "acked" condition and the apiserver
clears "scheduled-not-acked". The outlook approach is strictly more
informational because it indicates future possibilities - "will this pod
ever be running?" which makes life easier for users...

There's a nugget of clever here...

On Wed, May 6, 2015 at 5:34 PM, Daniel Smith notifications@github.com
wrote:

Since we're enumerating cons, let me add one additional reason to retain
some sort of pre-cached state in status: it makes it easy for clients to
set a fieldSelector and watch for their desired state.


Reply to this email directly or view it on GitHub
#7856 (comment)
.

@bgrant0607
Copy link
Member Author

A "state" abstraction pushes the decision logic onto the user, as shown by my switch statement.

@bgrant0607
Copy link
Member Author

@thockin Yes, the 4 fields is similar to my conditions proposal, just more ad hoc and with less information.

@bgrant0607
Copy link
Member Author

I proposed more possible Conditions, analogous to Leo's proposal, here: #6951 (comment)

@bgrant0607
Copy link
Member Author

@lavalamp We need to extend field selectors to be able to select on conditions. That's a subset of #1362.

@thockin
Copy link
Member

thockin commented May 7, 2015

I meant that these would work WITH conditions not instead of, in which case
they offer more information. Conditions could be added by any entity (are
they purely additive or can one entity set and another clear?) and would
carry richer details. The outlooks would be curated and small in number,
for programmatic consumption, derived (primarily?) from Conditions.

On Wed, May 6, 2015 at 8:09 PM, Brian Grant notifications@github.com
wrote:

@thockin https://github.com/thockin Yes, the 4 fields is similar to my
conditions proposal, just more ad hoc and with less information.


Reply to this email directly or view it on GitHub
#7856 (comment)
.

@bgrant0607
Copy link
Member Author

Conditions aren't annotations. They can't be added by just any entity. They should be curated.

@bgrant0607
Copy link
Member Author

And conditions are for programmatic consumption. For instance, the Ready condition is used to remove pods from Endpoints.

@thockin
Copy link
Member

thockin commented May 7, 2015

Have you worked out the mapping between Unkown, Pending, Running, Failed,
Success and conditions?

I still don't feel like I understand how to handle the case of a Service
being Pending while LB and Endpoints setup are happening.

On Wed, May 6, 2015 at 8:54 PM, Brian Grant notifications@github.com
wrote:

And conditions are for programmatic consumption. For instance, the Ready
condition is used to remove pods from Endpoints.


Reply to this email directly or view it on GitHub
#7856 (comment)
.

@bgrant0607
Copy link
Member Author

Running would be replaced by the Active condition (True/False): #7868 (comment).

Above I proposed a Terminated condition. The Reason associated with that condition would indicate Succeeded vs. Failed, as described above.

Pending could be derived from not Active and not Terminated for older API versions. I'm not sure we need a more explicit condition for it, since most scenarios check for Active or Terminated (or their complements), though I suggested Initialized (if we explicitly registered initializers) and Instantiated (essentially !Pending) here: #6951 (comment)

I'm also not sure we need Unknown. That would just be the desired condition (whichever that is) is not present. We shouldn't ever be returning Unknown currently, I don't think.

I'll look at the LB case again.

@bgrant0607
Copy link
Member Author

Why can't the service just have a Ready condition? Because no controller is responsible for updating its status currently? Pods have multiple preconditions from multiple components: they must be scheduled, and their images must be pulled.

@lavalamp
Copy link
Member

lavalamp commented May 7, 2015

Can we get a "deleting" (or deleted) condition? Can we get it for all objects?

@bgrant0607
Copy link
Member Author

Deleting would be analogous to Vacating. I'm not opposed to a condition for that, though as discussed on #6951 it could be inferred uniformly for all objects from metadata.

Whether we want to make at least certain conditions uniform for all objects is another issue. As discussed in #1899 (comment), #6487, and elsewhere, orchestrators on top of Kubernetes want certain uniform conditions, such as fully initialized (so the resource can be read), fully active (so creation can be considered successful and/or dependent entities can be created), and terminated (so entities they depend on can be deleted and/or entities dependent on them can be GCed).

@bgrant0607
Copy link
Member Author

An example where the current PodPhase Running was found to be confusing: #7868.

An example where "Pending" was confusing, or at least insufficiently informative on its own: openshift/origin#2048.

@thockin
Copy link
Member

thockin commented May 8, 2015

Running would be replaced by the Active condition (True/False)

I assume that "Active" is amalgamated from some number of other Conditions (scheduled, installed, ...)? As a consumer of this API, some of the things I will want to do are "wait until pod is running" or "wait until pod is terminated". If we publish a small set of very core conditions, these questions could be answered by blocking until the conditions exist. But this is sort of a trap - blocking until a pod is Active is wrong - it might never become active. I need to block until it is Active or Terminated. This is a place where outlooks are a better API. I can tell just by looking at the outlook field for Active whether I should keep waiting.

I've come around to getting rid of Phase, but I want to comprehend the details of the replacement.

@Lawouach
Copy link

Hello,

I was folowing #7856 (comment) and I wondered if conditions were still meant to be removed at some point or could we bet on them as a durable public interface to be consumed from?

Thanks,

@lavalamp
Copy link
Member

Conditions are not going to be removed.

I think the best interface around this is a two-step process:

  • The ground truth is set as conditions by the components that are nearby, e.g. kubelet sets "disk pressure = true".
  • The set of conditions is summarized into phases (or secondary conditions) for consumption by general controllers. E.g., if there is disk pressure, that can be aggregated into conditionSummary.schedulable: false. The process doing this needs to know all possible gound truth statements; the process of summarizing them isolates the rest of the cluster from needing to know this. You can think of this as a potential place to insert policy opinions, as well.

The latter step is not implemented consistently anywhere.

@vllry
Copy link
Contributor

vllry commented Jun 18, 2019

@lavalamp what are the next steps?

@clux
Copy link

clux commented Aug 1, 2019

If .status.conditions is the good solution to use.. how should controller writers proceed when designing and updating conditions?

Just simple stuff like trying to PATCH an individual .status.conditions entry seems impossible due to the vector structure:

  • merge patch won't work (merging a vector would blat other entries)
  • json patch doesn't support this either afaikt (cannot replace array entry by field name) - see rfc6902
  • strategic merge patch support does not look like it's even planned for custom resources kubernetes#58414

@lavalamp
Copy link
Member

lavalamp commented Aug 1, 2019

@clux The plan is to use serverside apply. (hopefully beta in this upcoming release)

@vllry regularizing our current API would be a monumental undertaking :/ so, I am not sure if there is a practical next step other than "make new things do it the right way, wait for the mythical v2 to fix existing things" :/

@lavalamp
Copy link
Member

I am closing this issue:

  • We will never be able to eliminate phase, certainly not in Kubernetes v1.
  • We recently merged new condition guidance.
  • It's been well over a year since the last comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture.
Projects
Workloads
  
Done
Development

No branches or pull requests