Pod state / status #1146

New issue

Closed

Pod state / status#1146

Labels

kind/design

thockin

Member

We have a PodStatus type, which is a string, with a few values defined. People will start programming to those strings. We should define a state machine that governs what causes state changes. We did this internally and it was a great help in reasoning about what happens.

If nobody balks, I can port over internal ideas into a design doc for dicsussion.

added

Member

Yes, I totally support this. When I implemented Restart Policy support #1147, and think about other features, such as garbage collection, events, etc. I couldn't help to wish we have a state machine defined, and feel that several states are missed from our PodStatus, for example, Error / Failure state. I even started to code with a state machine based on our internal design for Restart. :-)

smarterclayton

Contributor

lavalamp

Member

@thockin There was some discussion about this when I added the binding object.

The summary is, we will have only the three states we have now. Additional states will come in the form of clarification of those main three states, in another field (PodStateReason or some such).

We're trying to avoid a monolithic state machine like the one we have internally. This approach will make it easier to have states that not all components need to understand.

@bgrant0607

dchen1107

mentioned this

on Sep 3, 2014

Introduce the simplest RestartPolicy and handling. #1147

bgrant0607

Member

The discussion @lavalamp mentioned is here: #557 (comment)

The reasons why a pod might not be running need to be extensible, as must the reasons why a pod was terminated.

thockin

MemberAuthor

Hmm, I'm not very satisfied with just those 3 states.

Internally (on the node side) we defined a fairly simple state machine in
which each state very clearly defined what assumptions can and can not be
made about an entity. We find it very useful when writing and reading code
and docs, once you understood the states.

ACCEPTED: The request to create the entity has been accepted, and a
placeholder for it was created. The entity does not yet have presence in
actual machine state.

PREPARING: Machine state is actively being changed to instantiate the
entity.

ACTIVE: The entity is ready to be used.

SUCCEEDED: The entity has completed. No extra effort to remove machine
state is being made.

FAILED: The entity has failed. No extra effort to remove machine state is
being made.

CLEANING: Machine state is actively being changed to remove the entity.

DEFUNCT: An error was encountered while CLEANING. No extra effort to
remove machine state is being made.

CLEANED: The entity no longer has any presence in actual machine state.

We could probably flatten ACCEPTED and PREPARING. These states should be
enough to make decisions about workflows, to tell users unambiguously what
is happening, and to allow code to manage sub-objects. We use this state
machine for the equivalent of pods and for teh pieces that make them up
(e.g. volumes, containers, etc).

On Wed, Sep 3, 2014 at 1:29 PM, bgrant0607 notifications@github.com wrote:

The discussion @lavalamp https://github.com/lavalamp mentioned is here: #557
(comment)
#557 (comment)

The reasons why a pod might not be running need to be extensible, as must
the reasons why a pod was terminated.

Reply to this email directly or view it on GitHub
#1146 (comment)
.

lavalamp

Member

So in the k8s model, we'd have

state/reason:
waiting/ACCEPTED
waiting/PREPARING
running/ACTIVE.
terminated/SUCCEEDED
terminated/FAILED
terminated/CLEANING
terminated/DEFUNCT
terminated/CLEANED

@bgrant0607 how are we supposed to model containers that are being restarted? Substate under running?

thockin

MemberAuthor

Well, my point was that the distinct states provide useful information that
is obscured by lumping them all together. I'd call restarting (e.g. after
a crash) "ACTIVE".

On Thu, Sep 4, 2014 at 12:16 AM, Daniel Smith notifications@github.com
wrote:

So in the k8s model, we'd have

state/reason:
waiting/ACCEPTED
waiting/PREPARING
running/ACTIVE.
terminated/SUCCEEDED
terminated/FAILED
terminated/CLEANING
terminated/DEFUNCT
terminated/CLEANED

@bgrant0607 https://github.com/bgrant0607 how are we supposed to model
containers that are being restarted? Substate under running?

Reply to this email directly or view it on GitHub
#1146 (comment)
.

bgrant0607

Member

Is this just internal Kubelet state, or would it be exposed in Kubelet and/or apiserver APIs?

I'm skeptical of the utility of such fine-grain states in a flat state space, and I think it's a big mistake to expose such a thing in the API. The probability that we'll need to add new states in the future as we add functionality and components is 100% and the likelihood that it would be feasible to extend the set of states is 0%. Every piece of code that looked at the state field would almost certainly need to be changed. The problem is that states within a flat state space have no inherent, inferable semantics, and that resulting code that looks at such states is brittle.

Is a new state a new reason for not-yet-running, a new sort-of-running, or a new termination reason? What about restarting? Different forms of suspended? Waiting for activation by an external workflow, by an auto-sizer filling in resource values, or by a gang scheduler scheduling another pod? How about LOST? Does deliberately stopped count as FAILED? What is the state of a durable pod while the machine is down or unreachable? Is ACCEPTED admitted by the apiserver, or by the Kubelet? What state is the pod in during execution of PreStart, PreStop, or PostStop lifecycle hooks? While failing liveness and/or readiness probes?

Simple states have value to humans who want a high-level understanding of what's happening. The states are of questionable value to our components, however.

I advocate that additional useful information should be conveyed by other fields, such as some kind of one-word reason field and some kind of more verbose message (e.g., "image foo/bar download from registry xyz.io failed"), other fields specific to the activity (e.g., assigned host in the case of scheduling), time the state was last reported/collected, etc.

I'm fine with renaming RUNNING to ACTIVE.

thockin

MemberAuthor

On Thu, Sep 4, 2014 at 8:56 AM, bgrant0607 notifications@github.com wrote:

Is this just internal Kubelet state, or would it be exposed in Kubelet and/or apiserver APIs?

I'm skeptical of the utility of such fine-grain states in a flat state space, and I think it's a big mistake to expose such a thing in the API. The probability that we'll need to add new states in the future as we add functionality and components is 100% and the likelihood that it would be feasible to extend the set of states is 0%. Every piece of code that looked at the state field would almost certainly need to be changed. The problem is that states within a flat state space have no inherent, inferable semantics, and that resulting code that looks at such states is brittle.

I completely agree that a fine-grained state machine is a terrible API

the states here are very vague, by design. What they tell is simply
what you can assume about the object. it does not reasons or details
for state changes, just very blanket statements about what is safe to
assume and what is not.

Is a new state a new reason for not-yet-running, a new sort-of-running, or a new termination reason? What about restarting? Different forms of suspended? Waiting for activation by an external workflow, by an auto-sizer filling in resource values, or by a gang scheduler scheduling another pod? How about LOST? Does deliberately stopped count as FAILED? What is the state of a durable pod while the machine is down or unreachable? Is ACCEPTED admitted by the apiserver, or by the Kubelet? What state is the pod in during execution of PreStart, PreStop, or PostStop lifecycle hooks? While failing liveness and/or readiness probes?

Good questions, though many of them apply equally to what we call
PodStatus today. I'll asnwer these with a caveat that the states
described here were node-centric, so maybe need to be adjusted for
cluster-centric use.

To the first part, none of the above are represented (they are all
"reasons" not states).

I'd say "lost" (we expected a pod to be present but it was not) is FAILED.

Stopped is interesting - at the node level we call it FAILED, I think,
but that could be distinct. We only recently split ENDED into
SUCCEEDED and FAILED, and maybe we shouldn't.

If the machine disappears for longer that $timeout, pods become FAILED.

ACCEPTED means the apiserver has ACKed the transaction.

PreStart is new (we did not have events yet) and I would say PREPARING
(because the main control has not started).

PreStop is ACTIVE (the main application is in control)

PostStop is "ENDED" (either SUCCEEDED or FAILED) because the main
application can not be assumed to be alive.

Failing probes is still ACTIVE (the main application is in control)

Simple states have value to humans who want a high-level understanding of what's happening. The states are of questionable value to our components, however.

With simple semantics, you can know whether, for example, it is
reasonable to wait for cleanup to complete or whether it is in trouble
(historically there are a lot of things that can go wrong during
cleanup).

I advocate that additional useful information should be conveyed by other fields, such as some kind of one-word reason field and some kind of more verbose message (e.g., "image foo/bar download from registry xyz.io failed"), other fields specific to the activity (e.g., assigned host in the case of scheduling), time the state was last reported/collected, etc.

image down load failed - will it be retried or not? Relying on
strings to carry information like that in a consistent way is horrible
and fragile.

I'm willing to invest more time in this, thinking and discussing how
cluster semantics change this model (which I literally cut and pasted
from internal docs on the 'let), but only if we really think there's a
modicum of utility here. I find the current 3 states to be overly
broad, vaguely defined, and almost useless.

I'm fine with renaming RUNNING to ACTIVE.

This is not about names :) [If anything, RUNNING is better, since it
follows the ING and ED distinction, but some of the objects we
describe with this state machine are not actually "running" so much as
"present". Hence ACTIVE.]

Reply to this email directly or view it on GitHub.

bgrant0607

Member

@thockin I'm supportive of adding more fine-grained reasons, but they shouldn't be considered an exhaustive, finite enumerated list. We WILL need to add more fine distinctions in the future. I'm also fine with keeping the state internally and not exposing it in the APIs.

I could easily imagine inserting a stage before ACCEPTED, for instance, for instances pending an external admission control check.

Treating deliberate stops as failures would be misinterpreted by many tools/systems. And, as described in #137, there isn't a bounded number of reasons why a pod might be deliberately stopped.

My point about the lifecycle hooks was that there will be components that want to know about them, and that the Kubelet itself will need to keep track of whether they have been executed or not. Ditto with failing probes, unresponsive hosts, and several other examples. So, how does one decide what to elevate to states and what not to elevate as states? I suggest we don't elevate fine distinctions to top-level status, but remain open-ended and flexible about reporting the finer distinctions.

image load failed: Not intended to be an API. I intended it to be the cause of a failure (not retried at that level). Maybe it was a bad example since it could relate to retry logic, as discussed in #941 and #1088.

thockin

MemberAuthor

On Thu, Sep 4, 2014 at 1:28 PM, bgrant0607 notifications@github.com wrote:

@thockin I'm supportive of adding more fine-grained reasons, but they
shouldn't be considered an exhaustive, finite enumerated list. We WILL need
to add more fine distinctions in the future. I'm also fine with keeping the
state internally and not exposing it in the APIs.

IMO, reasons should be orthogonal to state, if we expose state this way.

I could easily imagine inserting a stage before ACCEPTED, for instance, for
instances pending an external admission control check.

I'd argue against it - the new state does not offer any semantic value
to a user, does it? If it has not been ACCEPTED into the system,
should it even be queryable for state?

Treating deliberate stops as failures would be misinterpreted by many
tools/systems. And, as described in #137, there isn't a bounded number of
reasons why a pod might be deliberately stopped.

Yeah, I could see STOPPED as a peer of SUCCEEDED and FAILED.

My point about the lifecycle hooks was that there will be components that
want to know about them, and that the Kubelet itself will need to keep track
of whether they have been executed or not. Ditto with failing probes,
unresponsive hosts, and several other examples. So, how does one decide what
to elevate to states and what not to elevate as states? I suggest we don't
elevate fine distinctions to top-level status, but remain open-ended and
flexible about reporting the finer distinctions.

I think I agree, if I understand. These states are not fine
distinctions, they simply define coarse semantic differences which can
be used to understand what can and can not be assumed. They are
intentionally vague about details, so as to leave any implementation
lots of room to implement. If, internally, kubelet or apiserver
wanted to track finer states, that is perfectly valid, as long as
those internal states each map to exactly one external state.

For example, during task setup internally, we have a number of states
as we install packages and start the process. But those all report
externally as PREPARING. Ditto teardown and SUCCEEDED/FAILED

image load failed: Not intended to be an API. I intended it to be the cause
of a failure (not retried at that level). Maybe it was a bad example since
it could relate to retry logic, as discussed in #941 and #1088.

Reply to this email directly or view it on GitHub.

bgrant0607

Member

@thockin "reasons should be orthogonal to state": You mean that the clients shouldn't also need to look at state in order to interpret reason? I agree with that.

State before ACCEPTED: Accepted by apiserver or by kubelet? An example of where it would be useful to expose a pre-accepted state would be in order to await action by a budget-checking service. Anyway, I don't want to rathole on this. It was just one example.

STOPPED: STOPPED, SUCCEEDED, and FAILED are all flavors of TERMINATED -- no longer active. We should capture and report more detailed termination reasons, exit codes, etc.

While I can see why Kubelet might want to keep track of them internally, I can't imagine of cases where CLEANING, DEFUNCT, and CLEAN are going to be significant to API clients or even to cluster-level management components. This is why I consider them "fine distinctions". I didn't say that reasons/subStates/stateDetails could have no semantics, only that not everything needed to understand them, since they'd have the option of looking at the coarse states instead.

ACCEPTED and PREPARING are similar -- there's also another way a client could distinguish these states, which was by whether the host info (host, hostIP, etc.) were initialized or not.

With respect to what a client can assume: The apiserver will necessarily have stale information, and the client will have even-more-stale information. If the client thought, for example, that the state were PREPARING, does that mean that the pod is not yet running? Of course not.

I propose we keep high-level states, waiting (or pending), active (was running), and terminated, and push anything else into Reason.

What I currently see in type Status seems pretty reasonable:

type Status struct {
    JSONBase `json:",inline" yaml:",inline"`
    // One of: "success", "failure", "working" (for operations not yet completed)
    Status string `json:"status,omitempty" yaml:"status,omitempty"`
    // A human-readable description of the status of this operation.
    Message string `json:"message,omitempty" yaml:"message,omitempty"`
    // A machine-readable description of why this operation is in the
    // "failure" or "working" status. If this value is empty there
    // is no information available. A Reason clarifies an HTTP status
    // code but does not override it.
    Reason StatusReason `json:"reason,omitempty" yaml:"reason,omitempty"`
    // Extended data associated with the reason.  Each reason may define its
    // own extended details. This field is optional and the data returned
    // is not guaranteed to conform to any schema except that defined by
    // the reason type.
    Details *StatusDetails `json:"details,omitempty" yaml:"details,omitempty"`
    // Suggested HTTP return code for this status, 0 if not set.
    Code int `json:"code,omitempty" yaml:"code,omitempty"`
}

brendandburns

added this to the v0.5 milestone

on Sep 9, 2014

9 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

kind/design

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

Add a description of pod states in kubernetes.kubernetes/kubernetes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pod state / status #1146

9 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Pod state / status #1146

Description

Activity

dchen1107 commented on Sep 3, 2014

smarterclayton commented on Sep 3, 2014

lavalamp commented on Sep 3, 2014

bgrant0607 commented on Sep 3, 2014

thockin commented on Sep 4, 2014

lavalamp commented on Sep 4, 2014

thockin commented on Sep 4, 2014

bgrant0607 commented on Sep 4, 2014

thockin commented on Sep 4, 2014

bgrant0607 commented on Sep 4, 2014

thockin commented on Sep 6, 2014

bgrant0607 commented on Sep 8, 2014

9 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions