Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add pod lifecycle event generator for kubelet #12802

Merged
merged 1 commit into from Feb 2, 2016

Conversation

yujuhong
Copy link
Contributor

No description provided.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 96d3e5462e5d292c76efcea9fad174790323a0f1.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit bf40ff2d4d6f8ded560161ead6027118e18204a8.

@brendandburns
Copy link
Contributor

cc @brendandburns

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 00fcbb013a76f4b689c11fce236be0cf61337134.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 6d21bdcaba1b5b3d2b90f20af23ee3b931924a06.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test failed for commit 763719084c5c777e746b5afaea51ffe87594681d.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 919dbf60516ebbd831a1a0d38e2dc99bce63d345.

- Probing: Now that pod workers only wake up on events, we need to
rethink how liveness/readiness probing should be implemented. One
solution is to create a separate probing component that can be
plugged in to PLEG to generate corresponding pod lifecylce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. I like the idea of a plugin to present the probe outcomes as events.

@ironcladlou
Copy link
Contributor

This would also enhance the apparent responsiveness of UIs in some cases (openshift/origin#3781). As just one example in OpenShift, a deployment's outcome is tied closely to the status of a Pod, and by proxy its containers; but no matter how fast the deployment container returns, the apparent running time can be inflated artificially by up to 10 seconds depending on where we land in the pod polling.


### Bootstrapping/restart

Upon restart, PLEG needs to perform a reslit to retrieve all container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate this is kubelet restart case here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reslit/relist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker restart is another case we have to detect and handle here too. You need to detect event_stream is disconnect. When docker is starting up, it might send you tons of events of old containers.

@dchen1107
Copy link
Member

@yujuhong On the way back home, I realized that cadvisor only support 4 types events: containerCreation, containerDeletion, OOM, and OOMKilled, and none of them are the key event you want or sufficient enough for you to make any decision:

  • containerCreation doesn't mean we can start a container.
  • containerDeletion is only fired when container GC removes dead container
  • The only termination event you have is OOMKilled when reach cgroup's memory limit
  • When sys oom and kill a random process, you won't have container information. This is known issue in cAdvisor, but I don't think it is easy to fix.
  • Other failures cause a termination of container, there is no event generated.

From above, you can see we pretty much cannot rely on cAdvisor's event stream at all, and I don't think there is an easy to add those missing events to cadvisor. Here is the code defined event:
https://github.com/google/cadvisor/blob/2a022a4a74f4f994c1ae7684de7655aaa0a898ba/info/v1/container.go#L521

Let's just stick on docker events for now. For rkt case, we could invoke relist operation all the time until they have event stream support.

cc/ @kubernetes/goog-node

@dchen1107 dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 20, 2015
@thereallukl
Copy link

Adding such strong dependency on cadvisor to provide information about lifecycle might block adding new container technologies in the future ( one might want to add for example lxc, qemu, ) and dependency for those drivers on something that is just metering component does not sound very good.

That being said I prefer option 1 for source of PLEG events ( Each runtime provides events by itself).

@vishh
Copy link
Contributor

vishh commented Aug 20, 2015

I believe the intention is to abstract out container runtimes using the cadvisor interface. Since cadvisor is already trying to handle container runtimes as first class entities, it seems natural to use it as an abstraction layer.
+1 to @dchen1107's arguments against using raw cadvisor events. May be we can extend cadvisor to support docker events?

@yujuhong
Copy link
Contributor Author

This would also enhance the apparent responsiveness of UIs in some cases (openshift/origin#3781). As just one example in OpenShift, a deployment's outcome is tied closely to the status of a Pod, and by proxy its containers; but no matter how fast the deployment container returns, the apparent running time can be inflated artificially by up to 10 seconds depending on where we land in the pod polling.

This should definitely improve the response latency provided that kubelet is not already overwhelmed.

@dchen1107
Copy link
Member

May be we can extend cadvisor to support docker events?

I am not sure this is a good idea though, but could be convinced. Docker is one implementation of container runtime, rkt is another one. To extend cadvisor to support docker event, we have to complicate cadvisor to make it composable too.

@vishh
Copy link
Contributor

vishh commented Aug 20, 2015

We will have to abstract out events from multiple container runtimes in
kubelet anyways. I assume your argument is whether cAdvisor is the right
abstraction layer?

On Thu, Aug 20, 2015 at 11:35 AM, Dawn Chen notifications@github.com
wrote:

May be we can extend cadvisor to support docker events?
I am not sure this is a good idea though, but could be convinced. Docker
is one implementation of container runtime, rkt is another one. To extend
cadvisor to support docker event, we have to complicate cadvisor to make it
composable too.


Reply to this email directly or view it on GitHub
#12802 (comment)
.

@yujuhong
Copy link
Contributor Author

Some clarifications since we (me, @vishh and @dchen1107) discussed this offline :-)

@yujuhong On the way back home, I realized that cadvisor only support 4 types events: containerCreation, containerDeletion, OOM, and OOMKilled, and none of them are the key event you want or sufficient enough for you to make any decision:

containerCreation doesn't mean we can start a container.

containerCreation means that we tried to start a container. It may fail to start because of other reasons (image, etc), but we'll receive a containerDeletion in those cases.

@vishh brought up a valid point that this would make kubelet interprets "container starts" differently today. I agree with that, but personally I think this does not affect how kubelet works internally, and is not that critical.

containerDeletion is only fired when container GC removes dead container
The only termination event you have is OOMKilled when reach cgroup's memory limit
When sys oom and kill a random process, you won't have container information. This is known issue in cAdvisor, but I don't think it is easy to fix.
Other failures cause a termination of container, there is no event generated.

The conclusion from the discussion is that cAdvisor would send out a Deletion event if a container terminates because cgroups cleanup upon pid 1 exiting (thanks @vishh for pointing that out). I can verify this since I have tested my prototype by docker stop containers. kubelet reacts to that immediately.

From above, you can see we pretty much cannot rely on cAdvisor's event stream at all, and I don't think there is an easy to add those missing events to cadvisor. Here is the code defined event:
https://github.com/google/cadvisor/blob/2a022a4a74f4f994c1ae7684de7655aaa0a898ba/info/v1/container.go#L521

Let's just stick on docker events for now. For rkt case, we could invoke relist operation all the time until they have event stream support.

@yujuhong
Copy link
Contributor Author

We will have to abstract out events from multiple container runtimes in
kubelet anyways. I assume your argument is whether cAdvisor is the right
abstraction layer?

I agree that we need an abstraction for events across different container runtimes.

cadvisor provides the bare-minimum event stream for kubelet to work, and we can move faster with it for now. That's why I chose cadvisor as a starting point. I think it's worth giving it a try since it should be easy to swap out the container event watcher.

That said, if the consensus is to use docker event, I'd be okay with that.

@yifan-gu
Copy link
Contributor

I think we need to define the event stream interface in kubelet. Whether we implement it using cadvisor or docker event stream should not matter.

ContainerStopped PodLifeCycleEventType = "ContainerStopped"
NetworkSetupCompleted PodLifeCycleEventType = "NetworkSetupCompleted"
NetworkFailed PodLifeCycleEventType = "NetworkFailed"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the issue #10288, I think you should add another high level PodLifeCycleEventType "ContainerDied" otherwise we will have the same problem ,even with the your new system.
Considering what events docker is capable of firing ( cf. https://docs.docker.com/reference/api/docker_remote_api_v1.17/#monitor-docker-s-events ) this PodLifeCycleEventType won't be inexact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejemba this list is not complete, but it includes all the types we use for now. We can augment this whenever we need :)

@zhengguoyong
Copy link
Contributor

👍

@dchen1107
Copy link
Member

@yujuhong Can you update this based on our current roadmap, design and implemention.

@k8s-bot
Copy link

k8s-bot commented Dec 2, 2015

GCE e2e test build/test passed for commit 919dbf60516ebbd831a1a0d38e2dc99bce63d345.

@k8s-bot
Copy link

k8s-bot commented Jan 12, 2016

GCE e2e build/test failed for commit 4faba2a3957276992366ecf02969ff57cb7f87af.

@yujuhong yujuhong force-pushed the kubelet_proposal branch 2 times, most recently from 501ced3 to f3f3d6f Compare January 12, 2016 18:16
@yujuhong
Copy link
Contributor Author

@yujuhong Can you update this based on our current roadmap, design and implemention.

@dchen1107, I updated the proposal. Please take a look, thanks!

@k8s-bot
Copy link

k8s-bot commented Jan 12, 2016

GCE e2e test build/test passed for commit 501ced30a43a23e34227b5868affce0447d01a80.

@k8s-bot
Copy link

k8s-bot commented Jan 12, 2016

GCE e2e test build/test passed for commit b8b532b.

@eparis
Copy link
Contributor

eparis commented Feb 1, 2016

@k8s-bot test this issue: #IGNORE

Tests have been pending for 24 hours

@k8s-bot
Copy link

k8s-bot commented Feb 2, 2016

GCE e2e test build/test passed for commit b8b532b.

@dchen1107
Copy link
Member

LGTM

@dchen1107 dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Feb 2, 2016
@k8s-github-robot k8s-github-robot merged commit 635cf58 into kubernetes:master Feb 2, 2016
@yujuhong yujuhong deleted the kubelet_proposal branch February 23, 2016 19:46
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet