New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: add pod lifecycle event generator for kubelet #12802
Conversation
GCE e2e build/test passed for commit 96d3e5462e5d292c76efcea9fad174790323a0f1. |
bf40ff2
to
00fcbb0
Compare
GCE e2e build/test passed for commit bf40ff2d4d6f8ded560161ead6027118e18204a8. |
GCE e2e build/test passed for commit 00fcbb013a76f4b689c11fce236be0cf61337134. |
00fcbb0
to
6d21bdc
Compare
GCE e2e build/test passed for commit 6d21bdcaba1b5b3d2b90f20af23ee3b931924a06. |
6d21bdc
to
7637190
Compare
7637190
to
919dbf6
Compare
GCE e2e build/test failed for commit 763719084c5c777e746b5afaea51ffe87594681d. |
GCE e2e build/test passed for commit 919dbf60516ebbd831a1a0d38e2dc99bce63d345. |
- Probing: Now that pod workers only wake up on events, we need to | ||
rethink how liveness/readiness probing should be implemented. One | ||
solution is to create a separate probing component that can be | ||
plugged in to PLEG to generate corresponding pod lifecylce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool. I like the idea of a plugin to present the probe outcomes as events.
This would also enhance the apparent responsiveness of UIs in some cases (openshift/origin#3781). As just one example in OpenShift, a deployment's outcome is tied closely to the status of a Pod, and by proxy its containers; but no matter how fast the deployment container returns, the apparent running time can be inflated artificially by up to 10 seconds depending on where we land in the pod polling. |
|
||
### Bootstrapping/restart | ||
|
||
Upon restart, PLEG needs to perform a reslit to retrieve all container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indicate this is kubelet restart case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reslit/relist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker restart is another case we have to detect and handle here too. You need to detect event_stream is disconnect. When docker is starting up, it might send you tons of events of old containers.
@yujuhong On the way back home, I realized that cadvisor only support 4 types events: containerCreation, containerDeletion, OOM, and OOMKilled, and none of them are the key event you want or sufficient enough for you to make any decision:
From above, you can see we pretty much cannot rely on cAdvisor's event stream at all, and I don't think there is an easy to add those missing events to cadvisor. Here is the code defined event: Let's just stick on docker events for now. For rkt case, we could invoke relist operation all the time until they have event stream support. cc/ @kubernetes/goog-node |
Adding such strong dependency on cadvisor to provide information about lifecycle might block adding new container technologies in the future ( one might want to add for example lxc, qemu, ) and dependency for those drivers on something that is just metering component does not sound very good. That being said I prefer option 1 for source of PLEG events ( Each runtime provides events by itself). |
I believe the intention is to abstract out container runtimes using the cadvisor interface. Since cadvisor is already trying to handle container runtimes as first class entities, it seems natural to use it as an abstraction layer. |
This should definitely improve the response latency provided that kubelet is not already overwhelmed. |
I am not sure this is a good idea though, but could be convinced. Docker is one implementation of container runtime, rkt is another one. To extend cadvisor to support docker event, we have to complicate cadvisor to make it composable too. |
We will have to abstract out events from multiple container runtimes in On Thu, Aug 20, 2015 at 11:35 AM, Dawn Chen notifications@github.com
|
Some clarifications since we (me, @vishh and @dchen1107) discussed this offline :-)
@vishh brought up a valid point that this would make kubelet interprets "container starts" differently today. I agree with that, but personally I think this does not affect how kubelet works internally, and is not that critical.
The conclusion from the discussion is that cAdvisor would send out a Deletion event if a container terminates because cgroups cleanup upon pid 1 exiting (thanks @vishh for pointing that out). I can verify this since I have tested my prototype by
|
I agree that we need an abstraction for events across different container runtimes. cadvisor provides the bare-minimum event stream for kubelet to work, and we can move faster with it for now. That's why I chose cadvisor as a starting point. I think it's worth giving it a try since it should be easy to swap out the container event watcher. That said, if the consensus is to use docker event, I'd be okay with that. |
I think we need to define the event stream interface in kubelet. Whether we implement it using cadvisor or docker event stream should not matter. |
ContainerStopped PodLifeCycleEventType = "ContainerStopped" | ||
NetworkSetupCompleted PodLifeCycleEventType = "NetworkSetupCompleted" | ||
NetworkFailed PodLifeCycleEventType = "NetworkFailed" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the issue #10288, I think you should add another high level PodLifeCycleEventType
"ContainerDied" otherwise we will have the same problem ,even with the your new system.
Considering what events docker is capable of firing ( cf. https://docs.docker.com/reference/api/docker_remote_api_v1.17/#monitor-docker-s-events ) this PodLifeCycleEventType
won't be inexact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejemba this list is not complete, but it includes all the types we use for now. We can augment this whenever we need :)
👍 |
@yujuhong Can you update this based on our current roadmap, design and implemention. |
GCE e2e test build/test passed for commit 919dbf60516ebbd831a1a0d38e2dc99bce63d345. |
919dbf6
to
4faba2a
Compare
GCE e2e build/test failed for commit 4faba2a3957276992366ecf02969ff57cb7f87af. |
501ced3
to
f3f3d6f
Compare
f3f3d6f
to
b8b532b
Compare
@dchen1107, I updated the proposal. Please take a look, thanks! |
GCE e2e test build/test passed for commit 501ced30a43a23e34227b5868affce0447d01a80. |
GCE e2e test build/test passed for commit b8b532b. |
@k8s-bot test this issue: #IGNORE Tests have been pending for 24 hours |
GCE e2e test build/test passed for commit b8b532b. |
LGTM |
Automatic merge from submit-queue |
Auto commit by PR queue bot
Auto commit by PR queue bot
No description provided.