New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate the benefits of adopting docker event stream #16831
Comments
OK, will do! This is indeed necessary for future work. :) |
+1 for starting with the Generic PLEG. On Wed, Nov 4, 2015 at 6:12 PM, taotao notifications@github.com wrote:
|
Now that we've added both a generic PLEG and a cache. We should re-evaluate the benefit of adopting the docker event stream and whether we should prioritize it. |
Here is some benchmark result for this: |
I am familiar with the docker side of events. With a little guidance I can help do the work in the kube code. |
/cc @rrati fyi to keep on the radar. |
@MHBauer We are redefining the container runtime interface now #22964. /cc @yujuhong |
@MHBauer that's great. As @Random-Liu pointed out, event stream will improve performance and resource usage for kubelet, but since 1) not all runtimes support this and 2) the benefits significant enough, we didn't prioritize switching to the event stream. The problem is a little bit more complicated than simply interfacing with the docker event stream. If you are interested, here is some background. Kubelet has a Pod Lifecycle Event Generator, PLEG that is compatible with all runtimes. PLEG does basically two things:
Kubelet has a per-pod goroutine (aka a pod worker) checking a pod periodically. Every time a worker syncs a pod, it gets the current status from the pod status cache to avoid hitting docker directly. Before the next sync can start, the worker will need to wait until all the side-effects (i.e., events) from the previous sync have been observed by PLEG and the in-memory cache has been updated accordingly. This is important because otherwise the worker may create the same container again! To enforce this rule, PLEG records a timestamp on each relisting, to indicate how fresh the cache is. The worker will block until the cache is newer than its last sync. If we want to use switch to docker event stream today, we'll need to
(2) is required because if kubelet doesn't receive any events over a period of time, it'd not know if there are truly no events, or the events are simply delayed. This is not the case for |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
This issue can be adapted to evaluate whether we need to support event stream in CRI. It is very low priority right now unless the need can be identified clearly. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
In #12540, docker event stream was proposed as part of the pod lifecycle event generator (PLEG) to reduce number of docker operations.
However, even without adopting the docker event stream, we can still implement a PLEG solely by periodically relisting the containers. The generic PLEG in #13571 is an example.
This would still improve the average resource usage of kubelet and docker because only one goroutine queries docker at a higher frequency (as opposed to all pod workers). The drawback, on the other hand, is that kubelet's reaction time to container events would be equal or greater than the relist period. We should try to understand the limit of pure relisting by running some micro-benchmark using docker, so that we know how much benefits the container event stream can bring us.
Metrics:
Parameters to vary:
The next step would be benchmarking the docker event stream.
/cc @kubernetes/goog-node
The text was updated successfully, but these errors were encountered: