New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vertical pod auto-sizer #10782
Comments
@jszczepkowski Do you have any issues open already on this? |
I don't have any issue for it yet. |
We are planning to divide vertical pod autoscaling into three implementation steps. We are going to deliver the first of them, Setting Initial Resources, for version 1.1. Setting Initial ResourcesSetting initial resources will be implemented as an admission plugin. It will try to estimate and set values for request memory/CPU for containers within each pod if they were not given by user. (The plugin will not set limit to avoid OOM killing). We will additionally annotate containers metrics with image name. Usage for given image will be aggregated (it is not yet decided how and by whom) and initial resource plugin will set request base on the aggregation. Reactive vertical autoscaling by deployment updateWe will add a new object: vertical pod autoscaler, which will work on a deployment level. User, when specifying a pod template in a deployment, will have an option to set enable_vertical_autoscaler flag. If the auto flag is given, vertical pod autoscaler will monitor resource usage of pod’s containers and change they resource requirements by updating the pod’s template in the deployment object. So, the deployment will act as the actuator of the autoscaler. Note that user can both specify requirements for pod and turn on auto flag for it. In such case, the requirements given by the user will be treated only as initial, and may be overwritten by the autoscaler. Reactive vertical autoscaling by in-place updateWe have an initial idea of a more complicated autoscaler, which will not be bounded to the deployment object, but will work on a pod level, and will actuate resource requirements by in-place update of the pod. Such autoscaler, before the update, will first need to consult the scheduler, if the new resources for the pod will fit, and in-place update is feasible. The answer given by scheduler will not be 100% reliable: it still may happen that the pod after the in-place update will be killed by kubelet due to lack of resources. |
Metrics aggregation needs some top level issue to track - I'm not aware of one but we'd like to see it be usable from several angles - UI and tracking other container metrics from related systems (load balancers) |
\CC me |
It is important that there be feedback when the predictions are wrong. In particular, I think it is important that a Pod which is over its request (due to an incorrect initial prediction) is much more likely to be killed than some other pod which is under its request. That way, a malfunction of the "Setting Initial Limits" system appears to affect specific pods, and not random pods, making it very difficult to diagnose. One way to do that is to make the kill probability in a system OOM situation proportional to the amount over request. @AnanyaKumar @vishh does the current implementation have that property? @dchen1107 pointed out that it is bad to have a system OOM in the first place. So, two things we might do are:
TL;DR: can we please set limit to 2x predicted request for v1.1. |
@erictune: Yes. The kernel will prefer to kill containers that exceeds their request in case of a OOM scenario. +1 for starting with a conservative estimate. |
@erictune I see your point of view and I agree that setting Limits will solve your case. On the other hands I can imagine some other situations when it causes problems rather than solves them. Especially when the estimation is wrong and the user would observe unexpected kill of his container. So that we need to have high confidence while setting Limits and we can't guarantee it from the beginning. I think everyone agrees that setting Request should improve the overall experience which may not be true for setting Limits. Long term we definitely want to set both, but I would set only Request in the first version (which may be different then v1.1), gather some feedback from users and then eventually add setting Limits once we will have algorithm tuned. @vishh How about having two containers that exceed their request: which one will be killed? The one that exceeds request 'more' or random one? |
As per the current kubelet qos policy, all processes exceeding their request will be equally likely to be killed by the OOM killer. |
By 'more' you mean relative or absolute value? |
@piosz: I updated my original comment. Does it make sense now? |
|
If we pursue a vertical autosizer that requires kicking a deployment, how hard is it to take that requirement back? For example, I would think many of our users would prefer a solution that did not require a re-deploy and instead could re-size existing pods. |
How would a vertical autosizer that did restarts work with a Deployment, exactly? Can resizing happen concurrently with new image rollout? If the user wants to roll back the image change, and there was an intervening resource change, what happens? Can I end up with four replicaSets (cross product of two image version and old/new resource advice)? Are these competing for |
An autosizer blowing out my deployment revision budget is not ok :) On Nov 18, 2016, at 11:08 AM, Derek Carr notifications@github.com wrote: If we pursue a vertical autosizer that requires kicking a deployment, how — |
That said, I think most java apps would need to restart to take advantage, On Fri, Nov 18, 2016 at 3:21 PM, Clayton Coleman notifications@github.com
|
@derekwaynecarr I don't think anyone is proposing "a vertical autosizer that requires kicking a deployment" in the long run -- instead I think @fgrzadkowski is saying that the first version would work that way because it's simpler and isn't blocked on in-place resource update in kubelet. Our plan for the next step of PodDisruptionBudget is to allow it to specify a disruption rate, not just a max-number-simultaneously-down. So you could imagine attaching a max disruption rate PDB to your Deployment, that the vertical autoscaler would respect (i.e. it would not exceed the specified rate when it does resource updates that require killing the container). I think @erictune is asking a good question. I was surprised that @fgrzadkowski said vertical autoscaling would create a new Deployment. IIRC in Borg we use an API that is distinct from collection update (i.e. does not create a new future collection), to handle vertical autoscaling, so that it doesn't interfere with any user-initiated update that might be ongoing at the same time. |
@davidopp I didn't suggest creating new deployment. I only suggested to change requirements via existing deployment. @erictune I think those are great questions! I don't have concrete answers - it should be covered in a proposal/design doc. However I recall a conversation with @bgrant0607 some time ago, that Deployment could potentially have multiple rollouts in-flight on different vertices. With regard to limiting how quickly we would roll it out I agree with @davidopp that it should be solved by PDB. @derekwaynecarr I imagine that initially in the validation we would always expect that the target object would be |
Some quick comments:
|
@jszczepkowski You mentioned that "the admission plugin will try to estimate and set values for request memory/CPU for containers within each pod if they were not given by user." Can you please elaborate how the admission plug does the estimation? Does it estimate based on historical data of similar jobs, or based on some profiling result, or something else? thanks |
@bitbyteshort |
@jszczepkowski thanks for pointing me to the latest proposal. |
FYI: I've put together a blog post aiming at raising awareness and introducing our demonstrators |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Can we try this VPA feture in K8S release 1.8? |
not really. None of the work has really landed yet (except for @mhausenblas's PoC called |
/remove-lifecycle stale |
VPA is in alpha in https://github.com/kubernetes/autoscaler |
We should create a vertical auto-sizer. A vertical auto-sizer sets the compute resource limits and request for pods which do not have them set, and periodically adjust them based on demand signals. It does not directly deal with replication controllers, services, or nodes.
Related issues:
The text was updated successfully, but these errors were encountered: