Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: We should consider updating/reusing cluster-autoscaler to support AWS #11935

Closed
justinsb opened this issue Jul 28, 2015 · 58 comments
Closed
Assignees
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@justinsb
Copy link
Member

We have an autoscaling group for the minions; we should consider enabling auto-scaling based e.g. on CPU or a custom metric we publish.

@erictune
Copy link
Member

A group of us have been discussing node autoscaling this week, including @bgrant0607 @vmarmol @davidopp @jszczepkowski @piosz @gmarek @mwielgus @wojtek-t (probably forgetting some people)

@erictune
Copy link
Member

One thing we talked about was maybe layering the system like this:

  • Pod horizontal autoscaler scales up pod count using CPU as a signal, and maybe later custom metrics, such as http request rate, http latency, etc.
  • Node (horizontal) autoscaler adds nodes when pods are pending due to the scheduler not being able to find a place in the cluster for the pod (failed PodFitsResources check in scheduler). This assumes that pods set reasonable CPU and memory limits.
    So, the Node autoscaler wouldn't directly look at CPU, but indirectly hears about it due to pods being pending.

@justinsb
Copy link
Member Author

That makes a lot of sense to me. I would love to be involved in any discussions.

AWS auto-scaling-groups (and I believe Google MIGs via autoscalers) allow for a quick-and-dirty version of this. Your approach is infinitely better, though I suspect will take a little longer!

The fact that the scheduler will avoid overloading the cluster makes auto-scaling externally much less useful, so we would be in custom metric territory. Even then, I think that having the master node manage the instances will be a much better experience.

Maybe we could promote this interface out of pkg/cloudprovider/aws (currently used only for e2e tests):
https://github.com/GoogleCloudPlatform/kubernetes/blob/8d5a6b063c68b50e9e2e481c04c4cfec4fa57bde/pkg/cloudprovider/aws/aws.go#L147-L154

@justinsb justinsb self-assigned this Jul 30, 2015
@erictune erictune added the sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. label Aug 5, 2015
@justinsb justinsb removed their assignment Aug 12, 2015
@mbforbes mbforbes added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Aug 16, 2015
@ecowan
Copy link

ecowan commented Nov 19, 2015

Hi everyone, I too am very interested in seeing progress on this front. I would really appreciate it if someone could point me to any resources / pull requests that have been done. Thanks!

@satheessh
Copy link

+1

2 similar comments
@rafaljanicki
Copy link

+1

@valery-zhurbenko
Copy link

+1

@piosz
Copy link
Member

piosz commented Feb 15, 2016

If anyone from would like to integrate Kubernetes with AWS autoscaler I'm happy to share our experience with integrating Kubernetes with GCE autoscaler.

cc @fgrzadkowski @mwielgus

@piosz piosz added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 15, 2016
@sstarcher
Copy link

@piosz I would be interested in hearing your experience with Kubernetes GCE autoscaler and I may be interested in helping with this feature.

@miguelfrde
Copy link
Contributor

@piosz I would be interested on hearing about your experience and helping with this feature as well.

@jimmycuadra
Copy link
Contributor

@piosz Yes, please! Very interested in this.

@dengshuan
Copy link

Any schedules about this feature? Or is there any more detailed discussion about this?

@mwielgus
Copy link
Contributor

For 1.3 we have a plan to revisit cluster autoscaling in Kubernetes and make it more user-friendly. At this moment we are discussing our 1.3 priorities and project assignments internally at Google. We will let you once we reach some agreement regarding the possible scope of the improvement that can be delivered by Google and integration plans for other cloud providers (we will definitely need community help there).

@sstarcher
Copy link

Our current AWS scaling strategy for Kubernetes currently has 3 parts

  • Scale add instances on Pending pods
  • Remove instances not running pods
  • A changed to the scheduler to pack our load instead of spreading our load

@justinsb justinsb added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 1, 2016
@justinsb justinsb added this to the v1.3 milestone Apr 1, 2016
@apobbati
Copy link

@pbitty Have you made any progress on this issue? I'd like to help anyway i can.

@philk
Copy link

philk commented Aug 24, 2016

#1377 might be what you're looking for

On Tue, Aug 23, 2016 at 10:14 PM Abhinav Pobbati notifications@github.com
wrote:

@pbitty https://github.com/pbitty Have you made any progress on this
issue? I'd like to help anyway i can.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11935 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABzei0LS2XunOv4fD_tQoLN4eq-LrKJks5qi9MugaJpZM4FhY1C
.

@bjoernhaeuser
Copy link

@philk I think the mentioned PR is not what we are looking for. Is there a typo or similar?

@andrewsykim
Copy link
Member

kubernetes-retired/contrib#1311 is probably the link you are looking for. It references other PRs that have been opened regarding cluster autoscaler for AWS

@philk
Copy link

philk commented Aug 24, 2016

Oh, yeah I was on mobile and didn't realize which repo I was in kubernetes-retired/contrib#1377 was what I meant. (Though 1311 above is useful too)

@aliakhtar
Copy link

What's the status on this feature? I came across this blog: http://blog.kubernetes.io/2016/07/autoscaling-in-kubernetes.html which said AWS auto scaling would be coming in 1.3. The current stable version is 1.3.6, but I can't find any info on this.

The AWS getting started doc says the max / desired instances in the AWS auto scaling group can be set, but do the new AWS instances auto register themselves?

@fgrzadkowski
Copy link
Contributor

This blogpost was released after 1.3 and said that AWS support will be ready soon. AFAIK it's already the case.

@mwielgus Can you please verify? Are there instructions how to set it up? Have we released the image? Does it require kubernetes 1.4 or is it just starting different add-on?

@andrewsykim
Copy link
Member

There's a README here. I don't think an official image was made so you would have to fork the contrib repo and build/push the image yourself for now.

@btdlin
Copy link

btdlin commented Sep 26, 2016

New to the thread, trying to set up auto-scaling with k8s in aws. Is this supported now? I check the README, but not sure exactly what needs to be done to build/push the image. Any update would be really appreciated. Thanks.

@andrewsykim
Copy link
Member

@btdlin you have to build your own docker image on whatever revision we started supporting cluster autoscaler on AWS and push it to your own registry. If you don't want to do that my company has published a public image for our own use cases which has AWS support on it wattpad/cluster-autoscaler:v1.1.

@jimmycuadra
Copy link
Contributor

Is there going to be an official image for the autoscaler? Why make people build it for themselves?

@btdlin
Copy link

btdlin commented Sep 26, 2016

Thanks @andrewsykim . Looks like v1.4 just released a few hours ago, do we know if autoscaler in aws is included in v1.4?

@andrewsykim
Copy link
Member

@jimmycuadra yes I believe there will be an official docker image already, we just didn't know if it the published one supported aws as a cloud provider so we built our own.

@fgrzadkowski
Copy link
Contributor

@mwielgus Can we make sure that cluster autoscaler image is released to an official repo? And I think we should close this issue now, as we support AWS :)

@danbeaulieu
Copy link

@fgrzadkowski Hi, I am very much interested in this feature but I find the lack of documentation to be an issue. The README leaves a bit to be desired.

  • How are instances scaled in? ie is there any Rhyme or reason to which is picked
  • Is it possible to have heterogeneous instance types in the cluster?
  • By what metrics can I use to scale? CPU usage? container count? etc

I am a heavy AWS user but new to Kubernetes if that helps understand the audience.

@jimmycuadra
Copy link
Contributor

Once there is an official image for it, let's make sure the docs for the autoscaler mention where it is!

@fgrzadkowski
Copy link
Contributor

We already have a PR inflight for better documentation - kubernetes-retired/contrib#1731

@mwielgus I think that to improve documentation we will also need:

@mwielgus Can we close this issue as fixed?

@andyxning
Copy link
Member

@erictune @sstarcher Does monitor the Pending pods means that we can use the InsufficientCPU or InsufficientMemory event to get the same result and based on these events we can add new nodes to the cluster.

These two type events will be emit when pods can not be scheduled when the required resource(CPU/Memory) can not be fulfilled.

@fgrzadkowski
Copy link
Contributor

Quick comments - events where not designed to be the API that other components should depend on. That's why we added Scheduled pod condition with reason Unschedulable.

@andyxning
Copy link
Member

andyxning commented Nov 22, 2016

@fgrzadkowski IIUC, you mean that event is not reliable and event is not designed to be depended on for usage like this. The most reliable way is to querying pod info and checking for Scheduled value of a pod spec.

After reading the source code, it seems that it will emit a FailedScheduling event before update the pod status.

@fgrzadkowski
Copy link
Contributor

Scheduler will emit events, but they are not considered to be part of the api for other components.

Yes, you should just check pod condition, which is part of PodStatus.

@motymichaely
Copy link

Hey team, Is there any k8s version aimed for the this feature to be released? Any suggestions for implementing this with AWS ASG + custom metrics?

@mwielgus
Copy link
Contributor

The current version of Cluster Autoscaler (0.4.0) supports AWS ASG. Closing the issue.

@mwielgus
Copy link
Contributor

BTW, Cluster Autoscaler is not driven by metrics but rather by the real need for a new node because some pods cannot schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests