Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove need for the apiserver to contact kubelet for current container state #156

Closed
jjhuff opened this issue Jun 18, 2014 · 34 comments
Closed
Labels
area/kubelet area/nodecontroller priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Milestone

Comments

@jjhuff
Copy link
Contributor

jjhuff commented Jun 18, 2014

While the kubelet certainly is the source of truth for what is running on a particular host, it might be nice to have it push that info to the apiserver on a regular basis (and on state changes) rather than force the apiserver to ask.

Reasons:

  • In some deployment scenarios, the apiserver might not have direct contact to individual kubelets.
  • It'd give the apiserver (and it's clients) access to a reasonably current state of the world without needing to poll each kubelet. This would be handy for improved replication/placement/auto-scale algorithms.
  • The update could include other info like statistics for both the host and containers
  • Open the door for auto-registering hosts

The kubelet could publish this to etcd, but I think it'd be good to aim for fewer dependencies on etcd rather than more. Thoughts on that?

@brendandburns
Copy link
Contributor

I think we'd rather not have the Kubelet calling back into the master, since it can result in massive fan-in storms of messages though.

However, I can definitely see the value in caching the information inside the apiserver. So what do you think about having the apiserver periodically poll all Kubelets for information, and caching that information locally. I think that would satisfy all of the needs you enumerated, while still enabling the master to control the flow of information.

What do you think?

@jbeda
Copy link
Contributor

jbeda commented Jun 18, 2014

In the past I've suggested a strategy where we have the master (or some fellow traveller server process) scrapes the nodes regularly and writes back to the master/etcd. We'd then return how stale results are in our API.

Further, there'd be an API option to ask for "up to the second" results that would result in a sync call out to the node.

@jjhuff
Copy link
Contributor Author

jjhuff commented Jun 18, 2014

I think that any apiserver->kublet polling will have problems in some
deployments. For example;

  • Using a mix of internal machines and GCE. Without a fair amount of work,
    neither has full access to the other's network -- each has to talk via NAT.
  • Running the master on, say, AppEngine. That'd be super handy for
    reducing common-mode failures.

I hear ya on the fan-in problem. That problem already exists to some extent
with the HTTP polling option. Since apiserver is stateless (yay!), it
should scale-out reasonably well as long as the backing store can handle
it...but we already have that problem.

Perhaps only push updates on state changes?

On Wed, Jun 18, 2014 at 11:20 AM, brendandburns notifications@github.com
wrote:

I think we'd rather not have the Kubelet calling back into the master,
since it can result in massive fan-in storms of messages though.

However, I can definitely see the value in caching the information inside
the apiserver. So what do you think about having the apiserver periodically
poll all Kubelets for information, and caching that information locally. I
think that would satisfy all of the needs you enumerated, while still
enabling the master to control the flow of information.

What do you think?


Reply to this email directly or view it on GitHub
#156 (comment)
.

@brendandburns
Copy link
Contributor

I hear you about the problems with mixed topologies, pushing updates on state changes actually does have the same fan-in problems, suppose all of your tasks fail at the same time with a packet of death, you'll still see a storm of messages.

Hacking in the polling from the apiserver is going to be easier in the short term, so I think we'll at least do that first.

@bgrant0607
Copy link
Member

The main reasons for the apiserver to contact kubelets rather than the other way are:

  • connection storms, which admittedly can be mitigated by fuzzing
  • other communication storms, which can be mitigated by the apiserver returning callback times to the kubelets
  • facilitating use of kubelets without apiserver, which admittedly could be controlled via configuration
  • support multiple independent state scrapers
  • shard state scraping amongst multiple servers, which admittedly could be handled with redirects
  • natural for heartbeating, starting new containers, and remote management of kubelet itself

Regardless which components initiate the connection, we maywant to implement a number of optimizations, especially once we start to collect resource stats:

  • state caching
  • change notification rather than polling
  • change significance filtering

@jjhuff
Copy link
Contributor Author

jjhuff commented Jun 18, 2014

My concern has mainly been figuring out the blockers to actually
deploying kubernetes outside of a strictly GCE environment given security
and network topology constraints. That makes me want to take scissors to
these all-all communication patterns:)

Perhaps a hybrid approach is going to be the best:

  • Cache state on the master -- either in memory or backed by etcd/whatever.
    In memory should be reasonable for a single apiserver instance, but we'd
    need persistence for anything more.
  • Optional state push (interval or change based) from the kubelets. Much
    like all of it's existing options. The state push would just populate the
    cache

This adds caching to the baseline config, gives the option to reverse the
apiserver-kubelet communication, and preserves the ability for other tools
to scrape the kubelets.

On Wed, Jun 18, 2014 at 12:52 PM, bgrant0607 notifications@github.com
wrote:

The main reasons for the apiserver to contact kubelets rather than the
other way are:

  • connection storms, which admittedly can be mitigated by fuzzing
  • other communication storms, which can be mitigated by the apiserver
    returning callback times to the kubelets
  • facilitating use of kubelets without apiserver, which admittedly
    could be controlled via configuration
  • support multiple independent state scrapers
  • shard state scraping amongst multiple servers, which admittedly
    could be handled with redirects
  • natural for heartbeating, starting new containers, and remote
    management of kubelet itself

Regardless which components initiate the connection, we maywant to
implement a number of optimizations, especially once we start to collect
resource stats:

  • state caching
  • change notification rather than polling
  • change significance filtering


Reply to this email directly or view it on GitHub
#156 (comment)
.

@brendandburns
Copy link
Contributor

SGTM.

I will send a PR for the poll and cache support today, and then work on the
optional Kubelet -> apiserver code path.

Thanks for bearing with us as we sort through this stuff ;)

--brendan

On Wed, Jun 18, 2014 at 1:48 PM, Justin Huff notifications@github.com
wrote:

My concern has mainly been figuring out the blockers to actually
deploying kubernetes outside of a strictly GCE environment given security
and network topology constraints. That makes me want to take scissors to
these all-all communication patterns:)

Perhaps a hybrid approach is going to be the best:

  • Cache state on the master -- either in memory or backed by
    etcd/whatever.
    In memory should be reasonable for a single apiserver instance, but we'd
    need persistence for anything more.
  • Optional state push (interval or change based) from the kubelets. Much
    like all of it's existing options. The state push would just populate the
    cache

This adds caching to the baseline config, gives the option to reverse the
apiserver-kubelet communication, and preserves the ability for other tools
to scrape the kubelets.

On Wed, Jun 18, 2014 at 12:52 PM, bgrant0607 notifications@github.com
wrote:

The main reasons for the apiserver to contact kubelets rather than the
other way are:

  • connection storms, which admittedly can be mitigated by fuzzing
  • other communication storms, which can be mitigated by the apiserver
    returning callback times to the kubelets
  • facilitating use of kubelets without apiserver, which admittedly
    could be controlled via configuration
  • support multiple independent state scrapers
  • shard state scraping amongst multiple servers, which admittedly
    could be handled with redirects
  • natural for heartbeating, starting new containers, and remote
    management of kubelet itself

Regardless which components initiate the connection, we maywant to
implement a number of optimizations, especially once we start to collect
resource stats:

  • state caching
  • change notification rather than polling
  • change significance filtering


Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/156#issuecomment-46485272>

.


Reply to this email directly or view it on GitHub
#156 (comment)
.

@jjhuff
Copy link
Contributor Author

jjhuff commented Jun 18, 2014

No problem! Thanks for even doing the work -- I was happy to do that. I was thinking of tackling #134 as well, but I can wait if you want to avoid conflicts.

@brendandburns
Copy link
Contributor

We're happy to take the work! Feel free to take on #134, and I might
delegate the push stuff to you too. We'll see how the pull cache goes....

Best
--brendan

On Wed, Jun 18, 2014 at 2:13 PM, Justin Huff notifications@github.com
wrote:

No problem! Thanks for even doing the work -- I was happy to do that. I
was thinking of tackling #134
#134 as well,
but I can wait if you want to avoid conflicts.


Reply to this email directly or view it on GitHub
#156 (comment)
.

@brendandburns
Copy link
Contributor

Polling from the master was added in #171

I'll work on optional push next.

@smarterclayton
Copy link
Contributor

#846 isolates the kubelet from etcd except for writes of logs (covered by #285) with the same semantics via the apiserver.

@bgrant0607 bgrant0607 added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 3, 2014
@bgrant0607
Copy link
Member

Related to #2726 and #2483. The Kubelet is effectively a "pod controller", and should be providing pod status.

We're also pursuing auto-registration, #2303.

@dchen1107 dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 4, 2015
@smarterclayton
Copy link
Contributor

David is going to open an issue on the security of the kubelet in general with a proposal to restrict the kubelet via TLS client certs. We may also need to let the kubelet ask the master whether certain things are allowed which would be covered under his SubjectAccessReview proposal

@bgrant0607
Copy link
Member

cc @erictune

@alex-mohr
Copy link
Contributor

@roberthbailey @cjcullen Robby and CJ have also been looking into securing the kubelet <-> master communication with TLS certs. We definitely need to harden that path, but I think that's a separate issue than this one?

@smarterclayton
Copy link
Contributor

@deads2k

----- Original Message -----

@roberthbailey @cjcullen Robby and CJ have also been looking into securing
the kubelet <-> master communication with TLS certs. We definitely need to
harden that path, but I think that's a separate issue than this one?


Reply to this email directly or view it on GitHub:
#156 (comment)

@bgrant0607
Copy link
Member

This proposal was to change to unidirectional communication from kubelet to apiserver. If we replicated apiserver #473, I think that could work reasonably well, and would be inline with the approach used by other components, such as controller-manager and scheduler.

@bgrant0607
Copy link
Member

@yujuhong

@bgrant0607
Copy link
Member

Related: #3168, #2435, #2303. All issues predicted by @jjhuff.

zmerlynn added a commit to zmerlynn/kubernetes that referenced this issue Mar 10, 2015
Change provisioning to pass all variables to both master and node. Run
Salt in a masterless setup on all nodes ala
http://docs.saltstack.com/en/latest/topics/tutorials/quickstart.html,
which involves ensuring Salt daemon is NOT running after install. Kill
Salt master install. And fix push to actually work in this new flow.

As part of this, the GCE Salt config no longer has access to the Salt
mine, which is primarily obnoxious for two reasons: - The minions
can't use Salt to see the master: this is easily fixed by static
config. - The master can't see the list of all the minions: this is
fixed temporarily by static config in util.sh, but later, by other
means (see
kubernetes#156, which
should eventually remove this direction).

As part of it, flatten all of cluster/gce/templates/* into
configure-vm.sh, using a single, separate piece of YAML to drive the
environment variables, rather than constantly rewriting the startup
script.
@dchen1107
Copy link
Member

Should we close this one?

@bgrant0607
Copy link
Member

Yes

akram pushed a commit to akram/kubernetes that referenced this issue Apr 7, 2015
Change provisioning to pass all variables to both master and node. Run
Salt in a masterless setup on all nodes ala
http://docs.saltstack.com/en/latest/topics/tutorials/quickstart.html,
which involves ensuring Salt daemon is NOT running after install. Kill
Salt master install. And fix push to actually work in this new flow.

As part of this, the GCE Salt config no longer has access to the Salt
mine, which is primarily obnoxious for two reasons: - The minions
can't use Salt to see the master: this is easily fixed by static
config. - The master can't see the list of all the minions: this is
fixed temporarily by static config in util.sh, but later, by other
means (see
kubernetes#156, which
should eventually remove this direction).

As part of it, flatten all of cluster/gce/templates/* into
configure-vm.sh, using a single, separate piece of YAML to drive the
environment variables, rather than constantly rewriting the startup
script.
resouer pushed a commit to resouer/kubernetes that referenced this issue Dec 5, 2016
xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016
devel/local-up: doc cfssl requirement
iaguis pushed a commit to kinvolk/kubernetes that referenced this issue Feb 6, 2018
seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
marun added a commit to marun/kubernetes that referenced this issue Jun 24, 2020
pjh pushed a commit to pjh/kubernetes that referenced this issue Jan 31, 2022
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/nodecontroller priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

11 participants