Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better story of container metrics for runtime integration #27097

Closed
yujuhong opened this issue Jun 8, 2016 · 45 comments
Closed

A better story of container metrics for runtime integration #27097

yujuhong opened this issue Jun 8, 2016 · 45 comments
Assignees
Labels
area/extensibility area/kubelet-api area/monitoring sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@yujuhong
Copy link
Contributor

yujuhong commented Jun 8, 2016

Currently, kubelet gets container stats/metrics through cadvisor and expose them through the /stats endpoint.
This forces container runtimes to integrate with cadvisor first, before they can properly integrate with kubelet.

Now that we are defining the new container runtime interface for kubelet (#22964), we should define container metrics and associated methods in the new interfaces so that runtimes can integrate directly with kubelet. This will greatly reduce the integration pain.

Before doing that, we should audit existing cadvisor usage in kubelet and break down the cadvisor interface for different purposes.

/cc @timstclair @vishh @mtaufen

@yujuhong yujuhong added area/kubelet-api area/extensibility sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 8, 2016
@yujuhong
Copy link
Contributor Author

/c @kubernetes/sig-node

@feiskyer
Copy link
Member

+1

@yifan-gu
Copy link
Contributor

yifan-gu commented Jun 15, 2016

cc @kubernetes/sig-rktnetes

@timstclair
Copy link

One thing to consider is that this will make adding additional metrics significantly more difficult. We may want to distinguish between required and nice-to-have metrics. It also depends on our future monitoring story, which is still up in the air (e.g. will these metrics be surfaced in the summary API and collected by Heapster? I suppose even if we went with a standalone solution, it could implement the client interface to collect the metrics).

@yujuhong
Copy link
Contributor Author

+1 for a set of core metrics.

As for nice-to-have metrics, we can either 1) allow runtime to return empty stats if it doesn't support it, or 2) go with the standalone solution, in which case it's up to runtime to integrate with heapster or other components.

@timstclair @dchen1107 , what's the status about custom metrics and monitoring in general?

@timstclair
Copy link

I don't think any progress has been made since we decided not to prioritize it for 1.3, and I haven't seen anything about it on 1.4. @dchen1107 do you have any more context?

@alexbrand
Copy link
Contributor

As @timstclair mentioned in google/cadvisor#1394 (comment), it'd be great to keep Windows in mind when defining this interface.

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

FYI @DirectXMan12

@tmrts
Copy link
Contributor

tmrts commented Jul 27, 2016

@timstclair @yujuhong the next action item for this is to analyze & refactor the current cAdvisor usage in kubelet, is that correct?

@timothysc
Copy link
Member

/cc @ConnorDoyle

@ConnorDoyle
Copy link
Contributor

@yujuhong
Copy link
Contributor Author

@timstclair @yujuhong the next action item for this is to analyze & refactor the current cAdvisor usage in kubelet, is that correct?

Yes, that'd give us a better idea of what metrics we need to support the current implementation.
I may not have time to get to this soon, so help will be welcome

@timstclair
Copy link

We've had a lot of discussions around separating "core" metrics that are required for internal kubernetes functionality from monitoring metrics that are user facing, and would be used for monitoring, alerting, etc. With this split, the CRI would only need to provide the core metrics (other metrics would be collected through alternate channels).

I'd really like to make this split before defining an API for it, but I think this is still a ways off.

@DirectXMan12
Copy link
Contributor

Do HPA metrics count as "required for internal kubernetes functionality"? How do you define what "core" is, especially if we open up the HPA to allow referring to arbitrary metrics? Do we say "by default, the only metrics available in your cluster to scale on are CPU, memory, etc, and for anything else you need to write your own solution to gather and push to Heapster"?

@timstclair
Copy link

@DirectXMan12 And this is why I think it's a ways off :) In all seriousness though, HPA metrics are definitely part of the discussion. There needs to be a way to ingest them, whether or not they're a part of the core metrics exported by the runtime is another question.

@ConnorDoyle
Copy link
Contributor

Just brainstorming, but how committed are folks to making metrics part of the CRI? Based on the above it sounds like: "HPA requires collecting some metrics, those metrics can come from multiple sources, one likely metrics source is the container runtime." Would it make sense to define a metrics source interface and then let container runtimes optionally implement it?

@vishh
Copy link
Contributor

vishh commented Jul 29, 2016

To clarify:

  • system metrics refers to metrics provided by the OS like cpu, memory, disk usage etc.,
  • service metrics refers to metrics provided by applications running in containers.
    IIUC, HPA metrics here is referring to service metrics.
    I'd argue that service metrics should not belong in CRI. A limited set of system metrics (cpu, memory & disk) is required for kubelet and core k8s functionalities like scheduling.
    A more broader set of system metrics and service metrics can be made available to HPA (via heapster) through means that do not involve the runtime and kubelet.
    Requiring all runtimes to support a wide-array of system metrics and service metrics is unreasonable and unnecessary.

@DirectXMan12
Copy link
Contributor

A more broader set of system metrics and service metrics can be made available to HPA (via heapster) through means that do not involve the runtime and kubelet. Requiring all runtimes to support a wide-array of system metrics and service metrics is unreasonable and unnecessary.

I think we should come up with a story for how that's going to happen. At the end of the day, it does not necessarily have to involve the CRI, but it may (I do like the idea of an optional interface, as @ConnorDoyle suggested), so I think we should work it out. I'm wary of a scenario where every kubernetes cluster/platform has a different method for allowing users to expose "service" metrics, which would make applications not as easily portable between clusters.

@yujuhong
Copy link
Contributor Author

Just brainstorming, but how committed are folks to making metrics part of the CRI? Based on the above it sounds like: "HPA requires collecting some metrics, those metrics can come from multiple sources, one likely metrics source is the container runtime." Would it make sense to define a metrics source interface and then let container runtimes optionally implement it?

This is one possible route, but it requires kubelet figuring out whether runtime supports metrics (i.e., optional features) and whether kubelet's metrics library (a.k.a. cadvisor) supports the type of the runtime (e.g., hypervisor-based). It may make more sense to just push this down to the runtime.

@DirectXMan12
Copy link
Contributor

It may make more sense to just push this down to the runtime.

I had interpreted the suggestion as something like defining a well-known interface that had two different actions:

a) fetching metrics
b) indicating where to look for "service" metrics in a standard format from pods

So Kubelet can a) ask the container runtime for metrics, and b) say "by the way, expect to see 'service' metrics in format X in location Y for these pods in these places" (where "in location Y" might be "on port Z" or something like that). The container runtime can then choose to implement those two pieces however it wants. Runtimes would be free to use cAdvisor for this purpose, but would not be obligated to do so; use of cAdvisor would be behind the the metrics interface, so that Kubelet would not know about cAdvisor at all.

Fetching core metrics would be a separate interface (that would be mandatory), but all the metrics would be presented together by Kubelet for collection by Heapster.

@ConnorDoyle
Copy link
Contributor

Or even more simply, configure Kubelet to look to one or more endpoints as metrics sources. One of them might happen to be a process that also implements the CRI interface. With gRPC you can stack multiple services. If guaranteeing system metrics are present is a concern, maybe the system metrics source endpoint could be a separate config value.

@yujuhong
Copy link
Contributor Author

Fetching core metrics would be a separate interface (that would be mandatory

Agreed.

but all the metrics would be presented together by Kubelet for collection by Heapster.

This is debatable. A separate monitoring pipeline could allow more flexibility without changing the kubelet stats api.

Or even more simply, configure Kubelet to look to one or more endpoints as metrics sources. One of them might happen to be a process that also implements the CRI interface.

If kubelet is going to be the hub of all custom metrics, this interface/api should be completely separate from the CRI.

I think we've all agreed that core metrics should be part of CRI, and we should aim for finishing that first.
The non-core metrics, whether be optional in CRI, or as a standalone interface, is worth a separate discussion.

@andrzej-k
Copy link

andrzej-k commented Aug 1, 2016

This is my first message on Kubernetes community forum so: Hello all 👋

I think that from the point of view of telemetry solutions having an interface to integrate with Kubelet (to provide core and/or non-core metrics) would be a great solution.
In our kubesnap project we integrated snap with Kubernetes to demonstrate new features, like: custom metrics on the fly, support for application level metrics, metrics processing and publishing to different sinks, etc.
Metrics delivered by snap were consumed by Heapster (we added new data source to Heapster), but since it seems that Heapster can handle only one data source at a time (link), we had to configure Heapster to talk to snap only.
If we could integrate with Kubelet directly, no changes on Heapster side would be needed. That's probably obvious truth but, I feel that loose coupling of monitoring with Kubelet could allow for new features to show up quicker, just my 2 cents...

@dchen1107 dchen1107 assigned dchen1107 and unassigned yujuhong Aug 5, 2016
@davidopp
Copy link
Member

davidopp commented Aug 5, 2016

Splitting cAdvisor into a part that handles core metrics and is compiled into Kubelet, and a part that runs as a separate pod and handles everything else (including custom metrics) and is interchangeable with other collection agents, seems reasonable to me.

I didn't fully understand why container vs. pod was the criteria you are using to split non-core metrics vs. core metrics, but it sounds like it makes the CRI simpler, and I don't know enough to object. :)

In terms of what the scheduler needs,

  1. short-term it needs per-pod usage information for usage-based scheduling (Scheduler should consider the current status of cpu/memory of hosts while scheduling. #12455)
  2. longer-term it needs some kind of value like Borg reservation (Section 5.5 here for folks who aren't familiar with it) on a per-pod basis
  3. It is conceivable that even longer-term (and this is very speculative), we might want to dynamically schedule containers into pods, in which case the scheduler would probably want usage and/or reservation on a per-container basis. But this is so speculative I do not think we should consider it now.

I assume that (2) will actually come through the separate monitoring pipeline, not from the core cAdvisor? In that case it is not quite correct to say that the core metrics provides all the information needed for scheduling.

@yujuhong
Copy link
Contributor Author

yujuhong commented Aug 5, 2016

@yujuhong #1 is simpler - do you think it is viable? Can we do #1 now and do #2 if and only if we have to?

I think it's viable. Let's wait it out and see if other tasks will make progress in the meantime. I'm a bit unsure that the standalone cadvisor work will be picked up anytime soon though.

@dchen1107
Copy link
Member

@thockin re: #27097 (comment)
Yes, Option 1 is the approach we chose for now and meanwhile

  1. Pod-level cgroup support is under developement (Adding support for QoS cgroup Hierarchy and Pod level resource management in Kubelet #27204). Hopefully we can get it done for 1.4, but for sure in 1.5. We need this w / o CRI work anyway.
  2. Standalone cAdvisor (Standalone cAdvisor for monitoring #18770) work will be picked up by someone in sig-node when we do 1.5 planning once we agreed on the high level (Looks like we do now). Not sure who yet. cc/ @matchstick

@davidopp re: #27097 (comment)
Thanks for providing the requirements from the scheduler perspective. Here are my thoughts on scheduling requirements for resource metrics:

  1. Splitting core metrics vs. non-core metrics should cover both scheduler needs for 1. and 2. listed by you above
  2. New CRI actually makes dynamically scheduling containers into pods possible since the new API separates container lifecycle from pod lifecycle. This is one of the main reasons I initiated this project.
  3. To really unblock CRI work without any functionality & usability regression, node has to export monitoring metrics which is defined at https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go The current reference implementation is standalone cAdvisor. And we need someone working on heapster to make sure it can retrieve the metrics from the daemon other than kubelet.

@bgrant0607
Copy link
Member

Users specify resource requests and limits per container. Thus, container-level metrics will be needed for vertical auto-scaling, resource estimation, and predictive horizontal auto-scaling. Presumably HPA currently use container-level metrics, as well.

@yujuhong
Copy link
Contributor Author

yujuhong commented Aug 8, 2016

Users specify resource requests and limits per container. Thus, container-level metrics will be needed for vertical auto-scaling, resource estimation, and predictive horizontal auto-scaling. Presumably HPA currently use container-level metrics, as well.

@bgrant0607, the conclusion drawn from this discussion is that container metrics should be handled in a separate pipeline (standalone cadvisor + heapster), where kubelet may not be involved.

@bgrant0607
Copy link
Member

@yujuhong I'm fine with the metrics being surfaced via another component. I was just pointing out that the data will be required.

@yujuhong
Copy link
Contributor Author

yujuhong commented Oct 7, 2016

We've decided to punt the monitoring issues to after v1.5, in the hope that the monitoring pipeline will be more well-defined by then.

@michmike
Copy link
Contributor

@yujuhong can you please involve me in the planning for the monitoring aspect of the container runtime interface? i would like to represent the windows-aspect of this work.

thanks!

@yujuhong yujuhong added this to the v1.7 milestone Mar 1, 2017
@yujuhong yujuhong assigned yujuhong and unassigned dchen1107 May 10, 2017
vdemeester pushed a commit to vdemeester/kubernetes that referenced this issue May 27, 2017
Automatic merge from submit-queue (batch tested with PRs 45809, 46515, 46484, 46516, 45614)

CRI: add methods for container stats

**What this PR does / why we need it**:
Define methods in CRI to get container stats.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 
Part of  kubernetes/enhancements#290; addresses kubernetes#27097

**Special notes for your reviewer**:
This PR defines the *minimum required* container metrics for the existing components to function, loosely based on the previous discussion on [core metrics](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/core-metrics-pipeline.md) as well as the existing cadvisor/summary APIs.
 
Two new RPC calls are added to the RuntimeService: `ContainerStats` and `ListContainerStats`. The former retrieves stats for a given container, while the latter gets stats for all containers in one call.
 
The stats gathering time of each subsystem can vary substantially (e.g., cpu vs. disk), so even though the on-demand model preferred due to its simplicity, we’d rather give the container runtime more flexibility to determine the collection frequency for each subsystem*. As a trade-off, each piece of stats for the subsystem must contain a timestamp to let kubelet know how fresh/recent the stats are. In the future, we should also recommend a guideline for how recent the stats should be in order to ensure the reliability (e.g., eviction) and the responsiveness (e.g., autoscaling) of the kubernetes cluster.
 
The next step is to plumb this through kubelet so that kubelet can choose consume container stats from CRI or cadvisor. 
 
**Alternatively, we can add calls to get stats of individual subsystems. However, kubelet does not have the complete knowledge of the runtime environment, so this would only lead to unnecessary complexity in kubelet.*


**Release note**:

```release-note
Augment CRI to support retrieving container stats from the runtime.
```
@yujuhong
Copy link
Contributor Author

yujuhong commented Jun 5, 2017

#45614 added methods in CRI to get container stats. We didn't have time to plumb this through kubelet, so CRI runtimes cannot use it yet. I will file a separate issue for finishing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensibility area/kubelet-api area/monitoring sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests