Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI: expose optional runtime features #32803

Closed
yujuhong opened this issue Sep 15, 2016 · 44 comments
Closed

CRI: expose optional runtime features #32803

yujuhong opened this issue Sep 15, 2016 · 44 comments
Assignees
Labels
area/kubelet-api lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@yujuhong
Copy link
Contributor

Meta-issue for CRI: #28789

For some features, kubelet checks if the underlying runtime supports such features and reject the pod if necessary. One example is the sysctl support. Right now, kubelet simply checks if (1) the runtime is docker, and (2) if docker version is higher than the minimum requirement. With CRI, checking runtime versions directly is no longer viable. One way to solve this is to ask runtime to expose a set of features it supports through CRI. E.g., an Info call can be added to CRI to retrieve a list of supported features.

Thoughts? Objections?

/cc @kubernetes/sig-node @thockin

Note: sysctl is still an experimental feature and relies on pod annotations in the k8s API. However, the pod admission logic is already live in kubelet.

@yujuhong yujuhong added area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 15, 2016
@yifan-gu
Copy link
Contributor

@yujuhong Why checking runtime versions is not viable?

@feiskyer
Copy link
Member

+1 for this.

@yujuhong Why checking runtime versions is not viable?

@yifan-gu Do you mean extend Version() API to include features? I think that's better than adding a new API.

@thockin
Copy link
Member

thockin commented Sep 16, 2016

Do we need this as an explicit API? Why not just pass sysctl to the
runtime, and if it doesn't support them, it can either reject the request
or log-and-run-anyway...

On Thu, Sep 15, 2016 at 8:34 PM, Pengfei Ni notifications@github.com
wrote:

+1 for this.

@yujuhong https://github.com/yujuhong Why checking runtime versions is
not viable?

@yifan-gu https://github.com/yifan-gu Do you mean extend Version() API
to include features? I think that's better than adding a new API.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32803 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVADZyi6Kvw70L3uZyzxDwdn4Gxtsks5qqg4-gaJpZM4J-JSY
.

@feiskyer
Copy link
Member

Do we need this as an explicit API? Why not just pass sysctl to the
runtime, and if it doesn't support them, it can either reject the request
or log-and-run-anyway...

@thockin One benefit is failing early before calling runtime API if features are not supported.

@yifan-gu
Copy link
Contributor

@yifan-gu Do you mean extend Version() API to include features? I think that's better than adding a new API.

Not saying including features in versions. My thought is the version itself is sufficient to provide information on what features are supported or not (e.g. docker 1.1x support feature a, b, c, etc)

Once the runtime tells us the version, we can infer what features are supported or not, right?

@Random-Liu
Copy link
Member

Random-Liu commented Sep 16, 2016

Not saying including features in versions. My thought is the version itself is sufficient to provide information on what features are supported or not (e.g. docker 1.1x support feature a, b, c, etc)

@yifan-gu Versions of different runtime mean different things. if we just let runtime return its version, kubelet must have some runtime specific logic to figure out what docker 1.1x supports, what rkt 1.0x supports etc.

@yifan-gu
Copy link
Contributor

@yifan-gu Versions of different runtime mean different things. if we just let runtime return its version, kubelet must have some runtime specific logic to figure out what docker 1.1x supports, what rkt 1.0x supports etc.

Fair enough.

@thockin
Copy link
Member

thockin commented Sep 17, 2016

I am not sure that is a compelling enough reason on its own.

On Sep 15, 2016 8:45 PM, "Pengfei Ni" notifications@github.com wrote:

Do we need this as an explicit API? Why not just pass sysctl to the
runtime, and if it doesn't support them, it can either reject the request
or log-and-run-anyway...

@thockin https://github.com/thockin One benefit is failing early before
calling runtime API if features are not supported.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32803 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDMVeLxY7GiEzWqhGQNPu-B-7wSxks5qqhDngaJpZM4J-JSY
.

@yujuhong
Copy link
Contributor Author

Do we need this as an explicit API? Why not just pass sysctl to the
runtime, and if it doesn't support them, it can either reject the request
or log-and-run-anyway...

  1. kubelet right now "rejects" a pod if the runtime doesn't support sysctl. To achieve the same effect, we'll need to either a) ask runtime to return a list of supported features or b) add a new "error" indicating the pod is not supported so that kubelet can reject the pod.
  2. Handling version skew explicitly. If sysctl is added to version 1.1, an older server/runtime, e.g., 1.0, would simply ignore the newly added fields and not return errors. This means that kubelet will have to check what CRI version the runtime is using and whether a particular feature is supported in this version.

@yujuhong yujuhong added this to the v1.5 milestone Sep 19, 2016
@thockin
Copy link
Member

thockin commented Sep 19, 2016

On Mon, Sep 19, 2016 at 2:38 PM, Yu-Ju Hong notifications@github.com wrote:

Do we need this as an explicit API? Why not just pass sysctl to the
runtime, and if it doesn't support them, it can either reject the request
or log-and-run-anyway...

kubelet right now "rejects" a pod if the runtime doesn't support sysctl. To achieve the same effect, we'll need to either a) ask runtime to return a list of supported features or b) add a new "error" indicating the pod is not supported so that kubelet can reject the pod.

This is what I was thinking

Handling version skew explicitly. If sysctl is added to version 1.1, an older server/runtime, e.g., 1.0, would simply ignore the newly added fields and not return errors. This means that kubelet will have to check what CRI version the runtime is using and whether a particular feature is supported in this version.

Kubelet should NOT be checking versions of plugins. It must be
neutral. The question is which behavior is correct? Rejecting pods
that use unsupported features or ignoring the features? Note this has
implications for defaulting logic, too. If we reject, we can never
default a new field lest we risk the kubelet rejecting it with version
skew (which empirically is the normal state of many clusters).

@yujuhong
Copy link
Contributor Author

Ignoring features may be ok in some cases, and I think this is what we do in k8s today (older kubelets ignore new features).

On the other hand, if the feature is related to security (e.g., apparmor?), letting the runtime silently ignore it seems dangerous.

@thockin
Copy link
Member

thockin commented Sep 19, 2016

Can we somehow close the loop? Maybe pass back the runtime's pod, then do
a diff or something? I guess I am OK with feature flags, but I am worried
about the defaulting problem.

On Mon, Sep 19, 2016 at 4:23 PM, Yu-Ju Hong notifications@github.com
wrote:

Ignoring features may be ok in some cases, and I think this is what we do
in k8s today (older kubelets ignore new features).

On the other hand, if the feature is related to security (e.g.,
apparmor?), letting the runtime silently ignore it seems dangerous.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32803 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVBnW2saz84eQ9wmBhhy_bvhD2LuPks5qrxlfgaJpZM4J-JSY
.

@yujuhong
Copy link
Contributor Author

Can we somehow close the loop? Maybe pass back the runtime's pod, then do
a diff or something? I guess I am OK with feature flags, but I am worried
about the defaulting problem.

Let's break down the problem a bit more.

  1. Assume there is no version skew, does runtime need to support all the fields listed in the container/sandbox configuration?
  2. With version skew, how can kubelet know whether a specific feature is enabled?

apiserver <-> kubelet doesn't have problem (1). I'd like to enforce that runtimes should support all fields in the config and it should reject the request if it can't. This does have the implication that every time we want to add a new feature, we'd have to force all runtimes to support this (or they cannot upgrade). cc @kubernetes/sig-rktnetes

As for (2), we also don't handle this in apiserver <-> kubelet in general, with some exceptions.

  • kubelet checks the docker version and reject pods that requests sysctl if the runtime doesn't support it.
  • For apparmor, kubelet writes an annotation on the pod object to acknowledge that kubelet enables the apparmor profile on the pod.

This either requires kubelet to check runtime+version, or the runtime to ACK back. If (1) is true, kubelet could potentially just check the API version that the runtime uses and determine whether the feature is supported (but it wouldn't be able to reject a pod outright without trying the request first).

@yujuhong yujuhong added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 26, 2016
@thockin
Copy link
Member

thockin commented Sep 27, 2016

On Mon, Sep 26, 2016 at 11:54 AM, Yu-Ju Hong notifications@github.com wrote:

Can we somehow close the loop? Maybe pass back the runtime's pod, then do
a diff or something? I guess I am OK with feature flags, but I am worried
about the defaulting problem.

Let's break down the problem a bit more.

Assume there is no version skew, does runtime need to support all the fields listed in the container/sandbox configuration?

With version skew, how can kubelet know whether a specific feature is enabled?

apiserver <-> kubelet doesn't have problem (1). I'd like to enforce that runtimes should support all fields in the config and it should reject the request if it can't. This does have the implication that every time we want to add a new feature, we'd have to force all runtimes to support this (or they cannot upgrade). cc @kubernetes/sig-rktnetes

Will this cause huge friction?Our API is not perfectly abstract, and
maybe there will be things that don't make sense on some runtimes? I
don't know here..

As for (2), we also don't handle this in apiserver <-> kubelet in general, with some exceptions.

We already have the security problem. If kubelet is 1.3.x and reads a
Pod from apiserver that sets AppArmor, it will happily start the pod
without AppArmor. Won't it? Why is that acceptable but kubelet <->
shim version skew is not? And if it is OK to drop fields in the face
of version skew, why is it not OK to drop fields when a runtime
doesn't support it?

kubelet checks the docker version and reject pods that requests sysctl if the runtime doesn't support it.
For apparmor, kubelet writes an annotation on the pod object to acknowledge that kubelet enables the apparmor profile on the pod.

This either requires kubelet to check runtime+version, or the runtime to ACK back. If (1) is true, kubelet could potentially just check the API version that the runtime uses and determine whether the feature is supported (but it wouldn't be able to reject a pod outright without trying the request first).

@euank
Copy link
Contributor

euank commented Oct 1, 2016

This is also partly a distribution problem I think, or at least can be solved by making sufficiently opinionated distribution coupling decisions.

If we can mandate "the shim is tightly tied to kubelet in distribution and version", then things become much easier.
Some of the benefit of having external remote runtimes is that their development and release can be decoupled a bit, but it doesn't have to be that loose.

It could be mandated that each shim must tightly tie to a kubelet version. For rktlet/dockerlet for now, this will be enforced by being in the same binary.

I'm suggesting that other runtimes could do something similar to that; e.g. you could have the frakti 1.5.x release series which works with kubelet 1.5.x release, and the kubelet could enforce that the runtime is on the right series.

It would then be up to the shim to either have a tighter or looser coupling; right now kubelet couples a little loosely with docker, supporting multiple different versions, but not all runtime-shims will need to do so.

This would at least let kubelet assume an arbitrary runtime shim knows about every field it might set (whether or not it ignores it, which would then be a runtime shim implementation choice).

Thoughts?

@thockin
Copy link
Member

thockin commented Oct 1, 2016

It's a little draconian, but not a terrible idea. But is this not covered
by API versioning? Kubelet says "I want API v1.5" and runtime says "I
don't support that". == fail. It's a nefarious failure mode for end-users
though.

On Fri, Sep 30, 2016 at 5:37 PM, Euan Kemp notifications@github.com wrote:

This is also partly a distribution problem I think, or at least can be
solved by making sufficiently opinionated distribution coupling decisions.

If we can mandate "the shim is tightly tied to kubelet in distribution and
version", then things become much easier.
Some of the benefit of having external remote runtimes is that their
development and release can be decoupled a bit, but it doesn't have to be
that loose.

It could be mandated that each shim must tightly tie to a kubelet version.
For rktlet/dockerlet for now, this will be enforced by being in the same
binary.

I'm suggesting that other runtimes could do something similar to that;
e.g. you could have the frakti 1.5.x release series which works with kubelet
1.5.x release, and the kubelet could enforce that the runtime is on the
right series.

It would then be up to the shim to either have a tighter or looser
coupling; right now kubelet couples a little loosely with docker,
supporting multiple different versions, but not all runtime-shims will need
to do so.

This would at least let kubelet assume an arbitrary runtime shim knows
about every field it might set (whether or not it ignores it, which would
then be a runtime shim implementation choice).

Thoughts?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32803 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVBBjZWnTwzkbyH2H-BsJG1W5W4JTks5qvatPgaJpZM4J-JSY
.

@yifan-gu
Copy link
Contributor

yifan-gu commented Oct 3, 2016

It's a little draconian, but not a terrible idea. But is this not covered
by API versioning? Kubelet says "I want API v1.5" and runtime says "I
don't support that". == fail. It's a nefarious failure mode for end-users
though.

Seems we need to achieve:
1, Don't just drop support when versions are different.
2, Acknowledge users that some features are not supported.

So it sounds like a BIG WARNING is the way to go...

k8s-github-robot pushed a commit that referenced this issue Oct 27, 2016
Automatic merge from submit-queue

Add sysctls for dockershim

This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.

Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.

cc/ @yujuhong @Random-Liu
@dims
Copy link
Member

dims commented Nov 16, 2016

This needs to be triaged as a release-blocker or not for 1.5 @yujuhong @thockin @euank

@euank
Copy link
Contributor

euank commented Nov 17, 2016

Bump to 1.6 @dims

@yujuhong yujuhong modified the milestones: v1.6, v1.5 Nov 17, 2016
@dims
Copy link
Member

dims commented Nov 17, 2016

thanks @euank

@mikebrow
Copy link
Member

mikebrow commented Aug 24, 2017

Yes, that's the sort of thing we need. Then there may be a need to provide other hints as well.. for example are certain capabilities within each also supported. For example, some defaults may not be specified on the platform / environment. (docker/default, unconfined, ...) As another example, there may be a minimum version number for the capability that is necessary for the container.

@mikebrow
Copy link
Member

mikebrow commented Aug 24, 2017

Or.. we make the demand that to be a CRI implementation, of a certain type, you must implement a specified set of features. That might be easier?

By type I mean, to qualify the difference between linux/windows, vms / containers, ...

@feiskyer
Copy link
Member

we make the demand that to be a CRI implementation, of a certain type, you must implement a specified set of features.

this lgtm, or else it will be too complex.

@mikebrow any ideas on the initial capabilities list?

@feiskyer feiskyer self-assigned this Aug 25, 2017
@mikebrow
Copy link
Member

I've been trying to figure that out as I go through implementing some of them :-) The pattern I'm using is crawl through the docker shim implementation to derive what the expectations are for each security product, and code it up on the cri-containerd side. For example, for seccomp, {minumum version, docker/default, unconfined, localhost/, runtime/default}

@mikebrow
Copy link
Member

mikebrow commented Aug 25, 2017

IMO the best way to ensure the expected features are there is to have tests for them in e2e node or cri-tools validate. For example, we have the SYSCTL and APPARMOR stuff working on cri-containerd. But verifying via e2e node is not a guarantee that we've done all of the needed features for each. For example, as far as I saw, none of the e2e node apparmor tests confirm if runtime/default is supported.

@mikebrow
Copy link
Member

mikebrow commented Aug 25, 2017

Side note: I'd rather see an external skip list than internal skips. For example, when testing for apparmor support for a runtime I shouldn't have to first PR a filter somewhere in a validator in kublet maybe another filter in the e2e tests to convince it that while docker or rkt might not support the feature maybe remote crio does. An external skip file would give us the ability to manage the filtering without editing the validators and test cases themselves. That said.. where would the skip list go and who would manage it. One alternative would be the above discussion with an api that queries the capabilities (and possibly sub features of capabilities). Another would be more like how we do white lists and black lists for focus/skips for testing.

@mikebrow
Copy link
Member

Another example just popped up: #51319

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@feiskyer feiskyer reopened this May 29, 2018
@feiskyer feiskyer removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 29, 2018
@feiskyer
Copy link
Member

@jessfraz Any comments on this?

@jessfraz
Copy link
Contributor

Yeah I think /info or something like it should return so the admission controllers can deny or not for features like they do with docker versions

@feiskyer
Copy link
Member

#64005 is adding a similar GetRuntimeConfigInfo() interface in CRI, which returns the UID/GID mapping config of runtimes. This could be combined together in one info API.

@feiskyer
Copy link
Member

@yujuhong @Random-Liu @mikebrow @mrunalp For the list of features, any more ideas of this list:

enum RuntimeCapability {
    CAPABILITY_SYSCTL = 0;
    CAPABILITY_SECCOMP = 1;
    CAPABILITY_SELINUX  = 2;
    CAPABILITY_APPARMOR = 3;
}

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 26, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet-api lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests