Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to shell into any container, even if it has no shell #10834

Closed
thockin opened this issue Jul 7, 2015 · 45 comments
Closed

Make it possible to shell into any container, even if it has no shell #10834

thockin opened this issue Jul 7, 2015 · 45 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@thockin
Copy link
Member

thockin commented Jul 7, 2015

Today kubectl exec sh depends on a shell being present in each container image. I think we can do better.

What if we had something like kubectl sh which would run a statically linked busybox (or something like that) that was provided by the host node and made that enter the cgroups and namespaces and chroot of the container? We already have a short list of features that only work if the host node has some piece of software installed, this would be the same.

@thockin thockin added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jul 7, 2015
@thockin thockin added this to the v1.0-post milestone Jul 7, 2015
@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@bgrant0607 bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 4, 2015
@smarterclayton
Copy link
Contributor

@ncdc

yes I've hit this several times. We added oc rsh that provides a helper for -itp -- bash but a more generic static linking solution would be good that works on any container

It can't totally depend on busybox (has to be replaceable/configurable) because we can't ship/support busybox.

Probably bonus points for allowing this to be more sophisticated down the road and inject other things (statically linked ssh)

@luxas
Copy link
Member

luxas commented Jan 4, 2017

Is this something we can prioritize?
The actual implementation, what would it look like?
A flag to the kubelet, sigh, that let's the user choose shell intepreter on host (default /bin/sh, since it exists nearly everywhere) and when the user is running kubectl sh it would just run nsenter, but with sh from the host, right?

cc @timstclair

Anyone up for this?

@smarterclayton
Copy link
Contributor

I don't think we'd want to override the image's shell, it would have to be a fallback iff the shell is missing.

@verb
Copy link
Contributor

verb commented Jan 4, 2017

@luxas I have this problem as well, see #35584 for how I'd like to address it. I'm looking for additional use cases, so your input would be most welcome.

This isn't a priority for SIG Node (at least as defined in the tentative 2017Q1 road map), but it's the only thing I'm attempting to contribute so it's a priority for me.

@thockin
Copy link
Member Author

thockin commented Jan 5, 2017 via email

@smarterclayton
Copy link
Contributor

Let me clarify - it is unacceptable for exec to change to magically override the built in shell, because that is not backwards compatible and breaks large classes of apps on Kube. It's acceptable to make static injection a net new mechanism, and might be acceptable to fallback.

I also don't want a generic shell command that is tied too closely with one particular shell impl - we would not be able to ship that for lots of reasons, most of which include support. So I'm -1 to a "special" exec command that uses a shell binary we ship always.

@thockin
Copy link
Member Author

thockin commented Jan 5, 2017 via email

@verb
Copy link
Contributor

verb commented Jan 5, 2017

It's a requirement for the runtimes rather than the kubelet, and the CRI will need to add support. It's the run times that would have to ship a binary. Implementation would be pretty straightforward for the current container-based runtimes and slightly more difficult for hypervisor/cloud runtimes.

This solution is constrained to a single static binary specified by the Kubernetes developers, limiting its usefulness. I'd much prefer to be able to use an arbitrary container image for debugging, but the two aren't mutually exclusive.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 5, 2017 via email

@thockin
Copy link
Member Author

thockin commented Jan 5, 2017 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 5, 2017 via email

@verb
Copy link
Contributor

verb commented Jan 5, 2017

Agreed. In the very best case my proposal will take a long time to fully implement, but the complete troubleshooting story is important for Kubernetes overall. Ideally we could find a limited solution that makes progress towards a complete solution, like kubectl debug to troubleshoot a duplicate pod before being able to troubleshoot a running one.

The most contentious part of my proposal is mutating the pod spec, but that's not a requirement. We could instead create a new resource to model this sort of ephemeral troubleshooting container, but there where concerns in SIG node about the proliferation of container types and backwards compatibility with tools that introspect the pod (e.g. for security/auditing).

I can't think of a way to inject a binary in service of a more complete solution. Binary injection seems brittle if done simply and a larger, diverging problem to do robustly.

@thockin
Copy link
Member Author

thockin commented Jan 5, 2017 via email

@verb
Copy link
Contributor

verb commented Jan 5, 2017

Assuming this is implemented using something like container image volumes (i.e. reusing existing image build system, distribution channels, resource efficiency via caching & layer sharing, etc) it has only a few disadvantages:

  1. It couples troubleshooting tools to pod creation, and one of our goals was to get the freshest utilities at debug time and free us from updates due to CVEs unrelated to our code.
  2. It constrains troubleshooting to a single, previously configured toolbox. We (Google) care about this because we'd like to implement a security policy where audit events depend on what image is run. (e.g. a shell triggers an alert but the automated troubleshooting script doesn't)
  3. It doesn't help troubleshoot a container that's crashlooping as exec is unavailable
  4. It means we have construct an awkward container image where bash (et. al.) uses a path of /k8s/bin, /k8s/etc, /k8s/lib, etc (for .status.toolboxPath='/k8s')

That being said, this suggestion simplifies many things. # 2 and # 3 can be solved in another fashion. # 4 detracts from user experience but isn't the end of the world and gcr.io/google_containers could host a canonical version.

We could relax the requirement for binary version decoupling in # 1 if everyone really likes this solution, especially if we could figure out a way to asynchronously update the container image volumes in the future.

I am having a little trouble getting past the awkwardness of squatting on a filesystem path like '/k8s' for every container. It feels like meddling in the container's affairs, but I could get over it if it's a path forward.

@thockin
Copy link
Member Author

thockin commented Jan 6, 2017 via email

@verb
Copy link
Contributor

verb commented Jan 6, 2017 via email

@timstclair
Copy link

One concern I have with auto mounting debug tools into every container is it makes the tools available to potential attackers as well. For example, if an arbitrary command execution vulnerability gets you to a shell, a lot of attack paths become much easier. I thought this was the original motivation for the tool being discussed, so that we didn't need to build a shell into every container?

@thockin
Copy link
Member Author

thockin commented Jan 7, 2017 via email

@sdminonne
Copy link
Contributor

FYI @EricMountain-1A

@verb
Copy link
Contributor

verb commented Jan 20, 2017

I've updated the proposal in #35584 with some of the ideas discussed here (and added a link). The latest revision of that proposal favors an approach which I think will give a better user experience.

It's not exactly what's being requested in this issue, but it may serve the same purpose. You'd have the choice between creating a copy of a pod with arbitrary changes (such as image or entrypoint) or attaching a new container to the running pod (initially with access to pod volumes but perhaps eventually with access to container filesystems).

I'd be interested to know whether this meets your needs.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 26, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2018
@kfox1111
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2018
@verb
Copy link
Contributor

verb commented Sep 24, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 24, 2018
@kfox1111
Copy link

Does this get subsumed by debug containers, or is this a slightly different issue?

@verb
Copy link
Contributor

verb commented Oct 17, 2018

@kfox1111 I think this will be solved by debug containers, but if you disagree let me know and I'll unassign.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2019
@sbley
Copy link

sbley commented Jan 15, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2019
@zanetworker
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2019
@verb
Copy link
Contributor

verb commented Aug 13, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2019
@tallclair
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 13, 2019
@verb
Copy link
Contributor

verb commented Aug 27, 2019

With #59484 merged, this has become "possible". We still have a way to go to make it useful out of the box and to graduate the feature. I'm going to close this issue because we have a few others tracking this.

If you're interested in following along, keep an eye on issue #27140 and kubernetes/enhancements#277.

/close

@k8s-ci-robot
Copy link
Contributor

@verb: Closing this issue.

In response to this:

With #59484 merged, this has become "possible". We still have a way to go to make it useful out of the box and to graduate the feature. I'm going to close this issue because we have a few others tracking this.

If you're interested in following along, keep an eye on issue #27140 and kubernetes/enhancements#277.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests