Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support removing ephemeral container from pod #84764

Closed
shuiqing05 opened this issue Nov 5, 2019 · 48 comments
Closed

Support removing ephemeral container from pod #84764

shuiqing05 opened this issue Nov 5, 2019 · 48 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@shuiqing05
Copy link

What would you like to be added:
Ephemeral container can be removed from pod gracefully.
Why is this needed:
In our K8s cluster, there are many deployed pods. We aim to use ephemeral containers as troubleshooting containers. Thus, for troubleshooting, many ephemeral containers are potentially created.
Once the need of an ephemeral container is gone, the container should be removed gracefully. Resource of the container should be released. The container should not be accessed any more for security. And the container also should be removed from pod.
Unless the pod which contains ephemeral container is destroyed, there are no ways to remove the ephemeral container totally.
However, this approach is really "intrusive". (For example, traffic may be impacted) We need a way to remove ephemeral container gracefully like how we creating it.

@shuiqing05 shuiqing05 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 5, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 5, 2019
@shuiqing05
Copy link
Author

/assign @verb

@verb
Copy link
Contributor

verb commented Nov 5, 2019

/sig node

Thanks @shuiqing05 for opening this issue! Is your concern mainly about reclaiming the resources once the ephemeral container has exited? This is a thing that should happen automatically regardless of whether the container has been removed from the PodSpec. It shouldn't require a manual step. (To be clear, I'm not saying this is current behavior.)

Deleting the container would be required if, for example, you wanted to reclaim the resources without the container having exited.

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2019
@dguendisch
Copy link

My usecases for removing closely match @shuiqing05 ones:

  • security: imagine I have a minimal regular container (e.g. with shareProcessNamespace: true); after deploying an allmighty ephemeral debug container I want to remove it after debugging was done so that my minimal container stays minimal again
  • restarts: without removing it's pretty inconvenient to deploy an ephemeral debug container again if the previous one terminated (for whatever reason)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 5, 2020
@tedyu
Copy link
Contributor

tedyu commented Feb 5, 2020

@dguendisch @shuiqing05
What would be the desired policy for removing ephemeral containers ?
For security purpose, should the container be denied access to the Pod after configurable duration ?

@dguendisch
Copy link

For my usecases I wouldn't need duration based policies. My main point would be to be able to manually delete/remove the eph container (after I'm done debugging the problem). Nice to have would be the option/policy of having it automatically removed after the container exited.

@duglin
Copy link

duglin commented Feb 7, 2020

Hi - what's the status of this? I'm really interested in using ephemeral containers for some of my usecases but, like others, I'd like to be able to remove them and reclaim all resources as the pod itself might live for a long time but these e-containers would come and go quite frequently.

@verb
Copy link
Contributor

verb commented Feb 10, 2020

@dguendisch @tedyu Rather than functionality in the kubelet to automatically remove a container based on a policy enforced by the kubelet, I prefer marking a pod as tainted when an ephemeral container has been added (#84353) and leaving it up the administrator how to handle this situation. I could imagine a controller than implements an automatic removal policy.

@duglin It hasn't been designed yet. Right now I'm focused on a debugging MVP. If anyone wants to pick this up the next step would be to gather requirements and propose an update to the KEP.

@verb
Copy link
Contributor

verb commented Mar 5, 2020

I'm considering how to do this for 1.19

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2020
@verb
Copy link
Contributor

verb commented Mar 21, 2020

A lot of use cases have been discussed here, so I thought it would be useful to enumerate them:

  1. reclaim resources after compeletion: We don't actually need to remove an ephemeral container from the spec for this. Resources could be reclaimed after the container exits.
  2. restarting debug containers: I'm not sure there's a strong argument here vs just starting a container with a new name, as kubectl alpha debug does.
  3. terminating the ephemeral container via the API. Currently you have to exit the container process to kill the container.
  4. clean up the pod spec: Frequent use of ephemeral containers will cause the spec to accumulate cruft, and recreating the pod may not be warranted.

Given these, I'm planning on proposing simple deletes without in-place restarts or other updates.

@verb
Copy link
Contributor

verb commented Mar 21, 2020

One concern about allowing ephemeral containers to be removed is that it removes information that could have been used for policy decisions. For this reason it will need to be configurable.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2020
@verb
Copy link
Contributor

verb commented Jun 24, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2020
@verb
Copy link
Contributor

verb commented Sep 22, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2020
@zhhray
Copy link

zhhray commented Oct 29, 2020

I want to know if anyone is working on this issue now? Is there a pr about this issue?

@verb
Copy link
Contributor

verb commented Oct 29, 2020

@zhhray I have a goal to land this in 1.21. It needs KEP modification. PR is kubernetes/enhancements#1690

@verb
Copy link
Contributor

verb commented Jan 12, 2021

/priority important-longterm

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 29, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@m-yosefpor
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@m-yosefpor: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@m-yosefpor
Copy link

@shuiqing05 can you please reopen this issue?

@tonyaw
Copy link

tonyaw commented May 12, 2022

@verb and @pacoxu,
Could you please help to provide the latest status to support removing ephemeral container?
I'd like to use ephemeral container to do some periodic check, and my concern is if the pod spec can be increased more than 1.5 MiB which is the limit of etcd.

@pacoxu
Copy link
Member

pacoxu commented May 12, 2022

The KEP was updated in the v1.23 release cycle kubernetes/enhancements#2892.

There are some discussions in #103354 (comment).

@tonyaw
Copy link

tonyaw commented May 12, 2022

Can I say I can't use ephemeral container to run a periodic check for a long-lived pods by using current release?
I need the ephemeral container to run and exit after each check.

@pacoxu
Copy link
Member

pacoxu commented May 12, 2022

Can I say I can't use ephemeral container to run a periodic check for a long-lived pods by using current release?
I need the ephemeral container to run and exit after each check.

No, you can run the periodic check. (The pod's ephemeral containers would be +1 each time and ephemeral containers would be too many after a long period.)

The Pod can have many ephemeral containers with different names. This issue is about removing exited containers from pod's ephemeral container list. Once the ephemeral container was added to the Pod, it cannot be removed.

@tonyaw
Copy link

tonyaw commented May 12, 2022

Right. My concern is if ephemeral containers run too many times after a long period, the list will be too long and make pod manifest reach 1.5MiB etcd limiation.
Will it be a problem? :-)

@liggitt
Copy link
Member

liggitt commented May 12, 2022

Using ephemeral containers to run a planned periodic check indefinitely for a long-running pod will monotonically increase the length of the ephemeral containers list, and is not a good use case for this feature. You could include what you need to run the periodic check in one of the normal containers and use something like exec to run the check

@tonyaw
Copy link

tonyaw commented May 12, 2022

Thanks @liggitt.

The problem here is:
Our application container is delivered without any debug tools, such as ping, and I want to use ephemeral container to try a periodic ping check to ensure the network is good.
It shall be aligned with one benifit of ephermal container mentioned here:
https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/#uses-for-ephemeral-containers
"In particular, distroless images enable you to deploy minimal container images that reduce attack surface and exposure to bugs and vulnerabilities. Since distroless images do not include a shell or any debugging utilities, it's difficult to troubleshoot distroless images using kubectl exec alone."
But as you mentioned, the ephemeral containers list will monotonically increase the length, so removing ephermal container is a mandatory feature for my use case.
If there is any other way to do it, could you please help to provide it?

@liggitt
Copy link
Member

liggitt commented May 12, 2022

What you're describing sounds more like a health or liveness check, not one-off debugging... including what you need to run that in the normal containers for the app is required.

@tonyaw
Copy link

tonyaw commented May 12, 2022

  1. We don't want to put any debug tools inside application container image.
  2. application container has dropped all capabilities including NEW_RAW which is needed by ping.
    Based on these two reasons, we can't run ping inside application container.
    ephemeral container is a perfect match since it not only can use different container image comparing with application container, but also get default capability including NEW_RAW.
    The only missing is removing ephermal container feature. :-)
    As it also menitoned in following link, ephermal container is designed for on-off debugging at beggining, but there is indeed some needs for repeat run:
    allow removing ephemeral containers since 1.23 #103354 (comment)

Could you please help to provide the reason for why not to add removing ephemeral container feature? Is ephermal container still only designed for one-off debugging purpose?

@liggitt
Copy link
Member

liggitt commented May 12, 2022

The reason remove was not supported was for auditability (losing API-visible information about what has been added to the pod and run) and to avoid race conditions where an ephemeral container with the same name was removed and re-added and appeared to the kubelet to change in-place.

@tonyaw
Copy link

tonyaw commented May 12, 2022

@liggitt Thanks for your info.
From KEP, looks like there is no plan to add the removing feature in the short term. Right?
Also, for the use case I mentioned, I think there is no solution so far. Right? :-(

@verb
Copy link
Contributor

verb commented Jun 10, 2022

@tonyaw That's correct, we won't be implementing this in the context of kubernetes/enhancements#277. It's possible we'd revisit this in the context of a new KEP, since it's popular, but I wouldn't expect it in the short term.

I agree that your use case sounds like a bad fit for ephemeral containers, which are best effort and shouldn't be part of normal pod operation. You could implement this as a health check that doesn't use ICMP (and so doesn't require NET_RAW) but I'd be worried about all of the pods exiting at once if the destination host became unavailable for some reason.

@unmarshall
Copy link

unmarshall commented Aug 18, 2022

The reason remove was not supported was for auditability

If you need for audit reasons then why bloat the pod spec and status? Instead create yet another companion resource which has audit information (gets created when the first debug/attach is issued). This way you do not have to change the pod spec, also allows you to control how much audit information needs to be kept which could also be made configurable. Operators/Consumers can also switch on/off auditing for selected resources which they control etc.

Spec is used to describe the desired state and should not be polluted with audit trail information which can keep growing. Auditing is a cross cutting concern. A desired state should not have ever-growing audit information as part of it. IMHO it simply does not belong there.

@jackmtpt
Copy link

Is there any update on this? Any plans to allow deleting ephemeral containers?

@chengjoey
Copy link
Contributor

#124271
draft for removing ephemeral container from pod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests