Support removing ephemeral container from pod #84764

shuiqing05 · 2019-11-05T02:41:45Z

What would you like to be added:
Ephemeral container can be removed from pod gracefully.
Why is this needed:
In our K8s cluster, there are many deployed pods. We aim to use ephemeral containers as troubleshooting containers. Thus, for troubleshooting, many ephemeral containers are potentially created.
Once the need of an ephemeral container is gone, the container should be removed gracefully. Resource of the container should be released. The container should not be accessed any more for security. And the container also should be removed from pod.
Unless the pod which contains ephemeral container is destroyed, there are no ways to remove the ephemeral container totally.
However, this approach is really "intrusive". (For example, traffic may be impacted) We need a way to remove ephemeral container gracefully like how we creating it.

shuiqing05 · 2019-11-05T03:07:57Z

/assign @verb

verb · 2019-11-05T12:54:33Z

/sig node

Thanks @shuiqing05 for opening this issue! Is your concern mainly about reclaiming the resources once the ephemeral container has exited? This is a thing that should happen automatically regardless of whether the container has been removed from the PodSpec. It shouldn't require a manual step. (To be clear, I'm not saying this is current behavior.)

Deleting the container would be required if, for example, you wanted to reclaim the resources without the container having exited.

dguendisch · 2019-11-07T15:34:42Z

My usecases for removing closely match @shuiqing05 ones:

security: imagine I have a minimal regular container (e.g. with shareProcessNamespace: true); after deploying an allmighty ephemeral debug container I want to remove it after debugging was done so that my minimal container stays minimal again
restarts: without removing it's pretty inconvenient to deploy an ephemeral debug container again if the previous one terminated (for whatever reason)

fejta-bot · 2020-02-05T16:11:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

tedyu · 2020-02-05T16:24:16Z

@dguendisch @shuiqing05
What would be the desired policy for removing ephemeral containers ?
For security purpose, should the container be denied access to the Pod after configurable duration ?

dguendisch · 2020-02-05T17:50:49Z

For my usecases I wouldn't need duration based policies. My main point would be to be able to manually delete/remove the eph container (after I'm done debugging the problem). Nice to have would be the option/policy of having it automatically removed after the container exited.

duglin · 2020-02-07T15:09:39Z

Hi - what's the status of this? I'm really interested in using ephemeral containers for some of my usecases but, like others, I'd like to be able to remove them and reclaim all resources as the pod itself might live for a long time but these e-containers would come and go quite frequently.

verb · 2020-02-10T09:29:36Z

@dguendisch @tedyu Rather than functionality in the kubelet to automatically remove a container based on a policy enforced by the kubelet, I prefer marking a pod as tainted when an ephemeral container has been added (#84353) and leaving it up the administrator how to handle this situation. I could imagine a controller than implements an automatic removal policy.

@duglin It hasn't been designed yet. Right now I'm focused on a debugging MVP. If anyone wants to pick this up the next step would be to gather requirements and propose an update to the KEP.

verb · 2020-03-05T16:34:11Z

I'm considering how to do this for 1.19

/remove-lifecycle stale

verb · 2020-03-21T06:48:44Z

A lot of use cases have been discussed here, so I thought it would be useful to enumerate them:

reclaim resources after compeletion: We don't actually need to remove an ephemeral container from the spec for this. Resources could be reclaimed after the container exits.
restarting debug containers: I'm not sure there's a strong argument here vs just starting a container with a new name, as kubectl alpha debug does.
terminating the ephemeral container via the API. Currently you have to exit the container process to kill the container.
clean up the pod spec: Frequent use of ephemeral containers will cause the spec to accumulate cruft, and recreating the pod may not be warranted.

Given these, I'm planning on proposing simple deletes without in-place restarts or other updates.

verb · 2020-03-21T07:06:24Z

One concern about allowing ephemeral containers to be removed is that it removes information that could have been used for policy decisions. For this reason it will need to be configurable.

fejta-bot · 2020-06-19T07:44:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

verb · 2020-06-24T12:47:31Z

/remove-lifecycle stale

fejta-bot · 2020-09-22T12:49:45Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

verb · 2020-09-22T13:55:46Z

/remove-lifecycle stale

zhhray · 2020-10-29T07:05:37Z

I want to know if anyone is working on this issue now? Is there a pr about this issue？

verb · 2020-10-29T09:55:27Z

@zhhray I have a goal to land this in 1.21. It needs KEP modification. PR is kubernetes/enhancements#1690

verb · 2021-01-12T15:05:48Z

/priority important-longterm

k8s-triage-robot · 2021-11-29T07:22:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-29T07:30:19Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-28T08:05:39Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-28T08:05:58Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

m-yosefpor · 2022-03-23T13:15:00Z

/reopen

k8s-ci-robot · 2022-03-23T13:15:19Z

@m-yosefpor: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

m-yosefpor · 2022-03-23T13:15:44Z

@shuiqing05 can you please reopen this issue?

tonyaw · 2022-05-12T06:27:04Z

@verb and @pacoxu,
Could you please help to provide the latest status to support removing ephemeral container?
I'd like to use ephemeral container to do some periodic check, and my concern is if the pod spec can be increased more than 1.5 MiB which is the limit of etcd.

pacoxu · 2022-05-12T07:44:27Z

The KEP was updated in the v1.23 release cycle kubernetes/enhancements#2892.

There are some discussions in #103354 (comment).

tonyaw · 2022-05-12T11:19:52Z

Can I say I can't use ephemeral container to run a periodic check for a long-lived pods by using current release?
I need the ephemeral container to run and exit after each check.

pacoxu · 2022-05-12T11:24:34Z

Can I say I can't use ephemeral container to run a periodic check for a long-lived pods by using current release?
I need the ephemeral container to run and exit after each check.

No, you can run the periodic check. (The pod's ephemeral containers would be +1 each time and ephemeral containers would be too many after a long period.)

The Pod can have many ephemeral containers with different names. This issue is about removing exited containers from pod's ephemeral container list. Once the ephemeral container was added to the Pod, it cannot be removed.

tonyaw · 2022-05-12T11:38:13Z

Right. My concern is if ephemeral containers run too many times after a long period, the list will be too long and make pod manifest reach 1.5MiB etcd limiation.
Will it be a problem? :-)

liggitt · 2022-05-12T12:20:03Z

Using ephemeral containers to run a planned periodic check indefinitely for a long-running pod will monotonically increase the length of the ephemeral containers list, and is not a good use case for this feature. You could include what you need to run the periodic check in one of the normal containers and use something like exec to run the check

tonyaw · 2022-05-12T12:36:00Z

Thanks @liggitt.

The problem here is:
Our application container is delivered without any debug tools, such as ping, and I want to use ephemeral container to try a periodic ping check to ensure the network is good.
It shall be aligned with one benifit of ephermal container mentioned here:
https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/#uses-for-ephemeral-containers
"In particular, distroless images enable you to deploy minimal container images that reduce attack surface and exposure to bugs and vulnerabilities. Since distroless images do not include a shell or any debugging utilities, it's difficult to troubleshoot distroless images using kubectl exec alone."
But as you mentioned, the ephemeral containers list will monotonically increase the length, so removing ephermal container is a mandatory feature for my use case.
If there is any other way to do it, could you please help to provide it?

liggitt · 2022-05-12T12:44:06Z

What you're describing sounds more like a health or liveness check, not one-off debugging... including what you need to run that in the normal containers for the app is required.

tonyaw · 2022-05-12T13:04:43Z

We don't want to put any debug tools inside application container image.
application container has dropped all capabilities including NEW_RAW which is needed by ping.
Based on these two reasons, we can't run ping inside application container.
ephemeral container is a perfect match since it not only can use different container image comparing with application container, but also get default capability including NEW_RAW.
The only missing is removing ephermal container feature. :-)
As it also menitoned in following link, ephermal container is designed for on-off debugging at beggining, but there is indeed some needs for repeat run:
allow removing ephemeral containers since 1.23 #103354 (comment)

Could you please help to provide the reason for why not to add removing ephemeral container feature? Is ephermal container still only designed for one-off debugging purpose?

liggitt · 2022-05-12T13:26:14Z

The reason remove was not supported was for auditability (losing API-visible information about what has been added to the pod and run) and to avoid race conditions where an ephemeral container with the same name was removed and re-added and appeared to the kubelet to change in-place.

tonyaw · 2022-05-12T13:37:42Z

@liggitt Thanks for your info.
From KEP, looks like there is no plan to add the removing feature in the short term. Right?
Also, for the use case I mentioned, I think there is no solution so far. Right? :-(

verb · 2022-06-10T13:21:38Z

@tonyaw That's correct, we won't be implementing this in the context of kubernetes/enhancements#277. It's possible we'd revisit this in the context of a new KEP, since it's popular, but I wouldn't expect it in the short term.

I agree that your use case sounds like a bad fit for ephemeral containers, which are best effort and shouldn't be part of normal pod operation. You could implement this as a health check that doesn't use ICMP (and so doesn't require NET_RAW) but I'd be worried about all of the pods exiting at once if the destination host became unavailable for some reason.

unmarshall · 2022-08-18T10:16:29Z

The reason remove was not supported was for auditability

If you need for audit reasons then why bloat the pod spec and status? Instead create yet another companion resource which has audit information (gets created when the first debug/attach is issued). This way you do not have to change the pod spec, also allows you to control how much audit information needs to be kept which could also be made configurable. Operators/Consumers can also switch on/off auditing for selected resources which they control etc.

Spec is used to describe the desired state and should not be polluted with audit trail information which can keep growing. Auditing is a cross cutting concern. A desired state should not have ever-growing audit information as part of it. IMHO it simply does not belong there.

jackmtpt · 2023-11-17T11:29:36Z

Is there any update on this? Any plans to allow deleting ephemeral containers?

chengjoey · 2024-04-16T05:22:25Z

#124271
draft for removing ephemeral container from pod

shuiqing05 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 5, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 5, 2019

k8s-ci-robot assigned verb Nov 5, 2019

shuiqing05 mentioned this issue Nov 5, 2019

Ephemeral Containers kubernetes/enhancements#277

Closed

23 tasks

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 5, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 29, 2021

k8s-ci-robot closed this as completed Jan 28, 2022

infiniteregrets mentioned this issue Aug 1, 2022

Autoclean/Support removing ephemeral containers after completion kubernetes/enhancements#3448

Closed

4 tasks

pacoxu mentioned this issue Dec 20, 2022

Failed to create additional ephemeral containers #114593

Closed

sword-jin mentioned this issue Apr 10, 2024

Requirment: Clear all debug sessions mintoolkit/mint#16

Closed

carlory mentioned this issue Apr 16, 2024

support delete ephemeral container #124270

Closed

Support removing ephemeral container from pod #84764

Support removing ephemeral container from pod #84764

Comments

shuiqing05 commented Nov 5, 2019

shuiqing05 commented Nov 5, 2019

verb commented Nov 5, 2019

dguendisch commented Nov 7, 2019

fejta-bot commented Feb 5, 2020

tedyu commented Feb 5, 2020

dguendisch commented Feb 5, 2020

duglin commented Feb 7, 2020

verb commented Feb 10, 2020

verb commented Mar 5, 2020

verb commented Mar 21, 2020

verb commented Mar 21, 2020

fejta-bot commented Jun 19, 2020

verb commented Jun 24, 2020

fejta-bot commented Sep 22, 2020

verb commented Sep 22, 2020

zhhray commented Oct 29, 2020

verb commented Oct 29, 2020

verb commented Jan 12, 2021

k8s-triage-robot commented Nov 29, 2021

k8s-triage-robot commented Dec 29, 2021

k8s-triage-robot commented Jan 28, 2022

k8s-ci-robot commented Jan 28, 2022

m-yosefpor commented Mar 23, 2022

k8s-ci-robot commented Mar 23, 2022

m-yosefpor commented Mar 23, 2022

tonyaw commented May 12, 2022

pacoxu commented May 12, 2022

tonyaw commented May 12, 2022

pacoxu commented May 12, 2022 • edited

tonyaw commented May 12, 2022

liggitt commented May 12, 2022

tonyaw commented May 12, 2022

liggitt commented May 12, 2022

tonyaw commented May 12, 2022 • edited

liggitt commented May 12, 2022

tonyaw commented May 12, 2022

verb commented Jun 10, 2022

unmarshall commented Aug 18, 2022 • edited

jackmtpt commented Nov 17, 2023

chengjoey commented Apr 16, 2024

pacoxu commented May 12, 2022 •

edited

tonyaw commented May 12, 2022 •

edited

unmarshall commented Aug 18, 2022 •

edited