Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod stuck with NodeAffinity status // using spot VMs under K8s 1.22.x and 1.23.x #112333

Closed
gillesdouaire opened this issue Sep 8, 2022 · 22 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@gillesdouaire
Copy link

The same problem on 1.22.3-gke.700

Originally posted by @maxpain in #98534 (comment)

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 8, 2022
@k8s-ci-robot
Copy link
Contributor

@gillesdouaire: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 8, 2022
@gillesdouaire gillesdouaire changed the title The same problem on 1.22.3-gke.700 NodeAffinity pod on Spot VMs under 1.22.3-gke.700 Sep 8, 2022
@gillesdouaire
Copy link
Author

gillesdouaire commented Sep 8, 2022

/sig node

We are aware it is supposed to be fixed as of k8s 1.21, but we experienced it in the same context but under newer K8s version. All pods on a given node are stuck with NodeAffinity status, and will remain so until deleted, after which action they will be re-scheduled. The node is ready and otherwise healthy.

k8s version: v1.22.12-gke.1200
spot VMs: enable

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 8, 2022
@gillesdouaire gillesdouaire changed the title NodeAffinity pod on Spot VMs under 1.22.3-gke.700 pod stuck with NodeAffinity status using VMs under 1.22.3-gke.700 Sep 8, 2022
@gillesdouaire gillesdouaire changed the title pod stuck with NodeAffinity status using VMs under 1.22.3-gke.700 pod stuck with NodeAffinity status // using spot VMs under K8s 1.22.x Sep 8, 2022
@pacoxu
Copy link
Member

pacoxu commented Sep 13, 2022

BTW, End of Life for 1.22 is 2022-10-28.

How can we reproduce it? I cannot reproduce it by just restarting kubelet in my v1.24 cluster.

@gillesdouaire
Copy link
Author

gillesdouaire commented Sep 15, 2022

@pacoxu In my case, I was able to reproduce the situation on our 1.22 GKE cluster by issueing a few "kube nodes delete... --force" commands on our existing KM nodes hosted on spot VMS / each time, waiting for a new node to respawn, stabilize, then del;ete again, It took 6 delete commands before pods stuck in NodeAffinity state appeared.

As mentioned before, all the pods stuck in NodeAffinity are assigned to the same node.

Right now, I am leaving a few pods in that state, so if you need more details on the actual status of the workloads, let me know.

@gillesdouaire
Copy link
Author

@pacoxu Using the "kube nodes delete... --force" approach, I was able to reproduce on a Kubernetes cluster running 1.23.10.

Same behaviour: all the pods stuck in NodeAffinity are assigned to the same node and remain unready.

@gillesdouaire gillesdouaire changed the title pod stuck with NodeAffinity status // using spot VMs under K8s 1.22.x pod stuck with NodeAffinity status // using spot VMs under K8s 1.22.x and 1.23.x Sep 19, 2022
@pacoxu
Copy link
Member

pacoxu commented Sep 20, 2022

"kube nodes delete... --force" approach

Does this mean that you delete a node and restart the kubelet several times?

  1. kubectl delete node --force
  2. restart kubelet on Node1

Can you share a sample pod yaml on that node with NodeAffinity state?

@gillesdouaire
Copy link
Author

gillesdouaire commented Sep 21, 2022

Only the first step; once the node is force deleted, kubernetes will have a new node respawn, and then pods will be reassigned correctly OR will fall in the NodeAffinity state.

The pods I had left in NodeAffinity state have been flushed (spot VMs were restarted), so I will need to re-generate a case, will post it here as soon as I have the data from a pod yaml.

@gillesdouaire
Copy link
Author

Good and/or bad news: I've seen the NodeAffinity status occur once under K8s 1.23, but now I have trouble reproducing.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2022
@jonpulsifer
Copy link
Contributor

+1, after adding a new preemptible (not spot) node pool to a 1.23.13-gke.900 cluster and scheduling a deployment there I've also noticed this behaviour on the first couple preemptions

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2023
@pacoxu
Copy link
Member

pacoxu commented Jan 28, 2023

+1, after adding a new preemptible (not spot) node pool to a 1.23.13-gke.900 cluster and scheduling a deployment there I've also noticed this behaviour on the first couple preemptions

@jonpulsifer Do you mind to give some more evidence on how to reproduce the issue?

Good and/or bad news: I've seen the NodeAffinity status occur once under K8s 1.23, but now I have trouble reproducing.

The process is hanging for no stable steps to reproduce the issue in my mind.

@SimSimY
Copy link

SimSimY commented Jan 31, 2023

+1
This happens in our GKE cluster to about 5% of the pods that run on preemptive nodes.

1.23.14-gke.401/1.23.12-gke.100

@pacoxu
Copy link
Member

pacoxu commented Mar 28, 2023

@SimSimY do you have some more details?

@vaibhavkhurana2018
Copy link

vaibhavkhurana2018 commented Jun 13, 2023

This happens with 1.25.8-gke.500 as well.

Steps to reproduce:

  1. Create a cluster with managed SPOT node pool having 1 node.
  2. Create an alpine deployment with 1 replica and a node selector on it.
      nodeSelector: 
        cloud.google.com/gke-spot: "true"
  1. Add a pause or sleep on the above.
  2. Simulate the node-maintenance step on the node the pod is scheduled on. Ref: https://cloud.google.com/compute/docs/instances/simulating-host-maintenance#gcloud
  3. Once the pod comes back up, it will have 1 dangling with NodeAffinity and the other in Running state.
NAME                                  READY   STATUS         RESTARTS   AGE
spot-graceful-test-868b9dd54f-9sh7j   1/1     Running        0          23s
spot-graceful-test-ff77554fd-fv42r    0/1     NodeAffinity   0          3m36s

@shaneqld
Copy link

I can confirm this also happens in our clusters running 1.25.8-gke.1000. Seems to happen when spot VMs are pre-empted, but only occasionally.

Here's sample events from kubectl describe on a pod stuck in NodeAffinity state:

Events:
  Type     Reason               Age                From                                   Message
  ----     ------               ----               ----                                   -------
  Warning  FailedScheduling     47m (x4 over 47m)  gke.io/optimize-utilization-scheduler  0/17 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 3 node(s) had untolerated taint {...}, 4 node(s) had untolerated taint {...}, 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/17 nodes are available: 15 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod.
  Normal   Scheduled            46m                gke.io/optimize-utilization-scheduler  Successfully assigned default/mypod-556d8c7486-xxxxx to mycluster--preemptible-a4fbde29-xxxx
  Warning  FailedMount          46m                kubelet                                MountVolume.SetUp failed for volume "istiod-ca-cert" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling              46m                kubelet                                Pulling image "docker.io/istio/proxyv2:1.xx.x"
  Normal   Pulled               45m                kubelet                                Successfully pulled image "docker.io/istio/proxyv2:1.xx.x" in 545.587643ms (41.513583798s including waiting)
  Normal   Created              45m                kubelet                                Created container istio-init
  Normal   Started              45m                kubelet                                Started container istio-init
  Normal   Pulling              45m                kubelet                                Pulling image "docker.io/istio/proxyv2:1.xx.x"
  Normal   Pulled               45m                kubelet                                Successfully pulled image "docker.io/istio/proxyv2:1.xx.x" in 479.028778ms (13.148100799s including waiting)
  Normal   Created              45m                kubelet                                Created container istio-proxy
  Normal   Started              45m                kubelet                                Started container istio-proxy
  Warning  ExceededGracePeriod  45m                kubelet                                Container runtime did not kill the pod within specified grace period.
  Warning  NodeAffinity         43m                kubelet                                Predicate NodeAffinity failed
  Warning  FailedMount          43m (x6 over 43m)  kubelet                                MountVolume.SetUp failed for volume "istiod-ca-cert" : object "default"/"istio-ca-root-cert" not registered
  Warning  FailedMount          43m (x5 over 43m)  kubelet                                MountVolume.SetUp failed for volume "kube-api-access-8bs7r" : object "default"/"kube-root-ca.crt" not registered  

@c4talyst
Copy link

Same experience on v1.25.10-gke.1400; lots of NodeAffinity pods after spot nodes are preempted.

This was also happening on 1.24.13-gke.2500; and we upgraded to attempt to reduce the noise.

Google says this is 'fixed' from 1.25.7-gke.1000 or later https://cloud.google.com/kubernetes-engine/docs/release-notes#April_14_2023 (but it's not)

Sliced screenshot output of the equivalent of kubectl get po,no
image

@austinpray-mixpanel
Copy link

Still able to reproduce on control plane 1.25.12-gke.500 / preemptible nodepool 1.25.12-gke.500

Screenshot 2023-11-22 at 8 46 51 AM

NodeAffinity status pods look like:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'Pod has become Healthy in NEG "Key{\"(snip)\",
      zone: \"us-central1-b\"}" attached to BackendService "Key{\"(snip)\"}".
      Marking condition "cloud.google.com/load-balancer-neg-ready" to True.'
    reason: LoadBalancerNegReady
    status: "True"
    type: cloud.google.com/load-balancer-neg-ready
  message: Pod Predicate NodeAffinity failed
  phase: Failed
  reason: NodeAffinity

@bobbypage
Copy link
Member

Are you seeing that all the pods stuck in "the node affinity" status have phase: Failed? If so, those pods are terminal and will not be started on any new nodes. Since those pods are terminal, they shouldn't affect anything -- the pod gc garbage collector will eventually clean those up or they can be manually cleaned up.

GKE has a fix for this issue to automatically clean up terminal pods on VM preemption that is available from control plane version 1.27.2-gke.1800+. Please try that out for the long term fix. Since this is a GKE specific issue, please reach out to GKE support if you continue to have issues. Thanks!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 20, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests