Sometime Liveness/Readiness Probes fail because of net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) #89898

yuripastushenko · 2020-04-06T15:49:32Z

What happened:
In my cluster sometimes readiness the probes are failing. But the application works fine.

Apr 06 18:15:14 kubenode** kubelet[34236]: I0406 18:15:14.056915   34236 prober.go:111] Readiness probe for "default-nginx-daemonset-4g6b5_default(a3734646-77fd-11ea-ad94-509a4c9f2810):nginx" failed (failure): Get http://172.18.123.127:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

What you expected to happen:
Successful Readiness Probe.

How to reproduce it (as minimally and precisely as possible):
We have few clusters with different workloads.
Only in cluster with big number of short living pods we have this issue.
But not on all nodes.
We can't reproduce this error on other our clusters (that have same configuration but different workload).
How i found the issue?
I deployed a daemonset:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: default-nginx-daemonset
  namespace: default
  labels:
    k8s-app: default-nginx
spec:
  selector:
    matchLabels:
      name: default-nginx
  template:
    metadata:
      labels:
        name: default-nginx
    spec:
      tolerations:
      - operator: Exists
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "1"
            memory: "1Gi"
        readinessProbe:
          httpGet:
            path: /
            port: 80

Then i started to listen events on all pods of this daemonset.
After a couple of time i received next events:

Warning  Unhealthy  110s (x5 over 44m)  kubelet, kubenode20  Readiness probe failed: Get http://172.18.122.143:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning  Unhealthy  11m (x3 over 32m)  kubelet, kubenode10  Readiness probe failed: Get http://172.18.65.57:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
....

Those events where on ~50% of pods of this daemonset.

On the nodes where the pods with failed probes was placed, I collected the logs of kubelet.
And there was errors like:

Apr 06 14:08:35 kubenode20 kubelet[10653]: I0406 14:08:35.464223   10653 prober.go:111] Readiness probe for "default-nginx-daemonset-nkwkf_default(90a3883b-77f3-11ea-ad94-509a4c9f2810):nginx" failed (failure): Get http://172.18.122.143:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I was thinking that sometimes the nginx in pod really response slowly.
For excluding this theory, I created a short script that curl the application in pod and store response time in a file:

while true; do curl http://172.18.122.143:80/ -s -o /dev/null -w  "%{time_starttransfer}\n" >> /tmp/measurment.txt; done;

I run this script on node where the pod is placed for 30 minutes and i get the following:

$ cat /tmp/measurment.txt | sort -u
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007

$ cat /tmp/measurment.txt | wc -l
482670

There was 482670 measurements and the longest response time was 0.007.

In logs of pod there are only message with response code 200 (from my requests and requests of readiness probes):

[06/Apr/2020:14:06:30 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"
......
[06/Apr/2020:14:08:35 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
[06/Apr/2020:14:08:35 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
[06/Apr/2020:14:08:35 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
......
[06/Apr/2020:14:08:41 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"

It means that part of probes are successful.

Then i stopped the curl script (because the big number of logs).
I waited while new error with failed probe appears in kubelet logs.

Apr 06 18:15:14 kubenode13 kubelet[34236]: I0406 18:15:14.056915   34236 prober.go:111] Readiness probe for "default-nginx-daemonset-4g6b5_default(a3734646-77fd-11ea-ad94-509a4c9f2810):nginx" failed (failure): Get http://172.18.123.127:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

And in logs of that pod with nginx I didn't find the request generated by this probe:

kubectl logs default-nginx-daemonset-4g6b5 | grep "15:15"
[06/Apr/2020:18:15:00 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"
[06/Apr/2020:18:15:20 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"
[06/Apr/2020:18:15:30 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"
[06/Apr/2020:18:15:40 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"
[06/Apr/2020:18:15:50 +0000] "GET / HTTP/1.1" 200 612 "-" "kube-probe/1.12" "-"

If I restart the kubelet - the error don't disappear.
Have someone any suggestions about this?

Environment:

Kubernetes version: 1.12.1
Cloud provider or hardware configuration: **hardware
OS (e.g: cat /etc/os-release): ubuntu 16.04
Kernel (e.g. uname -a): 4.15.0-66-generic
Install tools:
Network plugin and version (if this is a network-related bug): calico:v3.1.3

Seems like the problem appears in many different installations - #51096

/sig network

The text was updated successfully, but these errors were encountered:

yuripastushenko · 2020-04-06T15:49:43Z

/sig network

athenabot · 2020-04-06T16:12:09Z

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

Nittarab · 2020-04-06T18:31:55Z

Are you sure the application don't hit the resources limits?

In my case, the application starting fine, then the container start using more resources until he hit the limit. After that the readiness probes fail

yuripastushenko · 2020-04-07T09:21:22Z

Yes, I am sure. I collected the metrics.
Cpu usage on pod was around 0.001
Ram - around 4mb.
This is a test application (pure nginx image) that did not do anything (no traffic is sent to it).

yuripastushenko · 2020-04-07T12:56:09Z

/area kubelet

manikanta-kondeti · 2020-04-16T18:10:22Z

any update on this? we have been facing similar issues since few weeks

thockin · 2020-04-16T23:27:39Z

This is the first such report I have seen. There's nothing obvious about why this would happen.

It's possible kubelet is too busy and starved for CPU and the probe happened to be thing that got thrashed. How many pods are on this machine? How busy is it?

It's possible the node itself is thrashing or something and OOM behavior is weird. Does dmesg show any OOMs?

It's possible some other failure down deep in kubelet is being translated into this? You could try running kubelet at a higher log level to get more details on what is happening.

A lot of bugs have been fixed since 1.12, so we'd need to try to reproduce this and then try again in a more recent version. Is there any way you can help boil down a simpler reproduction?

athenabot · 2020-04-23T22:12:14Z

@thockin
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

manikanta-kondeti · 2020-05-05T11:36:47Z

@thockin
returns-console-558995dd78-b6bjm 0/1 Running 0 23h

For example, if you see the above get pods, readiness probe of one pod is failing almost every day. A restart would fix this. However, we're unable to find the root cause. But I don't see any abnormal numbers on CPU or memory or thread count. Doing a describe pod would give me that readiness has failed. How do we debug in such scenario?

Also, this is happening for this particular deployment only. The other pods are running fine.

"returns-console-558995dd78-pbjf8 1/1 Running 0 23h
returns-console-558995dd78-plqdr 1/1 Running 0 23h
returns-console-558995dd78-tlb5k 1/1 Running 0 23h
returns-console-558995dd78-tr2kd 1/1 Running 0 23h
returns-console-558995dd78-vfj2n 1/1 Running 0 23h"

tarunwadhwa13 · 2020-05-09T08:27:02Z

We too are facing the same issue in our cluster.

Kubernetes Version - 1.16.8
Kernel - 4.4.218-1
OS - Centos 7

Installed using Kubespray

In our case, timeouts were not related to application but related to specific nodes in cluster. In our 13 node cluster, node 1 to 4 had some kind of issue wherein the pods running on these nodes had random failures due to timeouts.

Checked that there weren't any cpu aur memory usage spikes also.

P.S We are using NodePort for production use case. Is it possible that the node port setup cannot handle too many socket connections?

thockin · 2020-05-14T20:58:06Z

I have no idea what might cause spurious probe failues. @derekwaynecarr have you heard any other reports of this?

@tarunwadhwa13 are you saying PROBES failed (always same node) or user traffic failed? If you have any other information about what was going on with those nodes when the failures happen, it would help. Check for OOMs, high CPU usage, conntrack failures?

tarunwadhwa13 · 2020-05-16T08:46:08Z

@thockin Conntrack shows hardly 2 or 3 errors. Memory consumption is 60-65% per node.

Just found that the timeouts were for almost all request and not just probe. We added istio lately to check connection stats and understand if the behaviour was due to application. But the findings were weird. Istio itself is now failing readiness probe quite frequently

157 failures in ~3 hours

Would like to add that kubernetes is running in our Datacenter. And since iptables version is 1.4.21, --random-fully is not being implemented. But since all machines have same configuration, we ruled out this possibility

casey-robertson · 2020-05-22T21:17:23Z

I apologize for not having a lot of details to share but I'd add my 2 cents. We recently upgraded from Istio 1.4.4 to 1.5.4 and started seeing the issues described by OP. Lots of liveness / readiness issues there were not happening before. It SEEMS like adding let's say a 20sec initial delay had helped in most cases. At this point we are still seeing it and not sure what the root cause is.

running on EKS 1.15 (control plane) / 1.14 managed nodeGroups

athenabot · 2020-05-23T23:12:09Z

@thockin
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

thockin · 2020-05-29T15:43:12Z

I'm afraid the only way to know more is to get something like a tcpdump from both inside and outside the pod, which captures one or more failing requests.

Possible?

den-is · 2020-06-04T16:05:20Z

I'm getting same issue.
Couple Nginx+PHP Pods running on huge instances in parallel with couple other small applications.
These are staging apps+nodes without constant traffic.
I constantly receive notification that these Nginx+PHP app has restarted... specifically Nginx container of these Pods.
At the same time other apps running in the same namespace, nodes never restart.

k -n staging get events
...
22m         Warning   Unhealthy                  pod/myapp-staging3-59f75b5d49-5tscw    Liveness probe failed: Get http://100.100.161.197:80/list/en/health/ping: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
23m         Warning   Unhealthy                  pod/myapp-staging4-7589945cc6-2nf4s    Liveness probe failed: Get http://100.100.161.198:80/list/en/health/ping: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
11m         Warning   Unhealthy                  pod/myapp-staging5-84fb4494db-5dvph    Liveness probe failed: Get http://100.104.124.220:80/list/en/health/ping: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
...

Liveness on an Nginx container looks like this:

  liveness:
    initialDelaySeconds: 10
    periodSeconds: 10
    failureThreshold: 3
    httpGet:
      path: /list/en/health/ping
      port: 80

UPD: Strange thing is that completely distinct deployments you can see above staging4 staging5 stagingN - above 10 deployments fail at once.

My possible problem might be in missing timeoutSeconds which is default 1s

rudolfdobias · 2020-06-11T15:07:56Z

Having the same problem here.

livenessProbe:
        httpGet:
          path: /status
          port: 80
        initialDelaySeconds: 30
        periodSeconds: 10
        timeoutSeconds: 10
        failureThreshold: 3

Error cca 3 - 10 times a day:

Liveness probe failed: Get http://10.244.0.154:8002/status: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The service operates normally and responds to /status in 5ms, though.

⚠️ Also, and that is a bigger problem, at similar random times some pods refuse to connect to each other.

Running on Azure Kubernetes Service
V 1.14.8

arjun921 · 2020-06-12T16:21:33Z

I'm facing the same issue as well, increasing timeoutSeconds didn't help.

          livenessProbe:
            httpGet:
              path: /ping
              port: http
            failureThreshold: 5
            initialDelaySeconds: 5
            periodSeconds: 20
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /ping
              port: http
            initialDelaySeconds: 5
            periodSeconds: 20
            timeoutSeconds: 10

Runing on Kubernetes v1.16.7 on AWS deployed via KOPS

thockin · 2020-06-13T00:17:07Z

I appreciate the extra reports. It sounds like something is really weird. I'll repeat myself from above:

I'm afraid the only way to know more is to get something like a tcpdump from both inside and outside the pod, which captures one or more failing requests.

Is that possible? Without that I am flying very blind. I don't see this experimentally and I'm not flooded with reports of this, so it's going to be difficult to pin down. If you say you see it repeatedly, please try to capture a pcap?

arjun921 · 2020-06-13T01:48:01Z

@thockin I'll try to get a dump if I'm able to replicate this issue consistently, since it tends to happen randomly.
Just to clarify, what exactly did you mean by tcpdump outside the pod?
Did you mean tcpdump of the node where the pod resides?

Sorry, I'm relatively new to this 😅

athenabot · 2020-06-23T00:12:17Z

@thockin
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

thockin · 2023-01-16T23:22:26Z

Why is kubelet opening new connections for every probe? Shouldn't it re-use the connection when it can?

aojea · 2023-01-16T23:37:21Z

Why is kubelet opening new connections for every probe? Shouldn't it re-use the connection when it can?

I's been that way since v1.2 80287e3

Also: net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 - isn't that the TIME_WAIT period?

that is the conntrack wait period, it will disappear once the socket closes.

The TIME_WAIT period for the socket is a constant of the kernel, however, users can override it per socket using the SO_LINGER options.

I see 2 possible directions:

reusing connections that means we have to refactor all the probes (for http I think that golang will handle it)
setting SO_LINGER to 0 or a much lower value (this is probing so the risk of loosing non-ack data is not important) and decrease the TIME_WAIT value

thockin · 2023-01-16T23:42:24Z

Option 1 is the "right" answer, but may be harder. The probe code isn't my favorite...

…

On Mon, Jan 16, 2023 at 3:37 PM Antonio Ojea ***@***.***> wrote: Why is kubelet opening new connections for every probe? Shouldn't it re-use the connection when it can? I's been that way since v1.2 80287e3 <80287e3> Also: net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 - isn't that the TIME_WAIT period? that is the conntrack wait period, it will disappear once the socket closes. The TIME_WAIT period for the socket is a constant of the kernel, however, users can override it per socket using the SO_LINGER options. I see 2 possible directions: 1. reusing connections that means we have to refactor all the probes (for http I think that golang will handle it) 2. setting SO_LINGER to 0 or a much lower value (this is probing so the risk of loosing non-ack data is not important) and decrease the TIME_WAIT value — Reply to this email directly, view it on GitHub <#89898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVF3U6X4D3WZS6HJQX3WSXLUHANCNFSM4MCMLXCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aojea · 2023-01-16T23:45:42Z

let me see how far I go with 1

aojea · 2023-01-17T22:26:53Z

I did 1 and 2 XD #115143

aojea · 2023-01-18T13:15:28Z

I have to leave 1 out, the results are very confusing and not deterministic.
The way http reuses connections c differs between http1 and http2, and golang stdlib behave also different for those protocols.

I think that option 2 for probes is the best option, less disruptive and the results I obtained locally are very promising.

thockin · 2023-01-21T00:50:19Z

Will you revisit HTTP reuse?

…

On Wed, Jan 18, 2023 at 5:15 AM Antonio Ojea ***@***.***> wrote: I have to leave 1 out, the results are very confusing and not deterministic. The way http reuses connections c differs between http1 and http2, and golang stdlib behave also different for those protocols. I think that option 2 for probes is the best option, less disruptive and the results I obtained locally are very promising. — Reply to this email directly, view it on GitHub <#89898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVCAZSVXLTBQ7WFKUPTWS7UIHANCNFSM4MCMLXCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aojea · 2023-01-21T14:48:47Z

Will you revisit HTTP reuse?
…
On Wed, Jan 18, 2023 at 5:15 AM Antonio Ojea @.> wrote: I have to leave 1 out, the results are very confusing and not deterministic. The way http reuses connections c differs between http1 and http2, and golang stdlib behave also different for those protocols. I think that option 2 for probes is the best option, less disruptive and the results I obtained locally are very promising. — Reply to this email directly, view it on GitHub <#89898 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVCAZSVXLTBQ7WFKUPTWS7UIHANCNFSM4MCMLXCA . You are receiving this because you were mentioned.Message ID: @.>

That's a can of worms, http1 Vs http2 thing is complex , let's this soak and revisit one we are able to get some data about the impact of this change

dprotaso · 2023-02-08T15:11:26Z

FWIW - I found this article useful in explaining TIME-WAIT and potential tcp options to use in Linux - https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

smyja · 2023-07-13T16:06:55Z

I am still facing the same issue.

adeniyistephen · 2023-07-14T21:36:33Z

Same here, still facing this issue.

aojea · 2023-08-30T16:54:43Z

please indicate with which version are you facing the issue and how many probes are configured in the node, and if they are http/TCP probes or exec probes

pg98gh · 2023-08-31T13:26:19Z

hi, i'm facing this issue right now, too, in on of my namespaces after Digital Ocean did an automatic Update to 1.24.16-do.0. it pulls the image successfully, but then crashes and throws me Liveness probe failed: Get "http://<my.ip>/actuator/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
when curling the IP inside the pod i'm getting a 137 error.
i may have about 5 or more http/TCP probes in the node.

danwinship · 2023-09-07T14:23:57Z

Why is kubelet opening new connections for every probe? Shouldn't it re-use the connection when it can?

No, that would fail to detect a pod doing graceful shutdown, wouldn't it? Kubelet needs to know whether the pod is accepting new connections, not just whether it's continuing to process old connections.

cdenneen · 2023-10-07T15:37:44Z

@aojea do you have example of how to implement option 2 (since option 1 doesn't seem to be doable at this point)?
We are experiencing these a lot and would like a proper way to address all our http probes. Can this happen at the probe spec level or does this change need to happen at the host level? If so what changes are you recommending we add to the bootstrap in order to resolve this. Having it documented would greatly be helpful here.

aojea · 2023-10-07T18:26:48Z

it is already implemented in #115143

cdenneen · 2023-10-07T21:12:59Z

@aojea ok so still happening for me on 1.26 host, have to confirm on 1.27. Do you know which release this made it into? Seems like there is an open PR to fix the http reuse which would fix option 1.

cdenneen · 2023-10-07T21:57:49Z

Looks like it was part of 1.27. I’ll upgrade that cluster this week.

rhyek · 2023-10-24T16:23:51Z

@cdenneen did you get a change to upgrade and did you see any improvement?

SlawekKarczewski · 2024-04-25T14:49:37Z

Hello,
I am facing this intemitient issue since some time. Usually it can heal itself after a while, or killing the project contour pods which causing restarting them fixing the problem, until next time.
Some of the cluster output logs:

"reason": "Unhealthy",
"message": "Liveness probe failed: Get "https://111.16.97.167:4443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
"source": {
"component": "kubelet",
"host": "xkubn02.xyz.local"
},
"reason": "Unhealthy",
"message": "Readiness probe failed: Get "https://111.16.97.167:4443/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
"source": {
"component": "kubelet",
"host": "xkubn02.xyz.local"
},

"reason": "Unhealthy",
"message": "Liveness probe failed: command "/bin/calico-node -felix-live -bird-live" timed out",
"source": {
"component": "kubelet",
"host": "xkubn02.xyz.local"
},
"firstTimestamp": "2024-01-05T18:21:46Z",
"lastTimestamp": "2024-04-25T13:46:25Z",

"involvedObject": {
"kind": "Pod",
"namespace": "kube-system",
"name": "coredns-6497469476-lw7mj",
"uid": "xyz",
"apiVersion": "v1",
"resourceVersion": "321482005"
},
"reason": "NodeNotReady",
"message": "Node is not ready",
"source": {
"component": "node-controller"
},
"firstTimestamp": "2024-04-24T16:33:27Z"

"reason": "Unhealthy",
"message": "Liveness probe failed: Get "http://111.16.97.151:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
"source": {
"component": "kubelet",
"host": "xkubn02.xyz.local"
},
"firstTimestamp": "2024-04-24T16:52:25Z",
"lastTimestamp": "2024-04-25T13:51:05Z",
"count": 3,

2024-03-27 04:48:27.774 [INFO][1] resources.go 350: Main client watcher loop
2024-03-27 04:52:59.896 [ERROR][1] client.go 290: Error getting cluster information config ClusterInformation="default" error=Get "https://101.96.01.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2024-03-27 04:52:59.896 [ERROR][1] main.go 269: Failed to verify datastore error=Get "https://101.96.01.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2024-03-27 04:53:01.900 [ERROR][1] main.go 304: Received bad status code from apiserver error=an error on the server ("[+]ping ok\n[+]log ok\n[-]etcd failed: reason withheld\n[+]poststarthook/start-kube-apiserver-admission-initializer

ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/priority-and-fairness-config-consumer ok\n[+]poststarthook/priority-and-fairness-filter ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/crd-informer-synced ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/rbac/bootstrap-roles ok\n[+]poststarthook/scheduling/bootstrap-system-priority-classes ok\n[+]poststarthook/priority-and-fairness-config-producer ok\n[+]poststarthook/start-cluster-authentication-info-controller ok\n[+]poststarthook/aggregator-reload-proxy-client-cert ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[+]poststarthook/apiservice-openapi-controller ok\nhealthz check failed") has prevented the request from succeeding status=500
2024-03-27 05:44:29.841 [INFO][1] resources.go 378: Terminating main client watcher loop

yuripastushenko added the kind/bug Categorizes issue or PR as related to a bug. label Apr 6, 2020

k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Apr 6, 2020

yuripastushenko changed the title ~~Sometimes Readiness Probes are false-positive~~ Sometimes Liveness/Readiness Probes are false-positive Apr 6, 2020

yuripastushenko changed the title ~~Sometimes Liveness/Readiness Probes are false-positive~~ Sometimes Liveness/Readiness Probes fail because of net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Apr 6, 2020

k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 6, 2020

k8s-ci-robot added the area/kubelet label Apr 7, 2020

thockin self-assigned this Apr 16, 2020

thockin added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 14, 2020

k8s-ci-robot assigned aojea Jan 16, 2023

aojea mentioned this issue Jan 17, 2023

Kubelet TCP/HTTP probes: improve network resources utilization #115143

Merged

k8s-ci-robot closed this as completed in #115143 Jan 22, 2023

SIG Node Bugs automation moved this from Triaged to Done Jan 22, 2023

SergeyKanzhelev mentioned this issue Feb 15, 2023

Write the stress test for gRPC, http, and tcp probes #115782

Open

SergeyKanzhelev mentioned this issue Feb 23, 2023

Liveness probe failed: context deadline exceeded (Client.Timeout exceeded while awaiting headers) #115469

Closed

alpeb mentioned this issue Mar 8, 2023

Linkerd-destination crashing linkerd/linkerd2#8235

Closed

jan-kantert mentioned this issue Oct 4, 2023

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector linkerd/linkerd2#11453

Closed

jan-kantert mentioned this issue Oct 26, 2023

Node memory swap support kubernetes/enhancements#2400

Open

44 tasks

tssurya mentioned this issue Mar 26, 2024

Redesigning Kubelet Probes kubernetes/enhancements#4559

Open

4 tasks

Sometime Liveness/Readiness Probes fail because of net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) #89898

Sometime Liveness/Readiness Probes fail because of net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) #89898

Comments

yuripastushenko commented Apr 6, 2020

yuripastushenko commented Apr 6, 2020

athenabot commented Apr 6, 2020

Nittarab commented Apr 6, 2020 • edited

yuripastushenko commented Apr 7, 2020 • edited

yuripastushenko commented Apr 7, 2020

manikanta-kondeti commented Apr 16, 2020

thockin commented Apr 16, 2020

athenabot commented Apr 23, 2020

manikanta-kondeti commented May 5, 2020 • edited

tarunwadhwa13 commented May 9, 2020

thockin commented May 14, 2020

tarunwadhwa13 commented May 16, 2020

casey-robertson commented May 22, 2020

athenabot commented May 23, 2020

thockin commented May 29, 2020

den-is commented Jun 4, 2020 • edited

rudolfdobias commented Jun 11, 2020 • edited

arjun921 commented Jun 12, 2020

thockin commented Jun 13, 2020

arjun921 commented Jun 13, 2020 • edited

athenabot commented Jun 23, 2020

thockin commented Jan 16, 2023

aojea commented Jan 16, 2023

thockin commented Jan 16, 2023 via email

aojea commented Jan 16, 2023

aojea commented Jan 17, 2023

aojea commented Jan 18, 2023

thockin commented Jan 21, 2023 via email

aojea commented Jan 21, 2023

dprotaso commented Feb 8, 2023

smyja commented Jul 13, 2023

adeniyistephen commented Jul 14, 2023

aojea commented Aug 30, 2023

pg98gh commented Aug 31, 2023 • edited

danwinship commented Sep 7, 2023

cdenneen commented Oct 7, 2023

aojea commented Oct 7, 2023

cdenneen commented Oct 7, 2023

cdenneen commented Oct 7, 2023

rhyek commented Oct 24, 2023

SlawekKarczewski commented Apr 25, 2024 • edited

Nittarab commented Apr 6, 2020 •

edited

yuripastushenko commented Apr 7, 2020 •

edited

manikanta-kondeti commented May 5, 2020 •

edited

den-is commented Jun 4, 2020 •

edited

rudolfdobias commented Jun 11, 2020 •

edited

arjun921 commented Jun 13, 2020 •

edited

pg98gh commented Aug 31, 2023 •

edited

SlawekKarczewski commented Apr 25, 2024 •

edited