Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cri: legacy symlink breaks non-Docker CRI implementations #44043

Closed
cyphar opened this issue Apr 4, 2017 · 14 comments
Closed

cri: legacy symlink breaks non-Docker CRI implementations #44043

cyphar opened this issue Apr 4, 2017 · 14 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@cyphar
Copy link

cyphar commented Apr 4, 2017

While working on cri-o/cri-o#162 @sameo found that the logPath that we are given by kubelet is actually a symlink to the "legacy container log path" which is a Docker-specific path that doesn't exist with non-Docker CRI implementations.

As a result, currently kubelet breaks the CRI for non-Docker implementations. This was originally merged in #34858, but it seems like this symlink should be created by dockershim not by kubelet.

@yujuhong yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 4, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Apr 4, 2017

@Random-Liu I thought the legacy symlinks are just there for fluentd to consume. Do we actually serve logs through those symlinks?

@Random-Liu
Copy link
Member

@cyphar I don't understand. Do you mean that the path passed to cri-o by CRI is legacy container log path?
We pass in the sandbox log directory path here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L98
pass in the container log path here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container.go#L154

I didn't see any problem.

@cyphar
Copy link
Author

cyphar commented Apr 4, 2017

@Random-Liu From this comment the path provided by CRI is something like /var/log/pods/7075157cfd4524dbe0951e00a8e3129e/etcd_0.log (which is fine). However, that path is actually a symlink to somewhere inside /var/lib/docker created here. Which obviously won't exist if you're not running Docker.

This means that an attempt for a non-Docker runtime to O_CREAT|O_APPEND that file will fail (not to mention it doesn't make sense for CRI to be hardcoding Docker paths like that).

@Random-Liu
Copy link
Member

@cyphar Actually it's not /var/lib/docker, it's the legacy kubernetes container log directory /var/log/containers.

We do that because even though we defined the new CRI log path /var/lib/pods, the cluster logging is not changed yet - fluentd is still looking at /var/lib/containers. For not breaking cluster logging, we symlink the log in new CRI log path /var/lib/pods/container/**.log to legacy kubernetes log path /var/lib/container/**.log.

@cyphar
Copy link
Author

cyphar commented Apr 4, 2017

@Random-Liu Do you know why @sameo's cluster had symlinks to /var/lib/docker/containers/df19314697ecce0c9c30754f97ce7947d231fd4b04424b16448d1a28c3075876/df19314697ecce0c9c30754f97ce7947d231fd4b04424b16448d1a28c3075876-json.log then? In any case, does kubelet ensure that the legacy kubernetes log path exists (or at the least the directory exists) when it gets passed to the runtime?

@yujuhong
Copy link
Contributor

yujuhong commented Apr 4, 2017

kubelet creates legacy symlinks in /var/log/containers to point to the new CRI log paths:

# ls /var/log/containers/ -lh
total 172K
lrwxrwxrwx 1 root root 64 Apr  4 18:31 busy2_default_busybox-9a2fa1efb496ec14b540cc60f8650f5046366421dd423525c973fc9bc9471543.log -> /var/log/pods/f8adb3b7-1964-11e7-ac5f-42010af00002/busybox_0.log
lrwxrwxrwx 1 root root 65 Mar 15 17:49 e2e-image-puller-e2e-test-yjhong-minion-group-38x1_kube-system_image-puller-08f73adc1fa6a0553d20ed5b314c6c0a7c7c21346cf5604bf94bf3e9637ff35d.log -> /var/log/pods/e7746ccbe7101f6199fec3c7d56c903c/image-puller_0.log
lrwxrwxrwx 1 root root 68 Mar 15 17:49 e2e-image-puller-e2e-test-yjhong-minion-group-38x1_kube-system_nethealth-check-e3962d95544fc57bb8a041a01c6f2946dec7027d48f97025031da52aa7474005.log -> /var/log/pods/e7746ccbe7101f6199fec3c7d56c903c/nethealth-check_0.log
lrwxrwxrwx 1 root root 68 Apr  4 19:13 fluentd-gcp-v2.0-b4h0v_kube-system_fluentd-gcp-57827b1ab1a8c72d3beb92aa74f54521889e10f6c456043b0f30e7646c620d26.log -> /var/log/pods/cf9e3369-09a7-11e7-bb7e-42010af00002/fluentd-gcp_4.log

dockershim (which should not run in your case) creates the symlinks from the CRI log paths (/var/log/pods/*) to the logs in /var/lib/docker

# ls /var/log/pods/2152edfc-0f3f-11e7-ac5f-42010af00002/ -lh
total 8.0K
lrwxrwxrwx 1 root root 165 Apr  4 14:08 hostnet_110.log -> /var/lib/docker/containers/c188fa70e2e999198ff1f75010ec72442febff411c71964a1f5fab34db188935/c188fa70e2e999198ff1f75010ec72442febff411c71964a1f5fab34db188935-json.log
lrwxrwxrwx 1 root root 165 Apr  4 18:30 hostnet_111.log -> /var/lib/docker/containers/c009517e49704a4d0d2c2cd9ca2505085c4f65faf8005e04a6d41d6a2cdb2506/c009517e49704a4d0d2c2cd9ca2505085c4f65faf8005e04a6d41d6a2cdb2506-json.log

@cyphar unless you are actually running dockershim, or the legacy non-cri implementation, you should not have symlinks to /var/lib/docker/containers.

@cyphar
Copy link
Author

cyphar commented Apr 4, 2017

@sameo was the person who ran into this issue, so I can't be sure about how he ran things, but from what he said he restarted a k8s cluster to switch from dockershim to cri-o.

@yujuhong
Copy link
Contributor

yujuhong commented Apr 4, 2017

@sameo was the person who ran into this issue, so I can't be sure about how he ran things, but from what he said he restarted a k8s cluster to switch from dockershim to cri-o.

Sounds very likely that those symlinks were the artifact from dockershim before making the switch.

@sameo could you confirm whether there are any problems and close this if there are none? Thanks!

@yujuhong yujuhong added the kind/support Categorizes issue or PR as a support question. label Apr 4, 2017
@castrojo
Copy link
Member

/remove-kind support

@k8s-ci-robot k8s-ci-robot removed the kind/support Categorizes issue or PR as a support question. label Jun 30, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Jun 30, 2017

@castrojo could you explain why you removed the label?

@castrojo
Copy link
Member

Yep, kind/support is deprecated, there's only a handful of them left with this label so I'm helping clean them up: https://groups.google.com/forum/#!topic/kubernetes-wg-contribex/vfHAHhHFBDg

@yujuhong yujuhong added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Jun 30, 2017
@yujuhong
Copy link
Contributor

Yep, kind/support is deprecated, there's only a handful of them left with this label so I'm helping clean them up: https://groups.google.com/forum/#!topic/kubernetes-wg-contribex/vfHAHhHFBDg

Thanks for the explanation. I added the priority/awaiting-more-evidence label.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Jan 2, 2018

Closing since the issue has been inactive for over 6 months. Feel free to open a new issue if you run into the problem again (using a newer kubernetes version).

@yujuhong yujuhong closed this as completed Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

6 participants