Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy in LocalModeNodeCIDR mode may cache stale Node.PodCIDR if the Node object is recreated #111321

Closed
bowei opened this issue Jul 21, 2022 · 7 comments · Fixed by #111344 or #118499
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@bowei
Copy link
Member

bowei commented Jul 21, 2022

What happened?

If LocalModeNodeCIDR is true, kube-proxy reads the Node.PodCIDR value and uses it to determine whether or not traffic is local to the Node.

In certain cases (e.g. when kube-proxy runs before the Node information has been completely updated), it is possible for kube-proxy to read and store a stale PodCIDR value. This will cause kube-proxy to route traffic incorrectly as the local traffic determination is no longer correct.

Note: PodCIDR is immutable once set but it is possible to delete and recreate the Node object to get into this state. (yes, this can happen)

/sig network

What did you expect to happen?

kube-proxy should detect this state and reconfigure/restart with the newly updated PodCIDR.

How can we reproduce it (as minimally and precisely as possible)?

Recreate the Node object with a different PodCIDR after kube-proxy has started on the Node.

Kubernetes version

Impacts all current versions that support LocalModeNodeCIDR

Cloud provider

Observed for GCE preemptible/spot VMs, but is a generic problem.
@bowei bowei added the kind/bug Categorizes issue or PR as related to a bug. label Jul 21, 2022
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 21, 2022
@bobbypage
Copy link
Member

/cc

@aojea
Copy link
Member

aojea commented Jul 21, 2022

We are watching the node object since we've implemented topology hint, so it would be easier to adapt the code to react on these changes

@aojea
Copy link
Member

aojea commented Jul 21, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 21, 2022
@aojea
Copy link
Member

aojea commented Jul 21, 2022

for iptables

func (proxier *Proxier) OnNodeAdd(node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node",
"eventNode", node.Name, "currentNode", proxier.hostname)
return
}
if reflect.DeepEqual(proxier.nodeLabels, node.Labels) {
return
}
proxier.mu.Lock()
proxier.nodeLabels = map[string]string{}
for k, v := range node.Labels {
proxier.nodeLabels[k] = v
}
proxier.mu.Unlock()
klog.V(4).InfoS("Updated proxier node labels", "labels", node.Labels)
proxier.Sync()
}
// OnNodeUpdate is called whenever modification of an existing
// node object is observed.
func (proxier *Proxier) OnNodeUpdate(oldNode, node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node",
"eventNode", node.Name, "currentNode", proxier.hostname)
return
}
if reflect.DeepEqual(proxier.nodeLabels, node.Labels) {
return
}
proxier.mu.Lock()
proxier.nodeLabels = map[string]string{}
for k, v := range node.Labels {
proxier.nodeLabels[k] = v
}
proxier.mu.Unlock()
klog.V(4).InfoS("Updated proxier node labels", "labels", node.Labels)
proxier.Sync()
}
// OnNodeDelete is called whenever deletion of an existing node
// object is observed.
func (proxier *Proxier) OnNodeDelete(node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node",
"eventNode", node.Name, "currentNode", proxier.hostname)
return
}
proxier.mu.Lock()
proxier.nodeLabels = nil
proxier.mu.Unlock()
proxier.Sync()
}

for ipvs

func (proxier *Proxier) OnNodeAdd(node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node", "eventNode", node.Name, "currentNode", proxier.hostname)
return
}
if reflect.DeepEqual(proxier.nodeLabels, node.Labels) {
return
}
proxier.mu.Lock()
proxier.nodeLabels = map[string]string{}
for k, v := range node.Labels {
proxier.nodeLabels[k] = v
}
proxier.mu.Unlock()
klog.V(4).InfoS("Updated proxier node labels", "labels", node.Labels)
proxier.Sync()
}
// OnNodeUpdate is called whenever modification of an existing
// node object is observed.
func (proxier *Proxier) OnNodeUpdate(oldNode, node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node", "eventNode", node.Name, "currentNode", proxier.hostname)
return
}
if reflect.DeepEqual(proxier.nodeLabels, node.Labels) {
return
}
proxier.mu.Lock()
proxier.nodeLabels = map[string]string{}
for k, v := range node.Labels {
proxier.nodeLabels[k] = v
}
proxier.mu.Unlock()
klog.V(4).InfoS("Updated proxier node labels", "labels", node.Labels)
proxier.Sync()
}
// OnNodeDelete is called whenever deletion of an existing node
// object is observed.
func (proxier *Proxier) OnNodeDelete(node *v1.Node) {
if node.Name != proxier.hostname {
klog.ErrorS(nil, "Received a watch event for a node that doesn't match the current node", "eventNode", node.Name, "currentNode", proxier.hostname)
return
}
proxier.mu.Lock()
proxier.nodeLabels = nil
proxier.mu.Unlock()
proxier.Sync()
}

@aojea
Copy link
Member

aojea commented Jul 22, 2022

/assign

@aojea
Copy link
Member

aojea commented May 30, 2023

/reopen

The fix in #111344 partially solved this problem, but there is still a race in the code.

So, when kube-proxy starts it calls newProxyServer

https://github.com/kubernetes/kubernetes/blob/7935006af2925dc081605c17fcb5e656cedee286/cmd/kube-proxy/app/server.go#L315-L325C29

that calls createProxier

s.Proxier, err = s.createProxier(config)

that if detectLocalMode is set to NodeCIDR will wait to get the PodCIDR assigned to the Node object

func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguration) (proxy.Provider, error) {
var proxier proxy.Provider
var err error
var nodeInfo *v1.Node
if config.DetectLocalMode == proxyconfigapi.LocalModeNodeCIDR {
klog.InfoS("Watching for node, awaiting podCIDR allocation", "hostname", s.Hostname)
nodeInfo, err = waitForPodCIDR(s.Client, s.Hostname)

However, the value obtained here is not the value used by the eventHandler, that is started later in the o.runLoop call

proxyServer, err := newProxyServer(o.config, o.master)
if err != nil {
return err
}
o.proxyServer = proxyServer
return o.runLoop()

nodeConfig := config.NewNodeConfig(currentNodeInformerFactory.Core().V1().Nodes(), s.Config.ConfigSyncPeriod.Duration)
// https://issues.k8s.io/111321
if s.Config.DetectLocalMode == kubeproxyconfig.LocalModeNodeCIDR {
nodeConfig.RegisterEventHandler(&proxy.NodePodCIDRHandler{})
}
nodeConfig.RegisterEventHandler(s.Proxier)

If something changes between the node informer is initiated and the starting code gets the PodCIDR, it will get unnoticed and kube-proxy will be operation with an outdated value, causing issue

@k8s-ci-robot
Copy link
Contributor

@aojea: Reopened this issue.

In response to this:

/reopen

The fix in #111344 partially solved this problem, but there is still a race in the code.

So, when kube-proxy stars it calls newProxyServer

https://github.com/kubernetes/kubernetes/blob/7935006af2925dc081605c17fcb5e656cedee286/cmd/kube-proxy/app/server.go#L315-L325C29

that calls createProxier

s.Proxier, err = s.createProxier(config)

that if detectLocalMode is set to NodeCIDR will wait to get the PodCIDR assigned to the Node object

func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguration) (proxy.Provider, error) {
var proxier proxy.Provider
var err error
var nodeInfo *v1.Node
if config.DetectLocalMode == proxyconfigapi.LocalModeNodeCIDR {
klog.InfoS("Watching for node, awaiting podCIDR allocation", "hostname", s.Hostname)
nodeInfo, err = waitForPodCIDR(s.Client, s.Hostname)

However, the value obtained here is not the value used by the eventHandler, that is started later in the o.runLoop call

proxyServer, err := newProxyServer(o.config, o.master)
if err != nil {
return err
}
o.proxyServer = proxyServer
return o.runLoop()

nodeConfig := config.NewNodeConfig(currentNodeInformerFactory.Core().V1().Nodes(), s.Config.ConfigSyncPeriod.Duration)
// https://issues.k8s.io/111321
if s.Config.DetectLocalMode == kubeproxyconfig.LocalModeNodeCIDR {
nodeConfig.RegisterEventHandler(&proxy.NodePodCIDRHandler{})
}
nodeConfig.RegisterEventHandler(s.Proxier)

If something changes between the node informer is initiated and the starting code gets the PodCIDR, it will get unnoticed and kube-proxy will be operation with an outdated value, causing issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants