Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a client/server implementation of the container runtime #13768

Closed
brendandburns opened this issue Sep 9, 2015 · 31 comments
Closed

Add a client/server implementation of the container runtime #13768

brendandburns opened this issue Sep 9, 2015 · 31 comments
Labels
area/extensibility priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@brendandburns
Copy link
Contributor

Currently, any container runtime has to be linked into the kubelet. This makes experimentation difficult, and prevents users from landing an alternate container runtime without landing code in core kubernetes.

To facilitate experimentation and to enable user choice, we should add a client/server implementation of the container runtime interface.

This implementation will simply encode the requests, send them to a server where they will be decoded and sent into an instance of the container runtime interface.

However, this enables container runtime implementations to be built and maintained outside of the core kubernetes tree.

@dchen1107 @smarterclayton
@kubernetes/goog-node

@smarterclayton
Copy link
Contributor

@kubernetes/rh-cluster-infra

@brendandburns
Copy link
Contributor Author

@feiskyer

@dchen1107 dchen1107 added sig/node Categorizes an issue or PR as relevant to SIG Node. help-wanted priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Sep 9, 2015
@bgrant0607
Copy link
Member

cc @thockin re. consistency and interaction with network and storage plugins

@bgrant0607 bgrant0607 added this to the v1.2-candidate milestone Sep 9, 2015
@vishh
Copy link
Contributor

vishh commented Sep 9, 2015

We need two kinds of runtime abstractions in kubelet as of now - one for docker which operates at the container level, and another for rocket which operates at the pod level. Are we planning to expose both pod and container runtime plugins?

@yujuhong
Copy link
Contributor

yujuhong commented Sep 9, 2015

We need two kinds of runtime abstractions in kubelet as of now - one for docker which operates at the container level, and another for rocket which operates at the pod level. Are we planning to expose both pod and container runtime plugins?

I think we need to redefine the abstraction (container runtime interface) if we are going to expose it. We may need to ask rkt to expose a container-level api to work with kubelet, since most container runtimes operate at that level. That said, kubelet still relies on the container runtime (docker or rkt) to save pod level information (e.g., what pod a container belongs to), so a container runtime would still need to be aware of the existence of pod.

Also, I am working a pod event generator (#12802) for kubelet's performance improvement, so I'd appreciate to be kept in the loop for related changes :)

@feiskyer
Copy link
Member

I think we need to redefine the abstraction (container runtime interface) if we are going to expose it. We may need to ask rkt to expose a container-level api to work with kubelet, since most container runtimes operate at that level. That said, kubelet still relies on the container runtime (docker or rkt) to save pod level information (e.g., what pod a container belongs to), so a container runtime would still need to be aware of the existence of pod.

I think we should provides both pod-level and container-level api to work with kubelet. As you say, container runtime must be aware of the existence of pod, so apis like creating and deleting pod should be pod-level. But other apis like exec/logs/attach should be container-level since they operates directly on container.

@yujuhong
Copy link
Contributor

The upside of keeping a container-level api is that we can keep the majority of the logic in kubelet, which would reduce code duplication and maintenance overhead.
I think we can maybe get away by passing pod UID & name as arguments when querying the runtime, if necessary. I need to think more about it.

/cc @yifan-gu for rkt.

@feiskyer
Copy link
Member

@yujuhong Rkt and hyper ( at #13079 ) are both operating at pod-level, their concept is consistent with kubernetes' pod. I think it's a good idea to reuse their pod apis in kubernetes, not split them into container-level.

@brendandburns
Copy link
Contributor Author

We currently have an exiting interface Runtime (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime.go#L53)

that both Docker and RKT implement.

I'm suggesting a network client/server version of that interface.

We can refactor/rewrite to our hearts content, but those refactors will just refactor the client/server interface also.

--brendan

@thockin
Copy link
Member

thockin commented Sep 10, 2015

+1 to the idea

I don't see why we would keep any container-centric API. The abstraction
is pods. If a given runtime has to jump through hoops to implement pods
(I'm looking at you, docker) that their own problem.

On Wed, Sep 9, 2015 at 8:13 PM, Brendan Burns notifications@github.com
wrote:

We currently have an exiting interface Runtime
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime.go#L53

that both Docker and RKT implement.

I'm suggesting a network client/server version of that interface.

We can refactor/rewrite to our hearts content, but those refactors will
just refactor the client/server interface also.

--brendan


Reply to this email directly or view it on GitHub
#13768 (comment)
.

@brendandburns
Copy link
Contributor Author

100% agree, the Runtime interface is Pod based, that's what we should have people implement.

@yifan-gu
Copy link
Contributor

+1 for the direction.

The core of the runtime interface is the SyncPod(), which the container runtime does the reconciliation in it. This is a bad abstraction as it hands to much work to container runtime and results in much code duplication.

Ideally container runtime should just implement RunPod(), KillPod(), and the kubelet should tell it which action to take during reconciliation. The trade-off is we will not be able to update a single container in a pod, that is if a pod has two containers and one of the container spec changes, then the other one will be killed as well.

@yifan-gu
Copy link
Contributor

Besides, i would suggest using grpc as

  • No encoding/decoding codes needed to be written.
  • It is supposed to be faster than json (though currently it's not :( )
  • We can get client bindings for free.
  • Server interfaces are generated as well. Each runtime only needs to implement them.

@yujuhong
Copy link
Contributor

We currently have an exiting interface Runtime (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime.go#L53)

That's the interface I talked about, which was defined in a rush of time and needed to be polished (reiterated) more. I have filed issues before for that, and as @yifan-gu mentioned, even at pod-level abstraction there are many obvious imperfections.
I agree that client-server architecture is good in the long run. It's also important that we work towards a good API before committing and exposing it for public consumption.

@yujuhong
Copy link
Contributor

100% agree, the Runtime interface is Pod based, that's what we should have people implement.

Not that I am against pod-level abstraction, but this alone is not the best argument IMO. We still have code (container garbage collection, image management, etc) bypassing the interface to talk to docker directly for direct container information. We should work out the requirements and fix the interface.

@pmorie
Copy link
Member

pmorie commented Sep 11, 2015

+1 to this direction

On Wednesday, September 9, 2015, Brendan Burns notifications@github.com
wrote:

Currently, any container runtime has to be linked into the kubelet. This
makes experimentation difficult, and prevents users from landing an
alternate container runtime without landing code in core kubernetes.

To facilitate experimentation and to enable user choice, we should add a
client/server implementation of the container runtime interface.

This implementation will simply encode the requests, send them to a server
where they will be decoded and sent into an instance of the container
runtime interface.

However, this enables container runtime implementations to be built and
maintained outside of the core kubernetes tree.

@dchen1107 https://github.com/dchen1107 @smarterclayton
https://github.com/smarterclayton
@kubernetes/goog-node https://github.com/orgs/kubernetes/teams/goog-node


Reply to this email directly or view it on GitHub
#13768.

@feiskyer
Copy link
Member

I notice there are three places talking directly to dockerclient out of runtime:

  • container garbage collection
  • image management
  • securitycontext

I suggest move securitycontext to docker itself as it is only docker related.

For container garbage collection and image management, I suggest refactor them talking to Runtime interface instead.

@thockin
Copy link
Member

thockin commented Sep 15, 2015

Yu-Ju,

There's certainly leaks in the abstraction, but if you consider something
like rkt or hyper, a container interface doesn't make much sense.
Everything that is more granular than a pod should be encapsulated behind
the pod interface.

The core of kubelet should only know about pods and abstract containers,
and nothing about the details of the runtime. Garbage collection is the
runtime's problem.

I understand that this may not be the case right now :)

On Thu, Sep 10, 2015 at 12:06 PM, Yu-Ju Hong notifications@github.com
wrote:

100% agree, the Runtime interface is Pod based, that's what we should have
people implement.

Not that I am against pod-level abstraction, but this alone is not the
best argument IMO. We still have code (container garbage collection, image
management, etc) bypassing the interface to talk to docker directly for
direct container information. We should work out the requirements and fix
the interface.


Reply to this email directly or view it on GitHub
#13768 (comment)
.

@yifan-gu
Copy link
Contributor

@yujuhong I think this is also something we need to consider when implementing the event generator in #13571

@yujuhong
Copy link
Contributor

Yu-Ju,

There's certainly leaks in the abstraction, but if you consider something
like rkt or hyper, a container interface doesn't make much sense.
Everything that is more granular than a pod should be encapsulated behind
the pod interface.

Hmm...I wasn't actually advocating the container-level abstraction. I brought it up because @dchen1107 mentioned it the other day, and I thought it was worth throwing into the discussion :)

The core of kubelet should only know about pods and abstract containers,
and nothing about the details of the runtime. Garbage collection is the
runtime's problem.

I understand that this may not be the case right now :)

Agreed, but at the same time it seems like we want to have tighter control over disk space allocation and garbage collection, so the interface may not be as high-level as we want.
(@vishh would probably know more about the requirements)
These are probably some indications that we need to put into more thoughts or wait until we can have a more mature interface for the client-server architecture.

@yujuhong
Copy link
Contributor

@yujuhong I think this is also something we need to consider when implementing the event generator in #13571

Agreed. For now, the generic event in generator #13571 uses the runtime interface to support container runtimes that don't support container/pod event streams. It shouldn't be too hard to adapt that to the new interface, or to implement the client-server architecture. For container runtimes which do support container event streams, they can implement one on their own.

@yifan-gu
Copy link
Contributor

@yujuhong By what I said I mean we might need to add pod level events. And in most of the time kubelet should only manage on pod level events. The container events should be handled by runtime according to the restartpolicy. However I tend to agree that container level events/interfaces are also necessary if we want to provide some fine grain introspection. (e.g. get per container's log, status, etc). Though I am trying to think if we can just include these in the pod level events/interfaces..

@yujuhong
Copy link
Contributor

@yifan-gu, the current container events are packaged as pod-level events, even though they include some container information (ContainerStarted, etc). Having individual runtime handles all container-level events is a separate question though. If kubelet can fire and forget a pod without babysitting it, we may not even need pod-level events (other than maybe sending the pod status to the apiserver) :)

@feiskyer
Copy link
Member

@brendandburns Image manager refactored for client/server implementation of the container runtime.

@jdef
Copy link
Contributor

jdef commented Sep 24, 2015

+1 for this. The mesos team is thinking of implementing a runtime for better integration with the mesos-native containerizer.

@smarterclayton
Copy link
Contributor

I see some significant downsides to client/server, including more hops into the container for exec/attach/portforward scenarios. I do think that we should support a client/server containerizer, but for the foreseeable future I would want my containerizer compiled in if I was shipping a deployment.

Conditionally compiled Kubelets provided with a particular host type (mesos, rocket, systemd+docker) has the same benefits without adding an extra hop.

@dchen1107
Copy link
Member

@smarterclayton inefficiency caused by client/server runtime is known. But please note that we didn't plan to make all container runtimes to use this model, and we do plan to continue supporting docker and rocket as Container Runtime as today, just with a better API.

The only candidate we have today for this client / server model is hyper:
#13079 (comment)

@resouer
Copy link
Contributor

resouer commented Oct 13, 2015

@brendandburns I'm interested in whether runc suitable for this model?

@vishh
Copy link
Contributor

vishh commented Oct 13, 2015

runc is essentially a thin wrapper around the Linux APIs as of now. And
so, I don't think it will be a suitable candidate for being a container
runtime, at-least in its initial versions.

On Tue, Oct 13, 2015 at 2:52 AM, Harry Zhang notifications@github.com
wrote:

@brendandburns https://github.com/brendandburns I'm interested in
whether runc suitable for this model?


Reply to this email directly or view it on GitHub
#13768 (comment)
.

@resouer
Copy link
Contributor

resouer commented Oct 16, 2015

@vishh agreed. Actually, I am keeping a close eye on docker 1.9, which has the same intention.

[update] moby/moby#14212

k8s-github-robot pushed a commit that referenced this issue Jul 12, 2016
Automatic merge from submit-queue

Proposal: client/server container runtime

Ref #25899  #13768 

Proposal for client/server container runtime

CC @brendandburns @dchen1107 @kubernetes/goog-node @kubernetes/sig-node
xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016
…er-proposal

Automatic merge from submit-queue

Proposal: client/server container runtime

Ref kubernetes#25899  kubernetes#13768 

Proposal for client/server container runtime

CC @brendandburns @dchen1107 @kubernetes/goog-node @kubernetes/sig-node
@yujuhong
Copy link
Contributor

yujuhong commented Jun 2, 2017

We already implemented CRI and using it by default in 1.6. Closing.

@yujuhong yujuhong closed this as completed Jun 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensibility priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests