Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image volumes and container volumes #831

Open
thockin opened this issue Aug 8, 2014 · 144 comments
Open

Image volumes and container volumes #831

thockin opened this issue Aug 8, 2014 · 144 comments
Labels
area/app-lifecycle area/usability kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@thockin
Copy link
Member

thockin commented Aug 8, 2014

This would map closely to Docker's native volumes support, and allow people to build and version pre-baked data as containers. Maybe read-only? Haven't thought that far...

@brendandburns
Copy link
Contributor

I guess so? But why not just use a git repo?

On Thu, Aug 7, 2014 at 10:11 PM, Tim Hockin notifications@github.com
wrote:

This would map closely to Docker's native volumes support, and allow
people to build and version pre-baked data as containers. Maybe read-only?
Haven't thought that far...


Reply to this email directly or view it on GitHub
#831.

@thockin
Copy link
Member Author

thockin commented Aug 8, 2014

More plugins more better? I wanted to put it out there, since we do sort
of diverge from Docker's native volumes support. Clearly not urgent :)

On Thu, Aug 7, 2014 at 10:18 PM, brendandburns notifications@github.com
wrote:

I guess so? But why not just use a git repo?

On Thu, Aug 7, 2014 at 10:11 PM, Tim Hockin notifications@github.com
wrote:

This would map closely to Docker's native volumes support, and allow
people to build and version pre-baked data as containers. Maybe
read-only?
Haven't thought that far...

Reply to this email directly or view it on GitHub
#831.

Reply to this email directly or view it on GitHub
#831 (comment)
.

@bgrant0607
Copy link
Member

More potential uses of this:

  • Deployment of scripts/programs for lifecycle hooks: One of the main points of the lifecycle hooks (PreStart and PostStop event hooks #140) is to decouple applications from the execution environment (Kubernetes in this case). If the hook scripts/programs must be deployed as part of the application container, that compromises this objective.
  • Dynamic package composition more generally: This would be more similar to our internal package model, where we can independently manage the base filesystem, language runtime, application, utility programs for debugging, etc.
  • Configuration deployment
  • Input data deployment

A git repo could be used for some of these cases, but for others it would be less than ideal.

@erictune
Copy link
Member

erictune commented Oct 1, 2014

If the base image of a docker file (e.g. FROM fedora) is a linux distro,
then isn't it going to be annoying to have a bunch of Linux Standard Base
type of files in what is really supposed to be a data-only packge?

On the other hand, if it is creating using tar -c . | docker import - myimage, then what is the advantage is of a docker image over a tar file?

On Tue, Sep 30, 2014 at 9:00 PM, bgrant0607 notifications@github.com
wrote:

More potential uses of this:

  • Deployment of scripts/programs for lifecycle hooks: One of the main
    points of the lifecycle hooks (PreStart and PostStop event hooks #140
    PreStart and PostStop event hooks #140) is to
    decouple applications from the execution environment (Kubernetes in this
    case). If the hook scripts/programs must be deployed as part of the
    application container, that compromises this objective.
  • Dynamic package composition more generally: This would be more
    similar to our internal package model, where we can independently manage
    the base filesystem, language runtime, application, utility programs for
    debugging, etc.
  • Configuration deployment
  • Input data deployment

A git repo could be used for some of these cases, but for others it would
be less than ideal.


Reply to this email directly or view it on GitHub
#831 (comment)
.

@thockin
Copy link
Member Author

thockin commented Oct 1, 2014

How about the VOLUMES directives of a container? Any container can declare
itself to be exposing any number of volumes. Maybe the functionality to
expose is not the whole container, but just the volumes from that container?

On the other hand, people can create data containers "FROM scratch", and
who are we to say it's annoying?

On Wed, Oct 1, 2014 at 7:20 AM, erictune notifications@github.com wrote:

If the base image of a docker file (e.g. FROM fedora) is a linux distro,
then isn't it going to be annoying to have a bunch of Linux Standard Base
type of files in what is really supposed to be a data-only packge?

On the other hand, if it is creating using tar -c . | docker import - myimage, then what is the advantage is of a docker image over a tar file?

On Tue, Sep 30, 2014 at 9:00 PM, bgrant0607 notifications@github.com
wrote:

More potential uses of this:

  • Deployment of scripts/programs for lifecycle hooks: One of the main
    points of the lifecycle hooks (PreStart and PostStop event hooks #140
    PreStart and PostStop event hooks #140) is to
    decouple applications from the execution environment (Kubernetes in this
    case). If the hook scripts/programs must be deployed as part of the
    application container, that compromises this objective.
  • Dynamic package composition more generally: This would be more
    similar to our internal package model, where we can independently manage
    the base filesystem, language runtime, application, utility programs for
    debugging, etc.
  • Configuration deployment
  • Input data deployment

A git repo could be used for some of these cases, but for others it
would
be less than ideal.

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/831#issuecomment-57416301>

.

Reply to this email directly or view it on GitHub
#831 (comment)
.

@dchen1107
Copy link
Member

@thockin I like the initial idea of having new volume type to support docker's data volume container, which matches common package, shared package etc. concept internally. I also can see the potential use cases listed by @bgrant0607. But please don't go down to the road like Docker had today: declare a container as data volume, in which case, we are going to introduce another level complexity of dependencies to containers within a pod, or even dependencies between pods if a pod only has data volume container, etc. I think your initial idea of having a volume type which actually refers as a docker volume container or other read-only volume is a better approach for a long run.

@thockin
Copy link
Member Author

thockin commented Oct 1, 2014

The interesting thing about docker volumes is that a container does not
have to be RUNNING for the volumes to exist. It's a weird model, but I
think it could work.

I don't think we know what people really want in this space yet, though.

On Wed, Oct 1, 2014 at 10:57 AM, Dawn Chen notifications@github.com wrote:

@thockin https://github.com/thockin I like the initial idea of having
new volume type to support docker's data volume container, which matches
common package, shared package etc. concept internally. I also can see the
potential use cases listed by @bgrant0607 https://github.com/bgrant0607.
But please don't go down to the road like Docker had today: declare a
container as data volume, in which case, we are going to introduce another
level complexity of dependencies to containers within a pod, or even
dependencies between pods if a pod only has data volume container, etc. I
think your initial idea of having a volume type which actually refers as a
docker volume container or other read-only volume is a better approach for
a long run.

Reply to this email directly or view it on GitHub
#831 (comment)
.

@erictune
Copy link
Member

erictune commented Oct 6, 2014

It appears the same net effect as this issue can be achieved without introducing a new volume type using two containers in a pod and some shell wrapped around the underling container command lines (see #1589 (comment)).

How to decide on container-as-a-volume vs. command-line-based sequencing?

  • more portability between Kubernetes and non-Kubernetes docker use cases with container-as-volume.
  • easier for user to discover container-as-volume concept and identify that it is the right solution?
  • having fewer and less general mechanisms for setting up "packages" may lend itself to more tightly integrated build/deploy systems. But, maybe that is not a goal for Kubernetes.
  • either solution can integrate with data durability, I think.
  • liveness checking is more complex with command-line-based sequencing, since the pod to be liveness checked goes through a waiting phase and then a running phase.

@stp-ip
Copy link
Member

stp-ip commented Nov 21, 2014

There are a lot of ways to go about it and a new volume type is in my opinion not needed. I tried to standardize the way we structure data and make different volume providers possible. These range from Host Volumes, Data Volume Containers, Side Containers with additional logic to Volume as a Service, which is where k8s could integrate greatly. The start is already available via git as volume. I think the native Volumes in Docker are enough, but just lack a standard. The more detailed ideas are available at moby/moby#9277.

@bgrant0607
Copy link
Member

I think the question is whether the data volume container should be represented as a container or as a volume. I prefer to think of them as volumes and find passive containers to be non-intuitive for users, problematic for management systems to deal with, and the source of weird corner-case behaviors.

@stp-ip
Copy link
Member

stp-ip commented Nov 21, 2014

@bgrant0607 still they are supported in docker and therefore we should acknowledge they exist. I would love to see more integrated methods in k8s itself, which just expose a specific type of volume. I was hinting at that in my proposal via the VaaS approach. But I would dislike this approach reducing compatibility.

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 3, 2014
@mindscratch
Copy link

+1 for supporting a container as a volume. I have a scenario where I have a container that has a bunch of data baked into it for use by other containers, helps keep the data "local" to the work being done.

@rehevkor5
Copy link

Whatever you decide, I hope you will make it clear in the documentation to save people time of searching around to find this information. Currently, the documentation for both compute and container engine:

  1. makes no mention that VOLUME/--volumes-from/container-as-volume is not supported
  2. makes no mention of possible work-arounds
  3. makes no mention of the possibility of retrieving things from a git repo

It's important to note that using a git repo isn't the same. It requires the git repo to be securely accessible from Google Cloud (or wherever Kubernetes is being used). Further, it's unclear how non-public repositories would be accessible, unless the user & password is hard-coded into the Kubernetes GitRepo#repository JSON/YAML string. Also, it requires that the desired artifact(s) be checked in to source control. And it decouples the Docker image from the artifact (which may or may not be desirable).

I will be working around this issue by moving the data that's in my container volume into a Dockerfile that layers on top of the container that wanted to use the volume, with ADD. The problem you're running into is that the community at large is encouraging the "container as volume" approach in websites and blog posts, and as a result people will continue to have difficulty. For example, the docker website itself says, "If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it's best to create a named Data Volume Container, and then to mount the data from it." (emphasis mine).

Also, @erictune a container-only volume can be (and probably should be) written as "FROM scratch". I'd argue that if the user doesn't do it that way, that's their choice.

@AshleyAitken
Copy link

+1 @rehevkor5

I am disappointed to hear that k8s doesn't support data volume containers.

I am not sure how I am supposed to abstract r/w data away from the host now. I was under the impression that k8s was about abstracting away the host, but a host volume introduces all sorts of host-specfics, like having to share the same username/group for access to the data.

@rehevkor5 I thought the same thing about data volume containers at first (should be written as FROM scratch) until I read this (which may or may not be correct): http://container42.com/2014/11/18/data-only-container-madness/ Your workaround seems to do just this?

@thockin
Copy link
Member Author

thockin commented Jan 11, 2015

There are a few things going on here, most importantly (I think) some
confusion.

Kubernetes supports the notion of a writable "empty volume" that is shared
across containers within a pod, without resorting to host directories -
this is what an emptyDir volume is.

Now the question comes down to "but I don't want an empty directory, I
want ". The question I think we need to sort out is what
the is. We currently support pull-from-git, which is just
a single instance of a larger pattern - "go fetch some pre-cooked data once
before starting my pod".

We could support myriad such "fetch once" volume kinds, git, cvs, svn,
docker containers(more below), URLs, in-line base64-encoded tarfiles,
stdout of another program, etc. I do NOT think we want to support those
all as independent volume plugins - they can almost all be done by an
unprivileged container without any help from kubelet. More, you quickly
arrive at the followup features like "...and re-pull from git every 10
minutes" - things that stop being "fetch once" and start being active
management, but do not require privileges. We make great use of such
things internally.

IMO: all of these things that can be run as a side-car container in your
pod (writing to a shared emptyDir) SHOULD BE. Git should stop being a
first-class volume, and should instead be a container of some sort. This
brings a slew of new design questions: Is it just a container like all the
other app containers? How do I ensure it runs BEFORE my app containers?
What if it experiences a failure? I don't have answers to all of these yet.

Now, let's think about the case of docker data volume containers. What is
the semantic that people are really asking for? Is it:

  • "run" a data container and then expose the entire chroot of that run?
  • "run" a data container and then expose all of the VOLUME (from
    Dockerfile) dirs as kubernetes volumes?
  • "run" a data container and then do the equivalent of --volumes-from into
    kubernetes containers?

These are all subtly different semantically, especially in the face of a
data container that has multiple VOLUME statements. Some operating modes
also make it hard to verify input until after a pod has been accepted,
scheduled, and attempted on a kubelet (we try to validate as much as we can
up front).

ACTION ITEM: I'd very much like for people who use docker data containers
to describe what behavior they would expect here.

Back to side-car containers as volumes. I could imagine something like:

Pod {
spec: {
volumes: [
{ Name: "git-data",
Source: FromContainer {
Name: "awkward", // what goes here?
Image: "kubernetes/git-volume"
EnvVars: [
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
}
}
],
containers: [ ... use git-data ...]
}

Sadly, there's not much of the container-ness that you can really hide, so
you end up re-using the Container schema, which is at least somewhat
redundant and awkward.

Alternately, I could imagine just making them extra containers:

Pod {
spec: {
volumes: [
{ Name: "git-data", Source: EmptyDir {} }
],
containers: [
{
Name: "git-puller",
Image: "kubernetes/git-volume"
EnvVars: [
{ DATADIR: "/vol" },
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
VolumeMounts: [ { Name: "git-data", MountPath: "/vol" } ]
},
{ ... container that reads git-data ... }
]
}
}

It's still pretty verbose. Can we do better?

In this sort of model, almost any sort of logic can be bundled up as a
container and published, and anyone can use it immediately.

Something like this is, I think, the way to go. Details to emerge, input
wanted.

On Sat, Jan 10, 2015 at 5:42 PM, Ashley Aitken notifications@github.com
wrote:

+1 @rehevkor5 https://github.com/rehevkor5

I am disappointed to hear that k8s doesn't support data volume containers.

I am not sure how I am supposed to abstract r/w data away from the host
now. I was under the impression that k8s was about abstracting away the
host, but a host volume introduces all sorts of host-specfics, like having
to share the same username/group for access to the data.

@rehevkor5 https://github.com/rehevkor5 I thought the same thing about
data volume containers at first (should be written as FROM scratch) until I
read this (which may or may not be correct):
http://container42.com/2014/11/18/data-only-container-madness/

Reply to this email directly or view it on GitHub
#831 (comment)
.

@thockin
Copy link
Member Author

thockin commented Jan 11, 2015

To follow up to myself - all of this assumes that any data mutations you
make have a lifetime equivalent to the pod. If the pod dies for any reason
(the machine goes down, it gets deleted in the API, some non-recoverable
failure in kubelet, etc) the data dies with it.

Durable data is a MUCH larger topic :)

On Sat, Jan 10, 2015 at 9:12 PM, Tim Hockin thockin@google.com wrote:

There are a few things going on here, most importantly (I think) some
confusion.

Kubernetes supports the notion of a writable "empty volume" that is shared
across containers within a pod, without resorting to host directories -
this is what an emptyDir volume is.

Now the question comes down to "but I don't want an empty directory, I
want ". The question I think we need to sort out is what
the is. We currently support pull-from-git, which is just
a single instance of a larger pattern - "go fetch some pre-cooked data once
before starting my pod".

We could support myriad such "fetch once" volume kinds, git, cvs, svn,
docker containers(more below), URLs, in-line base64-encoded tarfiles,
stdout of another program, etc. I do NOT think we want to support those
all as independent volume plugins - they can almost all be done by an
unprivileged container without any help from kubelet. More, you quickly
arrive at the followup features like "...and re-pull from git every 10
minutes" - things that stop being "fetch once" and start being active
management, but do not require privileges. We make great use of such
things internally.

IMO: all of these things that can be run as a side-car container in your
pod (writing to a shared emptyDir) SHOULD BE. Git should stop being a
first-class volume, and should instead be a container of some sort. This
brings a slew of new design questions: Is it just a container like all the
other app containers? How do I ensure it runs BEFORE my app containers?
What if it experiences a failure? I don't have answers to all of these yet.

Now, let's think about the case of docker data volume containers. What is
the semantic that people are really asking for? Is it:

  • "run" a data container and then expose the entire chroot of that run?
  • "run" a data container and then expose all of the VOLUME (from
    Dockerfile) dirs as kubernetes volumes?
  • "run" a data container and then do the equivalent of --volumes-from into
    kubernetes containers?

These are all subtly different semantically, especially in the face of a
data container that has multiple VOLUME statements. Some operating modes
also make it hard to verify input until after a pod has been accepted,
scheduled, and attempted on a kubelet (we try to validate as much as we can
up front).

ACTION ITEM: I'd very much like for people who use docker data containers
to describe what behavior they would expect here.

Back to side-car containers as volumes. I could imagine something like:

Pod {
spec: {
volumes: [
{ Name: "git-data",
Source: FromContainer {
Name: "awkward", // what goes here?
Image: "kubernetes/git-volume"
EnvVars: [
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
}
}
],
containers: [ ... use git-data ...]
}

Sadly, there's not much of the container-ness that you can really hide, so
you end up re-using the Container schema, which is at least somewhat
redundant and awkward.

Alternately, I could imagine just making them extra containers:

Pod {
spec: {
volumes: [
{ Name: "git-data", Source: EmptyDir {} }
],
containers: [
{
Name: "git-puller",
Image: "kubernetes/git-volume"
EnvVars: [
{ DATADIR: "/vol" },
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
VolumeMounts: [ { Name: "git-data", MountPath: "/vol" } ]
},
{ ... container that reads git-data ... }
]
}
}

It's still pretty verbose. Can we do better?

In this sort of model, almost any sort of logic can be bundled up as a
container and published, and anyone can use it immediately.

Something like this is, I think, the way to go. Details to emerge, input
wanted.

On Sat, Jan 10, 2015 at 5:42 PM, Ashley Aitken notifications@github.com
wrote:

+1 @rehevkor5 https://github.com/rehevkor5

I am disappointed to hear that k8s doesn't support data volume
containers.

I am not sure how I am supposed to abstract r/w data away from the host
now. I was under the impression that k8s was about abstracting away the
host, but a host volume introduces all sorts of host-specfics, like having
to share the same username/group for access to the data.

@rehevkor5 https://github.com/rehevkor5 I thought the same thing about
data volume containers at first (should be written as FROM scratch) until I
read this (which may or may not be correct):
http://container42.com/2014/11/18/data-only-container-madness/

Reply to this email directly or view it on GitHub
#831 (comment)
.

@mindscratch
Copy link

I'll try to describe a use case for a data container. I'll have a pod with
3 containers I'll name them "ingest", "process" and "data".

The ingest container is responsible for getting messages in some fashion
and telling the "process" container to do work.

The process container does work, but it requires access to data provided by
the "data" container, outside of kubernetes this is done using dockers
"volumes-from". This "data" can be 100's of megabytes, but most often is
10-15 gigabytes.

The data container has a process responsible for pulling the data that will
be needed by the process container. While the process container is doing
work, it's possible that a new set of data becomes available. The data
container can fetch the data and use something like symlinks to swap it so
the next time the process container begins a new process it's using the
newly available data.

Hopefully that makes some sense.

Thanks

On Sun, Jan 11, 2015 at 12:18 AM, Tim Hockin notifications@github.com
wrote:

To follow up to myself - all of this assumes that any data mutations you
make have a lifetime equivalent to the pod. If the pod dies for any reason
(the machine goes down, it gets deleted in the API, some non-recoverable
failure in kubelet, etc) the data dies with it.

Durable data is a MUCH larger topic :)

On Sat, Jan 10, 2015 at 9:12 PM, Tim Hockin thockin@google.com wrote:

There are a few things going on here, most importantly (I think) some
confusion.

Kubernetes supports the notion of a writable "empty volume" that is
shared
across containers within a pod, without resorting to host directories -
this is what an emptyDir volume is.

Now the question comes down to "but I don't want an empty directory, I
want ". The question I think we need to sort out is what
the is. We currently support pull-from-git, which is
just
a single instance of a larger pattern - "go fetch some pre-cooked data
once
before starting my pod".

We could support myriad such "fetch once" volume kinds, git, cvs, svn,
docker containers(more below), URLs, in-line base64-encoded tarfiles,
stdout of another program, etc. I do NOT think we want to support those
all as independent volume plugins - they can almost all be done by an
unprivileged container without any help from kubelet. More, you quickly
arrive at the followup features like "...and re-pull from git every 10
minutes" - things that stop being "fetch once" and start being active
management, but do not require privileges. We make great use of such
things internally.

IMO: all of these things that can be run as a side-car container in your
pod (writing to a shared emptyDir) SHOULD BE. Git should stop being a
first-class volume, and should instead be a container of some sort. This
brings a slew of new design questions: Is it just a container like all
the
other app containers? How do I ensure it runs BEFORE my app containers?
What if it experiences a failure? I don't have answers to all of these
yet.

Now, let's think about the case of docker data volume containers. What is
the semantic that people are really asking for? Is it:

  • "run" a data container and then expose the entire chroot of that run?
  • "run" a data container and then expose all of the VOLUME (from
    Dockerfile) dirs as kubernetes volumes?
  • "run" a data container and then do the equivalent of --volumes-from
    into
    kubernetes containers?

These are all subtly different semantically, especially in the face of a
data container that has multiple VOLUME statements. Some operating modes
also make it hard to verify input until after a pod has been accepted,
scheduled, and attempted on a kubelet (we try to validate as much as we
can
up front).

ACTION ITEM: I'd very much like for people who use docker data containers
to describe what behavior they would expect here.

Back to side-car containers as volumes. I could imagine something like:

Pod {
spec: {
volumes: [
{ Name: "git-data",
Source: FromContainer {
Name: "awkward", // what goes here?
Image: "kubernetes/git-volume"
EnvVars: [
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
}
}
],
containers: [ ... use git-data ...]
}

Sadly, there's not much of the container-ness that you can really hide,
so
you end up re-using the Container schema, which is at least somewhat
redundant and awkward.

Alternately, I could imagine just making them extra containers:

Pod {
spec: {
volumes: [
{ Name: "git-data", Source: EmptyDir {} }
],
containers: [
{
Name: "git-puller",
Image: "kubernetes/git-volume"
EnvVars: [
{ DATADIR: "/vol" },
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
VolumeMounts: [ { Name: "git-data", MountPath: "/vol" } ]
},
{ ... container that reads git-data ... }
]
}
}

It's still pretty verbose. Can we do better?

In this sort of model, almost any sort of logic can be bundled up as a
container and published, and anyone can use it immediately.

Something like this is, I think, the way to go. Details to emerge, input
wanted.

On Sat, Jan 10, 2015 at 5:42 PM, Ashley Aitken <notifications@github.com

wrote:

+1 @rehevkor5 https://github.com/rehevkor5

I am disappointed to hear that k8s doesn't support data volume
containers.

I am not sure how I am supposed to abstract r/w data away from the host
now. I was under the impression that k8s was about abstracting away the
host, but a host volume introduces all sorts of host-specfics, like
having
to share the same username/group for access to the data.

@rehevkor5 https://github.com/rehevkor5 I thought the same thing
about
data volume containers at first (should be written as FROM scratch)
until I
read this (which may or may not be correct):
http://container42.com/2014/11/18/data-only-container-madness/

Reply to this email directly or view it on GitHub
<
#831 (comment)

.


Reply to this email directly or view it on GitHub
#831 (comment)
.

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

@thockin
Copy link
Member Author

thockin commented Jan 11, 2015

To me, this does not make a strong argument. Everything you describe is
possible if your data container just writes to a shared emptyDir volume.

Now, there IS a gotcha with the initial load of data, but that has to be
handled in any similar model. Either the data is immutable, in which case
the data container can load it once and go to sleep, or else it is changing
over time, time, in which case you have to wait for it to get current. In
the former case, the initial data is ALL that matters. Is that an
interesting use? In the latter, does the initial data matter, or only
"current" data?

The only other argument is to be exactly docker compatible semantically,
but frankly the volumes-from behavior is so semantically rigid, it may not
be worth being compatible with.
On Jan 11, 2015 9:26 AM, "Craig Wickesser" notifications@github.com wrote:

I'll try to describe a use case for a data container. I'll have a pod with
3 containers I'll name them "ingest", "process" and "data".

The ingest container is responsible for getting messages in some fashion
and telling the "process" container to do work.

The process container does work, but it requires access to data provided
by
the "data" container, outside of kubernetes this is done using dockers
"volumes-from". This "data" can be 100's of megabytes, but most often is
10-15 gigabytes.

The data container has a process responsible for pulling the data that
will
be needed by the process container. While the process container is doing
work, it's possible that a new set of data becomes available. The data
container can fetch the data and use something like symlinks to swap it so
the next time the process container begins a new process it's using the
newly available data.

Hopefully that makes some sense.

Thanks

On Sun, Jan 11, 2015 at 12:18 AM, Tim Hockin notifications@github.com
wrote:

To follow up to myself - all of this assumes that any data mutations you
make have a lifetime equivalent to the pod. If the pod dies for any
reason
(the machine goes down, it gets deleted in the API, some non-recoverable
failure in kubelet, etc) the data dies with it.

Durable data is a MUCH larger topic :)

On Sat, Jan 10, 2015 at 9:12 PM, Tim Hockin thockin@google.com wrote:

There are a few things going on here, most importantly (I think) some
confusion.

Kubernetes supports the notion of a writable "empty volume" that is
shared

across containers within a pod, without resorting to host directories

this is what an emptyDir volume is.

Now the question comes down to "but I don't want an empty directory,
I
want ". The question I think we need to sort out is
what
the is. We currently support pull-from-git, which is
just
a single instance of a larger pattern - "go fetch some pre-cooked data
once
before starting my pod".

We could support myriad such "fetch once" volume kinds, git, cvs, svn,
docker containers(more below), URLs, in-line base64-encoded tarfiles,
stdout of another program, etc. I do NOT think we want to support
those
all as independent volume plugins - they can almost all be done by an
unprivileged container without any help from kubelet. More, you
quickly
arrive at the followup features like "...and re-pull from git every 10
minutes" - things that stop being "fetch once" and start being active
management, but do not require privileges. We make great use of such
things internally.

IMO: all of these things that can be run as a side-car container in
your
pod (writing to a shared emptyDir) SHOULD BE. Git should stop being a
first-class volume, and should instead be a container of some sort.
This
brings a slew of new design questions: Is it just a container like all
the
other app containers? How do I ensure it runs BEFORE my app
containers?
What if it experiences a failure? I don't have answers to all of these
yet.

Now, let's think about the case of docker data volume containers. What
is
the semantic that people are really asking for? Is it:

  • "run" a data container and then expose the entire chroot of that
    run?
  • "run" a data container and then expose all of the VOLUME (from
    Dockerfile) dirs as kubernetes volumes?
  • "run" a data container and then do the equivalent of --volumes-from
    into
    kubernetes containers?

These are all subtly different semantically, especially in the face of
a
data container that has multiple VOLUME statements. Some operating
modes
also make it hard to verify input until after a pod has been accepted,
scheduled, and attempted on a kubelet (we try to validate as much as
we
can
up front).

ACTION ITEM: I'd very much like for people who use docker data
containers
to describe what behavior they would expect here.

Back to side-car containers as volumes. I could imagine something
like:

Pod {
spec: {
volumes: [
{ Name: "git-data",
Source: FromContainer {
Name: "awkward", // what goes here?
Image: "kubernetes/git-volume"
EnvVars: [
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
}
}
],
containers: [ ... use git-data ...]
}

Sadly, there's not much of the container-ness that you can really
hide,
so
you end up re-using the Container schema, which is at least somewhat
redundant and awkward.

Alternately, I could imagine just making them extra containers:

Pod {
spec: {
volumes: [
{ Name: "git-data", Source: EmptyDir {} }
],
containers: [
{
Name: "git-puller",
Image: "kubernetes/git-volume"
EnvVars: [
{ DATADIR: "/vol" },
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
VolumeMounts: [ { Name: "git-data", MountPath: "/vol" } ]
},
{ ... container that reads git-data ... }
]
}
}

It's still pretty verbose. Can we do better?

In this sort of model, almost any sort of logic can be bundled up as a
container and published, and anyone can use it immediately.

Something like this is, I think, the way to go. Details to emerge,
input
wanted.

On Sat, Jan 10, 2015 at 5:42 PM, Ashley Aitken <
notifications@github.com

wrote:

+1 @rehevkor5 https://github.com/rehevkor5

I am disappointed to hear that k8s doesn't support data volume
containers.

I am not sure how I am supposed to abstract r/w data away from the
host
now. I was under the impression that k8s was about abstracting away
the
host, but a host volume introduces all sorts of host-specfics, like
having
to share the same username/group for access to the data.

@rehevkor5 https://github.com/rehevkor5 I thought the same thing
about
data volume containers at first (should be written as FROM scratch)
until I
read this (which may or may not be correct):
http://container42.com/2014/11/18/data-only-container-madness/

Reply to this email directly or view it on GitHub
<

#831 (comment)

.

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/831#issuecomment-69484090>

.

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

Reply to this email directly or view it on GitHub
#831 (comment)
.

@mindscratch
Copy link

Using an emptydir that is available between containers sounds sufficient.
The initial data could actually be baked into the docker image, then the
process in the "data" container could make sure it updates it when
necessary.

After understanding "emptydir" better, I agree, the use case I provided
would work with what Kubernetes supports today.

Thanks.

On Sun, Jan 11, 2015 at 1:36 PM, Tim Hockin notifications@github.com
wrote:

To me, this does not make a strong argument. Everything you describe is
possible if your data container just writes to a shared emptyDir volume.

Now, there IS a gotcha with the initial load of data, but that has to be
handled in any similar model. Either the data is immutable, in which case
the data container can load it once and go to sleep, or else it is changing
over time, time, in which case you have to wait for it to get current. In
the former case, the initial data is ALL that matters. Is that an
interesting use? In the latter, does the initial data matter, or only
"current" data?

The only other argument is to be exactly docker compatible semantically,
but frankly the volumes-from behavior is so semantically rigid, it may not
be worth being compatible with.
On Jan 11, 2015 9:26 AM, "Craig Wickesser" notifications@github.com
wrote:

I'll try to describe a use case for a data container. I'll have a pod
with
3 containers I'll name them "ingest", "process" and "data".

The ingest container is responsible for getting messages in some fashion
and telling the "process" container to do work.

The process container does work, but it requires access to data provided
by
the "data" container, outside of kubernetes this is done using dockers
"volumes-from". This "data" can be 100's of megabytes, but most often is
10-15 gigabytes.

The data container has a process responsible for pulling the data that
will
be needed by the process container. While the process container is doing
work, it's possible that a new set of data becomes available. The data
container can fetch the data and use something like symlinks to swap it
so
the next time the process container begins a new process it's using the
newly available data.

Hopefully that makes some sense.

Thanks

On Sun, Jan 11, 2015 at 12:18 AM, Tim Hockin notifications@github.com
wrote:

To follow up to myself - all of this assumes that any data mutations
you
make have a lifetime equivalent to the pod. If the pod dies for any
reason
(the machine goes down, it gets deleted in the API, some
non-recoverable
failure in kubelet, etc) the data dies with it.

Durable data is a MUCH larger topic :)

On Sat, Jan 10, 2015 at 9:12 PM, Tim Hockin thockin@google.com
wrote:

There are a few things going on here, most importantly (I think) some
confusion.

Kubernetes supports the notion of a writable "empty volume" that is
shared

across containers within a pod, without resorting to host directories

this is what an emptyDir volume is.

Now the question comes down to "but I don't want an empty
directory,
I
want ". The question I think we need to sort out is
what
the is. We currently support pull-from-git, which is
just
a single instance of a larger pattern - "go fetch some pre-cooked
data
once
before starting my pod".

We could support myriad such "fetch once" volume kinds, git, cvs,
svn,
docker containers(more below), URLs, in-line base64-encoded tarfiles,
stdout of another program, etc. I do NOT think we want to support
those
all as independent volume plugins - they can almost all be done by an
unprivileged container without any help from kubelet. More, you
quickly
arrive at the followup features like "...and re-pull from git every
10
minutes" - things that stop being "fetch once" and start being active
management, but do not require privileges. We make great use of such
things internally.

IMO: all of these things that can be run as a side-car container in
your
pod (writing to a shared emptyDir) SHOULD BE. Git should stop being a
first-class volume, and should instead be a container of some sort.
This
brings a slew of new design questions: Is it just a container like
all
the
other app containers? How do I ensure it runs BEFORE my app
containers?
What if it experiences a failure? I don't have answers to all of
these
yet.

Now, let's think about the case of docker data volume containers.
What
is
the semantic that people are really asking for? Is it:

  • "run" a data container and then expose the entire chroot of that
    run?
  • "run" a data container and then expose all of the VOLUME (from
    Dockerfile) dirs as kubernetes volumes?
  • "run" a data container and then do the equivalent of --volumes-from
    into
    kubernetes containers?

These are all subtly different semantically, especially in the face
of
a
data container that has multiple VOLUME statements. Some operating
modes
also make it hard to verify input until after a pod has been
accepted,
scheduled, and attempted on a kubelet (we try to validate as much as
we
can
up front).

ACTION ITEM: I'd very much like for people who use docker data
containers
to describe what behavior they would expect here.

Back to side-car containers as volumes. I could imagine something
like:

Pod {
spec: {
volumes: [
{ Name: "git-data",
Source: FromContainer {
Name: "awkward", // what goes here?
Image: "kubernetes/git-volume"
EnvVars: [
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
}
}
],
containers: [ ... use git-data ...]
}

Sadly, there's not much of the container-ness that you can really
hide,
so
you end up re-using the Container schema, which is at least somewhat
redundant and awkward.

Alternately, I could imagine just making them extra containers:

Pod {
spec: {
volumes: [
{ Name: "git-data", Source: EmptyDir {} }
],
containers: [
{
Name: "git-puller",
Image: "kubernetes/git-volume"
EnvVars: [
{ DATADIR: "/vol" },
{ REPO: "http://github.com/yourname/something"},
{ RESYNC: "true" },
{ RESYNC_INTERVAL: 60 }
]
VolumeMounts: [ { Name: "git-data", MountPath: "/vol" } ]
},
{ ... container that reads git-data ... }
]
}
}

It's still pretty verbose. Can we do better?

In this sort of model, almost any sort of logic can be bundled up as
a
container and published, and anyone can use it immediately.

Something like this is, I think, the way to go. Details to emerge,
input
wanted.

On Sat, Jan 10, 2015 at 5:42 PM, Ashley Aitken <
notifications@github.com

wrote:

+1 @rehevkor5 https://github.com/rehevkor5

I am disappointed to hear that k8s doesn't support data volume
containers.

I am not sure how I am supposed to abstract r/w data away from the
host
now. I was under the impression that k8s was about abstracting away
the
host, but a host volume introduces all sorts of host-specfics, like
having
to share the same username/group for access to the data.

@rehevkor5 https://github.com/rehevkor5 I thought the same thing
about
data volume containers at first (should be written as FROM scratch)
until I
read this (which may or may not be correct):
http://container42.com/2014/11/18/data-only-container-madness/

Reply to this email directly or view it on GitHub
<

#831 (comment)

.

Reply to this email directly or view it on GitHub
<

#831 (comment)

.

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

Reply to this email directly or view it on GitHub
<
#831 (comment)

.


Reply to this email directly or view it on GitHub
#831 (comment)
.

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

@kfox1111
Copy link

kfox1111 commented Oct 4, 2017

I really like the idea of a docker container as a volume from the standpoint of, it lets you reuse all the architecture for distributing containers. scalable doccker-registries, image caching, etc. It also lets you separate concerns. You could do a scalable static website by having a volume container with your web content in it and a main container of nginx. each pod would then have two containers as a scalable unit, and each updatable separately.

@dims
Copy link
Member

dims commented Oct 4, 2017

@renewooller did you try the flex volume? https://github.com/dims/docker-flexvol

@kfox1111
Copy link

kfox1111 commented Oct 4, 2017

no, I missed that. thanks for the pointer. its very interesting. :)

@ieugen
Copy link

ieugen commented Nov 20, 2017

Hi,

I'm adding my use case as an example. docker-flexvol seems to be what I need. Using eemptyDir with copy should also work but it's nasty.

We have a few single page front-end applications: compiled JS+CSS and other resources.
I would like to serve these apps using a single nginx server, replicated. No need to use more.
I would like to consume the apps and benefit from things like easy rollback to previous version. This use case is not nice to solve with copying files.Also it might take time if and 2x space.
I would also like to use the exact image that upstream provides (with matching hashes and gpg signatures and such).

The idea is to create a pod with a single nginx container and mount the files from the docker containers as volumes.

I imagine that I am able to deploy a new version for one of my applications by changing the volume somehow and then trigger a pod restart or something.

So if I have an image called my-container-image in my private repo:

- name: test
    flexVolume:
      driver: "dims.io/docker-flexvol"
      options:
        image: "my-container-image:v1"
        name: "/data-store"

and update the pod to contain:

- name: test
    flexVolume:
      driver: "dims.io/docker-flexvol"
      options:
        image: "my-container-image:v2"
        name: "/data-store"

I expect Kubernetes will do a rolling update to the new version of my app with my-container-image:v2

@dims: Is this use case supported by your plugin? How safe is the plugin and could we see it upstream to be installed via kubeadm? I think these are questions that should be on the project issues list.

@dims
Copy link
Member

dims commented Nov 20, 2017

@ieugen i believe yes, you should be able to do that. It's a pretty simple shell script so feel free to try it and let me know if you see issues.

As for upstream kubeadm etc. if someone wants to take the initiative, i can help.

Thanks,
Dims

@kfox1111
Copy link

Would be interested in getting this into a helm chart somehow... I saw containerized mount utils merged but they say it doesnt work with flexVolumes. Maybe something like:
https://github.com/openstack/kolla-kubernetes/blob/master/helm/microservice/ceph-rbd-daemonset/templates/ceph-rbd-daemonset.yaml

with shared mount namespaces?

@kfox1111
Copy link

alternately, if we could get a statically linked jq, we might be able to just slide in the two files directly onto the host....

@kfox1111
Copy link

or, I guess we could spit the difference and just run jq in a container... docker run -i --rm jq....

@sg3s
Copy link

sg3s commented Jan 3, 2018

It's been a month since I started studying how to use kubernetes (properly). It didn't take me long to find pretty much all use cases mentioned in this issue on my own.

In all cases it comes down to wanting to expose static files to more than one process/container.

For us that means:

  • Expose dir from container A (php runtime with static frontend) to container B (generic nginx) to serve only the static (css/js) files.
  • Expose dir from container A (static js frontend) to container B (generic nginx) to serve those files.

As I understand, the facilities that would allow this functionality are currently only properly supported by containerd(?) in the form of volumes, but it would be very helpful to have...

Some cases could probably be solved by obscuring the copy action needed and ensuring it is successful.

Any other (simple) solution that would allow packaging static files as a single artifact, and then use it inside of a pod/container, without copying it (with postStart commands) each time, would probably also work for most use cases mentioned in this issue. The thing is, with pipelines to build containers and registries to hold/version them all pretty much figured out, they are an extremely handy vassal for this (and in my opinion not outside their scope, it helps with the single responsibility principle).

Anyway, just my 2 cents.

@gjcarneiro
Copy link

Expose dir from container A (php runtime with static frontend) to container B (generic nginx) to serve only the static (css/js) files.
Expose dir from container A (static js frontend) to container B (generic nginx) to serve those files.

I have found that recent Docker multi-stage builds pretty much allow you to do this at docker build level.

@kfox1111
Copy link

kfox1111 commented Jan 3, 2018

multistage builds are not the same thing. multistage builds let you do stuff like build, throw away the build environment and copy the built artefacts to the final container. in the end, you are left with 1 container. it in this situation has for example, nginx and your static files.

In the spirit of k8s composability though, the desire is to have one container for nginx that can be independently updated from a second container storing your static files. they are combined together at runtime via k8s pod semantics. This is what the issue is about.

@gjcarneiro
Copy link

Yes, I agree docker images as volumes is nicer. I'm just leaving a clue, to whoever is reading this bug, how you can work around the missing feature in the meantime.

@spiffxp spiffxp removed the triaged label Mar 16, 2018
@bgrant0607 bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 3, 2018
@kfox1111
Copy link

The recent ephemeral csi volume support along with https://github.com/kubernetes-csi/csi-driver-image-populator should make this possible. :)

@bgrant0607
Copy link
Member

@kfox1111 Cool, thanks!

BTW, Google's internal composable "package" mechanism is described in this talk:
https://www.usenix.org/sites/default/files/conference/protected-files/lisa_2014_talk.pdf
which is mentioned in the SRE book:
https://landing.google.com/sre/sre-book/chapters/release-engineering/

@huangyoukun
Copy link

+1

@ianroberts
Copy link

In the spirit of k8s composability though, the desire is to have one container for nginx that can be independently updated from a second container storing your static files. they are combined together at runtime via k8s pod semantics. This is what the issue is about.

I have a similar requirement in my project - I'm building a platform that exposes a number of REST services for language processing. This includes tools like speech recognition which are structured as an "engine" (a few MB in size, the code that implements the service) and a separate "model" (1-5GB in size, for each specific language/domain/etc.). The two can evolve independently, and we might have three engine versions and 20 different models active at any given time. Currently we have 60 different images, and when a new engine is released we have to re-build 20 images of many GB each to embed all the different models.

Ideally we'd be able to build three images for the engines and 20 images for the models, and mount the model into the engine container via the pod spec. This would make updating the engine a much lighter weight process as the nodes only have to pull one new engine image (a few MB) rather than 20 new engine+model images (potentially up to 100GB).

sallyom pushed a commit to sallyom/kubernetes that referenced this issue Jul 4, 2021
UPSTREAM: <carry>: add CSI migration feature gates for GCE PD and Azure Disk
@willzhang
Copy link

i want mount docker registry auth and certs to official registry images , i'd like put the auth and certs in another image , i don't need prepare auth and certs at local host path

@kfox1111
Copy link

i want mount docker registry auth and certs to official registry images , i'd like put the auth and certs in another image , i don't need prepare auth and certs at local host path

Secrets in general are better suited to be put into Kubernetes secrets then images, IMO. Easier to encrypt/audit. You can treat them like volumes. Could you try that and see if it will do what you want?

@toms-place
Copy link

I already commented here: #6120

But to summarize, I managed to share the container filesystem using shareProcessNamespace like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: datasharing
spec:
  selector:
    matchLabels:
      app: datasharing
  template:
    metadata:
      labels:
        app: datasharing
    spec:
      shareProcessNamespace: true
      containers:
        - name: data
          image: ubuntu
          imagePullPolicy: IfNotPresent
          command: ["/bin/bash", "-c"]
          args: ["trap 'echo signal;exit 0' SIGINT; sleep infinity"]
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          lifecycle:
            postStart:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - mkdir -p /DATA && touch /DATA/empty_test_file
        - name: nginx
          image: nginx:stable
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          ports:
            - containerPort: 8080
          env:
            - name: GET_DATA_PROC_DIR
              value: 'find /proc -maxdepth 1 -type d -regex "/proc/[0-9]*" | head -2 | tail -1'
          lifecycle:
            postStart:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - ln -s $(eval $GET_DATA_PROC_DIR)/root/DATA /mnt/DATA

@arkadijs
Copy link

arkadijs commented Apr 16, 2024

If you can put a native executable into the data container, then it's less tricky to just symlink from there via emptyDir intermediary. Perhaps, also less opportunity for not having the symlink already in place when the service starts.

shareProcessNamespace: true
securityContext:
  runAsUser: 0 # /proc/<pid>/root resolve fails if not root
volumes:
- name: share
  emptyDir: {}
containers:
- name: data
  image: data:1 # FROM busybox
  volumeMounts:
  - name: share
    mountPath: /share
  command:
  - "/bin/sh"
  - "-c"
  - "ln -s /proc/$$$$/root/model /share/; sleep infinity"
- name: service
  image: service:1
  volumeMounts:
  - name: share
    mountPath: /share
    readOnly: true
  command:
  - "/bin/sh"
  - "-c"
  - "while test ! -L /share/model; do sleep 1; done; exec /bin/service --data=/share/model"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle area/usability kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests