Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed CRON jobs in k8s #2156

Closed
jeefy opened this issue Nov 4, 2014 · 26 comments
Closed

Distributed CRON jobs in k8s #2156

jeefy opened this issue Nov 4, 2014 · 26 comments
Assignees
Labels
area/batch area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@jeefy
Copy link
Member

jeefy commented Nov 4, 2014

Talked with @brendandburns and @jbeda about this briefly.

Being able to submit a job that fires in the k8s cluster periodically (like Chronos) would be a good feature to offer. Something that mirrors the replicationController (periodicController was @jbeda off the cuff name)

JSON would also be pretty similar (in my mind) as well, mirroring replicationController's, with the addition of a timing attribute and possibly a means to notify output.

Thoughts? (Typing this up quick before I run to a bunch of meetings)

@brendandburns
Copy link
Contributor

Yes, I think we should do this. Let's implement it as a plugin to the api
server.

Brendan
On Nov 4, 2014 11:16 AM, "Jeffrey Sica" notifications@github.com wrote:

Talked with @brendandburns https://github.com/brendandburns and @jbeda
https://github.com/jbeda about this briefly.

Being able to submit a job that fires in the k8s cluster periodically
(like Chronos) would be a good feature to offer. Something that mirrors the
replicationController (periodicController was @jbeda
https://github.com/jbeda off the cuff name)

JSON would also be pretty similar (in my mind) as well, mirroring
replicationController's, with the addition of a timing attribute and
possibly a means to notify output.

Thoughts? (Typing this up quick before I run to a bunch of meetings)


Reply to this email directly or view it on GitHub
#2156.

@bgrant0607
Copy link
Member

+1. We should finish #170 to make this easier.

@bgrant0607
Copy link
Member

@smarterclayton Have you thought about a cron-like controller in OpenShift? Almost every service at Google uses such a thing internally, such as for periodically regenerating serving data.

@smarterclayton
Copy link
Contributor

I think we definitely assumed there had to be something like it. In OpenShift today we find 20% of apps opt to use cron (which runs inside the container, driven by a host level cron process - scheduled docker exec). I had originally assumed similarities between cluster wide parallel docker exec at an imperative level (kubectl run command --on pod-labels) and a controller that would do the same on a period. I hadn't thought very far into the resource isolation aspects (ie do you want to do the same thing but with pods).

On Nov 4, 2014, at 4:48 PM, bgrant0607 notifications@github.com wrote:

@smarterclayton Have you thought about a cron-like controller in OpenShift? Almost every service at Google uses such a thing internally, such as for periodically regenerating serving data.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

The creating pods to spin off doesn't solve a lot of the common "in container" operations you might want to schedule (invoke DB stats collection, trim logs, etc). I'd like to see a discussion about that aspect.

@brendandburns
Copy link
Contributor

I started to poke at this. One big problem, is that despite the apiserver
being modular, we heavily rely on IsAPIObject to be implemented in the api
package for all API objects (for encode/decode).

We need to either be ok with there being a bunch of non-core types in those
files, or have some schema based encode/decode option.

--brendan

On Wed, Nov 5, 2014 at 9:20 PM, Clayton Coleman notifications@github.com
wrote:

The creating pods to spin off doesn't solve a lot of the common "in
container" operations you might want to schedule (invoke DB stats
collection, trim logs, etc). I'd like to see a discussion about that aspect.


Reply to this email directly or view it on GitHub
#2156 (comment)
.

@jeefy
Copy link
Member Author

jeefy commented Nov 6, 2014

From a user perspective, I'd like to define both internal (docker exec?) and external (new pod) jobs through a single object. There's merit in allowing both, and I like the idea of defining "in container" jobs at the k8s level when you consider people will be using third party containers that they can't (or won't) embed cron jobs into.

Being able to go "I want usage stats using dumped every hour from every pod labeled 'prod' and 'db' using 'docker exec ....' " as well as "I want to generate our nightly reports and email them out, spin up the reports pod" in the same breath would be ideal.

@smarterclayton
Copy link
Contributor

----- Original Message -----

From a user perspective, I'd like to define both internal (docker exec?) and
external (new pod) jobs through a single object. There's merit in allowing
both, and I like the idea of defining "in container" jobs at the k8s level
when you consider people will be using third party containers that they
can't (or won't) embed cron jobs into.

I think the challenge is that they are different fundamental actions - one is declarative "I want this pod to be created once every X hours", while the imperative "I want to run this command and collect the output and statistics on all of these pods" needs some object / container to mediate the action, gather the output, retry on failures, etc.

It would be worth considering whether we could offer a simple container that could do that action for you so that you could define it as a pod to run (with potentially some API magic making it easier to do). That allows the pod to be the unit of execution, and the logs and success of the total action to be defined in the pod.

Image: kubernetes/docker-executor
Env:
SCRIPT: "bunch of arbitrary script"
CREDENTIALS: "my kube credentials"
SELECTOR: "pod label selector"
Command: /a/remoteExecutionBinary

Binary takes the script ENV and runs it on all hosts:

  #!/bin/bash
  kubectl exec --token=$CREDENTIALS -l -l $SELECTOR -- /bin/bash -c $SCRIPT

The internal job can then become an external job, and the CLI can support the external job.

The challenge with internal job execution is the streaming of potentially very large sets of data (tar omcf - / | cat) across the wire from hundreds of pods. That requires some sort of proxy to mediate the execution of the job, buffer, etc. We haven't talked about it much yet.

Being able to go "I want usage stats using dumped every hour from every pod
labeled 'prod' and 'db' using 'docker exec ....' " as well as "I want to
generate our nightly reports and email them out, spin up the reports pod" in
the same breath would be ideal.


Reply to this email directly or view it on GitHub:
#2156 (comment)

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 4, 2014
@bgrant0607 bgrant0607 added status/help-wanted area/usability sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Feb 28, 2015
@smarterclayton
Copy link
Contributor

@soltysh please link your proposal here once it's more complete

@soltysh
Copy link
Contributor

soltysh commented Mar 6, 2015

@smarterclayton will do so as soon as I gather some more feedback around that.

In the meantime I have a question for @jeefy. Given what @smarterclayton proposed with the ability to run in-container commands (eg. using #3763) are you convinced on having just one job type which will, similarly to RCs, specify which image to run and the schedule?

@xudifsd
Copy link
Contributor

xudifsd commented Mar 12, 2015

Hi, I'm interested in implementing this as gsoc work, but it seems @soltysh is already doing this, I'm not sure if this still is a gsoc idea?

@soltysh
Copy link
Contributor

soltysh commented Mar 12, 2015

@xudifsd me and a couple of folks we are currently working on a proposal to drive the discussion in this topic. I can't tell you when exactly the implementation of it will start.

@bgrant0607
Copy link
Member

@soltysh Does that imply you plan to build it?

@xudifsd Sorry, it's a bit hard to tell what people will start working on.

@soltysh
Copy link
Contributor

soltysh commented Mar 13, 2015

On Fri, Mar 13, 2015 at 5:44 AM, Brian Grant notifications@github.com
wrote:

@soltysh https://github.com/soltysh Does that imply you plan to build
it?

Definitely I'll be one of the implementers.

@xudifsd https://github.com/xudifsd Sorry, it's a bit hard to tell what
people will start working on.


Reply to this email directly or view it on GitHub
#2156 (comment)
.

@smarterclayton
Copy link
Contributor

http://queue.acm.org/detail.cfm?id=2745840 A little light reading for the weekend.

@bgrant0607
Copy link
Member

Thanks for the pointer. Yes, Stepan was TL of our internal cron service.

@bgrant0607
Copy link
Member

Also worth a look: https://github.com/mesos/chronos

@davidopp
Copy link
Member

#11980 is implementing a version of this.

@davidopp davidopp added team/control-plane and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. team/master labels Aug 22, 2015
@DrRibosome
Copy link

just wondering if this feature is still in the works

@soltysh
Copy link
Contributor

soltysh commented Jun 2, 2016

It's still being worked on, unfortunately it didn't meet the 1.3 deadline so it'll sleep into 1.4.

@eghobo
Copy link

eghobo commented Jun 4, 2016

@soltysh: sorry to hear that this feature missed 1.3 ):. do you have any ETA in mind? we have big interest at this feature, how we can help?

@soltysh
Copy link
Contributor

soltysh commented Jun 6, 2016

@eghobo we're waiting for 1.3 to land and additionally we need to figure out some multi-versioning problems. But the general idea is to have it as soon as possible.

@SEJeff
Copy link
Contributor

SEJeff commented Jul 27, 2016

Not seeing a link to it in this issue, so to help the next person looking, this is fixed in git via #24970. It implements the scheduled job api which seemingly uses a normal crontab style API for job scheduling.

@soltysh
Copy link
Contributor

soltysh commented Jul 27, 2016

@SEJeff not quite, that PR introduces only the API part. The remaining (not merged yet) parts are tracked in kubernetes/enhancements#19.

@SEJeff
Copy link
Contributor

SEJeff commented Jul 27, 2016

Gotcha, thanks

@erictune
Copy link
Member

ScheduledJob is alpha in 1.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/batch area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests