Depending on the nature of your jobs, Cascading has built in a
topological scheduler. It will schedule all your work as their
dependencies are satisfied. Dependencies being source data and inter-
job intermediate data.
http://www.cascading.orgThe first catch is that you will still need bash to start/stop your
cluster and to start the cascading job (per your example below).
The second catch is that you currently must use the cascading api (or
the groovy api) to assemble your data processing flows. Hopefully in
the next couple weeks we will have a means to support custom/raw
hadoop jobs as members of a set of dependent jobs.
This feature is being delayed by our adding support for stream
assertions, the ability to validate data during runtime but have the
assertions 'planned' out of the process flow on demand, ie. for
production runs.
And for stream traps, built in support for siphoning off bad data into
side files so long running (or low fidelity) jobs can continue running
without losing any data.
can read more about these features here
http://groups.google.com/group/cascading-userckw
On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:
> I'm interested in the same thing -- is there a recommended way to
> batch
> Hadoop jobs together?
>
> On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang <
richardtechzh@...
> >
> wrote:
>
>> Hello folks:
>> I am running several hadoop applications on hdfs. To save the
>> efforts in
>> issuing the set of commands every time, I am trying to use bash
>> script to
>> run the several applications sequentially. To let the job finishes
>> before
>> it
>> is proceeding to the next job, I am using wait in the script like
>> below.
>>
>> sh bin/start-all.sh
>> wait
>> echo cluster start
>> (bin/hadoop jar hadoop-0.17.0-examples.jar randomwriter -D
>> test.randomwrite.bytes_per_map=107374182 rand)
>> wait
>> bin/hadoop jar hadoop-0.17.0-examples.jar randomtextwriter -D
>> test.randomtextwrite.total_bytes=107374182 rand-text
>> bin/stop-all.sh
>> echo finished hdfs randomwriter experiment
>>
>>
>> However, it always give the error like below. Does anyone have
>> better idea
>> on how to run the multiple sequential jobs with bash script?
>>
>> HadoopScript.sh: line 39: wait: pid 10 is not a child of this shell
>>
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.mapred.JobTracker$IllegalStateException: Job
>> tracker
>> still
>> initializing
>> at
>> org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java:
>> 1722)
>> at
>> org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1730)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:557)
>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
>> at $Proxy1.getNewJobId(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>>
>> org
>> .apache
>> .hadoop
>> .io
>> .retry
>> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> at
>>
>> org
>> .apache
>> .hadoop
>> .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:
>> 59)
>> at $Proxy1.getNewJobId(Unknown Source)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:696)
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
>> 973)
>> at
>> org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:276)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at
>> org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:287)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>>
>> org.apache.hadoop.util.ProgramDriver
>> $ProgramDescription.invoke(ProgramDriver.java:68)
>> at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> at
>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>>
>
>
>
> --
> hustlin, hustlin, everyday I'm hustlin
--
Chris K Wensel
chris@...
http://chris.wensel.net/http://www.cascading.org/