« Return to Thread: does anyone have idea on how to run multiple sequential jobs with bash script

Re: does anyone have idea on how to run multiple sequential jobs with bash script

by Chris K Wensel-2 :: Rate this Message:

Reply to Author | View in Thread


Depending on the nature of your jobs, Cascading has built in a  
topological scheduler. It will schedule all your work as their  
dependencies are satisfied. Dependencies being source data and inter-
job intermediate data.

http://www.cascading.org

The first catch is that you will still need bash to start/stop your  
cluster and to start the cascading job (per your example below).

The second catch is that you currently must use the cascading api  (or  
the groovy api) to assemble your data processing flows. Hopefully in  
the next couple weeks we will have a means to support custom/raw  
hadoop jobs as members of a set of dependent jobs.

This feature is being delayed by our adding support for stream  
assertions, the ability to validate data during runtime but have the  
assertions 'planned' out of the process flow on demand, ie. for  
production runs.

And for stream traps, built in support for siphoning off bad data into  
side files so long running (or low fidelity) jobs can continue running  
without losing any data.

can read more about these features here
http://groups.google.com/group/cascading-user

ckw

On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:

> I'm interested in the same thing -- is there a recommended way to  
> batch
> Hadoop jobs together?
>
> On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang <richardtechzh@...
> >
> wrote:
>
>> Hello folks:
>> I am running several hadoop applications on hdfs. To save the  
>> efforts in
>> issuing the set of commands every time, I am trying to use bash  
>> script to
>> run the several applications sequentially. To let the job finishes  
>> before
>> it
>> is proceeding to the next job, I am using wait in the script like  
>> below.
>>
>> sh bin/start-all.sh
>> wait
>> echo cluster start
>> (bin/hadoop jar hadoop-0.17.0-examples.jar randomwriter -D
>> test.randomwrite.bytes_per_map=107374182 rand)
>> wait
>> bin/hadoop jar hadoop-0.17.0-examples.jar randomtextwriter  -D
>> test.randomtextwrite.total_bytes=107374182 rand-text
>> bin/stop-all.sh
>> echo finished hdfs randomwriter experiment
>>
>>
>> However, it always give the error like below. Does anyone have  
>> better idea
>> on how to run the multiple sequential jobs with bash script?
>>
>> HadoopScript.sh: line 39: wait: pid 10 is not a child of this shell
>>
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.mapred.JobTracker$IllegalStateException: Job  
>> tracker
>> still
>> initializing
>>       at
>> org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java:
>> 1722)
>>       at
>> org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1730)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>>
>>       at org.apache.hadoop.ipc.Client.call(Client.java:557)
>>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
>>       at $Proxy1.getNewJobId(Unknown Source)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at
>>
>> org
>> .apache
>> .hadoop
>> .io
>> .retry
>> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>       at
>>
>> org
>> .apache
>> .hadoop
>> .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:
>> 59)
>>       at $Proxy1.getNewJobId(Unknown Source)
>>       at  
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:696)
>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
>> 973)
>>       at
>> org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:276)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at
>> org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:287)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at
>>
>> org.apache.hadoop.util.ProgramDriver
>> $ProgramDescription.invoke(ProgramDriver.java:68)
>>       at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>       at
>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at
>>
>> sun
>> .reflect
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at
>>
>> sun
>> .reflect
>> .DelegatingMethodAccessorImpl
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>       at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>>
>
>
>
> --
> hustlin, hustlin, everyday I'm hustlin

--
Chris K Wensel
chris@...
http://chris.wensel.net/
http://www.cascading.org/





 « Return to Thread: does anyone have idea on how to run multiple sequential jobs with bash script