|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Hadoop only processing the first 64 meg block of a 2 gig fileHello,
I'm trying to get Hadoop to process a 2 gig file but it seems to only be processing the first block. I'm running the exact Hadoop vmware image that is available here http://dl.google.com/edutools/hadoop-vmware.zip without any tweaks or modifications to it. I think my file has been properly loaded into HDFS (hdfs reports it as having 2270607035 bytes) but when I run the example wordcount task it only seems to operate on the first 64 meg chunk (Map input bytes is reported as 67239230 when the job completes). Is the image setup to only run the first block, and if so how to I change this so it runs over the whole file? Any help would be greatly appreciated. Thanks, --Matt P.S. Here are the commands I've actually run to verify that the file is in the hdfs and to run the wordcount example along with their output: hadoop dfs -ls /clickdir Found 1 items /clickdir/cf709.txt <r 1> 2270607035 hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to process : 1 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 08/01/18 00:27:38 INFO mapred.JobClient: org.apache.hadoop.examples.WordCount$Counter 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 08/01/18 00:27:38 INFO mapred.JobClient: Map output records=13050362 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 08/01/18 00:27:38 INFO mapred.JobClient: Map output bytes=118620427 08/01/18 00:27:38 INFO mapred.JobClient: Combine input records=13050362 08/01/18 00:27:38 INFO mapred.JobClient: Combine output records=926405 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input groups=709097 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input records=926405 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output records=709097 |
|
|
Re: Hadoop only processing the first 64 meg block of a 2 gig fileGo into the web interface and look at the file. See if you can see all of the blocks. On 1/18/08 7:46 AM, "Matt Herndon" <mherndon@...> wrote: > Hello, > > > > I'm trying to get Hadoop to process a 2 gig file but it seems to only be > processing the first block. I'm running the exact Hadoop vmware image > that is available here http://dl.google.com/edutools/hadoop-vmware.zip > without any tweaks or modifications to it. I think my file has been > properly loaded into HDFS (hdfs reports it as having 2270607035 bytes) > but when I run the example wordcount task it only seems to operate on > the first 64 meg chunk (Map input bytes is reported as 67239230 when the > job completes). Is the image setup to only run the first block, and if > so how to I change this so it runs over the whole file? Any help would > be greatly appreciated. > > > > Thanks, > > > > --Matt > > > > P.S. Here are the commands I've actually run to verify that the file is > in the hdfs and to run the wordcount example along with their output: > > > > hadoop dfs -ls /clickdir > > Found 1 items > > /clickdir/cf709.txt <r 1> 2270607035 > > > > hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 > > 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 > > 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% > > 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% > > 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% > > 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% > > 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% > > 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% > > 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% > > 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% > > 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% > > 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% > > 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% > > 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% > > 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% > > 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% > > 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% > > 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% > > 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% > > 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% > > 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% > > 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% > > 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% > > 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% > > 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% > > 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% > > 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% > > 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% > > 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% > > 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% > > 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% > > 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% > > 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% > > 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% > > 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% > > 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% > > 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% > > 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% > > 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% > > 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% > > 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% > > 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% > > 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% > > 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% > > 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% > > 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% > > 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% > > 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% > > 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% > > 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% > > 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 > > 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 > > 08/01/18 00:27:38 INFO mapred.JobClient: > org.apache.hadoop.examples.WordCount$Counter > > 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 > > 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework > > 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map output records=13050362 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map output bytes=118620427 > > 08/01/18 00:27:38 INFO mapred.JobClient: Combine input > records=13050362 > > 08/01/18 00:27:38 INFO mapred.JobClient: Combine output > records=926405 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input groups=709097 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input records=926405 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output > records=709097 > |
|
|
RE: Hadoop only processing the first 64 meg block of a 2 gig fileYep, I can see all 34 blocks and view chunks of actual data from each
using the web interface (quite a nifty tool). Any other suggestions? --Matt -----Original Message----- From: Ted Dunning [mailto:tdunning@...] Sent: Friday, January 18, 2008 11:23 AM To: hadoop-user@... Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig file Go into the web interface and look at the file. See if you can see all of the blocks. On 1/18/08 7:46 AM, "Matt Herndon" <mherndon@...> wrote: > Hello, > > > > I'm trying to get Hadoop to process a 2 gig file but it seems to only be > processing the first block. I'm running the exact Hadoop vmware image > that is available here http://dl.google.com/edutools/hadoop-vmware.zip > without any tweaks or modifications to it. I think my file has been > properly loaded into HDFS (hdfs reports it as having 2270607035 bytes) > but when I run the example wordcount task it only seems to operate on > the first 64 meg chunk (Map input bytes is reported as 67239230 when the > job completes). Is the image setup to only run the first block, and if > so how to I change this so it runs over the whole file? Any help would > be greatly appreciated. > > > > Thanks, > > > > --Matt > > > > P.S. Here are the commands I've actually run to verify that the file > in the hdfs and to run the wordcount example along with their output: > > > > hadoop dfs -ls /clickdir > > Found 1 items > > /clickdir/cf709.txt <r 1> 2270607035 > > > > hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 > > 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 > > 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% > > 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% > > 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% > > 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% > > 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% > > 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% > > 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% > > 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% > > 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% > > 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% > > 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% > > 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% > > 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% > > 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% > > 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% > > 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% > > 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% > > 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% > > 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% > > 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% > > 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% > > 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% > > 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% > > 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% > > 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% > > 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% > > 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% > > 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% > > 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% > > 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% > > 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% > > 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% > > 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% > > 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% > > 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% > > 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% > > 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% > > 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% > > 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% > > 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% > > 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% > > 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% > > 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% > > 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% > > 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% > > 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% > > 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% > > 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% > > 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 > > 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 > > 08/01/18 00:27:38 INFO mapred.JobClient: > org.apache.hadoop.examples.WordCount$Counter > > 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 > > 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework > > 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map output > > 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 > > 08/01/18 00:27:38 INFO mapred.JobClient: Map output bytes=118620427 > > 08/01/18 00:27:38 INFO mapred.JobClient: Combine input > records=13050362 > > 08/01/18 00:27:38 INFO mapred.JobClient: Combine output > records=926405 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input groups=709097 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input records=926405 > > 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output > records=709097 > |
|
|
Re: Hadoop only processing the first 64 meg block of a 2 gig fileLook at the map/reduce control panel on the web to look at your map tasks. If you drill all the way down, you can look at the output from the tasks. There is a good chance that your map task is exiting abnormally. On 1/18/08 8:37 AM, "Matt Herndon" <mherndon@...> wrote: > Yep, I can see all 34 blocks and view chunks of actual data from each > using the web interface (quite a nifty tool). Any other suggestions? > > --Matt > > -----Original Message----- > From: Ted Dunning [mailto:tdunning@...] > Sent: Friday, January 18, 2008 11:23 AM > To: hadoop-user@... > Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig > file > > > Go into the web interface and look at the file. > > See if you can see all of the blocks. > > > On 1/18/08 7:46 AM, "Matt Herndon" <mherndon@...> wrote: > >> Hello, >> >> >> >> I'm trying to get Hadoop to process a 2 gig file but it seems to only > be >> processing the first block. I'm running the exact Hadoop vmware image >> that is available here http://dl.google.com/edutools/hadoop-vmware.zip >> without any tweaks or modifications to it. I think my file has been >> properly loaded into HDFS (hdfs reports it as having 2270607035 > bytes) >> but when I run the example wordcount task it only seems to operate on >> the first 64 meg chunk (Map input bytes is reported as 67239230 when > the >> job completes). Is the image setup to only run the first block, and > if >> so how to I change this so it runs over the whole file? Any help > would >> be greatly appreciated. >> >> >> >> Thanks, >> >> >> >> --Matt >> >> >> >> P.S. Here are the commands I've actually run to verify that the file > is >> in the hdfs and to run the wordcount example along with their output: >> >> >> >> hadoop dfs -ls /clickdir >> >> Found 1 items >> >> /clickdir/cf709.txt <r 1> 2270607035 >> >> >> >> hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 >> >> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to >> process : 1 >> >> 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 >> >> 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% >> >> 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% >> >> 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% >> >> 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% >> >> 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% >> >> 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% >> >> 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% >> >> 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% >> >> 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% >> >> 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% >> >> 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% >> >> 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% >> >> 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% >> >> 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% >> >> 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% >> >> 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% >> >> 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% >> >> 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% >> >> 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% >> >> 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% >> >> 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% >> >> 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% >> >> 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% >> >> 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% >> >> 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% >> >> 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% >> >> 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% >> >> 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% >> >> 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% >> >> 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% >> >> 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% >> >> 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% >> >> 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% >> >> 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% >> >> 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% >> >> 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% >> >> 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% >> >> 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% >> >> 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% >> >> 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% >> >> 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% >> >> 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% >> >> 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% >> >> 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% >> >> 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% >> >> 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% >> >> 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% >> >> 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: >> org.apache.hadoop.examples.WordCount$Counter >> >> 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > bytes=118620427 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine input >> records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine output >> records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > groups=709097 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output >> records=709097 >> > |
|
|
RE: Hadoop only processing the first 64 meg block of a 2 gig fileDrilling into it I see that there were 34 map tasks, but the first one
is the only one that really did anything. Looking at the counters for it I see that the first map task processed 276,884 input records, but that every other map task processed only 17 records. If all map tasks had processed the same amount as the first then the file would have been totally processed. I don't see any abnormal exits on these map tasks but I should look into that further. It would seem strange for one to work and the copies of it to fail. We're getting closer to the root, and I'll look more into this when I'm back next week. Thanks for your help so far. If anyone else has suggestions over the weekend feel free to share. --Matt -----Original Message----- From: Ted Dunning [mailto:tdunning@...] Sent: Friday, January 18, 2008 12:05 PM To: hadoop-user@... Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig file Look at the map/reduce control panel on the web to look at your map tasks. If you drill all the way down, you can look at the output from the tasks. There is a good chance that your map task is exiting abnormally. On 1/18/08 8:37 AM, "Matt Herndon" <mherndon@...> wrote: > Yep, I can see all 34 blocks and view chunks of actual data from each > using the web interface (quite a nifty tool). Any other suggestions? > > --Matt > > -----Original Message----- > From: Ted Dunning [mailto:tdunning@...] > Sent: Friday, January 18, 2008 11:23 AM > To: hadoop-user@... > Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig > file > > > Go into the web interface and look at the file. > > See if you can see all of the blocks. > > > On 1/18/08 7:46 AM, "Matt Herndon" <mherndon@...> wrote: > >> Hello, >> >> >> >> I'm trying to get Hadoop to process a 2 gig file but it seems to only > be >> processing the first block. I'm running the exact Hadoop vmware >> that is available here http://dl.google.com/edutools/hadoop-vmware.zip >> without any tweaks or modifications to it. I think my file has been >> properly loaded into HDFS (hdfs reports it as having 2270607035 > bytes) >> but when I run the example wordcount task it only seems to operate on >> the first 64 meg chunk (Map input bytes is reported as 67239230 when > the >> job completes). Is the image setup to only run the first block, and > if >> so how to I change this so it runs over the whole file? Any help > would >> be greatly appreciated. >> >> >> >> Thanks, >> >> >> >> --Matt >> >> >> >> P.S. Here are the commands I've actually run to verify that the file > is >> in the hdfs and to run the wordcount example along with their output: >> >> >> >> hadoop dfs -ls /clickdir >> >> Found 1 items >> >> /clickdir/cf709.txt <r 1> 2270607035 >> >> >> >> hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 >> >> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to >> process : 1 >> >> 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 >> >> 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% >> >> 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% >> >> 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% >> >> 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% >> >> 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% >> >> 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% >> >> 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% >> >> 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% >> >> 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% >> >> 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% >> >> 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% >> >> 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% >> >> 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% >> >> 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% >> >> 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% >> >> 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% >> >> 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% >> >> 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% >> >> 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% >> >> 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% >> >> 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% >> >> 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% >> >> 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% >> >> 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% >> >> 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% >> >> 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% >> >> 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% >> >> 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% >> >> 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% >> >> 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% >> >> 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% >> >> 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% >> >> 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% >> >> 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% >> >> 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% >> >> 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% >> >> 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% >> >> 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% >> >> 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% >> >> 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% >> >> 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% >> >> 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% >> >> 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% >> >> 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% >> >> 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% >> >> 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% >> >> 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% >> >> 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: >> org.apache.hadoop.examples.WordCount$Counter >> >> 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > bytes=118620427 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine input >> records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine output >> records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > groups=709097 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output >> records=709097 >> > |
|
|
|
| Free embeddable forum powered by Nabble | Forum Help |