HBase 0.20.1 on Ubuntu 9.04: master fails to start

View: New views
13 Messages — Rating Filter:   Alert me  

HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello.

We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
Hadoop 0.20.1 on such environment:
*h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm
*s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
0.20.1, HBase 0.20.1

File */etc/hosts*

> 127.0.0.1       localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
Hadoop and HBase are running in pseudo-distributed mode:
Two options added to *hadoop-env.sh*:

> export JAVA_HOME=/usr/lib/jvm/java-6-sun
> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
>
*core-site.xml*:

> <configuration>
> <property>
>   <name>fs.default.name</name>
>   <value>hdfs://127.0.0.1:9000</value>
> </property>
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/hadoop/tmp/hadoop-${user.name}</value>
>   <description>A base for other temporary directories.</description>
> </property>
> </configuration>
>
*hdfs-site.xml*:

> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>   </property>
> <property>
>   <name>dfs.name.dir</name>
>   <value>/hadoop/hdfs/name</value>
> </property>
> <property>
>   <name>dfs.data.dir</name>
>   <value>/hadoop/hdfs/data</value>
> </property>
> <property>
>   <name>dfs.datanode.socket.write.timeout</name>
>   <value>0</value>
> </property>
> <property>
>    <name>dfs.datanode.max.xcievers</name>
>    <value>1023</value>
> </property>
> </configuration>
>
*marped-site.xml:*

> <configuration>
> <property>
>   <name>mapred.job.tracker</name>
>   <value>127.0.0.1:9001</value>
> </property>
> </configuration>
>
*hbase-site.xml:*

> <configuration>
>   <property>
>     <name>hbase.rootdir</name>
>     <value>hdfs://localhost:9000/</value>
>     <description>The directory shared by region servers.
>     Should be fully-qualified to include the filesystem to use.
>     E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
>     </description>
>   </property>
>   <property>
>     <name>hbase.master</name>
>     <value>127.0.0.1:60000</value>
>     <description>The host and port that the HBase master runs at.
>     </description>
>   </property>
>   <property>
>      <name>hbase.tmp.dir</name>
>      <value>/hadoop/tmp/hbase-${user.name}</value>
>      <description>Temporary directory on the local
> filesystem.</description>
>   </property>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>127.0.0.1</value>
>         <description>The directory shared by region servers.
>         </description>
>     </property>
> </configuration>
>
 Hadoop and HBase are running under *hbase *user, all necessary directories
are owned by *hbase *user (I mean */hadoop* directory and all its
subdirectories).

First launch was successfull, but after several days of work we trapt in
problem that hbase master was down, then we tried to restart it (*
stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error:

> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer
> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file
> /hbase.version could only be replicated to 0 nodes, instead of 1         at
> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267)
> at
> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422)
>

Then I tried to reformat hdfs (then, also remove all hadoop and hbase data,
then format hdfs again) and start hadoop and hbase again, but HBase master
fails to start with the same error.

Could someone revise our configuration and tell us what is the reason for
such HBase master instance behaviour?

Thanks in advance, Artyom
-------------------------------------------------
Best wishes, Artyom Shvedchikov

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Artyom,

Your configuration files look just fine.


>> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer
>> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file
>> /hbase.version could only be replicated to 0 nodes, instead of 1

I'm not totally sure, but I think this exception occurs when there is
no HDFS data node available in the cluster.

Can you access to the HDFS name node status screen at
<http://servers-ip:50070/> from a web browser to see if there is a
data node available?

Thanks,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan


On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> wrote:

> Hello.
>
> We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
> Hadoop 0.20.1 on such environment:
> *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm
> *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
> 0.20.1, HBase 0.20.1
>
> File */etc/hosts*
>
>> 127.0.0.1       localhost
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1     localhost ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>>
> Hadoop and HBase are running in pseudo-distributed mode:
> Two options added to *hadoop-env.sh*:
>
>> export JAVA_HOME=/usr/lib/jvm/java-6-sun
>> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
>>
> *core-site.xml*:
>
>> <configuration>
>> <property>
>>   <name>fs.default.name</name>
>>   <value>hdfs://127.0.0.1:9000</value>
>> </property>
>> <property>
>>   <name>hadoop.tmp.dir</name>
>>   <value>/hadoop/tmp/hadoop-${user.name}</value>
>>   <description>A base for other temporary directories.</description>
>> </property>
>> </configuration>
>>
> *hdfs-site.xml*:
>
>> <configuration>
>>   <property>
>>     <name>dfs.replication</name>
>>     <value>1</value>
>>   </property>
>> <property>
>>   <name>dfs.name.dir</name>
>>   <value>/hadoop/hdfs/name</value>
>> </property>
>> <property>
>>   <name>dfs.data.dir</name>
>>   <value>/hadoop/hdfs/data</value>
>> </property>
>> <property>
>>   <name>dfs.datanode.socket.write.timeout</name>
>>   <value>0</value>
>> </property>
>> <property>
>>    <name>dfs.datanode.max.xcievers</name>
>>    <value>1023</value>
>> </property>
>> </configuration>
>>
> *marped-site.xml:*
>
>> <configuration>
>> <property>
>>   <name>mapred.job.tracker</name>
>>   <value>127.0.0.1:9001</value>
>> </property>
>> </configuration>
>>
> *hbase-site.xml:*
>
>> <configuration>
>>   <property>
>>     <name>hbase.rootdir</name>
>>     <value>hdfs://localhost:9000/</value>
>>     <description>The directory shared by region servers.
>>     Should be fully-qualified to include the filesystem to use.
>>     E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
>>     </description>
>>   </property>
>>   <property>
>>     <name>hbase.master</name>
>>     <value>127.0.0.1:60000</value>
>>     <description>The host and port that the HBase master runs at.
>>     </description>
>>   </property>
>>   <property>
>>      <name>hbase.tmp.dir</name>
>>      <value>/hadoop/tmp/hbase-${user.name}</value>
>>      <description>Temporary directory on the local
>> filesystem.</description>
>>   </property>
>>     <property>
>>         <name>hbase.zookeeper.quorum</name>
>>         <value>127.0.0.1</value>
>>         <description>The directory shared by region servers.
>>         </description>
>>     </property>
>> </configuration>
>>
>  Hadoop and HBase are running under *hbase *user, all necessary directories
> are owned by *hbase *user (I mean */hadoop* directory and all its
> subdirectories).
>
> First launch was successfull, but after several days of work we trapt in
> problem that hbase master was down, then we tried to restart it (*
> stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error:
>
>> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer
>> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file
>> /hbase.version could only be replicated to 0 nodes, instead of 1         at
>> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267)
>> at
>> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422)
>>
>
> Then I tried to reformat hdfs (then, also remove all hadoop and hbase data,
> then format hdfs again) and start hadoop and hbase again, but HBase master
> fails to start with the same error.
>
> Could someone revise our configuration and tell us what is the reason for
> such HBase master instance behaviour?
>
> Thanks in advance, Artyom
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov
>

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Tatsuya
Thank you for the fast assistance.

I'm not totally sure, but I think this exception occurs when there is
> no HDFS data node available in the cluster.
>
> Can you access to the HDFS name node status screen at
> <http://servers-ip:50070/> from a web browser to see if there is a
> data node available?
>

Yes, the HDFS name node status is accessible and data node is available
through a web browser using url <http://servers-ip:50070/>.

Could you provide some examples when data node does not available in the
cluster and for the HBase master?
-------------------------------------------------
Best wishes, Artyom Shvedchikov


On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano
<tatsuyaml@...>wrote:

> Hi Artyom,
>
> Your configuration files look just fine.
>
>
> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
> datastreamer
> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
> file
> >> /hbase.version could only be replicated to 0 nodes, instead of 1
>
> I'm not totally sure, but I think this exception occurs when there is
> no HDFS data node available in the cluster.
>
> Can you access to the HDFS name node status screen at
> <http://servers-ip:50070/> from a web browser to see if there is a
> data node available?
>
> Thanks,
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
>
> On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...>
> wrote:
> > Hello.
> >
> > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
> > Hadoop 0.20.1 on such environment:
> > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200
> Rpm
> > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
> > 0.20.1, HBase 0.20.1
> >
> > File */etc/hosts*
> >
> >> 127.0.0.1       localhost
> >>
> >> # The following lines are desirable for IPv6 capable hosts
> >> ::1     localhost ip6-localhost ip6-loopback
> >> fe00::0 ip6-localnet
> >> ff00::0 ip6-mcastprefix
> >> ff02::1 ip6-allnodes
> >> ff02::2 ip6-allrouters
> >> ff02::3 ip6-allhosts
> >>
> > Hadoop and HBase are running in pseudo-distributed mode:
> > Two options added to *hadoop-env.sh*:
> >
> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun
> >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
> >>
> > *core-site.xml*:
> >
> >> <configuration>
> >> <property>
> >>   <name>fs.default.name</name>
> >>   <value>hdfs://127.0.0.1:9000</value>
> >> </property>
> >> <property>
> >>   <name>hadoop.tmp.dir</name>
> >>   <value>/hadoop/tmp/hadoop-${user.name}</value>
> >>   <description>A base for other temporary directories.</description>
> >> </property>
> >> </configuration>
> >>
> > *hdfs-site.xml*:
> >
> >> <configuration>
> >>   <property>
> >>     <name>dfs.replication</name>
> >>     <value>1</value>
> >>   </property>
> >> <property>
> >>   <name>dfs.name.dir</name>
> >>   <value>/hadoop/hdfs/name</value>
> >> </property>
> >> <property>
> >>   <name>dfs.data.dir</name>
> >>   <value>/hadoop/hdfs/data</value>
> >> </property>
> >> <property>
> >>   <name>dfs.datanode.socket.write.timeout</name>
> >>   <value>0</value>
> >> </property>
> >> <property>
> >>    <name>dfs.datanode.max.xcievers</name>
> >>    <value>1023</value>
> >> </property>
> >> </configuration>
> >>
> > *marped-site.xml:*
> >
> >> <configuration>
> >> <property>
> >>   <name>mapred.job.tracker</name>
> >>   <value>127.0.0.1:9001</value>
> >> </property>
> >> </configuration>
> >>
> > *hbase-site.xml:*
> >
> >> <configuration>
> >>   <property>
> >>     <name>hbase.rootdir</name>
> >>     <value>hdfs://localhost:9000/</value>
> >>     <description>The directory shared by region servers.
> >>     Should be fully-qualified to include the filesystem to use.
> >>     E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
> >>     </description>
> >>   </property>
> >>   <property>
> >>     <name>hbase.master</name>
> >>     <value>127.0.0.1:60000</value>
> >>     <description>The host and port that the HBase master runs at.
> >>     </description>
> >>   </property>
> >>   <property>
> >>      <name>hbase.tmp.dir</name>
> >>      <value>/hadoop/tmp/hbase-${user.name}</value>
> >>      <description>Temporary directory on the local
> >> filesystem.</description>
> >>   </property>
> >>     <property>
> >>         <name>hbase.zookeeper.quorum</name>
> >>         <value>127.0.0.1</value>
> >>         <description>The directory shared by region servers.
> >>         </description>
> >>     </property>
> >> </configuration>
> >>
> >  Hadoop and HBase are running under *hbase *user, all necessary
> directories
> > are owned by *hbase *user (I mean */hadoop* directory and all its
> > subdirectories).
> >
> > First launch was successfull, but after several days of work we trapt in
> > problem that hbase master was down, then we tried to restart it (*
> > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error:
> >
> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
> datastreamer
> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
> file
> >> /hbase.version could only be replicated to 0 nodes, instead of 1
> at
> >>
> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422)
> >>
> >
> > Then I tried to reformat hdfs (then, also remove all hadoop and hbase
> data,
> > then format hdfs again) and start hadoop and hbase again, but HBase
> master
> > fails to start with the same error.
> >
> > Could someone revise our configuration and tell us what is the reason for
> > such HBase master instance behaviour?
> >
> > Thanks in advance, Artyom
> > -------------------------------------------------
> > Best wishes, Artyom Shvedchikov
> >
>

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Artyom,

>> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?

> Yes, the HDFS name node status is accessible and data node is available
> through a web browser using url <http://servers-ip:50070/>.
>
> Could you provide some examples when data node does not available in the
> cluster and for the HBase master?


I happen to have an Ubuntu 9.04 virtual server installation, so I set
up HDFS on it to see if I can reproduce the exception you had. And I
found I can easily reproduce this by the following steps:

1. Delete hadoop data directory
2. bin/hadoop namenode -format
3. bin/start-all.sh
    -> namenode will start immediately and go in service, but data
node will be making a long (almost seven minutes) pause in a middle of
the startup.

4. Before the data node becomes ready, do an HDFS write operation
(e.g. "bin/hadoop fs -put conf input"), and then the write operations
will fail with the following error:
------------------------------------------------
09/10/28 09:00:19 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/tatsuya/input/capacity-scheduler.xml could only be replicated to
0 nodes, instead of 1
...

09/10/28 09:00:19 WARN hdfs.DFSClient: Error Recovery for block null
bad datanode[0] nodes == null
09/10/28 09:00:19 WARN hdfs.DFSClient: Could not get block locations.
Source file "/user/tatsuya/input/capacity-scheduler.xml" - Aborting...
------------------------------------------------

This doesn't seem to be a desired behavior of HDFS; shouldn't HDFS be
in the safe mode while data node is not ready?


Also, if I skip step #1 and 2, the problem doesn't happen. The data
node still does the long pause at startup, but HDFS cluster will start
in the safe mode and wait for the data node to become ready. HBase
deals with HDFS safe mode, so HBase should work fine in this case.

Can you check if this is your case? If so, you can avoid this by not
running "start-hbase.sh" until HDFS has the data nodes available.



I have done a little more investigation why the data node makes the
long pause on Ubuntu 9.04. It seems there is a problem with SUN JRE
SecureRandom implementation on Linux, and this causes Jetty (used in
the data node) to slow down to create its session ID manager.


Here is the data node log, with a seven-minute pause while it's trying
to start Jetty.
------------------------------------------------
2009-10-28 09:00:10,559 INFO org.mortbay.log: jetty-6.1.14
2009-10-28 09:06:54,165 INFO org.mortbay.log: Started
SelectChannelConnector@...:50075
------------------------------------------------


Here is a part of a full thread dump;
sun.security.provider.SecureRandom is taking long time (forever?) to
finish.
------------------------------------------------
"main" prio=10 tid=0x00000000409a8800 nid=0xba2 runnable [0x00007ff762a32000]
   java.lang.Thread.State: RUNNABLE
     at java.io.FileInputStream.readBytes(Native Method)
     ...
     - locked <0x00007ff749edfbb8> (a java.io.BufferedInputStream)

     at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453)
     at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:123)
     at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:118)
     at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
     at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171)
     - locked <0x00007ff749edf388> (a sun.security.provider.SecureRandom)
     at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
     - locked <0x00007ff749edf6b8> (a java.security.SecureRandom)
     at java.security.SecureRandom.next(SecureRandom.java:455)
     at java.util.Random.nextLong(Random.java:284)
     at org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139)
     ...

     at org.apache.hadoop.http.HttpServer.start(HttpServer.java:460)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:375)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
------------------------------------------------


And I found this is a known issue on Jetty:
http://jira.codehaus.org/browse/JETTY-331

It says you could workaround by changing Jetty setting to use
"java.util.Random" instead of "sun.security.provider.SecureRandom". I
don't know if this is a correct way to workaround. I'd better ask HDFS
folks at hdfs-user mailing list for a solution. (I'm currently not a
member of the mailing list.)


Hope this helps,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Wed, Oct 28, 2009 at 7:12 AM, Artyom Shvedchikov <shoolc@...> wrote:

> Hello, Tatsuya
> Thank you for the fast assistance.
>
> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?
>>
>
> Yes, the HDFS name node status is accessible and data node is available
> through a web browser using url <http://servers-ip:50070/>.
>
> Could you provide some examples when data node does not available in the
> cluster and for the HBase master?
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov
>
>
> On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano
> <tatsuyaml@...>wrote:
>
>> Hi Artyom,
>>
>> Your configuration files look just fine.
>>
>>
>> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
>> datastreamer
>> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
>> file
>> >> /hbase.version could only be replicated to 0 nodes, instead of 1
>>
>> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?
>>
>> Thanks,
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>>
>> On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...>
>> wrote:
>> > Hello.
>> >
>> > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
>> > Hadoop 0.20.1 on such environment:
>> > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200
>> Rpm
>> > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
>> > 0.20.1, HBase 0.20.1
>> >
>> > File */etc/hosts*
>> >
>> >> 127.0.0.1       localhost
>> >>
>> >> # The following lines are desirable for IPv6 capable hosts
>> >> ::1     localhost ip6-localhost ip6-loopback
>> >> fe00::0 ip6-localnet
>> >> ff00::0 ip6-mcastprefix
>> >> ff02::1 ip6-allnodes
>> >> ff02::2 ip6-allrouters
>> >> ff02::3 ip6-allhosts
>> >>
>> > Hadoop and HBase are running in pseudo-distributed mode:
>> > Two options added to *hadoop-env.sh*:
>> >
>> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun
>> >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
>> >>
>> > *core-site.xml*:
>> >
>> >> <configuration>
>> >> <property>
>> >>   <name>fs.default.name</name>
>> >>   <value>hdfs://127.0.0.1:9000</value>
>> >> </property>
>> >> <property>
>> >>   <name>hadoop.tmp.dir</name>
>> >>   <value>/hadoop/tmp/hadoop-${user.name}</value>
>> >>   <description>A base for other temporary directories.</description>
>> >> </property>
>> >> </configuration>
>> >>
>> > *hdfs-site.xml*:
>> >
>> >> <configuration>
>> >>   <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>1</value>
>> >>   </property>
>> >> <property>
>> >>   <name>dfs.name.dir</name>
>> >>   <value>/hadoop/hdfs/name</value>
>> >> </property>
>> >> <property>
>> >>   <name>dfs.data.dir</name>
>> >>   <value>/hadoop/hdfs/data</value>
>> >> </property>
>> >> <property>
>> >>   <name>dfs.datanode.socket.write.timeout</name>
>> >>   <value>0</value>
>> >> </property>
>> >> <property>
>> >>    <name>dfs.datanode.max.xcievers</name>
>> >>    <value>1023</value>
>> >> </property>
>> >> </configuration>
>> >>
>> > *marped-site.xml:*
>> >
>> >> <configuration>
>> >> <property>
>> >>   <name>mapred.job.tracker</name>
>> >>   <value>127.0.0.1:9001</value>
>> >> </property>
>> >> </configuration>
>> >>
>> > *hbase-site.xml:*
>> >
>> >> <configuration>
>> >>   <property>
>> >>     <name>hbase.rootdir</name>
>> >>     <value>hdfs://localhost:9000/</value>
>> >>     <description>The directory shared by region servers.
>> >>     Should be fully-qualified to include the filesystem to use.
>> >>     E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
>> >>     </description>
>> >>   </property>
>> >>   <property>
>> >>     <name>hbase.master</name>
>> >>     <value>127.0.0.1:60000</value>
>> >>     <description>The host and port that the HBase master runs at.
>> >>     </description>
>> >>   </property>
>> >>   <property>
>> >>      <name>hbase.tmp.dir</name>
>> >>      <value>/hadoop/tmp/hbase-${user.name}</value>
>> >>      <description>Temporary directory on the local
>> >> filesystem.</description>
>> >>   </property>
>> >>     <property>
>> >>         <name>hbase.zookeeper.quorum</name>
>> >>         <value>127.0.0.1</value>
>> >>         <description>The directory shared by region servers.
>> >>         </description>
>> >>     </property>
>> >> </configuration>
>> >>
>> >  Hadoop and HBase are running under *hbase *user, all necessary
>> directories
>> > are owned by *hbase *user (I mean */hadoop* directory and all its
>> > subdirectories).
>> >
>> > First launch was successfull, but after several days of work we trapt in
>> > problem that hbase master was down, then we tried to restart it (*
>> > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error:
>> >
>> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
>> datastreamer
>> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
>> file
>> >> /hbase.version could only be replicated to 0 nodes, instead of 1
>> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422)
>> >>
>> >
>> > Then I tried to reformat hdfs (then, also remove all hadoop and hbase
>> data,
>> > then format hdfs again) and start hadoop and hbase again, but HBase
>> master
>> > fails to start with the same error.
>> >
>> > Could someone revise our configuration and tell us what is the reason for
>> > such HBase master instance behaviour?
>> >
>> > Thanks in advance, Artyom
>> > -------------------------------------------------
>> > Best wishes, Artyom Shvedchikov
>> >
>>
>

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Tatsuya.

Yesterday we trapped in the same problem - master was down.

Here is a part of hbase master log after hbase became unavailable through hbase shell and Java hbase client.
 
2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 5.0
2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169, regionname: -ROOT-,,0, startKe$
2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308)
        at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
        at $Proxy2.openScanner(Unknown Source)
        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
        at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
        at org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
        at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
and this repeats to the end of log.

Zookeeper log part:

2009-10-29 02:45:14,138 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b
2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /127.0.0.1:56897 lastZxid 0
2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x124962e1e530013
2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x124962e1e530013 valid:true
2009-10-29 08:34:09,689 WARN org.apache.zookeeper.server.PrepRequestProcessor: Got exception when processing sessionid:0x124962e1e530013 type:create cxid:0x2 zxid:0xfffffffffff$
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
        at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
        at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
and this repeats to the end of log.

HBase became unavailable after we try to scan table with 6 000 000 rows several times.

Hbase Java client log:

Error during table scan: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 127.0.0.1:53169 for region channel_products,,1256660737751, row '', but failed after 10 attempts.
Exceptions:
java.lang.NoClassDefFoundError: org/mortbay/log/Log
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused

HBase shell log:

hbase@localhost:/hadoop$ ./hbase/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Version: 0.20.1, r822817, Wed Oct  7 11:55:42 PDT 2009
hbase(main):001:0> status
1 servers, 0 dead, 5.0000 average load
hbase(main):002:0> list
09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not be reached after 1 tries, giving up.
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /127.0.0.1:53169 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /127.0.0.1:53169 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /127.0.0.1:53169 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /127.0.0.1:53169 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /127.0.0.1:53169 after attempts=1

    from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in `getRegionServerWithRetries'
    from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
    from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
    from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in `listTables'
    from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
    from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
    from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
    from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
    from java/lang/reflect/Method.java:597:in `invoke'
    from org/jruby/javasupport/JavaMethod.java:298:in `invokeWithExceptionHandling'
    from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
    from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
    from org/jruby/runtime/callsite/CachingCallSite.java:253:in `cacheAndCall'
    from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
    from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
    from org/jruby/ast/ForNode.java:104:in `interpret'
... 112 levels...
    from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
    from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in `call'
    from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in `call'
    from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in `call'
    from org/jruby/runtime/callsite/CachingCallSite.java:253:in `cacheAndCall'
    from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
    from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
    from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
    from org/jruby/Ruby.java:577:in `runScript'
    from org/jruby/Ruby.java:480:in `runNormally'
    from org/jruby/Ruby.java:354:in `runFromMain'
    from org/jruby/Main.java:229:in `run'
    from org/jruby/Main.java:110:in `run'
    from org/jruby/Main.java:94:in `main'
    from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
    from (hbase):3hbase(main):003:0>

HDFS name node still available through web interface.

NameNode 'localhost:9000'

Started: Tue Oct 27 03:12:08 EET 2009
Version: 0.20.1, r810220
Compiled: Tue Sep 1 20:55:56 UTC 2009 by oom
Upgrades: There are no upgrades in progress.

Browse the filesystem
Namenode Logs

Cluster Summary

116 files and directories, 98 blocks = 214 total. Heap Size is 10.94 MB / 963 MB (1%)
Configured Capacity : 229.36 GB
DFS Used : 3.04 GB
Non DFS Used : 14.46 GB
DFS Remaining : 211.87 GB
DFS Used% : 1.32 %
DFS Remaining% : 92.37 %
Live Nodes : 1
Dead Nodes : 0


NameNode Storage:

Storage DirectoryTypeState
/hadoop/hdfs/nameIMAGE_AND_EDITSActive


Hadoop, 2009.


Could you check this, maybe some other thoughts will appear.
-------------------------------------------------
Best wishes, Artyom Shvedchikov




hadoop.logs.tar.gz (732K) Download Attachment
hbase.logs.tar.gz (140K) Download Attachment

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Tatsuya.

Yesterday we trapped in the same problem - master was down.

Here is a part of hbase master log after hbase became unavailable through
hbase shell and Java hbase client.


> 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager:
> 1 region servers, 0 dead, average load 5.0
> 2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169,
> regionname: -ROOT-,,0, startKe$
> 2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner:
> Scan ROOT region
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
>         at $Proxy2.openScanner(Unknown Source)
>         at
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
>         at
> org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
>         at
> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
>         at
> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>
and this repeats to the end of log.

Zookeeper log part:

2009-10-29 02:45:14,138 INFO

> org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b
> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Connected to /127.0.0.1:56897 lastZxid 0
> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Creating new session 0x124962e1e530013
> 2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Finished init of 0x124962e1e530013 valid:true
> 2009-10-29 08:34:09,689 WARN
> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
> processing sessionid:0x124962e1e530013 type:create cxid:0x2
> zxid:0xfffffffffff$
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
> NodeExists
>         at
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
>         at
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>
and this repeats to the end of log.

HBase became unavailable after we try to scan table with 6 000 000 rows
several times.

Hbase Java client log:

Error during table scan: java.lang.RuntimeException:

> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server 127.0.0.1:53169 for region channel_products,,1256660737751,
> row '', but failed after 10 attempts.
> Exceptions:
> java.lang.NoClassDefFoundError: org/mortbay/log/Log
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
>

HBase shell log:

hbase@localhost:/hadoop$ ./hbase/bin/hbase shell

> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Version: 0.20.1, r822817, Wed Oct  7 11:55:42 PDT 2009
> hbase(main):001:0> status
> 1 servers, 0 dead, 5.0000 average load
> hbase(main):002:0> list
> 09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> 09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
> be reached after 1 tries, giving up.
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server null for region , row '', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to /127.0.0.1:53169 after attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to /127.0.0.1:53169 after attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to /127.0.0.1:53169 after attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to /127.0.0.1:53169 after attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to /127.0.0.1:53169 after attempts=1
>
>     from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
> `getRegionServerWithRetries'
>     from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>     from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>     from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
> `listTables'
>     from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>     from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>     from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>     from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>     from java/lang/reflect/Method.java:597:in `invoke'
>     from org/jruby/javasupport/JavaMethod.java:298:in
> `invokeWithExceptionHandling'
>     from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>     from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>     from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>     from org/jruby/ast/ForNode.java:104:in `interpret'
> ... 112 levels...
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
>     from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
> `call'
>     from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
> `call'
>     from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
> `call'
>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>     from org/jruby/Ruby.java:577:in `runScript'
>     from org/jruby/Ruby.java:480:in `runNormally'
>     from org/jruby/Ruby.java:354:in `runFromMain'
>     from org/jruby/Main.java:229:in `run'
>     from org/jruby/Main.java:110:in `run'
>     from org/jruby/Main.java:94:in `main'
>     from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
>     from (hbase):3hbase(main):003:0>
>

HDFS name node still available through web interface.
NameNode 'localhost:9000'
  Started: Tue Oct 27 03:12:08 EET 2009  Version: 0.20.1, r810220  Compiled:
Tue Sep 1 20:55:56 UTC 2009 by oom  Upgrades: There are no upgrades in
progress.

*Browse the filesystem <http://77.122.169.205:50070/nn_browsedfscontent.jsp>
*
*Namenode Logs <http://77.122.169.205:50070/logs/>*
------------------------------
Cluster Summary * * * 116 files and directories, 98 blocks = 214 total. Heap
Size is 10.94 MB / 963 MB (1%)
*
  Configured Capacity : 229.36 GB DFS Used : 3.04 GB  Non DFS Used : 14.46
GB DFS Remaining : 211.87 GB  DFS Used% : 1.32 % DFS Remaining% : 92.37 % Live
Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=LIVE> : 1 Dead
Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0

------------------------------
 NameNode Storage:
 *Storage Directory**Type**State*/hadoop/hdfs/nameIMAGE_AND_EDITSActive

------------------------------
Hadoop <http://hadoop.apache.org/core>, 2009.


Could you check this, maybe some other thoughts will appear.

Thanks a lot for your time.
-------------------------------------------------
Best wishes, Artyom Shvedchikov

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Artyom,

I should have made it clear that I was giving you advice only one of
those problem you have had. It seems you have at least three different
problems:

In your first email:
1. HBase master went down after a few days of testing.
2. HBase didn't start again; because of an HDFS error.

In your last email:
3. HBase region server was not responding after trying to scan table
with 6 million rows several times.

And the possible cause and solution I have been telling you was only
for the problem #2, not for others.


So, for the problem #3, your master and client logs tell that the
HBase region server is not responding on port 53169. However it
doesn't tell why it's not responding. You should have region server
log in the logs directory as well, so can you check it if there is any
error message?


Also, in your first mail, you said your server has only 2GB or RAM.

On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> wrote:
> We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
> Hadoop 0.20.1 on such environment:
> *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm
> *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
> 0.20.1, HBase 0.20.1


2GB of RAM is definitely too small to fit entire Hadoop and HBase
clusters. You should be aware that you are trying to run the following
Java processes on your server, and 2GB RAM is too small for them.

1. Hadoop DFS Name Node
2. Hadoop DFS Secondary Name Node
3. Hadoop DFS Data Node
4. ZooKeeper
5. HBase Master
6. HBase Region Server
7. Hadoop Job Tracker
8. Hadoop Task Tracker
9….  Hadoop Map/Reduce processes


I would suggest you to have at least 8GB of RAM for just light-load
testing, and add more servers for heavy-load testing. HBase is memory
intensive and it becomes unstable when it don't have enough memory. I
bet your problem #1 and #3 will magically disappear if you add more
RAM to your server.


So try to check the region server log, and try to consider to add more
RAM to your server.

Thanks,


--
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Thu, Oct 29, 2009 at 7:06 PM, Artyom Shvedchikov <shoolc@...> wrote:

> Hello, Tatsuya.
>
> Yesterday we trapped in the same problem - master was down.
>
> Here is a part of hbase master log after hbase became unavailable through
> hbase shell and Java hbase client.
>
>
>> 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager:
>> 1 region servers, 0 dead, average load 5.0
>> 2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169,
>> regionname: -ROOT-,,0, startKe$
>> 2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner:
>> Scan ROOT region
>> java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
>>         at $Proxy2.openScanner(Unknown Source)
>>         at
>> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
>>         at
>> org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
>>         at
>> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
>>         at
>> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
>>         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>
> and this repeats to the end of log.
>
> Zookeeper log part:
>
> 2009-10-29 02:45:14,138 INFO
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b
>> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Connected to /127.0.0.1:56897 lastZxid 0
>> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Creating new session 0x124962e1e530013
>> 2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Finished init of 0x124962e1e530013 valid:true
>> 2009-10-29 08:34:09,689 WARN
>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> processing sessionid:0x124962e1e530013 type:create cxid:0x2
>> zxid:0xfffffffffff$
>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>> NodeExists
>>         at
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
>>         at
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>>
> and this repeats to the end of log.
>
> HBase became unavailable after we try to scan table with 6 000 000 rows
> several times.
>
> Hbase Java client log:
>
> Error during table scan: java.lang.RuntimeException:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
>> region server 127.0.0.1:53169 for region channel_products,,1256660737751,
>> row '', but failed after 10 attempts.
>> Exceptions:
>> java.lang.NoClassDefFoundError: org/mortbay/log/Log
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>>
>
> HBase shell log:
>
> hbase@localhost:/hadoop$ ./hbase/bin/hbase shell
>> HBase Shell; enter 'help<RETURN>' for list of supported commands.
>> Version: 0.20.1, r822817, Wed Oct  7 11:55:42 PDT 2009
>> hbase(main):001:0> status
>> 1 servers, 0 dead, 5.0000 average load
>> hbase(main):002:0> list
>> 09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying to contact region server null for region , row '', but failed after 5
>> attempts.
>> Exceptions:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>>
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> `getRegionServerWithRetries'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> `listTables'
>>     from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>>     from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>>     from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>>     from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>>     from java/lang/reflect/Method.java:597:in `invoke'
>>     from org/jruby/javasupport/JavaMethod.java:298:in
>> `invokeWithExceptionHandling'
>>     from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>>     from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>>     from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>>     from org/jruby/ast/ForNode.java:104:in `interpret'
>> ... 112 levels...
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
>>     from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>>     from org/jruby/Ruby.java:577:in `runScript'
>>     from org/jruby/Ruby.java:480:in `runNormally'
>>     from org/jruby/Ruby.java:354:in `runFromMain'
>>     from org/jruby/Main.java:229:in `run'
>>     from org/jruby/Main.java:110:in `run'
>>     from org/jruby/Main.java:94:in `main'
>>     from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
>>     from (hbase):3hbase(main):003:0>
>>
>
> HDFS name node still available through web interface.
> NameNode 'localhost:9000'
>  Started: Tue Oct 27 03:12:08 EET 2009  Version: 0.20.1, r810220  Compiled:
> Tue Sep 1 20:55:56 UTC 2009 by oom  Upgrades: There are no upgrades in
> progress.
>
> *Browse the filesystem <http://77.122.169.205:50070/nn_browsedfscontent.jsp>
> *
> *Namenode Logs <http://77.122.169.205:50070/logs/>*
> ------------------------------
> Cluster Summary * * * 116 files and directories, 98 blocks = 214 total. Heap
> Size is 10.94 MB / 963 MB (1%)
> *
>  Configured Capacity : 229.36 GB DFS Used : 3.04 GB  Non DFS Used : 14.46
> GB DFS Remaining : 211.87 GB  DFS Used% : 1.32 % DFS Remaining% : 92.37 % Live
> Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=LIVE> : 1 Dead
> Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0
>
> ------------------------------
>  NameNode Storage:
>  *Storage Directory**Type**State*/hadoop/hdfs/nameIMAGE_AND_EDITSActive
>
> ------------------------------
> Hadoop <http://hadoop.apache.org/core>, 2009.
>
>
> Could you check this, maybe some other thoughts will appear.
>
> Thanks a lot for your time.
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear, Tatsuya

1. Delete hadoop data directory

> 2. bin/hadoop namenode -format
> 3. bin/start-all.sh
>    -> namenode will start immediately and go in service, but data
> node will be making a long (almost seven minutes) pause in a middle of
> the startup.
>
> 4. Before the data node becomes ready, do an HDFS write operation
> (e.g. "bin/hadoop fs -put conf input"), and then the write operations
> will fail with the following error:
>

Today I tried to restart Hadoop and HBase skipping step #1 and step #2.
First I stop HBase, then Hadoop and then start Hadoop, wait for 10 minutes
and start HBase - it works. Data was not lost and was available to read and
etc. Then I tried to scan several times the table with 6 000 000 rows and
HBase hanged down again with the same exceptions as in my previous post (see
post at Thu, 29 Oct, 10:06).

hbase(main):006:0> list

> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 127.0.0.1:57613 for region .META.,,1, row
> '', but failed after 5 attempts.
> Exceptions:
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
> java.net.ConnectException: Connection refused
>
>     from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
> `getRegionServerWithRetries'
>     from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>     from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>     from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
> `listTables'
>     from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>     from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>     from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>     from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>     from java/lang/reflect/Method.java:597:in `invoke'
>     from org/jruby/javasupport/JavaMethod.java:298:in
> `invokeWithExceptionHandling'
>     from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>     from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>     from org/jruby/runtime/callsite/CachingCallSite.java:70:in `call'
>     from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>     from org/jruby/ast/ForNode.java:104:in `interpret'
>     from org/jruby/ast/NewlineNode.java:104:in `interpret'
> ... 110 levels...
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
>     from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
> `call'
>     from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
> `call'
>     from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
> `call'
>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>     from org/jruby/Ruby.java:577:in `runScript'
>     from org/jruby/Ruby.java:480:in `runNormally'
>     from org/jruby/Ruby.java:354:in `runFromMain'
>     from org/jruby/Main.java:229:in `run'
>     from org/jruby/Main.java:110:in `run'
>     from org/jruby/Main.java:94:in `main'
>     from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
>     from (hbase):7hbase(main):007:0> status
> 0 servers, 0 dead, NaN average load
> hbase(main):008:0> exit
>

Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52

The main issue for now is that HBase hangs down each time after I try to
scan the table (after second or third time). By the way, this time it was
enough to restart HBase only. And it was became available to scan/get/put
operations.

Table structure:

hbase(main):003:0> describe 'channel_products'

> DESCRIPTION
> ENABLED
>  {NAME => 'channel_products', FAMILIES => [{NAME => 'active', VERSIONS
> true
>  => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> '6553
>  6', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'channel_cat
>  egory_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
> '2147483647'
>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
> {
>  NAME => 'channel_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
> '
>  2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> =>
>   'true'}, {NAME => 'contract_id', VERSIONS => '3', COMPRESSION =>
> 'NON
>  E', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> B
>  LOCKCACHE => 'true'}, {NAME => 'created_at', VERSIONS => '3',
> COMPRESS
>  ION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
> =>
>   'false', BLOCKCACHE => 'true'}, {NAME => 'shop_category_id',
> VERSIONS
>   => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> '655
>  36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'shop_id',
>  VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647',
> BLOCKSIZE
>   => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'sh
>  op_product_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
> '214748
>  3647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true
>  '}, {NAME => 'updated_at', VERSIONS => '3', COMPRESSION => 'NONE',
> TTL
>   => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCAC
>  HE =>
> 'true'}]}
>
> 1 row(s) in 0.0630 seconds
>

Table contains ~ 6 000 000 rows, each value is a String.

Code to scan the table:

 protected void doGet(HttpServletRequest request, HttpServletResponse
> response) throws ServletException, IOException {
>
  Date startDate = new Date();
>
  Date finishDate;
>
  log(startDate + ": Get activation status started");
>
  String shop_id = request.getParameter("shop_id");
>

>   String[] shop_product_ids =
> request.getParameterValues("shop_product_ids");
>
  if (shop_product_ids != null && shop_product_ids.length == 1) {
>
  shop_product_ids = shop_product_ids[0].split(",");
>
  }
>

>   String channel_id = request.getParameter("channel_id");
>
  String channel_category_id = request.getParameter("channel_category_id");
>

>   String tableName = "channel_products";
>
  StringBuffer result = new StringBuffer("<?xml version=\"1.0\"?>");
>

>   if (this.admin.tableExists(tableName)) {
>
  result.append("<result>");
>

>   HTable table = new HTable(this.configuration, tableName);
>

>   Scan scan = new Scan();
>

>   FilterList mainFilterList = new FilterList();
>

>   if (shop_id != null) {
>
  mainFilterList.addFilter(new
> SingleColumnValueFilter(Bytes.toBytes("shop_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
> Bytes.toBytes(shop_id)));
>
  }
>
  if (channel_id != null) {
>
  mainFilterList.addFilter(new
> SingleColumnValueFilter(Bytes.toBytes("channel_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
> Bytes.toBytes(channel_id)));
>
  }
>
  if (channel_category_id != null) {
>
  mainFilterList.addFilter(new
> SingleColumnValueFilter(Bytes.toBytes("channel_category_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
> Bytes.toBytes(channel_category_id)));
>
  }
>

>   if (shop_product_ids != null && shop_product_ids.length > 0) {
>
  List<Filter> filterList = new ArrayList<Filter>();
>
  for (String shop_product_id : shop_product_ids) {
>
  filterList.add(new
> SingleColumnValueFilter(Bytes.toBytes("shop_product_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
> Bytes.toBytes(shop_product_id)));
>
  }
>
  FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ONE,
> filterList);
>
  mainFilterList.addFilter(filters);
>
  }
>

>   scan.setFilter(mainFilterList);
>
  ResultScanner scanner = null;
>
  try {
>
  scanner = table.getScanner(scan);
>
  for (Result item : scanner) {
>
  getItemXml(result, item);
>
  }
>
  } catch (Exception e) {
>
  logError("Error during table scan: ", e);
>
  result.append("<error>").append("Error during table scan: " +
> e).append("</error>");
>
  } finally {
>
  try {
>
  scanner.close();
>
  } catch (Exception e1) {
>
  //Can be null, skip
>
  }
>
  result.append("</result>");
>
  }
>
  } else {
>
  result.append("<result>").append("Table " + tableName + " not
> exists!").append("</result>");
>
  }
>
  finishDate = new Date();
>
  log(finishDate + ": Get activation status finihed, duration: " +
> (finishDate.getTime() - startDate.getTime()) + " ms");
>

>   response.getOutputStream().print(result.toString());
>
  }
>

I checked regionserver logs, but regionserver was not started:

> 2009-10-29 13:34:13,754 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct
> region server because hbase.cluster.distributed is false

HBase and Hadoop were configured according to "Getting Started" section on *
hadoop.org*. They are both started in Pseudo-distributed mode.
May be I should set this setting *hbase.cluster.distributed* to true?
I'll  try to increase RAM capacity. And then I'll write here about results
-------------------------------------------------
Best wishes, Artyom Shvedchikov

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Artyom,

Thanks for the extra info; it is very helpful.

>> So, for the problem #3, your master and client logs tell that the
>> HBase region server is not responding on port 53169. However it
>> doesn't tell why it's not responding. You should have region server
>> log in the logs directory as well, so can you check it if there is any
>> error message?

> I checked regionserver logs, but regionserver was not started:
>
>> 2009-10-29 13:34:13,754 WARN
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct
>> region server because hbase.cluster.distributed is false

Sorry, I was wrong. You're running HBase in pseudo distribution mode,
so the region server doesn't run as a separate process but is embedded
in the same process that running HBase master. The master server's log
actually contained the region server's log as well.


> Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52

I didn't realize those attachments. So now I checked them, and found
the followings in "hbase-hbase-master-localhost.log.2009-10-28":

==========================================================================
2009-10-28 15:26:36,888 INFO
org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0
dead, average load 5.0
2009-10-28 15:26:37,114 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
scanning meta region {server: 127.0.0.1:53169, regionname: -ROOT-,,0,
startKey: <>}
2009-10-28 15:26:37,135 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
scan of 1 row(s) of meta region {server: 127.0.0.1:53169, regionname:
-ROOT-,,0, startKey: <>} complete
2009-10-28 15:26:37,311 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {server: 127.0.0.1:53169, regionname: .META.,,1,
startKey: <>}
2009-10-28 15:26:37,382 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 3 row(s) of meta region {server: 127.0.0.1:53169, regionname:
.META.,,1, startKey: <>} complete
2009-10-28 15:26:37,382 INFO
org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2009-10-28 15:26:58,670 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-1247814523023719701 lease expired
2009-10-28 15:27:01,847 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping server on 53169
2009-10-28 15:27:01,848 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping IPC Server listener on 53169
2009-10-28 15:27:01,848 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 9 on 53169: exiting
2009-10-28 15:27:01,960 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
2009-10-28 15:27:02,009 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 7 on 53169: exiting
2009-10-28 15:27:02,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 0 on 53169: exiting
2009-10-28 15:27:02,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 8 on 53169: exiting
2009-10-28 15:27:02,026 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping IPC Server Responder
2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 4 on 53169: exiting
2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 6 on 53169: exiting
2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 2 on 53169: exiting
2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 3 on 53169: exiting
2009-10-28 15:27:02,481 INFO org.apache.hadoop.hdfs.DFSClient: Could
not obtain block blk_5009444910783378943_2681 from any node:
java.io.IOException: No live nodes contain current block
2009-10-28 15:27:02,504 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
RegionServer:0.cacheFlusher exiting
2009-10-28 15:27:02,505 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
RegionServer:0.majorCompactionChecker exiting
2009-10-28 15:27:02,505 INFO
org.apache.hadoop.hbase.regionserver.LogFlusher:
RegionServer:0.logFlusher exiting
2009-10-28 15:27:02,505 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
RegionServer:0.compactor exiting
2009-10-28 15:27:02,505 INFO
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2009-10-28 15:27:02,702 INFO org.apache.hadoop.hdfs.DFSClient: Could
not obtain block blk_6684004629716332214_2681 from any node:
java.io.IOException: No live nodes contain current block
2009-10-28 15:27:03,578 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread
exiting
2009-10-28 15:27:14,816 WARN
org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.StoreScanner@3cf294
2009-10-28 15:27:14,828 WARN
org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.StoreScanner@1611ef6
2009-10-28 15:27:14,828 WARN
org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.StoreScanner@137c0c7
2009-10-28 15:27:14,828 WARN
org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.StoreScanner@15151a5
2009-10-28 15:27:14,829 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server Responder, call next(-1247814523023719701, 1) from
127.0.0.1:41489: output error
2009-10-28 15:27:14,830 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 5 on 53169 caught:
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943)

2009-10-28 15:27:14,830 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 5 on 53169: exiting
2009-10-28 15:27:36,888 INFO
org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0
dead, average load 5.0
==========================================================================


Your region server gracefully shut itself down because it detected
lease expiration on a scanner. Lease expiration means it lost a
heartbeat signal from other parts of the cluster, which indicate a
serious failure in the cluster.

A lease could expire by a network problem or a cluster process is
killed or stalled. Since this occurred while the region server was
busy working on a series of SingleColumnValueFilters in a small RAM
environment, there may be some other cluster processes got swapped out
from RAM and almost stalled.

(By the way, it's a kind of confusing that HBase master was still
reporting there is 1 region server available while there isn't. I
guess this is because pseudo distribution mode and the master got
confused.)


> I'll  try to increase RAM capacity. And then I'll write here about results

So adding more RAM to the server will definitely help. Also, I never
tried this and maybe a wrong advice, but you could decrease HBase and
Hadoop processes' Java VM heap size so that they can avoid to be
swapping out from RAM and the leases will be never(?) expired. You can
configure the heap size in "hbase-env.sh" and "hadoop-env.sh", but do
this with a caution -- decreasing the heap size would cause
OutOfMemoryError or serious slow down of the process, and you could
end up with the same (or even worse) situation you have now.


Also, I saw you have many column families in your table "channel_products":

-- active
-- channel_cat
-- channel_id
-- contract_id
-- shop_category_id
-- shop_id
-- shop_product_id
-- created_at
-- updated_at

Try to have less. You can group them together in a fewer number of
column families, so that the region server can access each column
faster and consume less memory. Be aware that different column
families are stored in different files on the disk, so you can
optimize locality of the columns by grouping them together in some
column families.

Good luck,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Thu, Oct 29, 2009 at 9:50 PM, Artyom Shvedchikov <shoolc@...> wrote:

> Dear, Tatsuya
>
> 1. Delete hadoop data directory
>> 2. bin/hadoop namenode -format
>> 3. bin/start-all.sh
>>    -> namenode will start immediately and go in service, but data
>> node will be making a long (almost seven minutes) pause in a middle of
>> the startup.
>>
>> 4. Before the data node becomes ready, do an HDFS write operation
>> (e.g. "bin/hadoop fs -put conf input"), and then the write operations
>> will fail with the following error:
>>
>
> Today I tried to restart Hadoop and HBase skipping step #1 and step #2.
> First I stop HBase, then Hadoop and then start Hadoop, wait for 10 minutes
> and start HBase - it works. Data was not lost and was available to read and
> etc. Then I tried to scan several times the table with 6 000 000 rows and
> HBase hanged down again with the same exceptions as in my previous post (see
> post at Thu, 29 Oct, 10:06).
>
> hbase(main):006:0> list
>> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying to contact region server 127.0.0.1:57613 for region .META.,,1, row
>> '', but failed after 5 attempts.
>> Exceptions:
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>>
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> `getRegionServerWithRetries'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> `listTables'
>>     from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>>     from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>>     from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>>     from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>>     from java/lang/reflect/Method.java:597:in `invoke'
>>     from org/jruby/javasupport/JavaMethod.java:298:in
>> `invokeWithExceptionHandling'
>>     from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>>     from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:70:in `call'
>>     from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>>     from org/jruby/ast/ForNode.java:104:in `interpret'
>>     from org/jruby/ast/NewlineNode.java:104:in `interpret'
>> ... 110 levels...
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
>>     from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>>     from org/jruby/Ruby.java:577:in `runScript'
>>     from org/jruby/Ruby.java:480:in `runNormally'
>>     from org/jruby/Ruby.java:354:in `runFromMain'
>>     from org/jruby/Main.java:229:in `run'
>>     from org/jruby/Main.java:110:in `run'
>>     from org/jruby/Main.java:94:in `main'
>>     from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
>>     from (hbase):7hbase(main):007:0> status
>> 0 servers, 0 dead, NaN average load
>> hbase(main):008:0> exit
>>
>
> Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52
>
> The main issue for now is that HBase hangs down each time after I try to
> scan the table (after second or third time). By the way, this time it was
> enough to restart HBase only. And it was became available to scan/get/put
> operations.
>
> Table structure:
>
> hbase(main):003:0> describe 'channel_products'
>> DESCRIPTION
>> ENABLED
>>  {NAME => 'channel_products', FAMILIES => [{NAME => 'active', VERSIONS
>> true
>>  => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '6553
>>  6', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
>> 'channel_cat
>>  egory_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>> '2147483647'
>>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
>> {
>>  NAME => 'channel_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>> '
>>  2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
>> =>
>>   'true'}, {NAME => 'contract_id', VERSIONS => '3', COMPRESSION =>
>> 'NON
>>  E', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>> B
>>  LOCKCACHE => 'true'}, {NAME => 'created_at', VERSIONS => '3',
>> COMPRESS
>>  ION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
>> =>
>>   'false', BLOCKCACHE => 'true'}, {NAME => 'shop_category_id',
>> VERSIONS
>>   => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '655
>>  36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
>> 'shop_id',
>>  VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647',
>> BLOCKSIZE
>>   => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
>> 'sh
>>  op_product_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>> '214748
>>  3647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
>> 'true
>>  '}, {NAME => 'updated_at', VERSIONS => '3', COMPRESSION => 'NONE',
>> TTL
>>   => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>> BLOCKCAC
>>  HE =>
>> 'true'}]}
>>
>> 1 row(s) in 0.0630 seconds
>>
>
> Table contains ~ 6 000 000 rows, each value is a String.
>
> Code to scan the table:
>
>  protected void doGet(HttpServletRequest request, HttpServletResponse
>> response) throws ServletException, IOException {
>>
>  Date startDate = new Date();
>>
>  Date finishDate;
>>
>  log(startDate + ": Get activation status started");
>>
>  String shop_id = request.getParameter("shop_id");
>>
>
>>   String[] shop_product_ids =
>> request.getParameterValues("shop_product_ids");
>>
>  if (shop_product_ids != null && shop_product_ids.length == 1) {
>>
>  shop_product_ids = shop_product_ids[0].split(",");
>>
>  }
>>
>
>>   String channel_id = request.getParameter("channel_id");
>>
>  String channel_category_id = request.getParameter("channel_category_id");
>>
>
>>   String tableName = "channel_products";
>>
>  StringBuffer result = new StringBuffer("<?xml version=\"1.0\"?>");
>>
>
>>   if (this.admin.tableExists(tableName)) {
>>
>  result.append("<result>");
>>
>
>>   HTable table = new HTable(this.configuration, tableName);
>>
>
>>   Scan scan = new Scan();
>>
>
>>   FilterList mainFilterList = new FilterList();
>>
>
>>   if (shop_id != null) {
>>
>  mainFilterList.addFilter(new
>> SingleColumnValueFilter(Bytes.toBytes("shop_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
>> Bytes.toBytes(shop_id)));
>>
>  }
>>
>  if (channel_id != null) {
>>
>  mainFilterList.addFilter(new
>> SingleColumnValueFilter(Bytes.toBytes("channel_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
>> Bytes.toBytes(channel_id)));
>>
>  }
>>
>  if (channel_category_id != null) {
>>
>  mainFilterList.addFilter(new
>> SingleColumnValueFilter(Bytes.toBytes("channel_category_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
>> Bytes.toBytes(channel_category_id)));
>>
>  }
>>
>
>>   if (shop_product_ids != null && shop_product_ids.length > 0) {
>>
>  List<Filter> filterList = new ArrayList<Filter>();
>>
>  for (String shop_product_id : shop_product_ids) {
>>
>  filterList.add(new
>> SingleColumnValueFilter(Bytes.toBytes("shop_product_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL,
>> Bytes.toBytes(shop_product_id)));
>>
>  }
>>
>  FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ONE,
>> filterList);
>>
>  mainFilterList.addFilter(filters);
>>
>  }
>>
>
>>   scan.setFilter(mainFilterList);
>>
>  ResultScanner scanner = null;
>>
>  try {
>>
>  scanner = table.getScanner(scan);
>>
>  for (Result item : scanner) {
>>
>  getItemXml(result, item);
>>
>  }
>>
>  } catch (Exception e) {
>>
>  logError("Error during table scan: ", e);
>>
>  result.append("<error>").append("Error during table scan: " +
>> e).append("</error>");
>>
>  } finally {
>>
>  try {
>>
>  scanner.close();
>>
>  } catch (Exception e1) {
>>
>  //Can be null, skip
>>
>  }
>>
>  result.append("</result>");
>>
>  }
>>
>  } else {
>>
>  result.append("<result>").append("Table " + tableName + " not
>> exists!").append("</result>");
>>
>  }
>>
>  finishDate = new Date();
>>
>  log(finishDate + ": Get activation status finihed, duration: " +
>> (finishDate.getTime() - startDate.getTime()) + " ms");
>>
>
>>   response.getOutputStream().print(result.toString());
>>
>  }
>>
>
> I checked regionserver logs, but regionserver was not started:
>
>> 2009-10-29 13:34:13,754 WARN
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct
>> region server because hbase.cluster.distributed is false
>
> HBase and Hadoop were configured according to "Getting Started" section on *
> hadoop.org*. They are both started in Pseudo-distributed mode.
> May be I should set this setting *hbase.cluster.distributed* to true?
> I'll  try to increase RAM capacity. And then I'll write here about results
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov
>

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Tatsuya

Thanks a lot for your help.

I'll try both:
1. Increase RAM capacity
2. Decrease heap size

Also I'll try to optimize table structure.
-------------------------------------------------
Best wishes, Artyom Shvedchikov

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Artyom,

> I'll try both:
> 1. Increase RAM capacity
> 2. Decrease heap size
>
> Also I'll try to optimize table structure.

Good luck!

One more thing about the table structure, is channel_products table a
kind of join table in SQL world? If so, you could de-normalize the
table structure and eliminate that table.

Since HBase doesn't provide foreign key index and table join, your
current implementation of looking up the join table results a full
table scan of 3 million records, which will take a few seconds to
complete. If you de-normalize the table structure and eliminate the
join table, the same query could complete in a few milli-seconds and
of course consume much much smaller amount of memory.

So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They
will show you how to eliminate the join table.


--
Tatsuya Kawano (Mr.)
Tokyo, Japan

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
> well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They
> will show you how to eliminate the join table.

Sorry, not FAQ #10, but #20. The link above is a direct link to #20,
so you won't miss it.

Thanks,
Tatsuya



On Fri, Oct 30, 2009 at 8:54 AM, Tatsuya Kawano
<tatsuyaml@...> wrote:

> Hello, Artyom,
>
>> I'll try both:
>> 1. Increase RAM capacity
>> 2. Decrease heap size
>>
>> Also I'll try to optimize table structure.
>
> Good luck!
>
> One more thing about the table structure, is channel_products table a
> kind of join table in SQL world? If so, you could de-normalize the
> table structure and eliminate that table.
>
> Since HBase doesn't provide foreign key index and table join, your
> current implementation of looking up the join table results a full
> table scan of 3 million records, which will take a few seconds to
> complete. If you de-normalize the table structure and eliminate the
> join table, the same query could complete in a few milli-seconds and
> of course consume much much smaller amount of memory.
>
> So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
> well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They
> will show you how to eliminate the join table.
>
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

by shoolc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, Tatsuya

Thank you for all advises, I'll keep them all in mind.
Once I'll receive some results during optimization of hbase instance and
table structure - I'll write here.
-------------------------------------------------
Best wishes, Artyom Shvedchikov


On Fri, Oct 30, 2009 at 2:04 AM, Tatsuya Kawano <tatsuyaml@...>wrote:

> > So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
> > well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They
> > will show you how to eliminate the join table.
>
> Sorry, not FAQ #10, but #20. The link above is a direct link to #20,
> so you won't miss it.
>
> Thanks,
> Tatsuya
>
>
>
> On Fri, Oct 30, 2009 at 8:54 AM, Tatsuya Kawano
> <tatsuyaml@...> wrote:
> > Hello, Artyom,
> >
> >> I'll try both:
> >> 1. Increase RAM capacity
> >> 2. Decrease heap size
> >>
> >> Also I'll try to optimize table structure.
> >
> > Good luck!
> >
> > One more thing about the table structure, is channel_products table a
> > kind of join table in SQL world? If so, you could de-normalize the
> > table structure and eliminate that table.
> >
> > Since HBase doesn't provide foreign key index and table join, your
> > current implementation of looking up the join table results a full
> > table scan of 3 million records, which will take a few seconds to
> > complete. If you de-normalize the table structure and eliminate the
> > join table, the same query could complete in a few milli-seconds and
> > of course consume much much smaller amount of memory.
> >
> > So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
> > well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They
> > will show you how to eliminate the join table.
> >
> >
> > --
> > Tatsuya Kawano (Mr.)
> > Tokyo, Japan
>