|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
HBase 0.20.1 on Ubuntu 9.04: master fails to startHello.
We are testing the latest HBase 0.20.1 in pseudo-distributed mode with Hadoop 0.20.1 on such environment: *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop 0.20.1, HBase 0.20.1 File */etc/hosts* > 127.0.0.1 localhost > > # The following lines are desirable for IPv6 capable hosts > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > ff02::3 ip6-allhosts > Two options added to *hadoop-env.sh*: > export JAVA_HOME=/usr/lib/jvm/java-6-sun > export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true > *core-site.xml*: > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://127.0.0.1:9000</value> > </property> > <property> > <name>hadoop.tmp.dir</name> > <value>/hadoop/tmp/hadoop-${user.name}</value> > <description>A base for other temporary directories.</description> > </property> > </configuration> > > <configuration> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > <property> > <name>dfs.name.dir</name> > <value>/hadoop/hdfs/name</value> > </property> > <property> > <name>dfs.data.dir</name> > <value>/hadoop/hdfs/data</value> > </property> > <property> > <name>dfs.datanode.socket.write.timeout</name> > <value>0</value> > </property> > <property> > <name>dfs.datanode.max.xcievers</name> > <value>1023</value> > </property> > </configuration> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>127.0.0.1:9001</value> > </property> > </configuration> > *hbase-site.xml:* > <configuration> > <property> > <name>hbase.rootdir</name> > <value>hdfs://localhost:9000/</value> > <description>The directory shared by region servers. > Should be fully-qualified to include the filesystem to use. > E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR > </description> > </property> > <property> > <name>hbase.master</name> > <value>127.0.0.1:60000</value> > <description>The host and port that the HBase master runs at. > </description> > </property> > <property> > <name>hbase.tmp.dir</name> > <value>/hadoop/tmp/hbase-${user.name}</value> > <description>Temporary directory on the local > filesystem.</description> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>127.0.0.1</value> > <description>The directory shared by region servers. > </description> > </property> > </configuration> > are owned by *hbase *user (I mean */hadoop* directory and all its subdirectories). First launch was successfull, but after several days of work we trapt in problem that hbase master was down, then we tried to restart it (* stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error: > 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer > exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file > /hbase.version could only be replicated to 0 nodes, instead of 1 at > org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267) > at > org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422) > Then I tried to reformat hdfs (then, also remove all hadoop and hbase data, then format hdfs again) and start hadoop and hbase again, but HBase master fails to start with the same error. Could someone revise our configuration and tell us what is the reason for such HBase master instance behaviour? Thanks in advance, Artyom ------------------------------------------------- Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHi Artyom,
Your configuration files look just fine. >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file >> /hbase.version could only be replicated to 0 nodes, instead of 1 I'm not totally sure, but I think this exception occurs when there is no HDFS data node available in the cluster. Can you access to the HDFS name node status screen at <http://servers-ip:50070/> from a web browser to see if there is a data node available? Thanks, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> wrote: > Hello. > > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with > Hadoop 0.20.1 on such environment: > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop > 0.20.1, HBase 0.20.1 > > File */etc/hosts* > >> 127.0.0.1 localhost >> >> # The following lines are desirable for IPv6 capable hosts >> ::1 localhost ip6-localhost ip6-loopback >> fe00::0 ip6-localnet >> ff00::0 ip6-mcastprefix >> ff02::1 ip6-allnodes >> ff02::2 ip6-allrouters >> ff02::3 ip6-allhosts >> > Hadoop and HBase are running in pseudo-distributed mode: > Two options added to *hadoop-env.sh*: > >> export JAVA_HOME=/usr/lib/jvm/java-6-sun >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true >> > *core-site.xml*: > >> <configuration> >> <property> >> <name>fs.default.name</name> >> <value>hdfs://127.0.0.1:9000</value> >> </property> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>/hadoop/tmp/hadoop-${user.name}</value> >> <description>A base for other temporary directories.</description> >> </property> >> </configuration> >> > *hdfs-site.xml*: > >> <configuration> >> <property> >> <name>dfs.replication</name> >> <value>1</value> >> </property> >> <property> >> <name>dfs.name.dir</name> >> <value>/hadoop/hdfs/name</value> >> </property> >> <property> >> <name>dfs.data.dir</name> >> <value>/hadoop/hdfs/data</value> >> </property> >> <property> >> <name>dfs.datanode.socket.write.timeout</name> >> <value>0</value> >> </property> >> <property> >> <name>dfs.datanode.max.xcievers</name> >> <value>1023</value> >> </property> >> </configuration> >> > *marped-site.xml:* > >> <configuration> >> <property> >> <name>mapred.job.tracker</name> >> <value>127.0.0.1:9001</value> >> </property> >> </configuration> >> > *hbase-site.xml:* > >> <configuration> >> <property> >> <name>hbase.rootdir</name> >> <value>hdfs://localhost:9000/</value> >> <description>The directory shared by region servers. >> Should be fully-qualified to include the filesystem to use. >> E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR >> </description> >> </property> >> <property> >> <name>hbase.master</name> >> <value>127.0.0.1:60000</value> >> <description>The host and port that the HBase master runs at. >> </description> >> </property> >> <property> >> <name>hbase.tmp.dir</name> >> <value>/hadoop/tmp/hbase-${user.name}</value> >> <description>Temporary directory on the local >> filesystem.</description> >> </property> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>127.0.0.1</value> >> <description>The directory shared by region servers. >> </description> >> </property> >> </configuration> >> > Hadoop and HBase are running under *hbase *user, all necessary directories > are owned by *hbase *user (I mean */hadoop* directory and all its > subdirectories). > > First launch was successfull, but after several days of work we trapt in > problem that hbase master was down, then we tried to restart it (* > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error: > >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: datastreamer >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: file >> /hbase.version could only be replicated to 0 nodes, instead of 1 at >> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267) >> at >> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422) >> > > Then I tried to reformat hdfs (then, also remove all hadoop and hbase data, > then format hdfs again) and start hadoop and hbase again, but HBase master > fails to start with the same error. > > Could someone revise our configuration and tell us what is the reason for > such HBase master instance behaviour? > > Thanks in advance, Artyom > ------------------------------------------------- > Best wishes, Artyom Shvedchikov > |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Tatsuya
Thank you for the fast assistance. I'm not totally sure, but I think this exception occurs when there is > no HDFS data node available in the cluster. > > Can you access to the HDFS name node status screen at > <http://servers-ip:50070/> from a web browser to see if there is a > data node available? > Yes, the HDFS name node status is accessible and data node is available through a web browser using url <http://servers-ip:50070/>. Could you provide some examples when data node does not available in the cluster and for the HBase master? ------------------------------------------------- Best wishes, Artyom Shvedchikov On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano <tatsuyaml@...>wrote: > Hi Artyom, > > Your configuration files look just fine. > > > >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: > datastreamer > >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: > file > >> /hbase.version could only be replicated to 0 nodes, instead of 1 > > I'm not totally sure, but I think this exception occurs when there is > no HDFS data node available in the cluster. > > Can you access to the HDFS name node status screen at > <http://servers-ip:50070/> from a web browser to see if there is a > data node available? > > Thanks, > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > > On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> > wrote: > > Hello. > > > > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with > > Hadoop 0.20.1 on such environment: > > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 > Rpm > > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop > > 0.20.1, HBase 0.20.1 > > > > File */etc/hosts* > > > >> 127.0.0.1 localhost > >> > >> # The following lines are desirable for IPv6 capable hosts > >> ::1 localhost ip6-localhost ip6-loopback > >> fe00::0 ip6-localnet > >> ff00::0 ip6-mcastprefix > >> ff02::1 ip6-allnodes > >> ff02::2 ip6-allrouters > >> ff02::3 ip6-allhosts > >> > > Hadoop and HBase are running in pseudo-distributed mode: > > Two options added to *hadoop-env.sh*: > > > >> export JAVA_HOME=/usr/lib/jvm/java-6-sun > >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true > >> > > *core-site.xml*: > > > >> <configuration> > >> <property> > >> <name>fs.default.name</name> > >> <value>hdfs://127.0.0.1:9000</value> > >> </property> > >> <property> > >> <name>hadoop.tmp.dir</name> > >> <value>/hadoop/tmp/hadoop-${user.name}</value> > >> <description>A base for other temporary directories.</description> > >> </property> > >> </configuration> > >> > > *hdfs-site.xml*: > > > >> <configuration> > >> <property> > >> <name>dfs.replication</name> > >> <value>1</value> > >> </property> > >> <property> > >> <name>dfs.name.dir</name> > >> <value>/hadoop/hdfs/name</value> > >> </property> > >> <property> > >> <name>dfs.data.dir</name> > >> <value>/hadoop/hdfs/data</value> > >> </property> > >> <property> > >> <name>dfs.datanode.socket.write.timeout</name> > >> <value>0</value> > >> </property> > >> <property> > >> <name>dfs.datanode.max.xcievers</name> > >> <value>1023</value> > >> </property> > >> </configuration> > >> > > *marped-site.xml:* > > > >> <configuration> > >> <property> > >> <name>mapred.job.tracker</name> > >> <value>127.0.0.1:9001</value> > >> </property> > >> </configuration> > >> > > *hbase-site.xml:* > > > >> <configuration> > >> <property> > >> <name>hbase.rootdir</name> > >> <value>hdfs://localhost:9000/</value> > >> <description>The directory shared by region servers. > >> Should be fully-qualified to include the filesystem to use. > >> E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR > >> </description> > >> </property> > >> <property> > >> <name>hbase.master</name> > >> <value>127.0.0.1:60000</value> > >> <description>The host and port that the HBase master runs at. > >> </description> > >> </property> > >> <property> > >> <name>hbase.tmp.dir</name> > >> <value>/hadoop/tmp/hbase-${user.name}</value> > >> <description>Temporary directory on the local > >> filesystem.</description> > >> </property> > >> <property> > >> <name>hbase.zookeeper.quorum</name> > >> <value>127.0.0.1</value> > >> <description>The directory shared by region servers. > >> </description> > >> </property> > >> </configuration> > >> > > Hadoop and HBase are running under *hbase *user, all necessary > directories > > are owned by *hbase *user (I mean */hadoop* directory and all its > > subdirectories). > > > > First launch was successfull, but after several days of work we trapt in > > problem that hbase master was down, then we tried to restart it (* > > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error: > > > >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: > datastreamer > >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: > file > >> /hbase.version could only be replicated to 0 nodes, instead of 1 > at > >> > org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267) > >> at > >> > org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422) > >> > > > > Then I tried to reformat hdfs (then, also remove all hadoop and hbase > data, > > then format hdfs again) and start hadoop and hbase again, but HBase > master > > fails to start with the same error. > > > > Could someone revise our configuration and tell us what is the reason for > > such HBase master instance behaviour? > > > > Thanks in advance, Artyom > > ------------------------------------------------- > > Best wishes, Artyom Shvedchikov > > > |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHi Artyom,
>> I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available? > Yes, the HDFS name node status is accessible and data node is available > through a web browser using url <http://servers-ip:50070/>. > > Could you provide some examples when data node does not available in the > cluster and for the HBase master? I happen to have an Ubuntu 9.04 virtual server installation, so I set up HDFS on it to see if I can reproduce the exception you had. And I found I can easily reproduce this by the following steps: 1. Delete hadoop data directory 2. bin/hadoop namenode -format 3. bin/start-all.sh -> namenode will start immediately and go in service, but data node will be making a long (almost seven minutes) pause in a middle of the startup. 4. Before the data node becomes ready, do an HDFS write operation (e.g. "bin/hadoop fs -put conf input"), and then the write operations will fail with the following error: ------------------------------------------------ 09/10/28 09:00:19 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tatsuya/input/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 ... 09/10/28 09:00:19 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 09/10/28 09:00:19 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/tatsuya/input/capacity-scheduler.xml" - Aborting... ------------------------------------------------ This doesn't seem to be a desired behavior of HDFS; shouldn't HDFS be in the safe mode while data node is not ready? Also, if I skip step #1 and 2, the problem doesn't happen. The data node still does the long pause at startup, but HDFS cluster will start in the safe mode and wait for the data node to become ready. HBase deals with HDFS safe mode, so HBase should work fine in this case. Can you check if this is your case? If so, you can avoid this by not running "start-hbase.sh" until HDFS has the data nodes available. I have done a little more investigation why the data node makes the long pause on Ubuntu 9.04. It seems there is a problem with SUN JRE SecureRandom implementation on Linux, and this causes Jetty (used in the data node) to slow down to create its session ID manager. Here is the data node log, with a seven-minute pause while it's trying to start Jetty. ------------------------------------------------ 2009-10-28 09:00:10,559 INFO org.mortbay.log: jetty-6.1.14 2009-10-28 09:06:54,165 INFO org.mortbay.log: Started SelectChannelConnector@...:50075 ------------------------------------------------ Here is a part of a full thread dump; sun.security.provider.SecureRandom is taking long time (forever?) to finish. ------------------------------------------------ "main" prio=10 tid=0x00000000409a8800 nid=0xba2 runnable [0x00007ff762a32000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) ... - locked <0x00007ff749edfbb8> (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:123) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:118) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) - locked <0x00007ff749edf388> (a sun.security.provider.SecureRandom) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) - locked <0x00007ff749edf6b8> (a java.security.SecureRandom) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139) ... at org.apache.hadoop.http.HttpServer.start(HttpServer.java:460) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:375) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) ------------------------------------------------ And I found this is a known issue on Jetty: http://jira.codehaus.org/browse/JETTY-331 It says you could workaround by changing Jetty setting to use "java.util.Random" instead of "sun.security.provider.SecureRandom". I don't know if this is a correct way to workaround. I'd better ask HDFS folks at hdfs-user mailing list for a solution. (I'm currently not a member of the mailing list.) Hope this helps, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Wed, Oct 28, 2009 at 7:12 AM, Artyom Shvedchikov <shoolc@...> wrote: > Hello, Tatsuya > Thank you for the fast assistance. > > I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available? >> > > Yes, the HDFS name node status is accessible and data node is available > through a web browser using url <http://servers-ip:50070/>. > > Could you provide some examples when data node does not available in the > cluster and for the HBase master? > ------------------------------------------------- > Best wishes, Artyom Shvedchikov > > > On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano > <tatsuyaml@...>wrote: > >> Hi Artyom, >> >> Your configuration files look just fine. >> >> >> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: >> datastreamer >> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: >> file >> >> /hbase.version could only be replicated to 0 nodes, instead of 1 >> >> I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available? >> >> Thanks, >> >> -- >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> >> On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> >> wrote: >> > Hello. >> > >> > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with >> > Hadoop 0.20.1 on such environment: >> > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 >> Rpm >> > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop >> > 0.20.1, HBase 0.20.1 >> > >> > File */etc/hosts* >> > >> >> 127.0.0.1 localhost >> >> >> >> # The following lines are desirable for IPv6 capable hosts >> >> ::1 localhost ip6-localhost ip6-loopback >> >> fe00::0 ip6-localnet >> >> ff00::0 ip6-mcastprefix >> >> ff02::1 ip6-allnodes >> >> ff02::2 ip6-allrouters >> >> ff02::3 ip6-allhosts >> >> >> > Hadoop and HBase are running in pseudo-distributed mode: >> > Two options added to *hadoop-env.sh*: >> > >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun >> >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true >> >> >> > *core-site.xml*: >> > >> >> <configuration> >> >> <property> >> >> <name>fs.default.name</name> >> >> <value>hdfs://127.0.0.1:9000</value> >> >> </property> >> >> <property> >> >> <name>hadoop.tmp.dir</name> >> >> <value>/hadoop/tmp/hadoop-${user.name}</value> >> >> <description>A base for other temporary directories.</description> >> >> </property> >> >> </configuration> >> >> >> > *hdfs-site.xml*: >> > >> >> <configuration> >> >> <property> >> >> <name>dfs.replication</name> >> >> <value>1</value> >> >> </property> >> >> <property> >> >> <name>dfs.name.dir</name> >> >> <value>/hadoop/hdfs/name</value> >> >> </property> >> >> <property> >> >> <name>dfs.data.dir</name> >> >> <value>/hadoop/hdfs/data</value> >> >> </property> >> >> <property> >> >> <name>dfs.datanode.socket.write.timeout</name> >> >> <value>0</value> >> >> </property> >> >> <property> >> >> <name>dfs.datanode.max.xcievers</name> >> >> <value>1023</value> >> >> </property> >> >> </configuration> >> >> >> > *marped-site.xml:* >> > >> >> <configuration> >> >> <property> >> >> <name>mapred.job.tracker</name> >> >> <value>127.0.0.1:9001</value> >> >> </property> >> >> </configuration> >> >> >> > *hbase-site.xml:* >> > >> >> <configuration> >> >> <property> >> >> <name>hbase.rootdir</name> >> >> <value>hdfs://localhost:9000/</value> >> >> <description>The directory shared by region servers. >> >> Should be fully-qualified to include the filesystem to use. >> >> E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.master</name> >> >> <value>127.0.0.1:60000</value> >> >> <description>The host and port that the HBase master runs at. >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.tmp.dir</name> >> >> <value>/hadoop/tmp/hbase-${user.name}</value> >> >> <description>Temporary directory on the local >> >> filesystem.</description> >> >> </property> >> >> <property> >> >> <name>hbase.zookeeper.quorum</name> >> >> <value>127.0.0.1</value> >> >> <description>The directory shared by region servers. >> >> </description> >> >> </property> >> >> </configuration> >> >> >> > Hadoop and HBase are running under *hbase *user, all necessary >> directories >> > are owned by *hbase *user (I mean */hadoop* directory and all its >> > subdirectories). >> > >> > First launch was successfull, but after several days of work we trapt in >> > problem that hbase master was down, then we tried to restart it (* >> > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error: >> > >> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: >> datastreamer >> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: >> file >> >> /hbase.version could only be replicated to 0 nodes, instead of 1 >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267) >> >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422) >> >> >> > >> > Then I tried to reformat hdfs (then, also remove all hadoop and hbase >> data, >> > then format hdfs again) and start hadoop and hbase again, but HBase >> master >> > fails to start with the same error. >> > >> > Could someone revise our configuration and tell us what is the reason for >> > such HBase master instance behaviour? >> > >> > Thanks in advance, Artyom >> > ------------------------------------------------- >> > Best wishes, Artyom Shvedchikov >> > >> > |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Tatsuya.
Yesterday we trapped in the same problem - master was down. Here is a part of hbase master log after hbase became unavailable through hbase shell and Java hbase client. 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 5.0and this repeats to the end of log. Zookeeper log part: 2009-10-29 02:45:14,138 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17band this repeats to the end of log. HBase became unavailable after we try to scan table with 6 000 000 rows several times. Hbase Java client log: Error during table scan: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 127.0.0.1:53169 for region channel_products,,1256660737751, row '', but failed after 10 attempts. HBase shell log: hbase@localhost:/hadoop$ ./hbase/bin/hbase shell HDFS name node still available through web interface. NameNode 'localhost:9000'
Browse the filesystem Namenode Logs Cluster Summary116 files and directories, 98 blocks = 214 total. Heap Size is 10.94 MB / 963 MB (1%)
NameNode Storage:
Hadoop, 2009. Could you check this, maybe some other thoughts will appear. ------------------------------------------------- Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Tatsuya.
Yesterday we trapped in the same problem - master was down. Here is a part of hbase master log after hbase became unavailable through hbase shell and Java hbase client. > 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager: > 1 region servers, 0 dead, average load 5.0 > 2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169, > regionname: -ROOT-,,0, startKe$ > 2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner: > Scan ROOT region > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831) > at > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328) > at $Proxy2.openScanner(Unknown Source) > at > org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160) > at > org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54) > at > org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79) > at > org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136) > at org.apache.hadoop.hbase.Chore.run(Chore.java:68) > Zookeeper log part: 2009-10-29 02:45:14,138 INFO > org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b > 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: > Connected to /127.0.0.1:56897 lastZxid 0 > 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: > Creating new session 0x124962e1e530013 > 2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn: > Finished init of 0x124962e1e530013 valid:true > 2009-10-29 08:34:09,689 WARN > org.apache.zookeeper.server.PrepRequestProcessor: Got exception when > processing sessionid:0x124962e1e530013 type:create cxid:0x2 > zxid:0xfffffffffff$ > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) > HBase became unavailable after we try to scan table with 6 000 000 rows several times. Hbase Java client log: Error during table scan: java.lang.RuntimeException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server 127.0.0.1:53169 for region channel_products,,1256660737751, > row '', but failed after 10 attempts. > Exceptions: > java.lang.NoClassDefFoundError: org/mortbay/log/Log > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > HBase shell log: hbase@localhost:/hadoop$ ./hbase/bin/hbase shell > HBase Shell; enter 'help<RETURN>' for list of supported commands. > Version: 0.20.1, r822817, Wed Oct 7 11:55:42 PDT 2009 > hbase(main):001:0> status > 1 servers, 0 dead, 5.0000 average load > hbase(main):002:0> list > 09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > 09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not > be reached after 1 tries, giving up. > NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Trying to contact region server null for region , row '', but failed after 5 > attempts. > Exceptions: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to /127.0.0.1:53169 after attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to /127.0.0.1:53169 after attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to /127.0.0.1:53169 after attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to /127.0.0.1:53169 after attempts=1 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy to /127.0.0.1:53169 after attempts=1 > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in > `getRegionServerWithRetries' > from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' > from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in > `listTables' > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' > from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' > from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' > from java/lang/reflect/Method.java:597:in `invoke' > from org/jruby/javasupport/JavaMethod.java:298:in > `invokeWithExceptionHandling' > from org/jruby/javasupport/JavaMethod.java:259:in `invoke' > from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' > from org/jruby/runtime/callsite/CachingCallSite.java:253:in > `cacheAndCall' > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' > from org/jruby/ast/CallNoArgNode.java:61:in `interpret' > from org/jruby/ast/ForNode.java:104:in `interpret' > ... 112 levels... > from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call' > from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in > `call' > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in > `call' > from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in > `call' > from org/jruby/runtime/callsite/CachingCallSite.java:253:in > `cacheAndCall' > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' > from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__' > from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load' > from org/jruby/Ruby.java:577:in `runScript' > from org/jruby/Ruby.java:480:in `runNormally' > from org/jruby/Ruby.java:354:in `runFromMain' > from org/jruby/Main.java:229:in `run' > from org/jruby/Main.java:110:in `run' > from org/jruby/Main.java:94:in `main' > from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list' > from (hbase):3hbase(main):003:0> > HDFS name node still available through web interface. NameNode 'localhost:9000' Started: Tue Oct 27 03:12:08 EET 2009 Version: 0.20.1, r810220 Compiled: Tue Sep 1 20:55:56 UTC 2009 by oom Upgrades: There are no upgrades in progress. *Browse the filesystem <http://77.122.169.205:50070/nn_browsedfscontent.jsp> * *Namenode Logs <http://77.122.169.205:50070/logs/>* ------------------------------ Cluster Summary * * * 116 files and directories, 98 blocks = 214 total. Heap Size is 10.94 MB / 963 MB (1%) * Configured Capacity : 229.36 GB DFS Used : 3.04 GB Non DFS Used : 14.46 GB DFS Remaining : 211.87 GB DFS Used% : 1.32 % DFS Remaining% : 92.37 % Live Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=LIVE> : 1 Dead Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0 ------------------------------ NameNode Storage: *Storage Directory**Type**State*/hadoop/hdfs/nameIMAGE_AND_EDITSActive ------------------------------ Hadoop <http://hadoop.apache.org/core>, 2009. Could you check this, maybe some other thoughts will appear. Thanks a lot for your time. ------------------------------------------------- Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHi Artyom,
I should have made it clear that I was giving you advice only one of those problem you have had. It seems you have at least three different problems: In your first email: 1. HBase master went down after a few days of testing. 2. HBase didn't start again; because of an HDFS error. In your last email: 3. HBase region server was not responding after trying to scan table with 6 million rows several times. And the possible cause and solution I have been telling you was only for the problem #2, not for others. So, for the problem #3, your master and client logs tell that the HBase region server is not responding on port 53169. However it doesn't tell why it's not responding. You should have region server log in the logs directory as well, so can you check it if there is any error message? Also, in your first mail, you said your server has only 2GB or RAM. On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@...> wrote: > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with > Hadoop 0.20.1 on such environment: > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop > 0.20.1, HBase 0.20.1 2GB of RAM is definitely too small to fit entire Hadoop and HBase clusters. You should be aware that you are trying to run the following Java processes on your server, and 2GB RAM is too small for them. 1. Hadoop DFS Name Node 2. Hadoop DFS Secondary Name Node 3. Hadoop DFS Data Node 4. ZooKeeper 5. HBase Master 6. HBase Region Server 7. Hadoop Job Tracker 8. Hadoop Task Tracker 9…. Hadoop Map/Reduce processes I would suggest you to have at least 8GB of RAM for just light-load testing, and add more servers for heavy-load testing. HBase is memory intensive and it becomes unstable when it don't have enough memory. I bet your problem #1 and #3 will magically disappear if you add more RAM to your server. So try to check the region server log, and try to consider to add more RAM to your server. Thanks, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Thu, Oct 29, 2009 at 7:06 PM, Artyom Shvedchikov <shoolc@...> wrote: > Hello, Tatsuya. > > Yesterday we trapped in the same problem - master was down. > > Here is a part of hbase master log after hbase became unavailable through > hbase shell and Java hbase client. > > >> 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager: >> 1 region servers, 0 dead, average load 5.0 >> 2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169, >> regionname: -ROOT-,,0, startKe$ >> 2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner: >> Scan ROOT region >> java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831) >> at >> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) >> at >> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328) >> at $Proxy2.openScanner(Unknown Source) >> at >> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160) >> at >> org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54) >> at >> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79) >> at >> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136) >> at org.apache.hadoop.hbase.Chore.run(Chore.java:68) >> > and this repeats to the end of log. > > Zookeeper log part: > > 2009-10-29 02:45:14,138 INFO >> org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b >> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: >> Connected to /127.0.0.1:56897 lastZxid 0 >> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn: >> Creating new session 0x124962e1e530013 >> 2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn: >> Finished init of 0x124962e1e530013 valid:true >> 2009-10-29 08:34:09,689 WARN >> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when >> processing sessionid:0x124962e1e530013 type:create cxid:0x2 >> zxid:0xfffffffffff$ >> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = >> NodeExists >> at >> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245) >> at >> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) >> > and this repeats to the end of log. > > HBase became unavailable after we try to scan table with 6 000 000 rows > several times. > > Hbase Java client log: > > Error during table scan: java.lang.RuntimeException: >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact >> region server 127.0.0.1:53169 for region channel_products,,1256660737751, >> row '', but failed after 10 attempts. >> Exceptions: >> java.lang.NoClassDefFoundError: org/mortbay/log/Log >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> > > HBase shell log: > > hbase@localhost:/hadoop$ ./hbase/bin/hbase shell >> HBase Shell; enter 'help<RETURN>' for list of supported commands. >> Version: 0.20.1, r822817, Wed Oct 7 11:55:42 PDT 2009 >> hbase(main):001:0> status >> 1 servers, 0 dead, 5.0000 average load >> hbase(main):002:0> list >> 09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> 09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not >> be reached after 1 tries, giving up. >> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> Trying to contact region server null for region , row '', but failed after 5 >> attempts. >> Exceptions: >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up >> proxy to /127.0.0.1:53169 after attempts=1 >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up >> proxy to /127.0.0.1:53169 after attempts=1 >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up >> proxy to /127.0.0.1:53169 after attempts=1 >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up >> proxy to /127.0.0.1:53169 after attempts=1 >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up >> proxy to /127.0.0.1:53169 after attempts=1 >> >> from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in >> `getRegionServerWithRetries' >> from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' >> from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' >> from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in >> `listTables' >> from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' >> from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' >> from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' >> from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' >> from java/lang/reflect/Method.java:597:in `invoke' >> from org/jruby/javasupport/JavaMethod.java:298:in >> `invokeWithExceptionHandling' >> from org/jruby/javasupport/JavaMethod.java:259:in `invoke' >> from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' >> from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> `cacheAndCall' >> from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> from org/jruby/ast/CallNoArgNode.java:61:in `interpret' >> from org/jruby/ast/ForNode.java:104:in `interpret' >> ... 112 levels... >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call' >> from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in >> `call' >> from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in >> `call' >> from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in >> `call' >> from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> `cacheAndCall' >> from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__' >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load' >> from org/jruby/Ruby.java:577:in `runScript' >> from org/jruby/Ruby.java:480:in `runNormally' >> from org/jruby/Ruby.java:354:in `runFromMain' >> from org/jruby/Main.java:229:in `run' >> from org/jruby/Main.java:110:in `run' >> from org/jruby/Main.java:94:in `main' >> from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list' >> from (hbase):3hbase(main):003:0> >> > > HDFS name node still available through web interface. > NameNode 'localhost:9000' > Started: Tue Oct 27 03:12:08 EET 2009 Version: 0.20.1, r810220 Compiled: > Tue Sep 1 20:55:56 UTC 2009 by oom Upgrades: There are no upgrades in > progress. > > *Browse the filesystem <http://77.122.169.205:50070/nn_browsedfscontent.jsp> > * > *Namenode Logs <http://77.122.169.205:50070/logs/>* > ------------------------------ > Cluster Summary * * * 116 files and directories, 98 blocks = 214 total. Heap > Size is 10.94 MB / 963 MB (1%) > * > Configured Capacity : 229.36 GB DFS Used : 3.04 GB Non DFS Used : 14.46 > GB DFS Remaining : 211.87 GB DFS Used% : 1.32 % DFS Remaining% : 92.37 % Live > Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=LIVE> : 1 Dead > Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0 > > ------------------------------ > NameNode Storage: > *Storage Directory**Type**State*/hadoop/hdfs/nameIMAGE_AND_EDITSActive > > ------------------------------ > Hadoop <http://hadoop.apache.org/core>, 2009. > > > Could you check this, maybe some other thoughts will appear. > > Thanks a lot for your time. > ------------------------------------------------- > Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startDear, Tatsuya
1. Delete hadoop data directory > 2. bin/hadoop namenode -format > 3. bin/start-all.sh > -> namenode will start immediately and go in service, but data > node will be making a long (almost seven minutes) pause in a middle of > the startup. > > 4. Before the data node becomes ready, do an HDFS write operation > (e.g. "bin/hadoop fs -put conf input"), and then the write operations > will fail with the following error: > Today I tried to restart Hadoop and HBase skipping step #1 and step #2. First I stop HBase, then Hadoop and then start Hadoop, wait for 10 minutes and start HBase - it works. Data was not lost and was available to read and etc. Then I tried to scan several times the table with 6 000 000 rows and HBase hanged down again with the same exceptions as in my previous post (see post at Thu, 29 Oct, 10:06). hbase(main):006:0> list > NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Trying to contact region server 127.0.0.1:57613 for region .META.,,1, row > '', but failed after 5 attempts. > Exceptions: > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in > `getRegionServerWithRetries' > from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' > from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in > `listTables' > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' > from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' > from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' > from java/lang/reflect/Method.java:597:in `invoke' > from org/jruby/javasupport/JavaMethod.java:298:in > `invokeWithExceptionHandling' > from org/jruby/javasupport/JavaMethod.java:259:in `invoke' > from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' > from org/jruby/runtime/callsite/CachingCallSite.java:70:in `call' > from org/jruby/ast/CallNoArgNode.java:61:in `interpret' > from org/jruby/ast/ForNode.java:104:in `interpret' > from org/jruby/ast/NewlineNode.java:104:in `interpret' > ... 110 levels... > from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call' > from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in > `call' > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in > `call' > from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in > `call' > from org/jruby/runtime/callsite/CachingCallSite.java:253:in > `cacheAndCall' > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' > from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__' > from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load' > from org/jruby/Ruby.java:577:in `runScript' > from org/jruby/Ruby.java:480:in `runNormally' > from org/jruby/Ruby.java:354:in `runFromMain' > from org/jruby/Main.java:229:in `run' > from org/jruby/Main.java:110:in `run' > from org/jruby/Main.java:94:in `main' > from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list' > from (hbase):7hbase(main):007:0> status > 0 servers, 0 dead, NaN average load > hbase(main):008:0> exit > Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52 The main issue for now is that HBase hangs down each time after I try to scan the table (after second or third time). By the way, this time it was enough to restart HBase only. And it was became available to scan/get/put operations. Table structure: hbase(main):003:0> describe 'channel_products' > DESCRIPTION > ENABLED > {NAME => 'channel_products', FAMILIES => [{NAME => 'active', VERSIONS > true > => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => > '6553 > 6', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'channel_cat > egory_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => > '2147483647' > , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, > { > NAME => 'channel_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => > ' > 2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE > => > 'true'}, {NAME => 'contract_id', VERSIONS => '3', COMPRESSION => > 'NON > E', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', > B > LOCKCACHE => 'true'}, {NAME => 'created_at', VERSIONS => '3', > COMPRESS > ION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY > => > 'false', BLOCKCACHE => 'true'}, {NAME => 'shop_category_id', > VERSIONS > => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => > '655 > 36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'shop_id', > VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', > BLOCKSIZE > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'sh > op_product_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => > '214748 > 3647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true > '}, {NAME => 'updated_at', VERSIONS => '3', COMPRESSION => 'NONE', > TTL > => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', > BLOCKCAC > HE => > 'true'}]} > > 1 row(s) in 0.0630 seconds > Table contains ~ 6 000 000 rows, each value is a String. Code to scan the table: protected void doGet(HttpServletRequest request, HttpServletResponse > response) throws ServletException, IOException { > Date startDate = new Date(); > Date finishDate; > log(startDate + ": Get activation status started"); > String shop_id = request.getParameter("shop_id"); > > String[] shop_product_ids = > request.getParameterValues("shop_product_ids"); > if (shop_product_ids != null && shop_product_ids.length == 1) { > shop_product_ids = shop_product_ids[0].split(","); > } > > String channel_id = request.getParameter("channel_id"); > String channel_category_id = request.getParameter("channel_category_id"); > > String tableName = "channel_products"; > StringBuffer result = new StringBuffer("<?xml version=\"1.0\"?>"); > > if (this.admin.tableExists(tableName)) { > result.append("<result>"); > > HTable table = new HTable(this.configuration, tableName); > > Scan scan = new Scan(); > > FilterList mainFilterList = new FilterList(); > > if (shop_id != null) { > mainFilterList.addFilter(new > SingleColumnValueFilter(Bytes.toBytes("shop_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, > Bytes.toBytes(shop_id))); > } > if (channel_id != null) { > mainFilterList.addFilter(new > SingleColumnValueFilter(Bytes.toBytes("channel_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, > Bytes.toBytes(channel_id))); > } > if (channel_category_id != null) { > mainFilterList.addFilter(new > SingleColumnValueFilter(Bytes.toBytes("channel_category_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, > Bytes.toBytes(channel_category_id))); > } > > if (shop_product_ids != null && shop_product_ids.length > 0) { > List<Filter> filterList = new ArrayList<Filter>(); > for (String shop_product_id : shop_product_ids) { > filterList.add(new > SingleColumnValueFilter(Bytes.toBytes("shop_product_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, > Bytes.toBytes(shop_product_id))); > } > FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ONE, > filterList); > mainFilterList.addFilter(filters); > } > > scan.setFilter(mainFilterList); > ResultScanner scanner = null; > try { > scanner = table.getScanner(scan); > for (Result item : scanner) { > getItemXml(result, item); > } > } catch (Exception e) { > logError("Error during table scan: ", e); > result.append("<error>").append("Error during table scan: " + > e).append("</error>"); > } finally { > try { > scanner.close(); > } catch (Exception e1) { > //Can be null, skip > } > result.append("</result>"); > } > } else { > result.append("<result>").append("Table " + tableName + " not > exists!").append("</result>"); > } > finishDate = new Date(); > log(finishDate + ": Get activation status finihed, duration: " + > (finishDate.getTime() - startDate.getTime()) + " ms"); > > response.getOutputStream().print(result.toString()); > } > I checked regionserver logs, but regionserver was not started: > 2009-10-29 13:34:13,754 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct > region server because hbase.cluster.distributed is false HBase and Hadoop were configured according to "Getting Started" section on * hadoop.org*. They are both started in Pseudo-distributed mode. May be I should set this setting *hbase.cluster.distributed* to true? I'll try to increase RAM capacity. And then I'll write here about results ------------------------------------------------- Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHi Artyom,
Thanks for the extra info; it is very helpful. >> So, for the problem #3, your master and client logs tell that the >> HBase region server is not responding on port 53169. However it >> doesn't tell why it's not responding. You should have region server >> log in the logs directory as well, so can you check it if there is any >> error message? > I checked regionserver logs, but regionserver was not started: > >> 2009-10-29 13:34:13,754 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct >> region server because hbase.cluster.distributed is false Sorry, I was wrong. You're running HBase in pseudo distribution mode, so the region server doesn't run as a separate process but is embedded in the same process that running HBase master. The master server's log actually contained the region server's log as well. > Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52 I didn't realize those attachments. So now I checked them, and found the followings in "hbase-hbase-master-localhost.log.2009-10-28": ========================================================================== 2009-10-28 15:26:36,888 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 5.0 2009-10-28 15:26:37,114 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169, regionname: -ROOT-,,0, startKey: <>} 2009-10-28 15:26:37,135 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 127.0.0.1:53169, regionname: -ROOT-,,0, startKey: <>} complete 2009-10-28 15:26:37,311 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 127.0.0.1:53169, regionname: .META.,,1, startKey: <>} 2009-10-28 15:26:37,382 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 3 row(s) of meta region {server: 127.0.0.1:53169, regionname: .META.,,1, startKey: <>} complete 2009-10-28 15:26:37,382 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2009-10-28 15:26:58,670 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -1247814523023719701 lease expired 2009-10-28 15:27:01,847 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 53169 2009-10-28 15:27:01,848 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 53169 2009-10-28 15:27:01,848 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 53169: exiting 2009-10-28 15:27:01,960 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer 2009-10-28 15:27:02,009 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 53169: exiting 2009-10-28 15:27:02,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 53169: exiting 2009-10-28 15:27:02,025 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 53169: exiting 2009-10-28 15:27:02,026 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 53169: exiting 2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 53169: exiting 2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 53169: exiting 2009-10-28 15:27:02,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 53169: exiting 2009-10-28 15:27:02,481 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_5009444910783378943_2681 from any node: java.io.IOException: No live nodes contain current block 2009-10-28 15:27:02,504 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: RegionServer:0.cacheFlusher exiting 2009-10-28 15:27:02,505 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: RegionServer:0.majorCompactionChecker exiting 2009-10-28 15:27:02,505 INFO org.apache.hadoop.hbase.regionserver.LogFlusher: RegionServer:0.logFlusher exiting 2009-10-28 15:27:02,505 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: RegionServer:0.compactor exiting 2009-10-28 15:27:02,505 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2009-10-28 15:27:02,702 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_6684004629716332214_2681 from any node: java.io.IOException: No live nodes contain current block 2009-10-28 15:27:03,578 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2009-10-28 15:27:14,816 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@3cf294 2009-10-28 15:27:14,828 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@1611ef6 2009-10-28 15:27:14,828 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@137c0c7 2009-10-28 15:27:14,828 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@15151a5 2009-10-28 15:27:14,829 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-1247814523023719701, 1) from 127.0.0.1:41489: output error 2009-10-28 15:27:14,830 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 53169 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) 2009-10-28 15:27:14,830 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 53169: exiting 2009-10-28 15:27:36,888 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 5.0 ========================================================================== Your region server gracefully shut itself down because it detected lease expiration on a scanner. Lease expiration means it lost a heartbeat signal from other parts of the cluster, which indicate a serious failure in the cluster. A lease could expire by a network problem or a cluster process is killed or stalled. Since this occurred while the region server was busy working on a series of SingleColumnValueFilters in a small RAM environment, there may be some other cluster processes got swapped out from RAM and almost stalled. (By the way, it's a kind of confusing that HBase master was still reporting there is 1 region server available while there isn't. I guess this is because pseudo distribution mode and the master got confused.) > I'll try to increase RAM capacity. And then I'll write here about results So adding more RAM to the server will definitely help. Also, I never tried this and maybe a wrong advice, but you could decrease HBase and Hadoop processes' Java VM heap size so that they can avoid to be swapping out from RAM and the leases will be never(?) expired. You can configure the heap size in "hbase-env.sh" and "hadoop-env.sh", but do this with a caution -- decreasing the heap size would cause OutOfMemoryError or serious slow down of the process, and you could end up with the same (or even worse) situation you have now. Also, I saw you have many column families in your table "channel_products": -- active -- channel_cat -- channel_id -- contract_id -- shop_category_id -- shop_id -- shop_product_id -- created_at -- updated_at Try to have less. You can group them together in a fewer number of column families, so that the region server can access each column faster and consume less memory. Be aware that different column families are stored in different files on the disk, so you can optimize locality of the columns by grouping them together in some column families. Good luck, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Thu, Oct 29, 2009 at 9:50 PM, Artyom Shvedchikov <shoolc@...> wrote: > Dear, Tatsuya > > 1. Delete hadoop data directory >> 2. bin/hadoop namenode -format >> 3. bin/start-all.sh >> -> namenode will start immediately and go in service, but data >> node will be making a long (almost seven minutes) pause in a middle of >> the startup. >> >> 4. Before the data node becomes ready, do an HDFS write operation >> (e.g. "bin/hadoop fs -put conf input"), and then the write operations >> will fail with the following error: >> > > Today I tried to restart Hadoop and HBase skipping step #1 and step #2. > First I stop HBase, then Hadoop and then start Hadoop, wait for 10 minutes > and start HBase - it works. Data was not lost and was available to read and > etc. Then I tried to scan several times the table with 6 000 000 rows and > HBase hanged down again with the same exceptions as in my previous post (see > post at Thu, 29 Oct, 10:06). > > hbase(main):006:0> list >> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> Trying to contact region server 127.0.0.1:57613 for region .META.,,1, row >> '', but failed after 5 attempts. >> Exceptions: >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> java.net.ConnectException: Connection refused >> >> from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in >> `getRegionServerWithRetries' >> from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' >> from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' >> from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in >> `listTables' >> from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' >> from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' >> from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' >> from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' >> from java/lang/reflect/Method.java:597:in `invoke' >> from org/jruby/javasupport/JavaMethod.java:298:in >> `invokeWithExceptionHandling' >> from org/jruby/javasupport/JavaMethod.java:259:in `invoke' >> from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' >> from org/jruby/runtime/callsite/CachingCallSite.java:70:in `call' >> from org/jruby/ast/CallNoArgNode.java:61:in `interpret' >> from org/jruby/ast/ForNode.java:104:in `interpret' >> from org/jruby/ast/NewlineNode.java:104:in `interpret' >> ... 110 levels... >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call' >> from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in >> `call' >> from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in >> `call' >> from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in >> `call' >> from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> `cacheAndCall' >> from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__' >> from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load' >> from org/jruby/Ruby.java:577:in `runScript' >> from org/jruby/Ruby.java:480:in `runNormally' >> from org/jruby/Ruby.java:354:in `runFromMain' >> from org/jruby/Main.java:229:in `run' >> from org/jruby/Main.java:110:in `run' >> from org/jruby/Main.java:94:in `main' >> from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list' >> from (hbase):7hbase(main):007:0> status >> 0 servers, 0 dead, NaN average load >> hbase(main):008:0> exit >> > > Full hbase and hadoop logs can be found in my post at Thu, 29 Oct, 07:52 > > The main issue for now is that HBase hangs down each time after I try to > scan the table (after second or third time). By the way, this time it was > enough to restart HBase only. And it was became available to scan/get/put > operations. > > Table structure: > > hbase(main):003:0> describe 'channel_products' >> DESCRIPTION >> ENABLED >> {NAME => 'channel_products', FAMILIES => [{NAME => 'active', VERSIONS >> true >> => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => >> '6553 >> 6', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => >> 'channel_cat >> egory_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => >> '2147483647' >> , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, >> { >> NAME => 'channel_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => >> ' >> 2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE >> => >> 'true'}, {NAME => 'contract_id', VERSIONS => '3', COMPRESSION => >> 'NON >> E', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', >> B >> LOCKCACHE => 'true'}, {NAME => 'created_at', VERSIONS => '3', >> COMPRESS >> ION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY >> => >> 'false', BLOCKCACHE => 'true'}, {NAME => 'shop_category_id', >> VERSIONS >> => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => >> '655 >> 36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => >> 'shop_id', >> VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', >> BLOCKSIZE >> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => >> 'sh >> op_product_id', VERSIONS => '3', COMPRESSION => 'NONE', TTL => >> '214748 >> 3647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => >> 'true >> '}, {NAME => 'updated_at', VERSIONS => '3', COMPRESSION => 'NONE', >> TTL >> => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', >> BLOCKCAC >> HE => >> 'true'}]} >> >> 1 row(s) in 0.0630 seconds >> > > Table contains ~ 6 000 000 rows, each value is a String. > > Code to scan the table: > > protected void doGet(HttpServletRequest request, HttpServletResponse >> response) throws ServletException, IOException { >> > Date startDate = new Date(); >> > Date finishDate; >> > log(startDate + ": Get activation status started"); >> > String shop_id = request.getParameter("shop_id"); >> > >> String[] shop_product_ids = >> request.getParameterValues("shop_product_ids"); >> > if (shop_product_ids != null && shop_product_ids.length == 1) { >> > shop_product_ids = shop_product_ids[0].split(","); >> > } >> > >> String channel_id = request.getParameter("channel_id"); >> > String channel_category_id = request.getParameter("channel_category_id"); >> > >> String tableName = "channel_products"; >> > StringBuffer result = new StringBuffer("<?xml version=\"1.0\"?>"); >> > >> if (this.admin.tableExists(tableName)) { >> > result.append("<result>"); >> > >> HTable table = new HTable(this.configuration, tableName); >> > >> Scan scan = new Scan(); >> > >> FilterList mainFilterList = new FilterList(); >> > >> if (shop_id != null) { >> > mainFilterList.addFilter(new >> SingleColumnValueFilter(Bytes.toBytes("shop_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, >> Bytes.toBytes(shop_id))); >> > } >> > if (channel_id != null) { >> > mainFilterList.addFilter(new >> SingleColumnValueFilter(Bytes.toBytes("channel_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, >> Bytes.toBytes(channel_id))); >> > } >> > if (channel_category_id != null) { >> > mainFilterList.addFilter(new >> SingleColumnValueFilter(Bytes.toBytes("channel_category_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, >> Bytes.toBytes(channel_category_id))); >> > } >> > >> if (shop_product_ids != null && shop_product_ids.length > 0) { >> > List<Filter> filterList = new ArrayList<Filter>(); >> > for (String shop_product_id : shop_product_ids) { >> > filterList.add(new >> SingleColumnValueFilter(Bytes.toBytes("shop_product_id"),Bytes.toBytes(""),CompareFilter.CompareOp.EQUAL, >> Bytes.toBytes(shop_product_id))); >> > } >> > FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ONE, >> filterList); >> > mainFilterList.addFilter(filters); >> > } >> > >> scan.setFilter(mainFilterList); >> > ResultScanner scanner = null; >> > try { >> > scanner = table.getScanner(scan); >> > for (Result item : scanner) { >> > getItemXml(result, item); >> > } >> > } catch (Exception e) { >> > logError("Error during table scan: ", e); >> > result.append("<error>").append("Error during table scan: " + >> e).append("</error>"); >> > } finally { >> > try { >> > scanner.close(); >> > } catch (Exception e1) { >> > //Can be null, skip >> > } >> > result.append("</result>"); >> > } >> > } else { >> > result.append("<result>").append("Table " + tableName + " not >> exists!").append("</result>"); >> > } >> > finishDate = new Date(); >> > log(finishDate + ": Get activation status finihed, duration: " + >> (finishDate.getTime() - startDate.getTime()) + " ms"); >> > >> response.getOutputStream().print(result.toString()); >> > } >> > > I checked regionserver logs, but regionserver was not started: > >> 2009-10-29 13:34:13,754 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct >> region server because hbase.cluster.distributed is false > > HBase and Hadoop were configured according to "Getting Started" section on * > hadoop.org*. They are both started in Pseudo-distributed mode. > May be I should set this setting *hbase.cluster.distributed* to true? > I'll try to increase RAM capacity. And then I'll write here about results > ------------------------------------------------- > Best wishes, Artyom Shvedchikov > |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Tatsuya
Thanks a lot for your help. I'll try both: 1. Increase RAM capacity 2. Decrease heap size Also I'll try to optimize table structure. ------------------------------------------------- Best wishes, Artyom Shvedchikov |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Artyom,
> I'll try both: > 1. Increase RAM capacity > 2. Decrease heap size > > Also I'll try to optimize table structure. Good luck! One more thing about the table structure, is channel_products table a kind of join table in SQL world? If so, you could de-normalize the table structure and eliminate that table. Since HBase doesn't provide foreign key index and table join, your current implementation of looking up the join table results a full table scan of 3 million records, which will take a few seconds to complete. If you de-normalize the table structure and eliminate the join table, the same query could complete in a few milli-seconds and of course consume much much smaller amount of memory. So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They will show you how to eliminate the join table. -- Tatsuya Kawano (Mr.) Tokyo, Japan |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start> So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as
> well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They > will show you how to eliminate the join table. Sorry, not FAQ #10, but #20. The link above is a direct link to #20, so you won't miss it. Thanks, Tatsuya On Fri, Oct 30, 2009 at 8:54 AM, Tatsuya Kawano <tatsuyaml@...> wrote: > Hello, Artyom, > >> I'll try both: >> 1. Increase RAM capacity >> 2. Decrease heap size >> >> Also I'll try to optimize table structure. > > Good luck! > > One more thing about the table structure, is channel_products table a > kind of join table in SQL world? If so, you could de-normalize the > table structure and eliminate that table. > > Since HBase doesn't provide foreign key index and table join, your > current implementation of looking up the join table results a full > table scan of 3 million records, which will take a few seconds to > complete. If you de-normalize the table structure and eliminate the > join table, the same query could complete in a few milli-seconds and > of course consume much much smaller amount of memory. > > So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as > well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They > will show you how to eliminate the join table. > > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan |
||||||||||||||||||||||||||||||||||||||
|
|
Re: HBase 0.20.1 on Ubuntu 9.04: master fails to startHello, Tatsuya
Thank you for all advises, I'll keep them all in mind. Once I'll receive some results during optimization of hbase instance and table structure - I'll write here. ------------------------------------------------- Best wishes, Artyom Shvedchikov On Fri, Oct 30, 2009 at 2:04 AM, Tatsuya Kawano <tatsuyaml@...>wrote: > > So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as > > well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They > > will show you how to eliminate the join table. > > Sorry, not FAQ #10, but #20. The link above is a direct link to #20, > so you won't miss it. > > Thanks, > Tatsuya > > > > On Fri, Oct 30, 2009 at 8:54 AM, Tatsuya Kawano > <tatsuyaml@...> wrote: > > Hello, Artyom, > > > >> I'll try both: > >> 1. Increase RAM capacity > >> 2. Decrease heap size > >> > >> Also I'll try to optimize table structure. > > > > Good luck! > > > > One more thing about the table structure, is channel_products table a > > kind of join table in SQL world? If so, you could de-normalize the > > table structure and eliminate that table. > > > > Since HBase doesn't provide foreign key index and table join, your > > current implementation of looking up the join table results a full > > table scan of 3 million records, which will take a few seconds to > > complete. If you de-normalize the table structure and eliminate the > > join table, the same query could complete in a few milli-seconds and > > of course consume much much smaller amount of memory. > > > > So please take a look at HBase FAQ #10 at http://bit.ly/2RyrI3 , as > > well as the case studies by Evan Liu at http://bit.ly/1eGU2r . They > > will show you how to eliminate the join table. > > > > > > -- > > Tatsuya Kawano (Mr.) > > Tokyo, Japan > |
| Free embeddable forum powered by Nabble | Forum Help |