|
View:
New views
16 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (HADOOP-2576) Namenode performance degradation over timeNamenode performance degradation over time
------------------------------------------ Key: HADOOP-2576 URL: https://issues.apache.org/jira/browse/HADOOP-2576 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Christian Kunz We have a cluster running the same applications again and again with a high turnover of files. The performance of these applications seem to be correlated to the lifetime of the namenode: After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kunz updated HADOOP-2576: ----------------------------------- Priority: Blocker (was: Major) > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kunz updated HADOOP-2576: ----------------------------------- Fix Version/s: 0.16.0 > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558942#action_12558942 ] dhruba borthakur commented on HADOOP-2576: ------------------------------------------ There is a command that dumps namenode internal data structures to a log file. When this problem ocurs, can you pl run it as bin/hadoop dfsadmin -metasave "filename". The specified filename will be created in the namenode's log directory. This file will list blocks that are waiting to be replicated as well as blocks waiting to be deleted. Using this tool we can determine if the namenode is not purging the list of blocks to be invalidated. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560529#action_12560529 ] Christian Kunz commented on HADOOP-2576: ---------------------------------------- I have 2 block reports now, one generated 1.5 days after namenode startup, and one 4.5 days after. The build process did not yet slow down to a large extent, but the block reports already indicate some leak: The first block report lists about 20,000 blocks to delete from 14 nodes the 2nd one about 140,000 blocks to delete from 10 nodes. I checked the first block of the first node in the datanode log files: there were about 40 futile attempts to delete that block (not found in blockMap). > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560601#action_12560601 ] Raghu Angadi commented on HADOOP-2576: -------------------------------------- What is the heartbeat on this cluster? Is it is say large like 1 min? invalidateSet that contains the blocks to delete for each datanode at namenode is actually a array.. each block could be present multiple times in this array. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560649#action_12560649 ] Christian Kunz commented on HADOOP-2576: ---------------------------------------- Heartbeat is indeed 1 minute. On the other hand, of the 143437 blocks in 2nd block report listing blocks to be deleted on 10 datanodes, about 127650 are unique (including the one I checked having 40 failed attempts to delete), about 15400 are double, and less than 400 are replicated more or equal 3 times. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560655#action_12560655 ] Raghu Angadi commented on HADOOP-2576: -------------------------------------- Thanks. May be with access to the logs, this could investigated better. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Assigned: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur reassigned HADOOP-2576: ---------------------------------------- Assignee: Raghu Angadi > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561483#action_12561483 ] Stu Hood commented on HADOOP-2576: ---------------------------------- For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561485#action_12561485 ] Raghu Angadi commented on HADOOP-2576: -------------------------------------- Thanks Christian, I have access to the logs. The cluster seems to be running an old version of the trunk can you get the svn revision? Also Namenode was recently restarted. Looks like there another linked list attached each datanode. {{metasave}} prints only the "recent invalidates". A loop in Namenode moves the invalidated blocks from recent invalidates to the datanode list. So it is possible for the block to exist many more times in this list. This is most probably the reason. I think it is better to relieve Namenode from throttling the deletion of blocks. In cases like these there seems to quite a bit of penalty on Namenode memory, the most precious resource for HDFS. Namenode could just ask Datanode to delete anything that it want to delete. Datanode could throttle it, I think it would be more scalable. This will also remove code related to management of throttling. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561486#action_12561486 ] Raghu Angadi commented on HADOOP-2576: -------------------------------------- But for now, just changing the datanode list to a Set might be good enough. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561487#action_12561487 ] Raghu Angadi commented on HADOOP-2576: -------------------------------------- > For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now. I think a combination of large heartbeat interval and busts of deletions trigger this. What is dfs.heartbeat.interval set to? > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-2576: --------------------------------- Attachment: HADOOP-2576.patch Could you try the attached patch. You might get a conflict in FSNameSystem.java since it changes so often. But its only a one line change there. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > Attachments: HADOOP-2576.patch > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561814#action_12561814 ] dhruba borthakur commented on HADOOP-2576: ------------------------------------------ I agree with Raghu that the throttle to delete blocks from a datanode could be done by the Datanode. Currently, the namenode does this throttling. See HADOOP-774 for more discussion on this topic. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > Attachments: HADOOP-2576.patch > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (HADOOP-2576) Namenode performance degradation over time[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561829#action_12561829 ] Christian Kunz commented on HADOOP-2576: ---------------------------------------- I applied Raghu's patch, restarted the nameserver, and will monitor its performance. > Namenode performance degradation over time > ------------------------------------------ > > Key: HADOOP-2576 > URL: https://issues.apache.org/jira/browse/HADOOP-2576 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Priority: Blocker > Fix For: 0.16.0 > > Attachments: HADOOP-2576.patch > > > We have a cluster running the same applications again and again with a high turnover of files. > The performance of these applications seem to be correlated to the lifetime of the namenode: > After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. > During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2). > When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. > When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). > This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. > But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |