[jira] Created: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

[jira] Created: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

BasicTypeSorterBase.compare calls progress on each compare
----------------------------------------------------------

                 Key: HADOOP-2284
                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Owen O'Malley
            Assignee: Devaraj Das
             Fix For: 0.16.0


The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545627 ]

Owen O'Malley commented on HADOOP-2284:
---------------------------------------

Another important note on this is that the ratio of "overhead" in the compare looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu seconds and the work is being done in org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction levels wrapped around the compare. Part of that overhead is the progress, but I suspect that should strip out more of the overhead.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545627 ]

owen.omalley edited comment on HADOOP-2284 at 11/26/07 2:21 PM:
-----------------------------------------------------------------

Another important note on this is that the ratio of "overhead" in the compare looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu seconds and the work is being done in org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction levels wrapped around the compare. Part of that overhead is the progress, but I suspect that we should work on striping out more of the overhead.

      was (Author: owen.omalley):
    Another important note on this is that the ratio of "overhead" in the compare looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu seconds and the work is being done in org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction levels wrapped around the compare. Part of that overhead is the progress, but I suspect that should strip out more of the overhead.
 

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554382 ]

Amar Kamat commented on HADOOP-2284:
------------------------------------

Looks like there are two possibilities
-  _counter-threshold based technique_ : Send the progress after certain number of compares.
-  _time-interval based technique_ : Where the amount of time to wait before sending the progress is determined by *mapred.task.timeout*
Currently time-based seems like a better technique. Comments?

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554508 ]

Arun C Murthy commented on HADOOP-2284:
---------------------------------------

bq. Currently time-based seems like a better technique. Comments?

I agree, however it should be a fraction of {{mapred.task.timeout}} (say 10%), else we run the risk of the tasks being timed-out by the {{TaskTracker}}.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554515 ]

Amar Kamat commented on HADOOP-2284:
------------------------------------

Yeah. It should be some fraction of {{mapred.task.timeout}}. I was thinking like *2%*. So that we make sure that sufficient attempts are made to declare the progress (in this case 50 attempts) and 50 progress indications while sorting should not be a problem. no?

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554527 ]

Devaraj Das commented on HADOOP-2284:
-------------------------------------

I am worried about the time based approach since it might end up making a native call, system.currentTimeMillis() on every compare.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554528 ]

Owen O'Malley commented on HADOOP-2284:
---------------------------------------

I agree with Devaraj. The cost of gettimeofday is huge when put into the inner loop like that. I think we'll be fine with every 10,000th compare calling progress. To timeout we'd need to do less than 20 compares/second...

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554529 ]

Arun C Murthy commented on HADOOP-2284:
---------------------------------------

On second thoughts, I have to agree with Devaraj/Owen.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554545 ]

Amar Kamat commented on HADOOP-2284:
------------------------------------

I think the facility of _batch progress updates_ should be provided at the {{Reporter}} level than at the caller. So that we can set the interval and any call to the reporter within the interval will do nothing. The guess is that the problem reported is in the body of {{progress}}. The check to set the flag should be conditioned. Comments?

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reassigned HADOOP-2284:
-----------------------------------

    Assignee: Amar Kamat  (was: Devaraj Das)

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-2284:
-------------------------------

    Attachment: HADOOP-2284.patch

Attaching the patch that should reduce the calls to progress and also the time spent in unwrapping. Owen could you plz check and let us know. Can we do better if we know the number of elements getting sorted say 'n'? That is can we make it dynamic based on 'n', something like {{update-freq = n ^2^/ k}}, k could be log( n ), 100, 1000.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-2284:
-------------------------------

    Status: Patch Available  (was: Open)

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560082#action_12560082 ]

Hadoop QA commented on HADOOP-2284:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373347/HADOOP-2284.patch
against trunk revision r612957.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1626/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1626/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1626/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1626/console

This message is automatically generated.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560215#action_12560215 ]

Amar Kamat commented on HADOOP-2284:
------------------------------------

This test passes on my machine (checked again).

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560215#action_12560215 ]

amar_kamat edited comment on HADOOP-2284 at 1/17/08 8:51 PM:
-------------------------------------------------------------

The core test failed on {{TestMiniMRDFSCaching}}. This test passes on my machine (checked again).

      was (Author: amar_kamat):
    This test passes on my machine (checked again).
 

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560236#action_12560236 ]

Owen O'Malley commented on HADOOP-2284:
---------------------------------------

+1, as long as the unit test is fine

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-2284:
-------------------------------

    Status: Patch Available  (was: Open)

Resubmitting for tests.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-2284:
-------------------------------

    Status: Open  (was: Patch Available)

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560614#action_12560614 ]

Hadoop QA commented on HADOOP-2284:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373347/HADOOP-2284.patch
against trunk revision r613115.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1643/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1643/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1643/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1643/console

This message is automatically generated.

> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2284.patch
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make more sense to call progress in the sort rather than the compare or at most every 10000 compares. In the performance numbers, the call to progress as part of the sort are consuming 12% of the total cpu time when running word count under the local runner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

< Prev | 1 - 2 | Next >