perf stat output

View: New views
7 Messages — Rating Filter:   Alert me  

perf stat output

by Lucas De Marchi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all!

Some questions about perf stat output. See example:


lucas@LMS-linux:~/programming/testprograms> perf stat -e
L1-dcache-loads -e L1-dcache-load-misses -- make -j
gcc test_schedchanges.c -o test_schedchanges
gcc -pthread test_taskaff1.c -o test_taskaff1
gcc -pthread test_taskaff2.c -o test_taskaff2
gcc -pthread test_taskaff3.c -o test_taskaff3

 Performance counter stats for 'make -j':

       161384667  L1-dcache-loads          #      0.000 M/sec
              24853791  L1-dcache-load-misses    #      0.000 M/sec

                  0.066893389  seconds time elapsed

Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
cache-misses -e cache-references" but values different than 0 when
running "perf stat -a" without selecting the events?

The last question: what does the  "scaled from X%" mean? Is it related
to the maximum number of performance registers a processor can count
at a time?


Thanks

Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf stat output

by Peter Zijlstra-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:

> Hi all!
>
> Some questions about perf stat output. See example:
>
>
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
>
>  Performance counter stats for 'make -j':
>
>        161384667  L1-dcache-loads          #      0.000 M/sec
>               24853791  L1-dcache-load-misses    #      0.000 M/sec
>
>          0.066893389  seconds time elapsed
>
> Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
> cache-misses -e cache-references" but values different than 0 when
> running "perf stat -a" without selecting the events?

No idea, you'd have to look at the code computing this M/sec stuff. I
think Ingo wrote that, so he might have an idea.

> The last question: what does the  "scaled from X%" mean? Is it related
> to the maximum number of performance registers a processor can count
> at a time?

Yes, if the hardware has only 2 counters and you specify 4, we'll
round-robin those 4 onto the 2. In that case you'll see things like
scaled from ~50% because each counter will only have been on the actual
PMU for about 50% of the time.

(RR happens with tick granularity, so if your runtime is of that order
or shorter you can get funny results with some counters being 0).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf stat output

by Lucas De Marchi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 13, 2009 at 06:07, Peter Zijlstra <peterz@...> wrote:

> On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
>> Hi all!
>>
>> Some questions about perf stat output. See example:
>>
>>
>> lucas@LMS-linux:~/programming/testprograms> perf stat -e
>> L1-dcache-loads -e L1-dcache-load-misses -- make -j
>> gcc test_schedchanges.c -o test_schedchanges
>> gcc -pthread test_taskaff1.c -o test_taskaff1
>> gcc -pthread test_taskaff2.c -o test_taskaff2
>> gcc -pthread test_taskaff3.c -o test_taskaff3
>>
>>  Performance counter stats for 'make -j':
>>
>>        161384667  L1-dcache-loads          #      0.000 M/sec
>>               24853791  L1-dcache-load-misses    #      0.000 M/sec
>>
>>                 0.066893389  seconds time elapsed
>>
>> Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
>> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
>> cache-misses -e cache-references" but values different than 0 when
>> running "perf stat -a" without selecting the events?
>
> No idea, you'd have to look at the code computing this M/sec stuff. I
> think Ingo wrote that, so he might have an idea.

But should it be, for example, L1-dcache-loads / seconds time elapsed,
or does it have another meaning?



Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf stat output

by Ingo Molnar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


* Lucas De Marchi <lucas.de.marchi@...> wrote:

> Hi all!
>
> Some questions about perf stat output. See example:
>
>
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
>
>  Performance counter stats for 'make -j':
>
>        161384667  L1-dcache-loads          #      0.000 M/sec
>               24853791  L1-dcache-load-misses    #      0.000 M/sec
>
>          0.066893389  seconds time elapsed
>
> Why do we have both L1-dcache-loads and L1-dcache-load-misses with
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
> cache-misses -e cache-references" but values different than 0 when
> running "perf stat -a" without selecting the events?

You need the 'task-clock' event to be able to see M/sec metrics. I.e.:

$ perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e task-clock sleep 1

 Performance counter stats for 'sleep 1':

         201330  L1-dcache-loads          #    566.234 M/sec
          29916  L1-dcache-load-misses    #     84.138 M/sec
       0.355560  task-clock-msecs         #      0.000 CPUs

    1.000621650  seconds time elapsed

I agree with you that seeing '0.000 M/sec' is confusing and incorrect as
well. One solution would be to skip the printout in that case.

You can find the latest 'perf' code in:

  http://people.redhat.com/mingo/tip.git/README

( the tools/perf/ bits are backwards compatible with any perf kernel you
  are running right now, so no reboot is needed. )

You can find the stats printing in tools/perf/builtin-stat.c, in the
abs_printout() function:

        } else {
                total = avg_stats(&runtime_nsecs_stats);

                if (total)
                        ratio = 1000.0 * avg / total;

                fprintf(stderr, " # %10.3f M/sec", ratio);

I think if runtime_nsecs_stats is zero (i.e. if no 'task-clock' events
were measured), then we might be able to skip the printout via doing
something like:

        } else if (runtime_nsecs_stats.n != 0) {

Would you be interested in sending a (tested) patch for that? In theory
only that oneliner change should suffice - but i have not tested it.

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf stat output

by Lucas De Marchi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ingo, following the tested patch you asked. It's a bit different of what you
sent because branch-misses is counted in another struct.


Lucas De Marchi

--
commit 70e98c814153f0026a683c5d19674ed8185d02fa
Author: Lucas De Marchi <lucas.de.marchi@...>
Date:   Sun Nov 15 11:20:56 2009 -0200

    Do not print ratio when task-clock event is not counted
   
    The ratio between the number of events and the time elapsed makes sense only
    if task-clock event is counted. Otherwise it will be simply a (confusing)
   
    #      0.000 M/sec
   
    This patch outputs the ratio only if task-clock event is counted. Some test
    examples of before and after:
   
    Before:
   
    [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
   
    Performance counter stats for 'sleep 1':
   
    1367818  branch-misses            #      0.000 M/sec
   
       1.001494325  seconds time elapsed
   
    After (without task-clock):
   
    [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
   
    Performance counter stats for 'sleep 1':
   
    1135044  branch-misses
   
       1.001370775  seconds time elapsed
   
    After (with task-clock):
   
    [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
   
    Performance counter stats for 'sleep 1':
   
    1070111  branch-misses            #      0.534 M/sec
       2002.730893  task-clock-msecs         #      1.999 CPUs
   
       1.001640292  seconds time elapsed
   
    Signed-off-by: Lucas De Marchi <lucas.de.marchi@...>

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
  ratio = avg / total;
 
  fprintf(stderr, " # %10.3f IPC  ", ratio);
- } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+ } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+ runtime_branches_stats.n != 0) {
  total = avg_stats(&runtime_branches_stats);
 
  if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
 
  fprintf(stderr, " # %10.3f %%    ", ratio);
 
- } else {
+ } else if (runtime_nsecs_stats.n != 0) {
  total = avg_stats(&runtime_nsecs_stats);
 
  if (total)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf stat output

by Ingo Molnar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


* Lucas De Marchi <lucas.de.marchi@...> wrote:

> Ingo, following the tested patch you asked. It's a bit different of
> what you sent because branch-misses is counted in another struct.

Looks perfect - applied it to tip:perf/core for v2.6.33 merging, thanks
Lucas!

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted

by Lucas De Marchi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Commit-ID:  7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Gitweb:     http://git.kernel.org/tip/7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Author:     Lucas De Marchi <lucas.de.marchi@...>
AuthorDate: Sun, 15 Nov 2009 12:05:08 -0200
Committer:  Ingo Molnar <mingo@...>
CommitDate: Sun, 15 Nov 2009 15:25:50 +0100

perf stat: Do not print ratio when task-clock event is not counted

The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)

        #      0.000 M/sec

This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:

Before:

 [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1

         Performance counter stats for 'sleep 1':

                1367818  branch-misses            #      0.000 M/sec

            1.001494325  seconds time elapsed

After (without task-clock):

 [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1

         Performance counter stats for 'sleep 1':

                1135044  branch-misses

            1.001370775  seconds time elapsed

After (with task-clock):

 [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1

         Performance counter stats for 'sleep 1':

                1070111  branch-misses            #      0.534 M/sec
            2002.730893  task-clock-msecs         #      1.999 CPUs

            1.001640292  seconds time elapsed

Signed-off-by: Lucas De Marchi <lucas.de.marchi@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>
Cc: Arnaldo Carvalho de Melo <acme@...>
LKML-Reference: <20091115140507.GB21561@...>
Signed-off-by: Ingo Molnar <mingo@...>
---
 tools/perf/builtin-stat.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
  ratio = avg / total;
 
  fprintf(stderr, " # %10.3f IPC  ", ratio);
- } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+ } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+ runtime_branches_stats.n != 0) {
  total = avg_stats(&runtime_branches_stats);
 
  if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
 
  fprintf(stderr, " # %10.3f %%    ", ratio);
 
- } else {
+ } else if (runtime_nsecs_stats.n != 0) {
  total = avg_stats(&runtime_nsecs_stats);
 
  if (total)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/