|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
perf stat outputHi all!
Some questions about perf stat output. See example: lucas@LMS-linux:~/programming/testprograms> perf stat -e L1-dcache-loads -e L1-dcache-load-misses -- make -j gcc test_schedchanges.c -o test_schedchanges gcc -pthread test_taskaff1.c -o test_taskaff1 gcc -pthread test_taskaff2.c -o test_taskaff2 gcc -pthread test_taskaff3.c -o test_taskaff3 Performance counter stats for 'make -j': 161384667 L1-dcache-loads # 0.000 M/sec 24853791 L1-dcache-load-misses # 0.000 M/sec 0.066893389 seconds time elapsed Why do we have both L1-dcache-loads and L1-dcache-load-misses with 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e cache-misses -e cache-references" but values different than 0 when running "perf stat -a" without selecting the events? The last question: what does the "scaled from X%" mean? Is it related to the maximum number of performance registers a processor can count at a time? Thanks Lucas De Marchi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: perf stat outputOn Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
> Hi all! > > Some questions about perf stat output. See example: > > > lucas@LMS-linux:~/programming/testprograms> perf stat -e > L1-dcache-loads -e L1-dcache-load-misses -- make -j > gcc test_schedchanges.c -o test_schedchanges > gcc -pthread test_taskaff1.c -o test_taskaff1 > gcc -pthread test_taskaff2.c -o test_taskaff2 > gcc -pthread test_taskaff3.c -o test_taskaff3 > > Performance counter stats for 'make -j': > > 161384667 L1-dcache-loads # 0.000 M/sec > 24853791 L1-dcache-load-misses # 0.000 M/sec > > 0.066893389 seconds time elapsed > > Why do we have both L1-dcache-loads and L1-dcache-load-misses with > 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e > cache-misses -e cache-references" but values different than 0 when > running "perf stat -a" without selecting the events? No idea, you'd have to look at the code computing this M/sec stuff. I think Ingo wrote that, so he might have an idea. > The last question: what does the "scaled from X%" mean? Is it related > to the maximum number of performance registers a processor can count > at a time? Yes, if the hardware has only 2 counters and you specify 4, we'll round-robin those 4 onto the 2. In that case you'll see things like scaled from ~50% because each counter will only have been on the actual PMU for about 50% of the time. (RR happens with tick granularity, so if your runtime is of that order or shorter you can get funny results with some counters being 0). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: perf stat outputOn Fri, Nov 13, 2009 at 06:07, Peter Zijlstra <peterz@...> wrote:
> On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote: >> Hi all! >> >> Some questions about perf stat output. See example: >> >> >> lucas@LMS-linux:~/programming/testprograms> perf stat -e >> L1-dcache-loads -e L1-dcache-load-misses -- make -j >> gcc test_schedchanges.c -o test_schedchanges >> gcc -pthread test_taskaff1.c -o test_taskaff1 >> gcc -pthread test_taskaff2.c -o test_taskaff2 >> gcc -pthread test_taskaff3.c -o test_taskaff3 >> >> Performance counter stats for 'make -j': >> >> 161384667 L1-dcache-loads # 0.000 M/sec >> 24853791 L1-dcache-load-misses # 0.000 M/sec >> >> 0.066893389 seconds time elapsed >> >> Why do we have both L1-dcache-loads and L1-dcache-load-misses with >> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e >> cache-misses -e cache-references" but values different than 0 when >> running "perf stat -a" without selecting the events? > > No idea, you'd have to look at the code computing this M/sec stuff. I > think Ingo wrote that, so he might have an idea. But should it be, for example, L1-dcache-loads / seconds time elapsed, or does it have another meaning? Lucas De Marchi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: perf stat output* Lucas De Marchi <lucas.de.marchi@...> wrote: > Hi all! > > Some questions about perf stat output. See example: > > > lucas@LMS-linux:~/programming/testprograms> perf stat -e > L1-dcache-loads -e L1-dcache-load-misses -- make -j > gcc test_schedchanges.c -o test_schedchanges > gcc -pthread test_taskaff1.c -o test_taskaff1 > gcc -pthread test_taskaff2.c -o test_taskaff2 > gcc -pthread test_taskaff3.c -o test_taskaff3 > > Performance counter stats for 'make -j': > > 161384667 L1-dcache-loads # 0.000 M/sec > 24853791 L1-dcache-load-misses # 0.000 M/sec > > 0.066893389 seconds time elapsed > > Why do we have both L1-dcache-loads and L1-dcache-load-misses with > 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e > cache-misses -e cache-references" but values different than 0 when > running "perf stat -a" without selecting the events? You need the 'task-clock' event to be able to see M/sec metrics. I.e.: $ perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e task-clock sleep 1 Performance counter stats for 'sleep 1': 201330 L1-dcache-loads # 566.234 M/sec 29916 L1-dcache-load-misses # 84.138 M/sec 0.355560 task-clock-msecs # 0.000 CPUs 1.000621650 seconds time elapsed I agree with you that seeing '0.000 M/sec' is confusing and incorrect as well. One solution would be to skip the printout in that case. You can find the latest 'perf' code in: http://people.redhat.com/mingo/tip.git/README ( the tools/perf/ bits are backwards compatible with any perf kernel you are running right now, so no reboot is needed. ) You can find the stats printing in tools/perf/builtin-stat.c, in the abs_printout() function: } else { total = avg_stats(&runtime_nsecs_stats); if (total) ratio = 1000.0 * avg / total; fprintf(stderr, " # %10.3f M/sec", ratio); I think if runtime_nsecs_stats is zero (i.e. if no 'task-clock' events were measured), then we might be able to skip the printout via doing something like: } else if (runtime_nsecs_stats.n != 0) { Would you be interested in sending a (tested) patch for that? In theory only that oneliner change should suffice - but i have not tested it. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: perf stat outputIngo, following the tested patch you asked. It's a bit different of what you
sent because branch-misses is counted in another struct. Lucas De Marchi -- commit 70e98c814153f0026a683c5d19674ed8185d02fa Author: Lucas De Marchi <lucas.de.marchi@...> Date: Sun Nov 15 11:20:56 2009 -0200 Do not print ratio when task-clock event is not counted The ratio between the number of events and the time elapsed makes sense only if task-clock event is counted. Otherwise it will be simply a (confusing) # 0.000 M/sec This patch outputs the ratio only if task-clock event is counted. Some test examples of before and after: Before: [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1367818 branch-misses # 0.000 M/sec 1.001494325 seconds time elapsed After (without task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1135044 branch-misses 1.001370775 seconds time elapsed After (with task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1 Performance counter stats for 'sleep 1': 1070111 branch-misses # 0.534 M/sec 2002.730893 task-clock-msecs # 1.999 CPUs 1.001640292 seconds time elapsed Signed-off-by: Lucas De Marchi <lucas.de.marchi@...> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c6df377..c70d720 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg) ratio = avg / total; fprintf(stderr, " # %10.3f IPC ", ratio); - } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) { + } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) && + runtime_branches_stats.n != 0) { total = avg_stats(&runtime_branches_stats); if (total) @@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg) fprintf(stderr, " # %10.3f %% ", ratio); - } else { + } else if (runtime_nsecs_stats.n != 0) { total = avg_stats(&runtime_nsecs_stats); if (total) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: perf stat output* Lucas De Marchi <lucas.de.marchi@...> wrote: > Ingo, following the tested patch you asked. It's a bit different of > what you sent because branch-misses is counted in another struct. Looks perfect - applied it to tip:perf/core for v2.6.33 merging, thanks Lucas! Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
[tip:perf/core] perf stat: Do not print ratio when task-clock event is not countedCommit-ID: 7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Gitweb: http://git.kernel.org/tip/7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46 Author: Lucas De Marchi <lucas.de.marchi@...> AuthorDate: Sun, 15 Nov 2009 12:05:08 -0200 Committer: Ingo Molnar <mingo@...> CommitDate: Sun, 15 Nov 2009 15:25:50 +0100 perf stat: Do not print ratio when task-clock event is not counted The ratio between the number of events and the time elapsed makes sense only if task-clock event is counted. Otherwise it will be simply a (confusing) # 0.000 M/sec This patch outputs the ratio only if task-clock event is counted. Some test examples of before and after: Before: [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1367818 branch-misses # 0.000 M/sec 1.001494325 seconds time elapsed After (without task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1135044 branch-misses 1.001370775 seconds time elapsed After (with task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1 Performance counter stats for 'sleep 1': 1070111 branch-misses # 0.534 M/sec 2002.730893 task-clock-msecs # 1.999 CPUs 1.001640292 seconds time elapsed Signed-off-by: Lucas De Marchi <lucas.de.marchi@...> Cc: Peter Zijlstra <a.p.zijlstra@...> Cc: Arnaldo Carvalho de Melo <acme@...> LKML-Reference: <20091115140507.GB21561@...> Signed-off-by: Ingo Molnar <mingo@...> --- tools/perf/builtin-stat.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c6df377..c70d720 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg) ratio = avg / total; fprintf(stderr, " # %10.3f IPC ", ratio); - } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) { + } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) && + runtime_branches_stats.n != 0) { total = avg_stats(&runtime_branches_stats); if (total) @@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg) fprintf(stderr, " # %10.3f %% ", ratio); - } else { + } else if (runtime_nsecs_stats.n != 0) { total = avg_stats(&runtime_nsecs_stats); if (total) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
| Free embeddable forum powered by Nabble | Forum Help |