« Return to Thread: [rvm-research] Looking for Sources of Performance Variation

Re: [rvm-research] Looking for Sources of Performance Variation

by Rhodes Brown :: Rate this Message:

Reply to Author | View in Thread

Hello again all,

I wanted to complete my talk (to the IFIP Software Implementation
Technology Working Group) and get some feedback before following up on
Steve's reply to my original post. For those who are interested, I
have posted my slides with notes at:
http://webhome.cs.uvic.ca/~rhodesb/research/JikesRVM_Performance-Notes.pdf

As Steve noted, the issues I'm raising aren't necessarily specific to
Jikes. Cliff Click confirmed that he'd observed similar behaviors
working with the HotSpot VM. However, the issue of measuring
performance under adaptive optimization is clearly of particular
importance to Jikes researchers--especially those of us who's work
doesn't afford the luxury of being able to turn off the AOS.

Steve's point about normalizing data is well-taken. In my posted
slides, I have included a second axis that shows iteration times
normalized against the best overall time observed for each benchmark.
I've also tried to switch from "repetition" to "iteration" of a
benchmark, to be more in line with the common terminology. However,
I've stuck with "execution" over "invocation" of a VM, since my own
work deals with method invocations and I don't want to confuse the
two.

That said, if we are on the topic of presentation clarity, I'd like to
raise a couple of questions of my own.

First is the use of the geometric mean ("geomean") as an aggregate
measure of performance. I see this all over the place in papers
reporting on Jikes performance, but I have not been able to find a
single one that justifies the use of this mean. John makes a fairly
cogent argument that performance results, in particular speedup
results, should not be summarized with the geomean [1]. Depending on
one's emphasis, a weighted arithmetic or harmonic mean is more
appropriate. It would seem that the geomean is only (arguably)
appropriate in cases where the results exhibit a log-normal
distribution *and* are representative of real workloads--both
debatable points when it comes the commonly used Java benchmarks.

Second, and this is at the core of the point I was trying to make,
what is "bmtime"? Is this total running time for some number of
iterations? The time from a particular iteration, say the last? An
average (mean or median) of iterations within an execution? Does it
include JIT compilation, or is such a question even meaningful?

To be clear, let me try to re-state some of the points I was trying to
make earlier.

My primary intention was to debunk the myth of convergence. Some
benchmarks do, after executing a reasonable number of iterations,
approach a "typical" performance pattern with a CoV less than 0.02.
But some simply do not, regardless of how many executions or
iterations are run (antlr and hsqldb are examples). Some do converge
for some executions, but not others. Some stabilize, but not to the
same performance level. Moreover, many benchmarks actually begin to
de-stabilize when run longer with more time for adaptive optimization.
Thus, while the method suggested by Georges, et al does provide an
appropriate level of rigor, it will not always work and should not be
entirely relied upon. And certainly the rudimentary notions of
convergence built into DaCapo and SPECjvm98 should not be relied upon.

My second point was to emphasize that there is an important
distinction between measuring "typical" performance over a range of
iterations from an execution (as done by Georges, et al), and
measuring the best performance potential of a particular VM
configuration. The latter is appropriate when comparing modifications
that may affect several sequential iterations, as is the case for most
GC strategies. However, identifying the effectiveness of a compilation
strategy is clearly the former. In this case, we are interested in
identifying the maximum potential of the generated code while
discounting other factors. Of course, to be statistically valid, one
must find a mean-best result, not simple take the best overall value.

I would concur with a sentiment seen in several papers on performance
analysis, and echoed by Steve above:
"There is no simple prescription. You need to understand your system
and your hypothesis, and carefully design the experiments to suit."
Indeed, but I would go further: When publishing performance results,
one must choose an approach that is properly aligned with the subject
of one's study (eg. start-up, long-run GC, long-run adaptive
optimization, etc.) *and* present an argument for why the approach is
appropriate. This latter part seems absent from most papers on Jikes
performance that I have read.

As a final point, I think it is worth noting that we (as a community)
have made an inappropriate simplification in treating performance as a
"random" variable. While there are many complicated factors that can
influence performance from platform to platform, and run to run, the
results are still effectively determined by the VM and system
configuration. The true source of randomness is timing. Small and
unpredictable external pressures ultimately lead to executions that
unfold in a bounded, but chaotic fashion. In devising measurement
schemes, we ought to be conscious of this effect and aim to extract
results in a way that is, as much as possible, oblivious to timing
variations. Thus, I would reject methods that report results from a
specific iteration, or aggregates over a fixed interval of time or
iterations.

--
Rhodes H. F. Brown

Instructor & Ph.D. Candidate in Computer Science
University of Victoria - Victoria, BC, Canada
http://www.cs.uvic.ca

References:
[1] L. K. John. Performance Evaluation and Benchmarking, chapter 4:
Aggregating Performance Metrics Over a Benchmark Suite. CRC Press.
2005.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Jikesrvm-researchers mailing list
Jikesrvm-researchers@...
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers

 « Return to Thread: [rvm-research] Looking for Sources of Performance Variation