time stamps of talos performance results & finding regressions

View: New views
3 Messages — Rating Filter:   Alert me  

time stamps of talos performance results & finding regressions

by alice nodelman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Since the beginning of the talos project results gathered during a test
run have been collected and sent off to the graph server (be it
graphs.mozilla.org or graphs-stage.mozilla.org) and stored in a
databased keyed on a time stamp.  When this system was first put in
place it was mostly viewed in isolation without much investigation as to
how much data we were going to be collecting and how is was going to be
used.  Merely by default I ended up using talos testrun time as the time
stamp for sets of results as I could guarantee that it would be unique
and always advancing.

As it turns out, time stamping with testrun time is less than ideal.
Tests are only run once a build is completed and a talos machine becomes
free for testing, meaning that testrun time ends up being 2-3 hours (or
more) after build start time. Determining what build is being used at a
given timestamp on graph server is non-trivial. You have to backtrack
from testrun time to build start time to bonsai checkins. Adding a
little wiggle room on each of these makes the regression range for a
result end up being 2-3 hours. This means that there is a lot of
check-ins that have to be investigated, and possibly backed-out, simply
because it is so hard to correlate tinderbox waterfall information with
graph server information. To further complicate matters, there's a bunch
of infrastructure workarounds in place trying to make the tinderbox
waterfall page display build results and talos results in a way that
lines up, even though its not strictly accurate.

Bug#419487 (change buildbot & talos to use buildtime, not testrun time)
changes talos, so that the time stamp used for talos results for a given
build would no longer be the testrun time, but now be the build start
time as reported by the waterfall for that build.  If you see a
regression you could look at the specific build that caused it - instead
of doing the mental gymnastics and adding hours to the regression range
before and after the reported talos result.

So, what's the downside?  This would be going forward only.  For
existing data we do not have anything in place to re-time stamp results
that are already in a graph server database (and there is some question
that it may not be possible to do so at all). For now, the plan is that
we would end up drawing a line in the sand and say "For historic
results, we need to continue requiring large regression ranges, but from
now onwards, results can be pinpointed".

Overall, I think this is a great improvement on our infrastructure.  It
removes some daily complexity in how we debug regressions and determine
what patches need to be backed out. It also brings our build and talos
systems into sync with each other.

I want to get as much feedback as I can on this by people who frequently
use talos results to find regressions.  Please respond to Bug#419487 so
that all the discussion ends up in the same place.

Thanks,
alice.
_______________________________________________
dev-performance mailing list
dev-performance@...
https://lists.mozilla.org/listinfo/dev-performance

Re: time stamps of talos performance results & finding regressions

by Mike Shaver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, May 14, 2008 at 8:48 PM, alice nodelman <anodelman@...> wrote:
> Overall, I think this is a great improvement on our infrastructure.  It
> removes some daily complexity in how we debug regressions and determine
> what patches need to be backed out. It also brings our build and talos
> systems into sync with each other.

I agree, and think that the gains are very much worth the discontinuity.

Excellent!

Mike
_______________________________________________
dev-performance mailing list
dev-performance@...
https://lists.mozilla.org/listinfo/dev-performance

Parent Message unknown Re: time stamps of talos performance results & finding regressions

by Jonas Sicking-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mike Shaver wrote:
> On Wed, May 14, 2008 at 8:48 PM, alice nodelman <anodelman@...> wrote:
>> Overall, I think this is a great improvement on our infrastructure.  It
>> removes some daily complexity in how we debug regressions and determine
>> what patches need to be backed out. It also brings our build and talos
>> systems into sync with each other.
>
> I agree, and think that the gains are very much worth the discontinuity.

Same here. Just documenting this on the talos wiki pages would go a long
way in reducing confusion I think.

/ Jonas
_______________________________________________
dev-performance mailing list
dev-performance@...
https://lists.mozilla.org/listinfo/dev-performance