unit test speed for TestCustomScoreQuery

View: New views
8 Messages — Rating Filter:   Alert me  

unit test speed for TestCustomScoreQuery

by Erick Erickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OK, I'm actually getting of my duff and trying to do something. And I'm off today feeling ill, so I have a bit of time. So the rational place to start is to get all the code and run the unit tests, and I've even got them running in IntelliJ as well as the ant task. Will wonders never cease.

The unit tests in TestCustomScoreQuery take on the order of 80 seconds on my machine. Before I go off the deep end I wanted to run what I've found past y'all to see if it makes sense.

Virtually all the time is taken up in the method logResult on the call to QueryUtils.check(q, s). But the logResult method is called from verifyResults method in a loop like:

for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
      call logResult for 5 different queries.
}
But the queries don't change that I can see...

Why couldn't we just call QueryUtils.check once for each of the 5 queries outside the for loop? Doing so reduces the time it takes for the test from ~80 seconds to 9 seconds.

If there're no objections, I'll make a patch. Which folks will probably have to be patient with me since it'll be the first one I've produced for this project.....

While I'm at it, what are we thinking about using JUnit4? Looking *briefly* at the code, this actually seems like it'd be difficult. We'd have to deal with the LuceneTestCase superclass...

Best
Erick

Re: unit test speed for TestCustomScoreQuery

by markrmiller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

+1 - I made a similar observation a while back and started an issue to
address Junit test speeds:

https://issues.apache.org/jira/browse/LUCENE-1844

Erick Erickson wrote:

> OK, I'm actually getting of my duff and trying to do something. And
> I'm off today feeling ill, so I have a bit of time. So the rational
> place to start is to get all the code and run the unit tests, and I've
> even got them running in IntelliJ as well as the ant task. Will
> wonders never cease.
>
> The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> on my machine. Before I go off the deep end I wanted to run what I've
> found past y'all to see if it makes sense.
>
> Virtually all the time is taken up in the method logResult on the call
> to QueryUtils.check(q, s). But the logResult method is called from
> verifyResults method in a loop like:
>
> for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>       call logResult for 5 different queries.
> }
> But the queries don't change that I can see...
>
> Why couldn't we just call QueryUtils.check once for each of the 5
> queries outside the for loop? Doing so reduces the time it takes for
> the test from ~80 seconds to 9 seconds.
>
> If there're no objections, I'll make a patch. Which folks will
> probably have to be patient with me since it'll be the first one I've
> produced for this project.....
>
> While I'm at it, what are we thinking about using JUnit4? Looking
> *briefly* at the code, this actually seems like it'd be difficult.
> We'd have to deal with the LuceneTestCase superclass...
>
> Best
> Erick


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


Re: unit test speed for TestCustomScoreQuery

by Robert Muir :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

maybe we should link this issue to that as well: https://issues.apache.org/jira/browse/LUCENE-1786

except that one is also a problem because it downloads zip files from the internet, and if for some reason this fails, the test just passes...

one of these days maybe get off my duff and look at that one... (or if you want another one to look at, have at it!)

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller@...> wrote:
+1 - I made a similar observation a while back and started an issue to
address Junit test speeds:

https://issues.apache.org/jira/browse/LUCENE-1844

Erick Erickson wrote:
> OK, I'm actually getting of my duff and trying to do something. And
> I'm off today feeling ill, so I have a bit of time. So the rational
> place to start is to get all the code and run the unit tests, and I've
> even got them running in IntelliJ as well as the ant task. Will
> wonders never cease.
>
> The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> on my machine. Before I go off the deep end I wanted to run what I've
> found past y'all to see if it makes sense.
>
> Virtually all the time is taken up in the method logResult on the call
> to QueryUtils.check(q, s). But the logResult method is called from
> verifyResults method in a loop like:
>
> for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>       call logResult for 5 different queries.
> }
> But the queries don't change that I can see...
>
> Why couldn't we just call QueryUtils.check once for each of the 5
> queries outside the for loop? Doing so reduces the time it takes for
> the test from ~80 seconds to 9 seconds.
>
> If there're no objections, I'll make a patch. Which folks will
> probably have to be patient with me since it'll be the first one I've
> produced for this project.....
>
> While I'm at it, what are we thinking about using JUnit4? Looking
> *briefly* at the code, this actually seems like it'd be difficult.
> We'd have to deal with the LuceneTestCase superclass...
>
> Best
> Erick


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...




--
Robert Muir
rcmuir@...

Re: unit test speed for TestCustomScoreQuery

by Erick Erickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yep, I remember reading something about that, which is what
prodded me to take a look after I ran the tests and saw how
long they took....

several more questions:
1> Why call QueryUtils.check in the first place? I admit when I
     looked that method over, my eyes glazed. I try to take things
     in bite-sized chunks and that one stretched my jaw. Can anyone
    give me a two-sentence statement of what is that method trying
    to do and when it's desirable/necessary to call it? Because it looks
    *very* expensive. If we just don't call it at all, we're down to < 1 sec.
    I can imagine someone thinking "check, that sounds like a good thing
    to call" without appreciating the expense. I wonder if this is the
    *first* thing to check when looking at slow tests....
2> About reformatting. When I'm working, I'll often just check in a
     reformatted file w/o any code changes to make comparisons
     easier. I grabbed the codestyle.xml file from the SOLR site
     but using that reformatted a bunch of stuff (in IntelliJ) that
     confuse the code changes (to the TestCustomScoreQuery
     file only). Just check it all in anyway? (really, submit a patch)?
     Or do two separate patches? Or back out and not reformat
     that file?
3> I'm assuming that all unit tests are also Java5, so I added
     some generics notations. Is this acceptable?

Thanks
Erick

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller@...> wrote:
+1 - I made a similar observation a while back and started an issue to
address Junit test speeds:

https://issues.apache.org/jira/browse/LUCENE-1844

Erick Erickson wrote:
> OK, I'm actually getting of my duff and trying to do something. And
> I'm off today feeling ill, so I have a bit of time. So the rational
> place to start is to get all the code and run the unit tests, and I've
> even got them running in IntelliJ as well as the ant task. Will
> wonders never cease.
>
> The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> on my machine. Before I go off the deep end I wanted to run what I've
> found past y'all to see if it makes sense.
>
> Virtually all the time is taken up in the method logResult on the call
> to QueryUtils.check(q, s). But the logResult method is called from
> verifyResults method in a loop like:
>
> for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>       call logResult for 5 different queries.
> }
> But the queries don't change that I can see...
>
> Why couldn't we just call QueryUtils.check once for each of the 5
> queries outside the for loop? Doing so reduces the time it takes for
> the test from ~80 seconds to 9 seconds.
>
> If there're no objections, I'll make a patch. Which folks will
> probably have to be patient with me since it'll be the first one I've
> produced for this project.....
>
> While I'm at it, what are we thinking about using JUnit4? Looking
> *briefly* at the code, this actually seems like it'd be difficult.
> We'd have to deal with the LuceneTestCase superclass...
>
> Best
> Erick


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...



Re: unit test speed for TestCustomScoreQuery

by Erick Erickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I *finally* looked at the Jira and lo! you specifically identified the
check method.... Someday I'll learn to look first, *then* explore.
Siiigghhh.

Anyway, looking at things a bit more, I'm more confident that
TestCustomScoreQuery wouldn't lose anything by moving the
query check outside of the loop. The queries don't change and
we're not re-querying....

TestBooleanShouldMatch is trickier. It produces 1,000 queries
that have widely varying forms. I don't see a clean way to test
one of each "form". We could decide that, say, the first 50
were enough. Or we shouldn't be running check on randomly-
generated queries. Or if we find a query that's actually a bug
we should test for it explicitly. Or... But I'm not comfortable
changing the check in the random test without some
consensus.

I'll give folks a chance to chime in between now and Monday
and generate a patch for one or both depending on the response.

Erick

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller@...> wrote:
+1 - I made a similar observation a while back and started an issue to
address Junit test speeds:

https://issues.apache.org/jira/browse/LUCENE-1844

Erick Erickson wrote:
> OK, I'm actually getting of my duff and trying to do something. And
> I'm off today feeling ill, so I have a bit of time. So the rational
> place to start is to get all the code and run the unit tests, and I've
> even got them running in IntelliJ as well as the ant task. Will
> wonders never cease.
>
> The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> on my machine. Before I go off the deep end I wanted to run what I've
> found past y'all to see if it makes sense.
>
> Virtually all the time is taken up in the method logResult on the call
> to QueryUtils.check(q, s). But the logResult method is called from
> verifyResults method in a loop like:
>
> for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>       call logResult for 5 different queries.
> }
> But the queries don't change that I can see...
>
> Why couldn't we just call QueryUtils.check once for each of the 5
> queries outside the for loop? Doing so reduces the time it takes for
> the test from ~80 seconds to 9 seconds.
>
> If there're no objections, I'll make a patch. Which folks will
> probably have to be patient with me since it'll be the first one I've
> produced for this project.....
>
> While I'm at it, what are we thinking about using JUnit4? Looking
> *briefly* at the code, this actually seems like it'd be difficult.
> We'd have to deal with the LuceneTestCase superclass...
>
> Best
> Erick


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...



Re: unit test speed for TestCustomScoreQuery

by markrmiller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Erick Erickson wrote:
> I *finally* looked at the Jira and lo! you specifically identified the
> check method.... Someday I'll learn to look first, *then* explore.
> Siiigghhh.

No worries ;) So many half baked JIRA issues out there, it happens to
everyone - no way around it.
This way worked just as well I think - and sometimes coming at it fresh
gives a different insight anyway.

>
> Anyway, looking at things a bit more, I'm more confident that
> TestCustomScoreQuery wouldn't lose anything by moving the
> query check outside of the loop. The queries don't change and
> we're not re-querying....
>
> TestBooleanShouldMatch is trickier. It produces 1,000 queries
> that have widely varying forms. I don't see a clean way to test
> one of each "form". We could decide that, say, the first 50
> were enough. Or we shouldn't be running check on randomly-
> generated queries. Or if we find a query that's actually a bug
> we should test for it explicitly. Or... But I'm not comfortable
> changing the check in the random test without some
> consensus.
>
> I'll give folks a chance to chime in between now and Monday
> and generate a patch for one or both depending on the response.
>
> Erick
>
> On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller@...
> <mailto:markrmiller@...>> wrote:
>
>     +1 - I made a similar observation a while back and started an issue to
>     address Junit test speeds:
>
>     https://issues.apache.org/jira/browse/LUCENE-1844
>
>     Erick Erickson wrote:
>     > OK, I'm actually getting of my duff and trying to do something. And
>     > I'm off today feeling ill, so I have a bit of time. So the rational
>     > place to start is to get all the code and run the unit tests,
>     and I've
>     > even got them running in IntelliJ as well as the ant task. Will
>     > wonders never cease.
>     >
>     > The unit tests in TestCustomScoreQuery take on the order of 80
>     seconds
>     > on my machine. Before I go off the deep end I wanted to run what
>     I've
>     > found past y'all to see if it makes sense.
>     >
>     > Virtually all the time is taken up in the method logResult on
>     the call
>     > to QueryUtils.check(q, s). But the logResult method is called from
>     > verifyResults method in a loop like:
>     >
>     > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>     >       call logResult for 5 different queries.
>     > }
>     > But the queries don't change that I can see...
>     >
>     > Why couldn't we just call QueryUtils.check once for each of the 5
>     > queries outside the for loop? Doing so reduces the time it takes for
>     > the test from ~80 seconds to 9 seconds.
>     >
>     > If there're no objections, I'll make a patch. Which folks will
>     > probably have to be patient with me since it'll be the first one
>     I've
>     > produced for this project.....
>     >
>     > While I'm at it, what are we thinking about using JUnit4? Looking
>     > *briefly* at the code, this actually seems like it'd be difficult.
>     > We'd have to deal with the LuceneTestCase superclass...
>     >
>     > Best
>     > Erick
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@...
>     <mailto:java-dev-unsubscribe@...>
>     For additional commands, e-mail: java-dev-help@...
>     <mailto:java-dev-help@...>
>
>


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


Re: unit test speed for TestCustomScoreQuery

by Erick Erickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'll be attaching a new patch to JIRA-1844 after all the tests run, a bit later this
evening. The two files I changed move/remove calls to CheckUtils.check(query);

I arbitrarily limited the random query test in TestBooleanShouldMatch
to only checking the first 100, on the theory that we were at diminishing
returns after 100, whoever commits this feel free to disagree.

But a couple of process questions.

1> I almost reflexively reformat files I'm working on. Which makes diffing
     them really a pain. But the "how to contribute" page can be read as this
     is OK as long as one is working on the file. What's preferred? Separate
     checkins for reformatting-only? And how is that accomplished
2> I cleaned up the warnings from IntelliJs inspections, I assume that's reasonable.
3> Did some upgrading-to-java-5 modifications....

Let me know what I should do differently next time, or what needs to be changed.

Thanks
Erick

On Thu, Oct 29, 2009 at 8:05 PM, Mark Miller <markrmiller@...> wrote:
Erick Erickson wrote:
> I *finally* looked at the Jira and lo! you specifically identified the
> check method.... Someday I'll learn to look first, *then* explore.
> Siiigghhh.

No worries ;) So many half baked JIRA issues out there, it happens to
everyone - no way around it.
This way worked just as well I think - and sometimes coming at it fresh
gives a different insight anyway.
>
> Anyway, looking at things a bit more, I'm more confident that
> TestCustomScoreQuery wouldn't lose anything by moving the
> query check outside of the loop. The queries don't change and
> we're not re-querying....
>
> TestBooleanShouldMatch is trickier. It produces 1,000 queries
> that have widely varying forms. I don't see a clean way to test
> one of each "form". We could decide that, say, the first 50
> were enough. Or we shouldn't be running check on randomly-
> generated queries. Or if we find a query that's actually a bug
> we should test for it explicitly. Or... But I'm not comfortable
> changing the check in the random test without some
> consensus.
>
> I'll give folks a chance to chime in between now and Monday
> and generate a patch for one or both depending on the response.
>
> Erick
>
> On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller@...
> <mailto:markrmiller@...>> wrote:
>
>     +1 - I made a similar observation a while back and started an issue to
>     address Junit test speeds:
>
>     https://issues.apache.org/jira/browse/LUCENE-1844
>
>     Erick Erickson wrote:
>     > OK, I'm actually getting of my duff and trying to do something. And
>     > I'm off today feeling ill, so I have a bit of time. So the rational
>     > place to start is to get all the code and run the unit tests,
>     and I've
>     > even got them running in IntelliJ as well as the ant task. Will
>     > wonders never cease.
>     >
>     > The unit tests in TestCustomScoreQuery take on the order of 80
>     seconds
>     > on my machine. Before I go off the deep end I wanted to run what
>     I've
>     > found past y'all to see if it makes sense.
>     >
>     > Virtually all the time is taken up in the method logResult on
>     the call
>     > to QueryUtils.check(q, s). But the logResult method is called from
>     > verifyResults method in a loop like:
>     >
>     > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
>     >       call logResult for 5 different queries.
>     > }
>     > But the queries don't change that I can see...
>     >
>     > Why couldn't we just call QueryUtils.check once for each of the 5
>     > queries outside the for loop? Doing so reduces the time it takes for
>     > the test from ~80 seconds to 9 seconds.
>     >
>     > If there're no objections, I'll make a patch. Which folks will
>     > probably have to be patient with me since it'll be the first one
>     I've
>     > produced for this project.....
>     >
>     > While I'm at it, what are we thinking about using JUnit4? Looking
>     > *briefly* at the code, this actually seems like it'd be difficult.
>     > We'd have to deal with the LuceneTestCase superclass...
>     >
>     > Best
>     > Erick
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@...
>     <mailto:java-dev-unsubscribe@...>
>     For additional commands, e-mail: java-dev-help@...
>     <mailto:java-dev-help@...>
>
>


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...



Re: unit test speed for TestCustomScoreQuery

by Michael McCandless-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 6:01 PM, Erick Erickson <erickerickson@...> wrote:

> But a couple of process questions.
>
> 1> I almost reflexively reformat files I'm working on. Which makes
>    diffing them really a pain. But the "how to contribute" page
>    can be read as this is OK as long as one is working on the
>    file. What's preferred? Separate checkins for
>    reformatting-only? And how is that accomplished
> 2> I cleaned up the warnings from IntelliJs inspections, I assume
>    that's reasonable.
> 3> Did some upgrading-to-java-5 modifications....

Good questions!

I think you should in fact clean-as-you-go, and include it in the
patch, as you've done....

I've actually flip-flopped on this: I used to feel we should try hard
to NOT do formatting, cleanups, etc., while making a "real" change,
but in practice I think that's too severe and it'll mean
far less code cleaning (which all source code sorely needs) will
happen.  So now I think clean-as-you-go is preferred.

One useful command (on unix's) is something like this:

  svn diff --diff-cmd diff -x -w

It generates a diff that ignores whitespace differences.  It's not
something you can feed back to patch, but, it's great for seeing
mostly "real" diffs when you have whitespace-only formatting
differences.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...