Badly performing WHERE caluse

View: New views
6 Messages — Rating Filter:   Alert me  

Badly performing WHERE caluse

by Gavin Matthews :: Rate this Message:

| View Threaded | Show Only this Message

Hi,

We have a WHERE clause:

WHERE
2>1 AND
2>1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

which reasonably takes 2ms to execute on a table that has 357254 rows, and does so by selecting the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_DESC at serializable isolation level using share row locking chosen by the optimizer

However, when we change the WHERE clause to:

WHERE
1=1 AND
1=1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

the execution time increases to 6000ms, and appears to do so because the query planner hasn't selected the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_ASC at serializable isolation level using share row locking chosen by the optimizer

The indexes are specified as follows:

CREATE INDEX tblEventsDateTimestampASC ON tblEvents (DateTimestamp ASC);
CREATE INDEX tblEventsDateTimestampDESC ON tblEvents (DateTimestamp DESC);

Do people agree that this appears to be a bug with the query planner, as the selection of the most appropriate index should not be influenced by static conditions such as 1=1 or 2>1?

Regards,

Gavin

Re: Badly performing WHERE caluse

by Bryan Pendleton :: Rate this Message:

| View Threaded | Show Only this Message

> Do people agree that this appears to be a bug with the query planner, as the
> selection of the most appropriate index should not be influenced by static
> conditions such as 1=1 or 2>1?

I don't know of any reason why the 1=1 versus 2>1 should have affected the
query plan chosen.

If you can provide a reproducible test case that demonstrates it, I'd be
very interested in pursuing this further.

thanks,

bryan



Re: Badly performing WHERE caluse

by Brett Wooldridge-2 :: Rate this Message:

| View Threaded | Show Only this Message

One could argue that in the first case (2>1), the optimizer is free to choose whatever index it wants because the query can provably return no rows.  However, in the second case (1=1), were the DateTimestamp condition must be evaluated, it would appear that the optimizer indeed selected the wrong index.

However, I don't think you can point to 2ms vs. 6000ms as proof of a bad query plan.  In the first case, the optimizer can prove (to itself) that the query can return no rows (because of the 2>1 condition along with all AND predicates) -- and therefore it probably doesn't do any index scan at all.  Therefore, 2ms.

In the second case, because the 1=1 condition is satisfied, Derby must *actually* do the index scan as per plan.  Therefore, 6000ms.  Whether using the DESC index actually provides significant improvement over ASC remains to be seen, as the comparison currently isn't apples-to-apples.

I would be interested in seeing the result of the DESC vs. the ASC index in the (1=1) scenario.  You can override the optimizer with a hint, like so:

SELECT * FROM table --DERBY-PROPERTIES index=TBLEVENTS_DATETIMESTAMP_DESC
WHERE
1=1 AND
1=1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

Note, the --DERBY-PROPERTIES must come at the end of a literal line (i.e. there must be a linefeed after that cause) otherwise everything after that will be taken as a comment.

Let us know the result.  Either way it may be considered a bug in the optimizer, but depending on the number of rows returned, may not make a significant difference in performance.

-Brett

On Wed, Sep 23, 2009 at 11:18 PM, Gavin Matthews <gjm@...> wrote:

Hi,

We have a WHERE clause:

WHERE
2>1 AND
2>1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

which reasonably takes 2ms to execute on a table that has 357254 rows, and
does so by selecting the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_DESC
at serializable isolation level using share row locking chosen by the
optimizer

However, when we change the WHERE clause to:

WHERE
1=1 AND
1=1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

the execution time increases to 6000ms, and appears to do so because the
query planner hasn't selected the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_ASC
at serializable isolation level using share row locking chosen by the
optimizer

The indexes are specified as follows:

CREATE INDEX tblEventsDateTimestampASC ON tblEvents (DateTimestamp ASC);
CREATE INDEX tblEventsDateTimestampDESC ON tblEvents (DateTimestamp DESC);

Do people agree that this appears to be a bug with the query planner, as the
selection of the most appropriate index should not be influenced by static
conditions such as 1=1 or 2>1?

Regards,

Gavin

--
View this message in context: http://www.nabble.com/Badly-performing-WHERE-caluse-tp25531166p25531166.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.



Re: Badly performing WHERE caluse

by Matt Doran-3 :: Rate this Message:

| View Threaded | Show Only this Message

Brett Wooldridge wrote:
> One could argue that in the first case (2>1), the optimizer is free to
> choose whatever index it wants because the query can provably return
> no rows.  However, in the second case (1=1), were the DateTimestamp
> condition must be evaluated, it would appear that the optimizer indeed
> selected the wrong index.
>
>
Ummm, last time I checked, 2 was greater than 1.  :)

Re: Badly performing WHERE caluse

by Brett Wooldridge-2 :: Rate this Message:

| View Threaded | Show Only this Message

That'll teach me to post too early in the morning (I'm in Tokyo) and before coffee.  :-)



On Thu, Sep 24, 2009 at 10:19 AM, Matt Doran <matt.doran@...> wrote:
Brett Wooldridge wrote:
One could argue that in the first case (2>1), the optimizer is free to choose whatever index it wants because the query can provably return no rows.  However, in the second case (1=1), were the DateTimestamp condition must be evaluated, it would appear that the optimizer indeed selected the wrong index.


Ummm, last time I checked, 2 was greater than 1.  :)


Re: Badly performing WHERE caluse

by Knut Anders Hatlen :: Rate this Message:

| View Threaded | Show Only this Message

Bryan Pendleton <bpendleton@...> writes:

>> Do people agree that this appears to be a bug with the query planner, as the
>> selection of the most appropriate index should not be influenced by static
>> conditions such as 1=1 or 2>1?
>
> I don't know of any reason why the 1=1 versus 2>1 should have affected the
> query plan chosen.

Sorry for the late response. I just came across some code in the engine
that made me remember this thread. It looks like Derby doesn't treat 1=1
and 2>1 as boolean constants during optimization, so it takes the
selectivity from the = operator and the > operator, respectively.

The = operator has selectivity 0.1, and > has selectivity 0.33. This
means that the optimizer thinks 1=1 will match 10% of the rows and
2>1 will match 33% of the rows. This is apparently enough to make the
optimizer choose another plan in this case. Of course, both of the
predicates will match 100% of the rows, so the the information that's
given to the optimizer is inaccurate in any case.

Logged here: https://issues.apache.org/jira/browse/DERBY-4416

--
Knut Anders