SELECT query takes 5 secs, what can I do?

View: New views
18 Messages — Rating Filter:   Alert me  

SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I have a query that used to take 10secs to run, i.e.

select * from Message m where
m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
and m.id != 933927 and m.archived=1

The Message table has around one million rows.

I added the following index

CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)

and now it takes 5secs.

Is there anything else I can do?

Should I add an index on the boolean "archived" column too?

Any performance hints appreciated.

Thanks
Andrew

Re: SELECT query takes 5 secs, what can I do?

by Brett Wooldridge-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Consider making a separate archive table and live table.  Looking at  
that query, I doubt any other index will help.

Sent from my iPhone

On Sep 15, 2009, at 17:07, Andrew Bruno <andrew.bruno@...> wrote:

> Hello,
>
> I have a query that used to take 10secs to run, i.e.
>
> select * from Message m where
> m.m
> essageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> and m.id != 933927 and m.archived=1
>
> The Message table has around one million rows.
>
> I added the following index
>
> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>
> and now it takes 5secs.
>
> Is there anything else I can do?
>
> Should I add an index on the boolean "archived" column too?
>
> Any performance hints appreciated.
>
> Thanks
> Andrew

RE: SELECT query takes 5 secs, what can I do?

by Derby-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry to top post, just rushing out the door.

Adding a second index most likely will not help.
If JavaDB/Derby/Cloudscape is anything like most other RDBMSes, you will
only filter your selection set on one index per table. So since you're
selecting from one table, you can only use one index.

(I'm sure someone from IBM or Sun will correct me if I'm wrong. ;-)

I would suggest if you know that you usually use a specific field as a
second filter clause that you consider creating a compound index. (Index
that contains multiple columns for uniqueness.)

In your select statement below, I'm guessing that archived is a binary
field, but I don't know how unique the id is.

I would suggest that you create a single index based on messageId and then
id. However... I don't know how, if any performance will be gained.

The issue is that your messageId appears to be unique so that by itself you
should return a single row. (If this is true, then why do you have the
second and third filter?)

HTH

-Mikey


> -----Original Message-----
> From: Brett Wooldridge [mailto:brett.wooldridge@...]
> Sent: Tuesday, September 15, 2009 4:37 AM
> To: Derby Discussion
> Subject: Re: SELECT query takes 5 secs, what can I do?
>
> Consider making a separate archive table and live table.  Looking at
> that query, I doubt any other index will help.
>
> Sent from my iPhone
>
> On Sep 15, 2009, at 17:07, Andrew Bruno <andrew.bruno@...> wrote:
>
> > Hello,
> >
> > I have a query that used to take 10secs to run, i.e.
> >
> > select * from Message m where
> > m.m
> > essageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> > and m.id != 933927 and m.archived=1
> >
> > The Message table has around one million rows.
> >
> > I added the following index
> >
> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
> >
> > and now it takes 5secs.
> >
> > Is there anything else I can do?
> >
> > Should I add an index on the boolean "archived" column too?
> >
> > Any performance hints appreciated.
> >
> > Thanks
> > Andrew


RE: SELECT query takes 5 secs, what can I do?

by Michael Segel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is also true.

Can you 'detatch' your index? By this I mean store the index on a different
disk?

Also what type/size machine are you running your query on? How much memory?
What else is in the table? Are the rows fat?

HTH to point you down a possible decision path.

-Mikey

> -----Original Message-----
> From: Brett Wooldridge [mailto:brett.wooldridge@...]
> Sent: Tuesday, September 15, 2009 4:37 AM
> To: Derby Discussion
> Subject: Re: SELECT query takes 5 secs, what can I do?
>
> Consider making a separate archive table and live table.  Looking at
> that query, I doubt any other index will help.
>
> Sent from my iPhone
>
> On Sep 15, 2009, at 17:07, Andrew Bruno <andrew.bruno@...> wrote:
>
> > Hello,
> >
> > I have a query that used to take 10secs to run, i.e.
> >
> > select * from Message m where
> > m.m
> > essageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> > and m.id != 933927 and m.archived=1
> >
> > The Message table has around one million rows.
> >
> > I added the following index
> >
> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
> >
> > and now it takes 5secs.
> >
> > Is there anything else I can do?
> >
> > Should I add an index on the boolean "archived" column too?
> >
> > Any performance hints appreciated.
> >
> > Thanks
> > Andrew


Re: SELECT query takes 5 secs, what can I do?

by Rick Hillegas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Andrew,

You might try adding more columns to your index so that it covers the
whole WHERE clause:

CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
archived, id)

Hope this helps,
-Rick



Andrew Bruno wrote:

> Hello,
>
> I have a query that used to take 10secs to run, i.e.
>
> select * from Message m where
> m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> and m.id != 933927 and m.archived=1
>
> The Message table has around one million rows.
>
> I added the following index
>
> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>
> and now it takes 5secs.
>
> Is there anything else I can do?
>
> Should I add an index on the boolean "archived" column too?
>
> Any performance hints appreciated.
>
> Thanks
> Andrew
>  


Re: SELECT query takes 5 secs, what can I do?

by dev@xx :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>Can you 'detatch' your index? By this I mean store the index on a different
>disk?

Is this possible with Derby ? I did look for this feature but didn't find a
way to do it ...

I found that log could be set on different device but not index...

JY Linet

----- Original Message -----
From: "Michael Segel" <msegel@...>
To: "'Derby Discussion'" <derby-user@...>
Sent: Tuesday, September 15, 2009 2:21 PM
Subject: RE: SELECT query takes 5 secs, what can I do?


> This is also true.
>
> Can you 'detatch' your index? By this I mean store the index on a
> different
> disk?
>
> Also what type/size machine are you running your query on? How much
> memory?
> What else is in the table? Are the rows fat?
>
> HTH to point you down a possible decision path.
>
> -Mikey
>
>> -----Original Message-----
>> From: Brett Wooldridge [mailto:brett.wooldridge@...]
>> Sent: Tuesday, September 15, 2009 4:37 AM
>> To: Derby Discussion
>> Subject: Re: SELECT query takes 5 secs, what can I do?
>>
>> Consider making a separate archive table and live table.  Looking at
>> that query, I doubt any other index will help.
>>
>> Sent from my iPhone
>>
>> On Sep 15, 2009, at 17:07, Andrew Bruno <andrew.bruno@...> wrote:
>>
>> > Hello,
>> >
>> > I have a query that used to take 10secs to run, i.e.
>> >
>> > select * from Message m where
>> > m.m
>> > essageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
>> > and m.id != 933927 and m.archived=1
>> >
>> > The Message table has around one million rows.
>> >
>> > I added the following index
>> >
>> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>> >
>> > and now it takes 5secs.
>> >
>> > Is there anything else I can do?
>> >
>> > Should I add an index on the boolean "archived" column too?
>> >
>> > Any performance hints appreciated.
>> >
>> > Thanks
>> > Andrew
>
>


RE: SELECT query takes 5 secs, what can I do?

by Mike Baranski-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

1 comment inline.

>-----Original Message-----
>From: Michael Segel [mailto:msegel@...] On Behalf Of
>derby@...
>Sent: Tuesday, September 15, 2009 8:19 AM
>To: 'Derby Discussion'
>Subject: RE: SELECT query takes 5 secs, what can I do?
>
>Sorry to top post, just rushing out the door.
>
>Adding a second index most likely will not help.
>If JavaDB/Derby/Cloudscape is anything like most other RDBMSes, you will
>only filter your selection set on one index per table. So since you're
>selecting from one table, you can only use one index.
>
>(I'm sure someone from IBM or Sun will correct me if I'm wrong. ;-)
>
>I would suggest if you know that you usually use a specific field as a
>second filter clause that you consider creating a compound index. (Index
>that contains multiple columns for uniqueness.)
>
>In your select statement below, I'm guessing that archived is a binary
>field, but I don't know how unique the id is.
>
>I would suggest that you create a single index based on messageId and
>then
>id. However... I don't know how, if any performance will be gained.

Also, you can try to create a compound index on *only* the columns that you
are selecting (instead of selecting *).  Then, the DB should not have to
open the table, it can just find all the information in the index.  I don't
know if Derby has this optimization, but it might be worth checking.

>The issue is that your messageId appears to be unique so that by itself
>you
>should return a single row. (If this is true, then why do you have the
>second and third filter?)
>
>HTH
>
>-Mikey
>
>
>> -----Original Message-----
>> From: Brett Wooldridge [mailto:brett.wooldridge@...]
>> Sent: Tuesday, September 15, 2009 4:37 AM
>> To: Derby Discussion
>> Subject: Re: SELECT query takes 5 secs, what can I do?
>>
>> Consider making a separate archive table and live table.  Looking at
>> that query, I doubt any other index will help.
>>
>> Sent from my iPhone
>>
>> On Sep 15, 2009, at 17:07, Andrew Bruno <andrew.bruno@...>
>wrote:
>>
>> > Hello,
>> >
>> > I have a query that used to take 10secs to run, i.e.
>> >
>> > select * from Message m where
>> > m.m
>> > essageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
>> > and m.id != 933927 and m.archived=1
>> >
>> > The Message table has around one million rows.
>> >
>> > I added the following index
>> >
>> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message
>(messageId)
>> >
>> > and now it takes 5secs.
>> >
>> > Is there anything else I can do?
>> >
>> > Should I add an index on the boolean "archived" column too?
>> >
>> > Any performance hints appreciated.
>> >
>> > Thanks
>> > Andrew


Parent Message unknown RE: SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you all for the responses.  

The db runs as part of an applicatioon that sits on  a windows box, and we have many installed version around the world, so we don't control hardware,etc..  It only has a few db connections, and at the momement this particular instancee is looping through all the messages, and seeing whether they have been archived in the  cloud yet... And as you can imagine, at 5secs a message its taking  a long time.

The message table has a bunch of other columns, and since I am using hibernate I need all columns... But I wonder if it would be quicker to select just the id column, and then load/select all columns using where id=.. Sounds crazy.. Kinda like 1+N selects...But any thoughts?

As for the compund index, I thought it was not possible to add a pk index I.e. Id as part of a compound, but either way I will try.

Also, if I do a count(messageid) instead, are there any other optimization tricks?

Keep all thoughts coming, crazy or sound :)

Appreciated
Andrew



-----Original Message-----
From: Rick Hillegas <Richard.Hillegas@...>
Sent: Tuesday, 15 September 2009 10:32 PM
To: Derby Discussion <derby-user@...>
Subject: Re: SELECT query takes 5 secs, what can I do?

Hi Andrew,

You might try adding more columns to your index so that it covers the
whole WHERE clause:

CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
archived, id)

Hope this helps,
-Rick



Andrew Bruno wrote:

> Hello,
>
> I have a query that used to take 10secs to run, i.e.
>
> select * from Message m where
> m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> and m.id != 933927 and m.archived=1
>
> The Message table has around one million rows.
>
> I added the following index
>
> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>
> and now it takes 5secs.
>
> Is there anything else I can do?
>
> Should I add an index on the boolean "archived" column too?
>
> Any performance hints appreciated.
>
> Thanks
> Andrew
>  



Re: SELECT query takes 5 secs, what can I do?

by Brett Wooldridge-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Because the query specified an id != ... (NOT equal), your specificity will be extremely low (almost every row meets that criteria), therefore adding id to the index will do little but increase the overhead of the index.  Similarly, it seems the archived flag would offer little in the way of narrowing the result set -- unless the number of archived items is extremely small compared to the total row count.

But you said something interesting, you said it is "looping through all the messages".  That sounds like more of an algorithmic issue than a DB one.  For example, rather than looping and firing off 10000 select statements at 5 seconds each -- processing each message -- is there a way to devise 1 select (or some similary small number) that returns 10000 rows that you can cursor through and process?

Anytime I see a pattern like that -- looping over some X programatically and running a query for each -- it is a red flag.  In the past I've converted similar onesy-twosy patterns to bulk operations with generally one or two orders of magnitude improvement.  I worked on a project where a "clean-up" job was deleting data one row at a time based on some algorithm, but doing it for thousands of rows.  It was taking hours.  After some thinking and creative querying (and sub-querying) we were able to generate a bulk delete that took less than two minutes.  It involved creating a temporary table (this was mysql) with ids to be deleted (populated by a few choice queries) and then doing a bulk delete based on a join with that table.

Anyway, the point is, rather than trying to optimize a query that is run thousands of times, try to optimize the algorithm so that it doesn't need to do that.

Just my thoughts.

-Brett


On Tue, Sep 15, 2009 at 10:02 PM, Andrew Bruno <andrew.bruno@...> wrote:
Thank you all for the responses.

The db runs as part of an applicatioon that sits on  a windows box, and we have many installed version around the world, so we don't control hardware,etc..  It only has a few db connections, and at the momement this particular instancee is looping through all the messages, and seeing whether they have been archived in the  cloud yet... And as you can imagine, at 5secs a message its taking  a long time.

The message table has a bunch of other columns, and since I am using hibernate I need all columns... But I wonder if it would be quicker to select just the id column, and then load/select all columns using where id=.. Sounds crazy.. Kinda like 1+N selects...But any thoughts?

As for the compund index, I thought it was not possible to add a pk index I.e. Id as part of a compound, but either way I will try.

Also, if I do a count(messageid) instead, are there any other optimization tricks?

Keep all thoughts coming, crazy or sound :)

Appreciated
Andrew



-----Original Message-----
From: Rick Hillegas <Richard.Hillegas@...>
Sent: Tuesday, 15 September 2009 10:32 PM
To: Derby Discussion <derby-user@...>
Subject: Re: SELECT query takes 5 secs, what can I do?

Hi Andrew,

You might try adding more columns to your index so that it covers the
whole WHERE clause:

CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
archived, id)

Hope this helps,
-Rick



Andrew Bruno wrote:
> Hello,
>
> I have a query that used to take 10secs to run, i.e.
>
> select * from Message m where
> m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> and m.id != 933927 and m.archived=1
>
> The Message table has around one million rows.
>
> I added the following index
>
> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>
> and now it takes 5secs.
>
> Is there anything else I can do?
>
> Should I add an index on the boolean "archived" column too?
>
> Any performance hints appreciated.
>
> Thanks
> Andrew
>




Re: SELECT query takes 5 secs, what can I do?

by Brett Wooldridge-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

By the way, it was Hibernate that led the original developer to be looping over a large collection and executing a delete of each object.  Hibernate can be very handy, but even Gavin and the other Hibernate guys will tell you it is not designed for (and is not a substitute for) bulk operations.  We had to "go around" Hibernate in that case and several others to execute true bulk operations against the database.  Use Hibernate where it is useful, but go to the bare metal when you need performance.

Brett

On Wed, Sep 16, 2009 at 12:16 AM, Brett Wooldridge <brett.wooldridge@...> wrote:
Because the query specified an id != ... (NOT equal), your specificity will be extremely low (almost every row meets that criteria), therefore adding id to the index will do little but increase the overhead of the index.  Similarly, it seems the archived flag would offer little in the way of narrowing the result set -- unless the number of archived items is extremely small compared to the total row count.

But you said something interesting, you said it is "looping through all the messages".  That sounds like more of an algorithmic issue than a DB one.  For example, rather than looping and firing off 10000 select statements at 5 seconds each -- processing each message -- is there a way to devise 1 select (or some similary small number) that returns 10000 rows that you can cursor through and process?

Anytime I see a pattern like that -- looping over some X programatically and running a query for each -- it is a red flag.  In the past I've converted similar onesy-twosy patterns to bulk operations with generally one or two orders of magnitude improvement.  I worked on a project where a "clean-up" job was deleting data one row at a time based on some algorithm, but doing it for thousands of rows.  It was taking hours.  After some thinking and creative querying (and sub-querying) we were able to generate a bulk delete that took less than two minutes.  It involved creating a temporary table (this was mysql) with ids to be deleted (populated by a few choice queries) and then doing a bulk delete based on a join with that table.

Anyway, the point is, rather than trying to optimize a query that is run thousands of times, try to optimize the algorithm so that it doesn't need to do that.

Just my thoughts.

-Brett



On Tue, Sep 15, 2009 at 10:02 PM, Andrew Bruno <andrew.bruno@...> wrote:
Thank you all for the responses.

The db runs as part of an applicatioon that sits on  a windows box, and we have many installed version around the world, so we don't control hardware,etc..  It only has a few db connections, and at the momement this particular instancee is looping through all the messages, and seeing whether they have been archived in the  cloud yet... And as you can imagine, at 5secs a message its taking  a long time.

The message table has a bunch of other columns, and since I am using hibernate I need all columns... But I wonder if it would be quicker to select just the id column, and then load/select all columns using where id=.. Sounds crazy.. Kinda like 1+N selects...But any thoughts?

As for the compund index, I thought it was not possible to add a pk index I.e. Id as part of a compound, but either way I will try.

Also, if I do a count(messageid) instead, are there any other optimization tricks?

Keep all thoughts coming, crazy or sound :)

Appreciated
Andrew



-----Original Message-----
From: Rick Hillegas <Richard.Hillegas@...>
Sent: Tuesday, 15 September 2009 10:32 PM
To: Derby Discussion <derby-user@...>
Subject: Re: SELECT query takes 5 secs, what can I do?

Hi Andrew,

You might try adding more columns to your index so that it covers the
whole WHERE clause:

CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
archived, id)

Hope this helps,
-Rick



Andrew Bruno wrote:
> Hello,
>
> I have a query that used to take 10secs to run, i.e.
>
> select * from Message m where
> m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
> and m.id != 933927 and m.archived=1
>
> The Message table has around one million rows.
>
> I added the following index
>
> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>
> and now it takes 5secs.
>
> Is there anything else I can do?
>
> Should I add an index on the boolean "archived" column too?
>
> Any performance hints appreciated.
>
> Thanks
> Andrew
>





Re: SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brett and others, great feedback all. thanks

The id != ... (NOT equal) was an awaking, didnt think about that.

The looping sounds weird, I agree.. but let me explain further.

The applications goes through many stages, and too much to explain
here in detail.  As its archiving email from an email server, the
first thing it does is index all messages for a user, this is simply
an INSERT.

When it decides to transfer the message, it checks whether that
message has already been transferred for another user.  If it has, it
does not send the message again, it sends a re-index job (our own
technology) to just index the message against this other user.  This
is to save storage and bandwidth.  In an organization, with many
users, if an email is sent out, that same email is in many people
mailboxes.  We only want one copy of it archived.

So, what I am thinking of doing is to simply query the table on only
messageId, and then filter the results myself.

i.e.

select * from Message m where
m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'

All I care about is getting the first message that does not have the
same Id, and that it has already been archived.  The rest are ignored.

i.e. and m.id != 933927 and m.archived=1

Once I find it in loop, this is the one I reference for the re-indexing.

What do you think?


On Wed, Sep 16, 2009 at 1:32 AM, Brett Wooldridge
<brett.wooldridge@...> wrote:

> By the way, it was Hibernate that led the original developer to be looping
> over a large collection and executing a delete of each object.  Hibernate
> can be very handy, but even Gavin and the other Hibernate guys will tell you
> it is not designed for (and is not a substitute for) bulk operations.  We
> had to "go around" Hibernate in that case and several others to execute true
> bulk operations against the database.  Use Hibernate where it is useful, but
> go to the bare metal when you need performance.
>
> Brett
>
> On Wed, Sep 16, 2009 at 12:16 AM, Brett Wooldridge
> <brett.wooldridge@...> wrote:
>>
>> Because the query specified an id != ... (NOT equal), your specificity
>> will be extremely low (almost every row meets that criteria), therefore
>> adding id to the index will do little but increase the overhead of the
>> index.  Similarly, it seems the archived flag would offer little in the way
>> of narrowing the result set -- unless the number of archived items is
>> extremely small compared to the total row count.
>>
>> But you said something interesting, you said it is "looping through all
>> the messages".  That sounds like more of an algorithmic issue than a DB
>> one.  For example, rather than looping and firing off 10000 select
>> statements at 5 seconds each -- processing each message -- is there a way to
>> devise 1 select (or some similary small number) that returns 10000 rows that
>> you can cursor through and process?
>>
>> Anytime I see a pattern like that -- looping over some X programatically
>> and running a query for each -- it is a red flag.  In the past I've
>> converted similar onesy-twosy patterns to bulk operations with generally one
>> or two orders of magnitude improvement.  I worked on a project where a
>> "clean-up" job was deleting data one row at a time based on some algorithm,
>> but doing it for thousands of rows.  It was taking hours.  After some
>> thinking and creative querying (and sub-querying) we were able to generate a
>> bulk delete that took less than two minutes.  It involved creating a
>> temporary table (this was mysql) with ids to be deleted (populated by a few
>> choice queries) and then doing a bulk delete based on a join with that
>> table.
>>
>> Anyway, the point is, rather than trying to optimize a query that is run
>> thousands of times, try to optimize the algorithm so that it doesn't need to
>> do that.
>>
>> Just my thoughts.
>>
>> -Brett
>>
>>
>> On Tue, Sep 15, 2009 at 10:02 PM, Andrew Bruno <andrew.bruno@...>
>> wrote:
>>>
>>> Thank you all for the responses.
>>>
>>> The db runs as part of an applicatioon that sits on  a windows box, and
>>> we have many installed version around the world, so we don't control
>>> hardware,etc..  It only has a few db connections, and at the momement this
>>> particular instancee is looping through all the messages, and seeing whether
>>> they have been archived in the  cloud yet... And as you can imagine, at
>>> 5secs a message its taking  a long time.
>>>
>>> The message table has a bunch of other columns, and since I am using
>>> hibernate I need all columns... But I wonder if it would be quicker to
>>> select just the id column, and then load/select all columns using where
>>> id=.. Sounds crazy.. Kinda like 1+N selects...But any thoughts?
>>>
>>> As for the compund index, I thought it was not possible to add a pk index
>>> I.e. Id as part of a compound, but either way I will try.
>>>
>>> Also, if I do a count(messageid) instead, are there any other
>>> optimization tricks?
>>>
>>> Keep all thoughts coming, crazy or sound :)
>>>
>>> Appreciated
>>> Andrew
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Rick Hillegas <Richard.Hillegas@...>
>>> Sent: Tuesday, 15 September 2009 10:32 PM
>>> To: Derby Discussion <derby-user@...>
>>> Subject: Re: SELECT query takes 5 secs, what can I do?
>>>
>>> Hi Andrew,
>>>
>>> You might try adding more columns to your index so that it covers the
>>> whole WHERE clause:
>>>
>>> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
>>> archived, id)
>>>
>>> Hope this helps,
>>> -Rick
>>>
>>>
>>>
>>> Andrew Bruno wrote:
>>> > Hello,
>>> >
>>> > I have a query that used to take 10secs to run, i.e.
>>> >
>>> > select * from Message m where
>>> > m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
>>> > and m.id != 933927 and m.archived=1
>>> >
>>> > The Message table has around one million rows.
>>> >
>>> > I added the following index
>>> >
>>> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>>> >
>>> > and now it takes 5secs.
>>> >
>>> > Is there anything else I can do?
>>> >
>>> > Should I add an index on the boolean "archived" column too?
>>> >
>>> > Any performance hints appreciated.
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>>
>>>
>>
>
>

Re: SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Actually I think a subquery is more elegant

SELECT * from Message where ID IN (SELECT ID FROM Message  where
messageId = '<7997716ED1AF3D47A35D74FA2CB61092E2A619@...>')
and archived=1 and id!=987452

The inner select will only return whatever the number of users in an
organization has, so say it has 200 users, at most I'll get 200 back..
etc.

What do you think?


On Wed, Sep 16, 2009 at 10:17 AM, Andrew Bruno <andrew.bruno@...> wrote:

> Brett and others, great feedback all. thanks
>
> The id != ... (NOT equal) was an awaking, didnt think about that.
>
> The looping sounds weird, I agree.. but let me explain further.
>
> The applications goes through many stages, and too much to explain
> here in detail.  As its archiving email from an email server, the
> first thing it does is index all messages for a user, this is simply
> an INSERT.
>
> When it decides to transfer the message, it checks whether that
> message has already been transferred for another user.  If it has, it
> does not send the message again, it sends a re-index job (our own
> technology) to just index the message against this other user.  This
> is to save storage and bandwidth.  In an organization, with many
> users, if an email is sent out, that same email is in many people
> mailboxes.  We only want one copy of it archived.
>
> So, what I am thinking of doing is to simply query the table on only
> messageId, and then filter the results myself.
>
> i.e.
>
> select * from Message m where
> m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
>
> All I care about is getting the first message that does not have the
> same Id, and that it has already been archived.  The rest are ignored.
>
> i.e. and m.id != 933927 and m.archived=1
>
> Once I find it in loop, this is the one I reference for the re-indexing.
>
> What do you think?
>
>
> On Wed, Sep 16, 2009 at 1:32 AM, Brett Wooldridge
> <brett.wooldridge@...> wrote:
>> By the way, it was Hibernate that led the original developer to be looping
>> over a large collection and executing a delete of each object.  Hibernate
>> can be very handy, but even Gavin and the other Hibernate guys will tell you
>> it is not designed for (and is not a substitute for) bulk operations.  We
>> had to "go around" Hibernate in that case and several others to execute true
>> bulk operations against the database.  Use Hibernate where it is useful, but
>> go to the bare metal when you need performance.
>>
>> Brett
>>
>> On Wed, Sep 16, 2009 at 12:16 AM, Brett Wooldridge
>> <brett.wooldridge@...> wrote:
>>>
>>> Because the query specified an id != ... (NOT equal), your specificity
>>> will be extremely low (almost every row meets that criteria), therefore
>>> adding id to the index will do little but increase the overhead of the
>>> index.  Similarly, it seems the archived flag would offer little in the way
>>> of narrowing the result set -- unless the number of archived items is
>>> extremely small compared to the total row count.
>>>
>>> But you said something interesting, you said it is "looping through all
>>> the messages".  That sounds like more of an algorithmic issue than a DB
>>> one.  For example, rather than looping and firing off 10000 select
>>> statements at 5 seconds each -- processing each message -- is there a way to
>>> devise 1 select (or some similary small number) that returns 10000 rows that
>>> you can cursor through and process?
>>>
>>> Anytime I see a pattern like that -- looping over some X programatically
>>> and running a query for each -- it is a red flag.  In the past I've
>>> converted similar onesy-twosy patterns to bulk operations with generally one
>>> or two orders of magnitude improvement.  I worked on a project where a
>>> "clean-up" job was deleting data one row at a time based on some algorithm,
>>> but doing it for thousands of rows.  It was taking hours.  After some
>>> thinking and creative querying (and sub-querying) we were able to generate a
>>> bulk delete that took less than two minutes.  It involved creating a
>>> temporary table (this was mysql) with ids to be deleted (populated by a few
>>> choice queries) and then doing a bulk delete based on a join with that
>>> table.
>>>
>>> Anyway, the point is, rather than trying to optimize a query that is run
>>> thousands of times, try to optimize the algorithm so that it doesn't need to
>>> do that.
>>>
>>> Just my thoughts.
>>>
>>> -Brett
>>>
>>>
>>> On Tue, Sep 15, 2009 at 10:02 PM, Andrew Bruno <andrew.bruno@...>
>>> wrote:
>>>>
>>>> Thank you all for the responses.
>>>>
>>>> The db runs as part of an applicatioon that sits on  a windows box, and
>>>> we have many installed version around the world, so we don't control
>>>> hardware,etc..  It only has a few db connections, and at the momement this
>>>> particular instancee is looping through all the messages, and seeing whether
>>>> they have been archived in the  cloud yet... And as you can imagine, at
>>>> 5secs a message its taking  a long time.
>>>>
>>>> The message table has a bunch of other columns, and since I am using
>>>> hibernate I need all columns... But I wonder if it would be quicker to
>>>> select just the id column, and then load/select all columns using where
>>>> id=.. Sounds crazy.. Kinda like 1+N selects...But any thoughts?
>>>>
>>>> As for the compund index, I thought it was not possible to add a pk index
>>>> I.e. Id as part of a compound, but either way I will try.
>>>>
>>>> Also, if I do a count(messageid) instead, are there any other
>>>> optimization tricks?
>>>>
>>>> Keep all thoughts coming, crazy or sound :)
>>>>
>>>> Appreciated
>>>> Andrew
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Rick Hillegas <Richard.Hillegas@...>
>>>> Sent: Tuesday, 15 September 2009 10:32 PM
>>>> To: Derby Discussion <derby-user@...>
>>>> Subject: Re: SELECT query takes 5 secs, what can I do?
>>>>
>>>> Hi Andrew,
>>>>
>>>> You might try adding more columns to your index so that it covers the
>>>> whole WHERE clause:
>>>>
>>>> CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId,
>>>> archived, id)
>>>>
>>>> Hope this helps,
>>>> -Rick
>>>>
>>>>
>>>>
>>>> Andrew Bruno wrote:
>>>> > Hello,
>>>> >
>>>> > I have a query that used to take 10secs to run, i.e.
>>>> >
>>>> > select * from Message m where
>>>> > m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
>>>> > and m.id != 933927 and m.archived=1
>>>> >
>>>> > The Message table has around one million rows.
>>>> >
>>>> > I added the following index
>>>> >
>>>> > CREATE INDEX IDX_Message_MessageId ON ExchangeSync.Message (messageId)
>>>> >
>>>> > and now it takes 5secs.
>>>> >
>>>> > Is there anything else I can do?
>>>> >
>>>> > Should I add an index on the boolean "archived" column too?
>>>> >
>>>> > Any performance hints appreciated.
>>>> >
>>>> > Thanks
>>>> > Andrew
>>>> >
>>>>
>>>>
>>>
>>
>>
>

Re: SELECT query takes 5 secs, what can I do?

by Brett Wooldridge-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The original query:

select * from Message m where
   m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
   and m.id != 933927 and m.archived=1

and the sub-select query:

SELECT * from Message where ID IN (SELECT ID FROM Message  where
    messageId = '<7997716ED1AF3D47A35D74FA2CB61092E2A619@...>')
    and archived=1 and id!=987452

are equivalent with respect to the database.  My comment regarding the 'id != 987452' clause was only with respect to an index.  It's inclusion in the query (original) does not impact performance in a negative way.

The optimizer is going to choose to use the index (on messageId), and will perform that select first, then it will (likely) join that sub-result against the other two clauses (archived=1 and id != 987452) -- making it equivalent to your second query.

Do you know how to get a query plan?  http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

The portion of the query with the messageId should have high-specificity due to the relative uniqueness of the message within the context of all messages.  For example, you say the table has 1 million rows, and a given messsage in an organization of 200 users would consume 200 rows, so as you can see the specificity is very high.  I would not expect that kind of select to be taking 5 seconds, but rather milliseconds.  It is likely that your index statistics are seriously out of date (see http://issues.apache.org/jira/browse/DERBY-269).  One user reported that updating statistics took a 22 minute query down to less than one second!

You can force Derby to update statistics with this command:

alter table <table-name> compress [sequential]

Note this command itself might take a long time -- dozens of minutes -- but in a system like yours you could get away with running it once or month or so at some off-peak time.
 
If you have a test database in which your query is taking 5 seconds, attempt to run the above alter table, and then re-run your query.  If your query times are still low, post the query plan for the query here and we can take a look at it.

Brett


RE: SELECT query takes 5 secs, what can I do?

by Derby-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Andrew,

 

Whoa! It sounds like you’re making a mountain out of a mole hill.

 

Ok, after a couple of rounds of messages, it becomes a little clearer on what you’re attempting to do…

 

If I understand the problem…

 

You have a mail server that you want to archive the mail messages in to a second database.

Based on your description, the archive process is an automated process that runs periodically on the system.

Your constraint is that you will want to save only one copy of the message in the archive. Note you may want to store a copy of the messageId by user  (denormalized) for faster searches of the archive.

 

Is this not the case?

What am I missing?

 

Assuming that this is, the solution is pretty trivial….

 

First on your archive database, you put a unique index on messageId.  This forces the constraint that you can only store one copy of the message in the archive database.

 

Then in a simple java program you do the following:

 

Create two jdbc connections, one to each database.

Connection A is to your mail server

Connection B is to your archive database.

 

In connection A, you want to run a simple select statement to fetch all records where the archive flag is not set to 1.

You want to do this in an UPDATE CURSOR. (Assuming Derby supports this terminology/syntax).

 

In connection B, you want to prepare your INSERT Statements. I’m assuming that you’re saving the headers in one table and then the body in a separate table/blob space?

 

Open the cursor in A to fetch the records.

As you loop through the records, in a try/catch block you try to insert the record via the cursors for B.

If you succeed, you update the record from A (CURRENT of A’s cursor) and set the archive flag to true (1).

If you failed because the messageId is already in the database, you update the record’s archive flag of the current record to true (1).

( You can make this more efficient if you order select on messageId and add some logic to your processing.)

 

In a nutshell, that’s pretty much it.

 

You mentioned that you had to do your other query because of hibernate?

If hibernate is making this task more difficult, why then are you using it?

 

 

HTH

 

-Mike

 

 

 


From: Brett Wooldridge [mailto:brett.wooldridge@...]
Sent: Tuesday, September 15, 2009 8:43 PM
To: Derby Discussion
Subject: Re: SELECT query takes 5 secs, what can I do?

 

The original query:

select * from Message m where
   m.messageId='<7997716ED1AF3D47A35D74FA2CB610920255303F@...>'
   and m.id != 933927 and m.archived=1

and the sub-select query:

SELECT * from Message where ID IN (SELECT ID FROM Message  where
    messageId = '<7997716ED1AF3D47A35D74FA2CB61092E2A619@...>')
    and archived=1 and id!=987452

are equivalent with respect to the database.  My comment regarding the 'id != 987452' clause was only with respect to an index.  It's inclusion in the query (original) does not impact performance in a negative way.

The optimizer is going to choose to use the index (on messageId), and will perform that select first, then it will (likely) join that sub-result against the other two clauses (archived=1 and id != 987452) -- making it equivalent to your second query.

Do you know how to get a query plan?  http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

The portion of the query with the messageId should have high-specificity due to the relative uniqueness of the message within the context of all messages.  For example, you say the table has 1 million rows, and a given messsage in an organization of 200 users would consume 200 rows, so as you can see the specificity is very high.  I would not expect that kind of select to be taking 5 seconds, but rather milliseconds.  It is likely that your index statistics are seriously out of date (see http://issues.apache.org/jira/browse/DERBY-269).  One user reported that updating statistics took a 22 minute query down to less than one second!

You can force Derby to update statistics with this command:

alter table <table-name> compress [sequential]

Note this command itself might take a long time -- dozens of minutes -- but in a system like yours you could get away with running it once or month or so at some off-peak time.
 
If you have a test database in which your query is taking 5 seconds, attempt to run the above alter table, and then re-run your query.  If your query times are still low, post the query plan for the query here and we can take a look at it.

Brett


Re: SELECT query takes 5 secs, what can I do?

by Knut Anders Hatlen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brett Wooldridge <brett.wooldridge@...> writes:

> You can force Derby to update statistics with this command:
>
> alter table <table-name> compress [sequential]
>
> Note this command itself might take a long time -- dozens of minutes -- but in
> a system like yours you could get away with running it once or month or so at
> some off-peak time.

In Derby 10.5 you have a cheaper way of updating the index cardinality
statistics. This statement will update the statistics for all the
indexes on columns in MYSCHEMA.MYTABLE without doing an expensive
compress:

  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('MYSCHEMA', 'MYTABLE', NULL)

http://db.apache.org/derby/docs/10.5/ref/rrefupdatestatsproc.html

--
Knut Anders

Re: SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey all,

Sorry been flat chat for a while, but I gotta say that the query
improved dramatically when the stats updated.

When I created the index, I didnt wait for the stats to be updated.

I came back the next day, and the query was running in sub seconds.
Now ExchangeSync can run well again on this 3G database.

I want to thank everyone for the awesome support, including some of
those left or right field ideas!

Cheers
Andrew


On Thu, Sep 17, 2009 at 7:26 PM, Knut Anders Hatlen <Knut.Hatlen@...> wrote:

> Brett Wooldridge <brett.wooldridge@...> writes:
>
>> You can force Derby to update statistics with this command:
>>
>> alter table <table-name> compress [sequential]
>>
>> Note this command itself might take a long time -- dozens of minutes -- but in
>> a system like yours you could get away with running it once or month or so at
>> some off-peak time.
>
> In Derby 10.5 you have a cheaper way of updating the index cardinality
> statistics. This statement will update the statistics for all the
> indexes on columns in MYSCHEMA.MYTABLE without doing an expensive
> compress:
>
>  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('MYSCHEMA', 'MYTABLE', NULL)
>
> http://db.apache.org/derby/docs/10.5/ref/rrefupdatestatsproc.html
>
> --
> Knut Anders
>

Re: SELECT query takes 5 secs, what can I do?

by Brett Wooldridge-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Glad the updated statistics worked for you.  The specificity of messageId seemed extremely high, and given only 1 million rows should have been returning sub-second results.  You don't need indexes on either of the other two fields, they won't improve performance and will slow insertion speed.

Brett


On Fri, Sep 18, 2009 at 9:31 AM, Andrew Bruno <andrew.bruno@...> wrote:
Hey all,

Sorry been flat chat for a while, but I gotta say that the query
improved dramatically when the stats updated.

When I created the index, I didnt wait for the stats to be updated.

I came back the next day, and the query was running in sub seconds.
Now ExchangeSync can run well again on this 3G database.

I want to thank everyone for the awesome support, including some of
those left or right field ideas!

Cheers
Andrew


On Thu, Sep 17, 2009 at 7:26 PM, Knut Anders Hatlen <Knut.Hatlen@...> wrote:
> Brett Wooldridge <brett.wooldridge@...> writes:
>
>> You can force Derby to update statistics with this command:
>>
>> alter table <table-name> compress [sequential]
>>
>> Note this command itself might take a long time -- dozens of minutes -- but in
>> a system like yours you could get away with running it once or month or so at
>> some off-peak time.
>
> In Derby 10.5 you have a cheaper way of updating the index cardinality
> statistics. This statement will update the statistics for all the
> indexes on columns in MYSCHEMA.MYTABLE without doing an expensive
> compress:
>
>  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('MYSCHEMA', 'MYTABLE', NULL)
>
> http://db.apache.org/derby/docs/10.5/ref/rrefupdatestatsproc.html
>
> --
> Knut Anders
>


Re: SELECT query takes 5 secs, what can I do?

by Andrew Bruno :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

agree! thx

On Fri, Sep 18, 2009 at 11:07 AM, Brett Wooldridge
<brett.wooldridge@...> wrote:

> Glad the updated statistics worked for you.  The specificity of messageId
> seemed extremely high, and given only 1 million rows should have been
> returning sub-second results.  You don't need indexes on either of the other
> two fields, they won't improve performance and will slow insertion speed.
> Brett
>
> On Fri, Sep 18, 2009 at 9:31 AM, Andrew Bruno <andrew.bruno@...>
> wrote:
>>
>> Hey all,
>>
>> Sorry been flat chat for a while, but I gotta say that the query
>> improved dramatically when the stats updated.
>>
>> When I created the index, I didnt wait for the stats to be updated.
>>
>> I came back the next day, and the query was running in sub seconds.
>> Now ExchangeSync can run well again on this 3G database.
>>
>> I want to thank everyone for the awesome support, including some of
>> those left or right field ideas!
>>
>> Cheers
>> Andrew
>>
>>
>> On Thu, Sep 17, 2009 at 7:26 PM, Knut Anders Hatlen <Knut.Hatlen@...>
>> wrote:
>> > Brett Wooldridge <brett.wooldridge@...> writes:
>> >
>> >> You can force Derby to update statistics with this command:
>> >>
>> >> alter table <table-name> compress [sequential]
>> >>
>> >> Note this command itself might take a long time -- dozens of minutes --
>> >> but in
>> >> a system like yours you could get away with running it once or month or
>> >> so at
>> >> some off-peak time.
>> >
>> > In Derby 10.5 you have a cheaper way of updating the index cardinality
>> > statistics. This statement will update the statistics for all the
>> > indexes on columns in MYSCHEMA.MYTABLE without doing an expensive
>> > compress:
>> >
>> >  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('MYSCHEMA', 'MYTABLE', NULL)
>> >
>> > http://db.apache.org/derby/docs/10.5/ref/rrefupdatestatsproc.html
>> >
>> > --
>> > Knut Anders
>> >
>
>