|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
GIN: any ordering guarantees for the hits returned?Hello,
I have a moderately large (~10-20GB) table: CREATE TABLE msgs ( msg varchar(2048), msg_tsv tsvector, posted timestamp ); CREATE INDEX msgs_i ON msgs USING gin(msg_tsv); The table never gets updated (more specifically, it gets re-created once a day with no updates in between). I want to run queries of the following form: SELECT msg, posted FROM msgs WHERE 'blah blah'::tsquery @@ msg_tsv ORDERED BY posted DESC; (with various LIMIT/OFFSET) Which obviously may get too expensive, for it will cause reading and sorting of all rows meeting the condition, i.e. too many disk reads. On the other hand, (as far as I understand) GIN always produces hits already sorted in the insertion order. So - what if I just populate my table in the order of decreasing 'posted', remove the "ORDERED BY" clause and just hope for the best? Will the correct ordering be guaranteed? If not, are there any other ideas around? Thanks, - adrobj |
|
|
Re: GIN: any ordering guarantees for the hits returned?On Fri, Nov 16, 2007 at 07:56:45PM -0800, adrobj wrote:
> > Hello, > > I have a moderately large (~10-20GB) table: > > CREATE TABLE msgs ( > msg varchar(2048), > msg_tsv tsvector, > posted timestamp > ); > > CREATE INDEX msgs_i ON msgs USING gin(msg_tsv); > > The table never gets updated (more specifically, it gets re-created > once a day with no updates in between). > > I want to run queries of the following form: > > SELECT msg, posted FROM msgs WHERE 'blah blah'::tsquery @@ msg_tsv > ORDERED BY posted DESC; (with various LIMIT/OFFSET) > > Which obviously may get too expensive, for it will cause reading and > sorting of all rows meeting the condition, i.e. too many disk reads. > > On the other hand, (as far as I understand) GIN always produces hits > already sorted in the insertion order. > > So - what if I just populate my table in the order of decreasing > 'posted', remove the "ORDERED BY" clause and just hope for the best? > Will the correct ordering be guaranteed? Ordering is never guaranteed without an ORDER BY, except in the time between a CLUSTER and the first write operation after it. > If not, are there any other ideas around? Rather than assuming you know where problems will arise, do some profiling and find out where they actually do :) Cheers, David. -- David Fetter <david@...> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@... Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
|
|
Re: GIN: any ordering guarantees for the hits returned?On Sun, Nov 18, 2007 at 09:59:41AM -0800, David Fetter wrote:
> Ordering is never guaranteed without an ORDER BY, except in the time > between a CLUSTER and the first write operation after it. It's my understanding that with the new "seqscan piggy-backing" in 8.3 even this will go. I'm not sure if this would affect the behaviour of GIN indexes, but I'd doubt it. A quick question for the OP; if the results from these queries are going to be displayed for human consumption, does it matter much if they're in perfect sorted order? For computer algorithms it generally matters much more, but people are generally a bit more flexible. Sam ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ |
|
|
Re: GIN: any ordering guarantees for the hits returned?David Fetter <david@...> wrote:
Never miss a thing. Make Yahoo your homepage. |
|
|
Re: GIN: any ordering guarantees for the hits returned?Alex Drobychev wrote:
> I agree with this maybe 98% - but not 100%. :-) Unfortunately > performance can change rather unpredictably when the DB stops > fitting in memory - say, 3-4 months after a production roll-out, too > late for profiling experiments. :-( Surely you're capable of inventing random data to simulate the load you'll have in 3-4 months or even a year? David is correct in that the order is not guaranteed. It's not just a matter of which order the rows were inserted -- the executor can do a lot of things internally that would make the result appear in a different order. Even when the data is CLUSTER'ed the ordering can be lost. If you want to have a guaranteed order, use ORDER BY. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end." (2nd Commandment for C programmers) ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| Free embeddable forum powered by Nabble | Forum Help |