|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 | Next > |
|
|
WIP: generalized index constraintsThis is a follow up to my old proposal here:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php Top pointed out a few problems here: http://archives.postgresql.org/pgsql-hackers/2008-06/msg00427.php Here are my updated answers: 1. Not a problem with the new design, which checks the constraints from ExecInsertIndexTuples(). 2. Not a problem for similar reasons. 3. I don't have an answer here yet, but I have a few thoughts. I see it as a separate proposal. My hand-waving answer is that it should be just as possible as before to append index constraint failures to a big list, and loop through it as long as we're making progress. If I need a more solid proposal for this problem before my generalized constraints proposal is considered, let me know. To try out my patch: (1) Apply patch to 8.5-devel and Init DB (2) Install contrib/btree_gist (only necessary for this example, patch works with Btree and GIN, too). (3) => create table test(i int, c circle); => create index test_idx on test using gist(i, c); => UPDATE pg_index SET indconstrats = '3 3' WHERE indexrelid='test_idx'::regclass; In the above query, 3 is the equality strategy number for the GiST opclass for integers, and 3 is also the "overlaps" strategy number for the GiST opclass for circles, so we put a 3 for each attribute. What this will mean is that it will reject any new tuple when there is already another tuple in the table with an equal value of i AND an overlapping value of c. Concurrency should behave identically to UNIQUE on a btree. (4) Now, try some inserts (concurrent or otherwise) and see what happens. Ultimately, I think the language for this might shape up something like: CREATE INDEX test_idx ON test USING gist (i CONSTRAINT =, c CONSTRAINT &&); which would avoid the need for updating the catalog, of course. Limitations: * Still not deferrable, even 'til the end of the command. * Your constraint must be symmetric (if tuple A conflicts with tuple B, tuple B must conflict with tuple A). * The types have to match between the left and right arguments in the operator class and the type of the column in the table. This is normally true, but the GIN Array opclass works on type "anyarray", but the table has a normal type, which causes a problem. Maybe it's possible to be smarter about this, but the workaround is to just create more opclasses (I believe). Any input is appreciated (design problems, implementation, language ideas, or anything else). I'd like to get it into shape for the July 15 commitfest if no major problems are found. Regards, Jeff Davis [index-constraints-20090705.patch] diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 1515d9f..eedb456 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -26,6 +26,7 @@ * index_vacuum_cleanup - post-deletion cleanup of an index * index_getprocid - get a support procedure OID * index_getprocinfo - get a support procedure's lookup info + * index_check_constraint - check index constraints * * NOTES * This file contains the index_ routines which used @@ -64,9 +65,13 @@ #include "access/relscan.h" #include "access/transam.h" +#include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" +#include "storage/lwlock.h" +#include "storage/procarray.h" +#include "utils/lsyscache.h" #include "utils/relcache.h" #include "utils/snapmgr.h" #include "utils/tqual.h" @@ -116,6 +121,19 @@ do { \ static IndexScanDesc index_beginscan_internal(Relation indexRelation, int nkeys, ScanKey key); +typedef struct +{ + Oid relid; + TransactionId xid; + ItemPointerData tid; +} CurrentIndexInsertEntry; + +static CurrentIndexInsertEntry *CurrentIndexInsertsTable = NULL; + +static bool index_check_constraint_conflict(TupleTableSlot *slot, + HeapTuple tup, int2 *heap_attnums, + int2 index_natts, + Oid *constraint_procs); /* ---------------------------------------------------------------- * index_ interface functions @@ -846,3 +864,278 @@ index_getprocinfo(Relation irel, return locinfo; } + +void +index_check_constraint(Relation heap, Relation index, + ItemPointer tid, TupleTableSlot *slot) +{ + IndexScanDesc index_scan; + HeapTuple tup; + ScanKeyData *scankeys; + int2vector *constr_strats; + Oid *constr_procs; + int i; + int2 *heap_attnums = index->rd_index->indkey.values; + int2 index_natts = index->rd_index->indnatts; + SnapshotData DirtySnapshot; + int nkeys = 0; + + CurrentIndexInsertEntry *MyIndexInsertEntry; + CurrentIndexInsertEntry potential_conflicts[MaxBackends]; + int n_potential_conflicts = 0; + + /* Find constraint strategy numbers */ + constr_strats = RelationGetIndexConstraintStrategies(index); + + /* return if no constraint */ + if (constr_strats == NULL) + return; + + /* + * if any of the indexed columns are NULL, the constraint + * is satisfied + */ + for (i = 0; i < index_natts; i++) + if (slot_attisnull(slot, heap_attnums[i])) + return; + + /* + * Find the function that tests for a conflict based on the + * strategy number, operator family, and types. + */ + constr_procs = palloc(sizeof(Oid) * index_natts); + for (i = 0; i < index_natts; i++) + { + /* + * Find the procedure implementing the strategy for the + * index for two arguments both with the type of the + * indexed attribute. + */ + Oid oper; + Oid opfamily = index->rd_opfamily[i]; + StrategyNumber strategy = constr_strats->values[i]; + Oid typeOid = heap->rd_att->attrs[heap_attnums[i] - 1]->atttypid; + + if (strategy == InvalidStrategy) + continue; + + oper = get_opfamily_member(opfamily, typeOid, typeOid, strategy); + + if(OidIsValid(oper)) + constr_procs[i] = get_opcode(oper); + else + elog(ERROR, "cannot determine operator for type %d and " + "strategy %d", typeOid, strategy); + } + + + if (CurrentIndexInsertsTable == NULL) + { + bool found; + + CurrentIndexInsertsTable = (CurrentIndexInsertEntry *) + ShmemInitStruct("Current Index Inserts Table", + CurrentIndexInsertsShmemSize(), &found); + Assert(found); + } + + MyIndexInsertEntry = &CurrentIndexInsertsTable[MyBackendId - 1]; + + /* + * Check for conflicts with concurrent inserts. These are + * inserts that are actually in-progress now, and have not + * actually been put in the index yet. + */ + + /* TODO: this lock should be partitioned by heap relid hash. */ + LWLockAcquire(IndexConstraintLock, LW_EXCLUSIVE); + + for (i = 0; i < MaxBackends; i++) + { + CurrentIndexInsertEntry entry = CurrentIndexInsertsTable[i]; + + if (RelationGetRelid(heap) == entry.relid && + !TransactionIdIsCurrentTransactionId(entry.xid)) + { + if (TransactionIdIsInProgress(entry.xid)) + { + /* build a list of potential conflicts */ + potential_conflicts[n_potential_conflicts++] = entry; + } + else + entry.relid = InvalidOid; + } + } + + MyIndexInsertEntry->relid = heap->rd_id; + MyIndexInsertEntry->xid = GetCurrentTransactionId(); + MyIndexInsertEntry->tid = *tid; + + LWLockRelease(IndexConstraintLock); + + InitDirtySnapshot(DirtySnapshot); + + for (i = 0; i < n_potential_conflicts; i++) + { + bool does_conflict = true; + HeapTupleData tup; + Buffer buffer; + + tup.t_self = potential_conflicts[i].tid; + if (!heap_fetch(heap, &DirtySnapshot, &tup, &buffer, false, NULL)) + continue; + + does_conflict = index_check_constraint_conflict( + slot, &tup, heap_attnums, index_natts, constr_procs); + + ReleaseBuffer(buffer); + + if (does_conflict) + { + CurrentIndexInsertEntry conflict = potential_conflicts[i]; + if (TransactionIdIsCurrentTransactionId(conflict.xid)) + elog(ERROR, "conflict detected 1"); + + XactLockTableWait(conflict.xid); + if (TransactionIdDidCommit(conflict.xid)) + elog(ERROR, "conflict detected 2"); + } + } + + /* + * Now search the tuples that are actually in the index for + * any violations. + */ + + scankeys = palloc(index_natts * sizeof(ScanKeyData)); + for (i = 0; i < index_natts; i++) + { + Datum key_datum; + bool isnull; + + key_datum = slot_getattr(slot, heap_attnums[i], &isnull); + Assert(!isnull); /* already checked above */ + + if (constr_strats->values[i] == InvalidStrategy) + continue; + + ScanKeyInit(&scankeys[nkeys], i + 1, constr_strats->values[i], + constr_procs[i], key_datum); + nkeys++; + } + + /* + * We have to find all tuples, even those not visible + * yet. Other transactions may have inserted many tuples (or + * the transaction might be a prepared transaction), so there + * may be some tuples that are not in the shared memory + * structure and not visible. + */ + index_scan = index_beginscan(heap, index, &DirtySnapshot, nkeys, + scankeys); + while((tup = index_getnext(index_scan, ForwardScanDirection)) != NULL) + { + if (index_scan->xs_recheck) + { + if (!index_check_constraint_conflict( + slot, tup, heap_attnums, index_natts, constr_procs)) + continue; + } + + /* If the in-progress inserting transaction aborts, proceed. */ + if (TransactionIdIsValid(DirtySnapshot.xmin)) + { + XactLockTableWait(DirtySnapshot.xmin); + if (TransactionIdDidAbort(DirtySnapshot.xmin)) + continue; + } + + /* If the in-progress deleting transaction commits, proceed. */ + if (TransactionIdIsValid(DirtySnapshot.xmax)) + { + XactLockTableWait(DirtySnapshot.xmax); + if (TransactionIdDidCommit(DirtySnapshot.xmax)) + continue; + } + + elog(ERROR, "conflict detected 3"); + } + + index_endscan(index_scan); + pfree(scankeys); + + return; +} + +/* + * For each attribute of the index, check for a conflict between the + * slot's tuple's value and the tuple's value. Only return true if all + * values conflict. + */ +static bool +index_check_constraint_conflict(TupleTableSlot *slot, HeapTuple tup, + int2 *heap_attnums, int2 index_natts, + Oid* constraint_procs) +{ + int i; + + for (i = 0; i < index_natts; i++) + { + Datum input_datum; + Datum existing_datum; + bool isnull; + + if (!OidIsValid(constraint_procs[i])) + continue; + + input_datum = slot_getattr(slot, heap_attnums[i], &isnull); + if (isnull) + return false; + + existing_datum = heap_getattr( + tup, heap_attnums[i], slot->tts_tupleDescriptor, &isnull); + if (isnull) + return false; + + if (!DatumGetBool(OidFunctionCall2(constraint_procs[i], + input_datum, existing_datum))) + return false; + } + return true; +} + +/* + * GistShmemSize --- report amount of shared memory space needed + */ +Size +CurrentIndexInsertsShmemSize(void) +{ + return (sizeof(CurrentIndexInsertEntry) * MaxBackends); +} + +/* + * GistShmemInit --- initialize this module's shared memory + */ +void +CurrentIndexInsertsShmemInit(void) +{ + int i; + bool found; + CurrentIndexInsertEntry *current_inserts; + + current_inserts = (CurrentIndexInsertEntry *) + ShmemInitStruct("Current Index Inserts Table", + CurrentIndexInsertsShmemSize(), + &found); + + if (!IsUnderPostmaster) + { + /* Initialize shared memory area */ + Assert(!found); + + for (i = 0; i < MaxBackends; i++) + current_inserts[i].relid = InvalidOid; + } + else + Assert(found); +} diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index c4e4cab..8a136bb 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -444,6 +444,7 @@ UpdateIndexRelation(Oid indexoid, values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false); /* we set isvalid and isready the same way */ values[Anum_pg_index_indisready - 1] = BoolGetDatum(isvalid); + nulls[Anum_pg_index_indconstrats - 1] = true; values[Anum_pg_index_indkey - 1] = PointerGetDatum(indkey); values[Anum_pg_index_indclass - 1] = PointerGetDatum(indclass); values[Anum_pg_index_indoption - 1] = PointerGetDatum(indoption); diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 0ccd862..e8d704c 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1067,6 +1067,15 @@ ExecInsertIndexTuples(TupleTableSlot *slot, econtext->ecxt_scantuple = slot; /* + * before actually inserting, check index constraints for each index + */ + for (i = 0; i < numIndices; i++) + { + index_check_constraint(heapRelation, relationDescs[i], + tupleid, slot); + } + + /* * for each index, form and insert the index tuple */ for (i = 0; i < numIndices; i++) diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 3022867..8c04940 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -15,6 +15,7 @@ #include "postgres.h" #include "access/clog.h" +#include "access/genam.h" #include "access/heapam.h" #include "access/multixact.h" #include "access/nbtree.h" @@ -115,6 +116,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) size = add_size(size, BgWriterShmemSize()); size = add_size(size, AutoVacuumShmemSize()); size = add_size(size, BTreeShmemSize()); + size = add_size(size, CurrentIndexInsertsShmemSize()); size = add_size(size, SyncScanShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); @@ -215,6 +217,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) * Set up other modules that need some shared memory space */ BTreeShmemInit(); + CurrentIndexInsertsShmemInit(); SyncScanShmemInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 29976e7..24583d3 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -3284,6 +3284,22 @@ RelationGetIndexAttrBitmap(Relation relation) return indexattrs; } +int2vector * +RelationGetIndexConstraintStrategies(Relation relation) +{ + bool isnull; + Datum constraint_strategies; + + constraint_strategies = heap_getattr(relation->rd_indextuple, + Anum_pg_index_indconstrats, + GetPgIndexDescriptor(), + &isnull); + if (isnull) + return NULL; + + return (int2vector *) DatumGetPointer(constraint_strategies); + +} /* * load_relcache_init_file, write_relcache_init_file diff --git a/src/include/access/genam.h b/src/include/access/genam.h index a6ac5db..2aad8b4 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -16,6 +16,8 @@ #include "access/sdir.h" #include "access/skey.h" +#include "access/xact.h" +#include "executor/tuptable.h" #include "nodes/tidbitmap.h" #include "storage/buf.h" #include "storage/lock.h" @@ -129,6 +131,13 @@ extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum, uint16 procnum); extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum, uint16 procnum); +extern void index_check_constraint(Relation heapRelation, + Relation indexRelation, + ItemPointer tid, + TupleTableSlot *slot); + +extern Size CurrentIndexInsertsShmemSize(void); +extern void CurrentIndexInsertsShmemInit(void); /* * index access method support routines (in genam.c) diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h index eaa405f..413cf0a 100644 --- a/src/include/catalog/pg_attribute.h +++ b/src/include/catalog/pg_attribute.h @@ -477,6 +477,7 @@ DATA(insert ( 1259 tableoid 26 0 4 -7 0 -1 -1 t p i t f f t 0 _null_)); { 0, {"indclass"}, 30, -1, -1, 11, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0, { 0 } }, \ { 0, {"indoption"}, 22, -1, -1, 12, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0, { 0 } }, \ { 0, {"indexprs"}, 25, -1, -1, 13, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0, { 0 } }, \ -{ 0, {"indpred"}, 25, -1, -1, 14, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0, { 0 } } +{ 0, {"indpred"}, 25, -1, -1, 14, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0, { 0 } }, \ +{ 0, {"indconstrats"}, 22, -1, -1, 15, 1, -1, -1, false, 'p', 'i', false, false, false, true, 0, { 0 } } #endif /* PG_ATTRIBUTE_H */ diff --git a/src/include/catalog/pg_index.h b/src/include/catalog/pg_index.h index 19069db..9ab905c 100644 --- a/src/include/catalog/pg_index.h +++ b/src/include/catalog/pg_index.h @@ -49,6 +49,7 @@ CATALOG(pg_index,2610) BKI_WITHOUT_OIDS * each zero entry in indkey[] */ text indpred; /* expression tree for predicate, if a partial * index; else NULL */ + int2vector indconstrats; /* index constraint strategies */ } FormData_pg_index; /* ---------------- @@ -62,7 +63,7 @@ typedef FormData_pg_index *Form_pg_index; * compiler constants for pg_index * ---------------- */ -#define Natts_pg_index 14 +#define Natts_pg_index 15 #define Anum_pg_index_indexrelid 1 #define Anum_pg_index_indrelid 2 #define Anum_pg_index_indnatts 3 @@ -77,6 +78,7 @@ typedef FormData_pg_index *Form_pg_index; #define Anum_pg_index_indoption 12 #define Anum_pg_index_indexprs 13 #define Anum_pg_index_indpred 14 +#define Anum_pg_index_indconstrats 15 /* * Index AMs that support ordered scans must support these two indoption diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h index e389c61..9e6f93e 100644 --- a/src/include/storage/lwlock.h +++ b/src/include/storage/lwlock.h @@ -63,6 +63,7 @@ typedef enum LWLockId TwoPhaseStateLock, TablespaceCreateLock, BtreeVacuumLock, + IndexConstraintLock, AddinShmemInitLock, AutovacuumLock, AutovacuumScheduleLock, diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 7d4d914..305d1d2 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -43,6 +43,7 @@ extern Oid RelationGetOidIndex(Relation relation); extern List *RelationGetIndexExpressions(Relation relation); extern List *RelationGetIndexPredicate(Relation relation); extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation); +extern int2vector *RelationGetIndexConstraintStrategies(Relation relation); extern void RelationSetIndexList(Relation relation, List *indexIds, Oid oidIndex); -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote: > This is a follow up to my old proposal here: > > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php > > Any input is appreciated (design problems, implementation, language > ideas, or anything else). I'd like to get it into shape for the July > 15 commitfest if no major problems are found. I was concerned that your definition of concurrently inserted didn't seem to match the size of the shared memory array required. How will you cope with a large COPY? Surely there can be more than one concurrent insert from any backend? It would be useful to see a real example of what this can be used for. I think it will be useful to separate the concepts of a constraint from the concept of an index. It seems possible to have a UNIQUE constraint that doesn't help at all in locating rows, just in proving that the rows are unique. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, Jul 6, 2009 at 11:56 AM, Simon Riggs<simon@...> wrote:
> How will you cope with a large COPY? Surely there can be more than one > concurrent insert from any backend? He only needs to handle inserts for the period they're actively being inserted into the index. Once they're in the index he'll find them using the index scan. In other words this is all a proxy for the way btree locks index pages while it looks for a unique key violation. I'm a bit concerned about the use of tid. You might have to look at a lot of heap pages to check for conflicts. I suppose they're almost certainly all in shared memory though. Also, it sounds like you're anticipating the possibility of dead entries in the array but if you do then you need to store the xmin also to protect against a tuple that's been vacuumed and had its line pointer reused since. But I don't see the necessity for that anyways since you can just clean up the entry on abort. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, Jul 06, 2009 at 11:56:41AM +0100, Simon Riggs wrote:
> On Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote: > > This is a follow up to my old proposal here: > > > > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php > > > > > Any input is appreciated (design problems, implementation, > > language ideas, or anything else). I'd like to get it into shape > > for the July 15 commitfest if no major problems are found. > > I was concerned that your definition of concurrently inserted didn't > seem to match the size of the shared memory array required. > > How will you cope with a large COPY? Surely there can be more than > one concurrent insert from any backend? > > It would be useful to see a real example of what this can be used > for. Constraints like "these intervals can't overlap" would be one. It's handy in calendaring applications, for example. > I think it will be useful to separate the concepts of a constraint > from the concept of an index. It seems possible to have a UNIQUE > constraint that doesn't help at all in locating rows, just in > proving that the rows are unique. Interesting idea. Are you thinking of this in terms of things the planner can do once it knows a set is all distinct values, or...? Cheers, David. -- David Fetter <david@...> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@... Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 12:28 +0100, Greg Stark wrote:
> He only needs to handle inserts for the period they're actively being > inserted into the index. Once they're in the index he'll find them > using the index scan. In other words this is all a proxy for the way > btree locks index pages while it looks for a unique key violation. Exactly, that was my design: /* * We have to find all tuples, even those not visible * yet. Other transactions may have inserted many tuples (or * the transaction might be a prepared transaction), so there * may be some tuples that are not in the shared memory * structure and not visible. */ > I'm a bit concerned about the use of tid. You might have to look at a > lot of heap pages to check for conflicts. I suppose they're almost > certainly all in shared memory though. That was my hope. The 8.4 bulk insert code might defeat that to some degree, however. Maybe that could be disabled when inserting into an index with constraints? I didn't think about it before, but the bulk insert buffer ring would affect unique btrees, too, right? > Also, it sounds like you're > anticipating the possibility of dead entries in the array but if you > do then you need to store the xmin also to protect against a tuple > that's been vacuumed and had its line pointer reused since. But I > don't see the necessity for that anyways since you can just clean up > the entry on abort. Can you tell me a little more specifically the problem you're worried about? If the tuple has been VACUUMed and removed, then the TID search will either find a tuple, and do a spurious constraint check against it; or not find a tuple, and just move on. I could have the commit and abort paths clear the entry, which might optimize away some of the TransactionIdIsInProgress() calls for transactions that ended normally. But that didn't strike me as a big cost compared to the index scan. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote:
> I think it will be useful to separate the concepts of a constraint from > the concept of an index. It seems possible to have a UNIQUE constraint > that doesn't help at all in locating rows, just in proving that the rows > are unique. That would be interesting. Do you have a use case? Checking the constraint would surely be slower in a lot of cases. I could imagine different constraint-checking schemes that could be fast against a heap. For instance, if it's greater than the max or less than the min value, that would be cheap to check. That might be an interesting way to handle the constraint for a sequence-generated column, or timestamp column that is always ascending. However, the problem is I don't see a lot of room for a practical use case. In the above situations, you'd almost certainly want indexes anyway: what's the point of a sequence number unless you're going to do lookups? And if you have an ascending timestamp column, I would think that you might do range lookups occasionally (which will be even better because the heap will be clustered). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 07:30 -0700, David Fetter wrote:
> > It would be useful to see a real example of what this can be used > > for. > > Constraints like "these intervals can't overlap" would be one. It's > handy in calendaring applications, for example. Exactly, you already know my use case ;) My goal is a "temporal key", where you can't have overlapping intervals of time, e.g. the constraint "nobody can be two places at the same time". > > I think it will be useful to separate the concepts of a constraint > > from the concept of an index. It seems possible to have a UNIQUE > > constraint that doesn't help at all in locating rows, just in > > proving that the rows are unique. > > Interesting idea. Are you thinking of this in terms of things the > planner can do once it knows a set is all distinct values, or...? I think that's an orthogonal idea. It's a good idea though, I would like the planner to be smarter about those kinds of things. A simple example is that if a table has a non-partial unique constraint anywhere, then "select * from foo union select * from foo" can be transformed into "select * from foo" (eliminating the expensive union). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@...> wrote:
> > Exactly, you already know my use case ;) My goal is a "temporal key", > where you can't have overlapping intervals of time, e.g. the constraint > "nobody can be two places at the same time". Incidentally to handle non-overlapping ranges you don't need GIST, you can actually use a plain btree. Since there are no overlapping ranges the ranges have a complete ordering and you can get that by just sorting by either endpoint. To enforce the constraint you only have to compare with the previous and following element in the btree. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote:
> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@...> wrote: > > > > Exactly, you already know my use case ;) My goal is a "temporal key", > > where you can't have overlapping intervals of time, e.g. the constraint > > "nobody can be two places at the same time". > > Incidentally to handle non-overlapping ranges you don't need GIST, you > can actually use a plain btree. Since there are no overlapping ranges > the ranges have a complete ordering and you can get that by just > sorting by either endpoint. To enforce the constraint you only have to > compare with the previous and following element in the btree. What if you have an entire index full of overlapping dead tuples, and a few live ones? How would search work? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 08:50 -0700, Jeff Davis wrote: > On Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote: > > I think it will be useful to separate the concepts of a constraint from > > the concept of an index. It seems possible to have a UNIQUE constraint > > that doesn't help at all in locating rows, just in proving that the rows > > are unique. > > That would be interesting. Do you have a use case? Checking the > constraint would surely be slower in a lot of cases. > > I could imagine different constraint-checking schemes that could be fast > against a heap. For instance, if it's greater than the max or less than > the min value, that would be cheap to check. That might be an > interesting way to handle the constraint for a sequence-generated > column, or timestamp column that is always ascending. Yes. > However, the problem is I don't see a lot of room for a practical use > case. In the above situations, you'd almost certainly want indexes > anyway: what's the point of a sequence number unless you're going to do > lookups? And if you have an ascending timestamp column, I would think > that you might do range lookups occasionally (which will be even better > because the heap will be clustered). In many cases, people add unique indexes solely to allow replication to work correctly. The index itself may never be used, especially in high volume applications. How do you handle uniqueness within a stream? Presumably it is possible and useful to have a stream of data that can be guaranteed unique, yet a stream would never be uniquely targeted for lookups because of the volume of data involved. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, Jul 6, 2009 at 6:20 PM, Jeff Davis<pgsql@...> wrote:
> On Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote: >> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis<pgsql@...> wrote: >> > >> > Exactly, you already know my use case ;) My goal is a "temporal key", >> > where you can't have overlapping intervals of time, e.g. the constraint >> > "nobody can be two places at the same time". >> >> Incidentally to handle non-overlapping ranges you don't need GIST, you >> can actually use a plain btree. Since there are no overlapping ranges >> the ranges have a complete ordering and you can get that by just >> sorting by either endpoint. To enforce the constraint you only have to >> compare with the previous and following element in the btree. > > What if you have an entire index full of overlapping dead tuples, and a > few live ones? How would search work? I should clarify I didn't mean you could implement it in SQL using Postgres btrees. I just meant that a tree data structure was sufficient, you don't need the power of GIST. It's probably easier to implement it in GIST in Postgres since it's there though. So it would work just like regular btrees, you only consider it a conflict if there's a live value that conflicts. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
> In many cases, people add unique indexes solely to allow replication to > work correctly. The index itself may never be used, especially in high > volume applications. Interesting. Maybe we should at least try to leave room for this feature to be added later. I agree that, from a theoretical perspective, requiring a UNIQUE constraint to use an index is wrong. For one thing, you can't ensure the uniqueness without defining some total order (although you can define an arbitrary total order for cases with no meaningful total order). > How do you handle uniqueness within a stream? Presumably it is possible > and useful to have a stream of data that can be guaranteed unique, yet a > stream would never be uniquely targeted for lookups because of the > volume of data involved. [ Simon is asking me because I work for Truviso, but my response is not officially from Truviso ] There are a few cases worth mentioning here. First, if you have a stream that's backed by a table, you can use a table constraint. Second, you might choose to have an "in-order" constraint (not necessary, the system can fix out-of-order data), which could be a unique constraint that's very cheap to test. Additionally, this is not strictly a constraint, but if you have downstream operators, like COUNT(DISTINCT...), that can be seen as being similar to a constraint. These will often be over a limited span of time, say, a minute or an hour, and we can keep the necessary state. If there are a huge number of distinct values there, then it's a challenge to avoid keeping a lot of state. There are a few other specialized methods that we can use for specific use-cases. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraints> CREATE INDEX test_idx ON test USING gist > (i CONSTRAINT =, c CONSTRAINT &&); > > which would avoid the need for updating the catalog, of course. Hmm, looks like "index"-fied table's constrains -- Teodor Sigaev E-mail: teodor@... WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsJeff Davis <pgsql@...> writes:
> On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote: >> In many cases, people add unique indexes solely to allow replication to >> work correctly. The index itself may never be used, especially in high >> volume applications. > Interesting. Maybe we should at least try to leave room for this feature > to be added later. I agree that, from a theoretical perspective, > requiring a UNIQUE constraint to use an index is wrong. For one thing, > you can't ensure the uniqueness without defining some total order > (although you can define an arbitrary total order for cases with no > meaningful total order). This seems a bit pointless. There is certainly not any use case for a constraint without an enforcement mechanism (or at least none the PG community is likely to consider legitimate ;-)). And it's not very realistic to suppose that you'd check a constraint by doing a seqscan every time. Therefore there has to be an index underlying the constraint somehow. Jeff's complaint about total order is not an argument against having an index, it's just pointing out that btree is not the only possible type of index. It's perfectly legitimate to imagine using a hash index to enforce uniqueness, for example. If hash indexes had better performance we'd probably already have been looking for a way to do that, and wanting some outside-the-AM mechanism for it so we didn't have to duplicate code from btree. Also, if hash indexes were a realistic alternative to btree for this, we'd already have come up against the problem that the CONSTRAINT syntax doesn't provide any way to specify what kind of index you want to use underneath the constraint. I think it might be interesting to turn around Jeff's syntax sketch and provide a way to say that a CONSTRAINT declaration should depend on some previously added index, eg something like ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index Not sure how that squares exactly with the question of variant definitions of uniqueness semantics (as opposed to purely implementation decisions like hash vs btree). But it's a different way to come at it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote: > ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index This would be very useful, though perhaps only because we do not have REINDEX CONCURRENTLY. It is likely to be useful in the future to allow an index with N columns, yet which can provide uniqueness with < N of those columns. This capability is known as covered indexes and will be important if Heikki writes his index-only scan code. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote: > Jeff Davis <pgsql@...> writes: > > On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote: > >> In many cases, people add unique indexes solely to allow replication to > >> work correctly. The index itself may never be used, especially in high > >> volume applications. > > > Interesting. Maybe we should at least try to leave room for this feature > > to be added later. I agree that, from a theoretical perspective, > > requiring a UNIQUE constraint to use an index is wrong. For one thing, > > you can't ensure the uniqueness without defining some total order > > (although you can define an arbitrary total order for cases with no > > meaningful total order). > > This seems a bit pointless. There is certainly not any use case for a > constraint without an enforcement mechanism (or at least none the PG > community is likely to consider legitimate ;-)). And it's not very > realistic to suppose that you'd check a constraint by doing a seqscan > every time. Therefore there has to be an index underlying the > constraint somehow. I think the idea has been misconstrued. Obviously a constraint requires an enforcement mechanism. That doesn't imply that the enforcement mechanism must be fully usable as an index. The example being discussed was enforcing uniqueness on monotonically increasing columns. If we knew that a column value was GENERATED ALWAYS using a sequence, then we could simply skip the uniqueness check altogether. No index, yet an enforced unique constraint. Yes, we would need to understand the relationship between the sequence and the table and throw an error in certain sequence update cases (and we may need to check those with a seq scan). But that seems a small price to pay for the avoidance of a potentially very large index that may have no purpose. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Tue, Jul 7, 2009 at 6:22 PM, Tom Lane<tgl@...> wrote:
> > This seems a bit pointless. There is certainly not any use case for a > constraint without an enforcement mechanism (or at least none the PG > community is likely to consider legitimate ;-)). And it's not very > realistic to suppose that you'd check a constraint by doing a seqscan > every time. Therefore there has to be an index underlying the > constraint somehow. I'm not entirely convinced that running a full scan to enforce constraints is necessarily such a crazy idea. It may well be the most efficient approach after a major bulk load. And consider a read-only database where the only purpose of the constraint is to inform the optimizer that it can trust the property to hold. That said this seems like an orthogonal issue to me. > Jeff's complaint about total order is not an > argument against having an index, it's just pointing out that btree is > not the only possible type of index. It's perfectly legitimate to > imagine using a hash index to enforce uniqueness, for example. If hash > indexes had better performance we'd probably already have been looking > for a way to do that, and wanting some outside-the-AM mechanism for it > so we didn't have to duplicate code from btree. I'm a bit at a loss why we need this extra data structure though. The whole duplicated code issue seems to me to be one largely of code structure. If we hoisted the heap-value rechecking code out of the btree AM then the hash AM could reuse it just fine. Both the hash and btree AMs would have to implement some kind of "insert-unique-key" operation which would hold some kind of lock preventing duplicate unique keys from being inserted but both btree and hash could implement that efficiently by locking one page or one hash value. GIST would need something like this "store the key value or tid in shared memory" mechanism. But that could be implemented as an external facility which GIST then made use of -- just the way every part of the system makes use of other parts. It doesn't mean we have to make "prevent concurrent unique inserts" not the responsibility of the AM which knows best how to handle that. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote:
> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote: > It is likely to be useful in the future to allow an index with N > columns, yet which can provide uniqueness with < N of those columns. > This capability is known as covered indexes and will be important if > Heikki writes his index-only scan code. My patch offers this capability, and the language I suggested would support it. In the current version of the patch, just use InvalidStrategy (0) instead of, say, BTEqualStrategyNumber (3) for the attributes that you don't want to be a part of the constraint. Some of the proper error checking is not done yet, but it will work. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsJeff Davis <pgsql@...> writes:
> On Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote: >> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote: >> It is likely to be useful in the future to allow an index with N >> columns, yet which can provide uniqueness with < N of those columns. >> This capability is known as covered indexes and will be important if >> Heikki writes his index-only scan code. > My patch offers this capability, and the language I suggested would > support it. > In the current version of the patch, just use InvalidStrategy (0) > instead of, say, BTEqualStrategyNumber (3) for the attributes that you > don't want to be a part of the constraint. Some of the proper error > checking is not done yet, but it will work. I don't think this even approximates the need --- in particular it's not clear what the semantics of combination across different index columns are. I assume you've hot-wired it so that several BTEqualStrategyNumber columns will work like a normal multicolumn uniqueness constraint (IOW it's okay as long as at least one column is NULL or isn't equal). But I'm not at all sure that's what I'd want for some other operator type. Also, what happens if you want to use the same index to support more than one logical constraint? This is impossible if you put the information into pg_index, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
|
Re: WIP: generalized index constraintsOn Tue, 2009-07-07 at 14:57 -0400, Tom Lane wrote:
> I don't think this even approximates the need --- in particular it's not > clear what the semantics of combination across different index columns > are. I assume you've hot-wired it so that several BTEqualStrategyNumber > columns will work like a normal multicolumn uniqueness constraint (IOW > it's okay as long as at least one column is NULL or isn't equal). But > I'm not at all sure that's what I'd want for some other operator type. If any input attributes are NULL, the constraint automatically succeeds (I think that follows the SQL philosophy). Perhaps we should be able to override this behavior somehow? Now, for each attribute where a constraint is defined, it becomes a new scan key used in the index lookup to enforce the constraint. So, the more attributes in the constraint, the more permissive the constraint (just like UNIQUE). I can't think of another good interpretation of the constraint. If a constraint on (x,y) meant "x is unique, and y is unique", it would be hard to scan the index for y's uniqueness. If you want to say that both are independently unique, I believe that requires two indexes, and so it would probably be best to just require the constraints to be specified separately. Thoughts? > Also, what happens if you want to use the same index to support more > than one logical constraint? This is impossible if you put the > information into pg_index, I think. > I like the idea to store the information outside of pg_index. It means that I don't have to build the index and check the constraint at the same time. It would be more like adding a CHECK constraint. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |