|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
pg_dump additional options for performanceSimon,
I agree with adding these options in general, since I find myself frustrated by having to vi huge dumps to change simple schema things. A couple of comments on the patch though: - Conflicting option handling I think we are doing our users a disservice by putting it on them to figure out exactly what: multiple object groups cannot be used together means to them. You and I may understand what an "object group" is, and why there can be only one, but it's a great deal less clear than the prior message of options -s/--schema-only and -a/--data-only cannot be used together My suggestion would be to either list out the specific options which can't be used together, as was done previously, or add a bit of (I realize, boring) code and actually tell the user which of the conflicting options were used. - Documentation When writing the documentation I would stress that "pre-schema" and "post-schema" be defined in terms of PostgreSQL objects and why they are pre vs. post. - Technically, the patch needs to be updated slightly since another pg_dump-related patch was committed recently which also added options and thus causes a conflict. Beyond those minor points, the patch looks good to me. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceOn Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: > Simon, > > I agree with adding these options in general, since I find myself > frustrated by having to vi huge dumps to change simple schema things. > A couple of comments on the patch though: > > - Conflicting option handling > I think we are doing our users a disservice by putting it on them to > figure out exactly what: > multiple object groups cannot be used together > means to them. You and I may understand what an "object group" is, > and why there can be only one, but it's a great deal less clear than > the prior message of > options -s/--schema-only and -a/--data-only cannot be used together > My suggestion would be to either list out the specific options which > can't be used together, as was done previously, or add a bit of (I > realize, boring) code and actually tell the user which of the > conflicting options were used. > > - Documentation > When writing the documentation I would stress that "pre-schema" and > "post-schema" be defined in terms of PostgreSQL objects and why they > are pre vs. post. > > - Technically, the patch needs to be updated slightly since another > pg_dump-related patch was committed recently which also added > options and thus causes a conflict. > > Beyond those minor points, the patch looks good to me. Thanks for the review. I'll make the changes you suggest. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceOn Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote: > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: > > Simon, > > > > I agree with adding these options in general, since I find myself > > frustrated by having to vi huge dumps to change simple schema things. > > A couple of comments on the patch though: > > > > - Conflicting option handling > > I think we are doing our users a disservice by putting it on them to > > figure out exactly what: > > multiple object groups cannot be used together > > means to them. You and I may understand what an "object group" is, > > and why there can be only one, but it's a great deal less clear than > > the prior message of > > options -s/--schema-only and -a/--data-only cannot be used together > > My suggestion would be to either list out the specific options which > > can't be used together, as was done previously, or add a bit of (I > > realize, boring) code and actually tell the user which of the > > conflicting options were used. > > > > - Documentation > > When writing the documentation I would stress that "pre-schema" and > > "post-schema" be defined in terms of PostgreSQL objects and why they > > are pre vs. post. > > > > - Technically, the patch needs to be updated slightly since another > > pg_dump-related patch was committed recently which also added > > options and thus causes a conflict. > > > > Beyond those minor points, the patch looks good to me. > > Thanks for the review. I'll make the changes you suggest. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support [pg_dump_prepost.v3.patch] Index: doc/src/sgml/ref/pg_dump.sgml =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v retrieving revision 1.102 diff -c -r1.102 pg_dump.sgml *** doc/src/sgml/ref/pg_dump.sgml 13 Apr 2008 03:49:21 -0000 1.102 --- doc/src/sgml/ref/pg_dump.sgml 20 Jul 2008 06:33:30 -0000 *************** *** 133,139 **** <para> Include large objects in the dump. This is the default behavior except when <option>--schema</>, <option>--table</>, or ! <option>--schema-only</> is specified, so the <option>-b</> switch is only useful to add large objects to selective dumps. </para> </listitem> --- 133,140 ---- <para> Include large objects in the dump. This is the default behavior except when <option>--schema</>, <option>--table</>, or ! <option>--schema-only</> or <option>--schema-pre-load</> or ! <option>--schema-post-load</> is specified, so the <option>-b</> switch is only useful to add large objects to selective dumps. </para> </listitem> *************** *** 443,448 **** --- 444,471 ---- </varlistentry> <varlistentry> + <term><option>--schema-pre-load</option></term> + <listitem> + <para> + Dump only the object definitions (schema) required to load data. Dumps + exactly what <option>--schema-only</> would dump, but only those + statements before the data load. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--schema-post-load</option></term> + <listitem> + <para> + Dump only the object definitions (schema) required after data has been + loaded. Dumps exactly what <option>--schema-only</> would dump, but + only those statements after the data load. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-S <replaceable class="parameter">username</replaceable></option></term> <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term> <listitem> *************** *** 774,779 **** --- 797,830 ---- </para> <para> + The output of pg_dump can be notionally divided into three parts: + <itemizedlist> + <listitem> + <para> + Pre-Schema - objects required before data loading, such as + <command>CREATE TABLE</command>. + This part can be requested using <option>--schema-pre-load</>. + </para> + </listitem> + <listitem> + <para> + Table Data - data can be requested using <option>--data-only</>. + </para> + </listitem> + <listitem> + <para> + Post-Schema - objects required after data loading, such as + <command>ALTER TABLE</command> and <command>CREATE INDEX</command>. + This part can be requested using <option>--schema-post-load</>. + </para> + </listitem> + </itemizedlist> + This allows us to work more easily with large data dump files when + there is some need to edit commands or resequence their execution for + performance. + </para> + + <para> Because <application>pg_dump</application> is used to transfer data to newer versions of <productname>PostgreSQL</>, the output of <application>pg_dump</application> can be loaded into Index: doc/src/sgml/ref/pg_restore.sgml =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v retrieving revision 1.75 diff -c -r1.75 pg_restore.sgml *** doc/src/sgml/ref/pg_restore.sgml 13 Apr 2008 03:49:21 -0000 1.75 --- doc/src/sgml/ref/pg_restore.sgml 20 Jul 2008 06:33:18 -0000 *************** *** 321,326 **** --- 321,350 ---- </varlistentry> <varlistentry> + <term><option>--schema-post-load</option></term> + <listitem> + <para> + Dump only the object definitions (schema) required after data has been + loaded. Dumps exactly what <option>--schema-only</> would dump, but + only those statements after the data load. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-S <replaceable class="parameter">username</replaceable></option></term> + <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term> + <listitem> + <para> + Specify the superuser user name to use when disabling triggers. + This is only relevant if <option>--disable-triggers</> is used. + (Usually, it's better to leave this out, and instead start the + resulting script as superuser.) + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-S <replaceable class="parameter">username</replaceable></option></term> <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term> <listitem> *************** *** 572,577 **** --- 596,629 ---- </para> <para> + The actions of pg_restore can be notionally divided into three parts: + <itemizedlist> + <listitem> + <para> + Pre-Schema - objects required before data loading, such as + <command>CREATE TABLE</command>. + This part can be requested using <option>--schema-pre-load</>. + </para> + </listitem> + <listitem> + <para> + Table Data - data can be requested using <option>--data-only</>. + </para> + </listitem> + <listitem> + <para> + Post-Schema - objects required after data loading, such as + <command>ALTER TABLE</command> and <command>CREATE INDEX</command>. + This part can be requested using <option>--schema-post-load</>. + </para> + </listitem> + </itemizedlist> + This allows us to work more easily with large data dump files when + there is some need to edit commands or resequence their execution for + performance. + </para> + + <para> The limitations of <application>pg_restore</application> are detailed below. <itemizedlist> Index: src/bin/pg_dump/pg_backup.h =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v retrieving revision 1.47 diff -c -r1.47 pg_backup.h *** src/bin/pg_dump/pg_backup.h 13 Apr 2008 03:49:21 -0000 1.47 --- src/bin/pg_dump/pg_backup.h 20 Jul 2008 05:19:34 -0000 *************** *** 89,95 **** int use_setsessauth;/* Use SET SESSION AUTHORIZATION commands * instead of OWNER TO */ char *superuser; /* Username to use as superuser */ ! int dataOnly; int dropSchema; char *filename; int schemaOnly; --- 89,95 ---- int use_setsessauth;/* Use SET SESSION AUTHORIZATION commands * instead of OWNER TO */ char *superuser; /* Username to use as superuser */ ! int dumpObjFlags; /* which objects types to dump */ int dropSchema; char *filename; int schemaOnly; Index: src/bin/pg_dump/pg_backup_archiver.c =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.157 diff -c -r1.157 pg_backup_archiver.c *** src/bin/pg_dump/pg_backup_archiver.c 4 May 2008 08:32:21 -0000 1.157 --- src/bin/pg_dump/pg_backup_archiver.c 20 Jul 2008 05:19:34 -0000 *************** *** 56,62 **** static void _selectTablespace(ArchiveHandle *AH, const char *tablespace); static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te); static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te); ! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls); static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt); static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt); static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id); --- 56,62 ---- static void _selectTablespace(ArchiveHandle *AH, const char *tablespace); static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te); static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te); ! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls); static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt); static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt); static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id); *************** *** 129,135 **** { ArchiveHandle *AH = (ArchiveHandle *) AHX; TocEntry *te; ! teReqs reqs; OutputContext sav; bool defnDumped; --- 129,135 ---- { ArchiveHandle *AH = (ArchiveHandle *) AHX; TocEntry *te; ! int reqs; OutputContext sav; bool defnDumped; *************** *** 175,193 **** * Work out if we have an implied data-only restore. This can happen if * the dump was data only or if the user has used a toc list to exclude * all of the schema data. All we do is look for schema entries - if none ! * are found then we set the dataOnly flag. * ! * We could scan for wanted TABLE entries, but that is not the same as ! * dataOnly. At this stage, it seems unnecessary (6-Mar-2001). */ ! if (!ropt->dataOnly) { int impliedDataOnly = 1; for (te = AH->toc->next; te != AH->toc; te = te->next) { reqs = _tocEntryRequired(te, ropt, true); ! if ((reqs & REQ_SCHEMA) != 0) { /* It's schema, and it's wanted */ impliedDataOnly = 0; break; --- 175,193 ---- * Work out if we have an implied data-only restore. This can happen if * the dump was data only or if the user has used a toc list to exclude * all of the schema data. All we do is look for schema entries - if none ! * are found then say we only want DATA type objects. * ! * We could scan for wanted TABLE entries, but that is not the same. ! * At this stage, it seems unnecessary (6-Mar-2001). */ ! if (!WANT_DATA(ropt->dumpObjFlags)) { int impliedDataOnly = 1; for (te = AH->toc->next; te != AH->toc; te = te->next) { reqs = _tocEntryRequired(te, ropt, true); ! if (WANT_PRE_SCHEMA(reqs) || WANT_POST_SCHEMA(reqs)) { /* It's schema, and it's wanted */ impliedDataOnly = 0; break; *************** *** 195,201 **** } if (impliedDataOnly) { ! ropt->dataOnly = impliedDataOnly; ahlog(AH, 1, "implied data-only restore\n"); } } --- 195,201 ---- } if (impliedDataOnly) { ! ropt->dumpObjFlags = REQ_DATA; ahlog(AH, 1, "implied data-only restore\n"); } } *************** *** 236,242 **** AH->currentTE = te; reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ ); ! if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt) { /* We want the schema */ ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag); --- 236,242 ---- AH->currentTE = te; reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ ); ! if (((reqs & REQ_PRE_SCHEMA) != 0) && te->dropStmt) { /* We want the schema */ ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag); *************** *** 278,284 **** /* Dump any relevant dump warnings to stderr */ if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0) { ! if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0) write_msg(modulename, "warning from original dump file: %s\n", te->defn); else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0) write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt); --- 278,284 ---- /* Dump any relevant dump warnings to stderr */ if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0) { ! if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0) write_msg(modulename, "warning from original dump file: %s\n", te->defn); else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0) write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt); *************** *** 286,292 **** defnDumped = false; ! if ((reqs & REQ_SCHEMA) != 0) /* We want the schema */ { ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag); --- 286,293 ---- defnDumped = false; ! if ((WANT_PRE_SCHEMA(reqs) && WANT_PRE_SCHEMA(ropt->dumpObjFlags)) || ! (WANT_POST_SCHEMA(reqs) && WANT_POST_SCHEMA(ropt->dumpObjFlags))) /* We want the schema */ { ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag); *************** *** 331,337 **** /* * If we have a data component, then process it */ ! if ((reqs & REQ_DATA) != 0) { /* * hadDumper will be set if there is genuine data component for --- 332,338 ---- /* * If we have a data component, then process it */ ! if (WANT_DATA(reqs)) { /* * hadDumper will be set if there is genuine data component for *************** *** 343,349 **** /* * If we can output the data, then restore it. */ ! if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0) { #ifndef HAVE_LIBZ if (AH->compression != 0) --- 344,350 ---- /* * If we can output the data, then restore it. */ ! if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs)) { #ifndef HAVE_LIBZ if (AH->compression != 0) *************** *** 415,421 **** /* Work out what, if anything, we want from this entry */ reqs = _tocEntryRequired(te, ropt, true); ! if ((reqs & REQ_SCHEMA) != 0) /* We want the schema */ { ahlog(AH, 1, "setting owner and privileges for %s %s\n", te->desc, te->tag); --- 416,422 ---- /* Work out what, if anything, we want from this entry */ reqs = _tocEntryRequired(te, ropt, true); ! if (WANT_PRE_SCHEMA(reqs)) /* We want the schema */ { ahlog(AH, 1, "setting owner and privileges for %s %s\n", te->desc, te->tag); *************** *** 473,479 **** _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt) { /* This hack is only needed in a data-only restore */ ! if (!ropt->dataOnly || !ropt->disable_triggers) return; ahlog(AH, 1, "disabling triggers for %s\n", te->tag); --- 474,480 ---- _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt) { /* This hack is only needed in a data-only restore */ ! if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers) return; ahlog(AH, 1, "disabling triggers for %s\n", te->tag); *************** *** 499,505 **** _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt) { /* This hack is only needed in a data-only restore */ ! if (!ropt->dataOnly || !ropt->disable_triggers) return; ahlog(AH, 1, "enabling triggers for %s\n", te->tag); --- 500,506 ---- _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt) { /* This hack is only needed in a data-only restore */ ! if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers) return; ahlog(AH, 1, "enabling triggers for %s\n", te->tag); *************** *** 1321,1327 **** return NULL; } ! teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt) { TocEntry *te = getTocEntryByDumpId(AH, id); --- 1322,1328 ---- return NULL; } ! int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt) { TocEntry *te = getTocEntryByDumpId(AH, id); *************** *** 2026,2035 **** te->defn); } ! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls) { ! teReqs res = REQ_ALL; /* ENCODING and STDSTRINGS items are dumped specially, so always reject */ if (strcmp(te->desc, "ENCODING") == 0 || --- 2027,2036 ---- te->defn); } ! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls) { ! int res = ropt->dumpObjFlags; /* ENCODING and STDSTRINGS items are dumped specially, so always reject */ if (strcmp(te->desc, "ENCODING") == 0 || *************** *** 2109,2125 **** if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0)) return 0; - /* Mask it if we only want schema */ - if (ropt->schemaOnly) - res = res & REQ_SCHEMA; - - /* Mask it we only want data */ - if (ropt->dataOnly) - res = res & REQ_DATA; - /* Mask it if we don't have a schema contribution */ if (!te->defn || strlen(te->defn) == 0) ! res = res & ~REQ_SCHEMA; /* Finally, if there's a per-ID filter, limit based on that as well */ if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1]) --- 2110,2118 ---- if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0)) return 0; /* Mask it if we don't have a schema contribution */ if (!te->defn || strlen(te->defn) == 0) ! res = res & ~(REQ_PRE_SCHEMA | REQ_POST_SCHEMA); /* Finally, if there's a per-ID filter, limit based on that as well */ if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1]) Index: src/bin/pg_dump/pg_backup_archiver.h =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v retrieving revision 1.76 diff -c -r1.76 pg_backup_archiver.h *** src/bin/pg_dump/pg_backup_archiver.h 7 Nov 2007 12:24:24 -0000 1.76 --- src/bin/pg_dump/pg_backup_archiver.h 20 Jul 2008 05:19:34 -0000 *************** *** 158,169 **** STAGE_FINALIZING } ArchiverStage; ! typedef enum ! { ! REQ_SCHEMA = 1, ! REQ_DATA = 2, ! REQ_ALL = REQ_SCHEMA + REQ_DATA ! } teReqs; typedef struct _archiveHandle { --- 158,173 ---- STAGE_FINALIZING } ArchiverStage; ! #define REQ_PRE_SCHEMA (1 << 0) ! #define REQ_DATA (1 << 1) ! #define REQ_POST_SCHEMA (1 << 2) ! #define REQ_ALL (REQ_PRE_SCHEMA + REQ_DATA + REQ_POST_SCHEMA) ! ! #define WANT_PRE_SCHEMA(req) ((req & REQ_PRE_SCHEMA) == REQ_PRE_SCHEMA) ! #define WANT_DATA(req) ((req & REQ_DATA) == REQ_DATA) ! #define WANT_POST_SCHEMA(req) ((req & REQ_POST_SCHEMA) == REQ_POST_SCHEMA) ! #define WANT_ALL(req) ((req & REQ_ALL) == REQ_ALL) ! typedef struct _archiveHandle { *************** *** 317,323 **** extern void ReadToc(ArchiveHandle *AH); extern void WriteDataChunks(ArchiveHandle *AH); ! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt); extern bool checkSeek(FILE *fp); #define appendStringLiteralAHX(buf,str,AH) \ --- 321,327 ---- extern void ReadToc(ArchiveHandle *AH); extern void WriteDataChunks(ArchiveHandle *AH); ! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt); extern bool checkSeek(FILE *fp); #define appendStringLiteralAHX(buf,str,AH) \ Index: src/bin/pg_dump/pg_dump.c =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.496 diff -c -r1.496 pg_dump.c *** src/bin/pg_dump/pg_dump.c 18 Jul 2008 03:32:52 -0000 1.496 --- src/bin/pg_dump/pg_dump.c 20 Jul 2008 05:57:58 -0000 *************** *** 72,77 **** --- 72,81 ---- bool dataOnly; bool aclsSkip; + /* groups of objects: default is we dump all groups */ + + int dumpObjFlags; + /* subquery used to convert user ID (eg, datdba) to user name */ static const char *username_subquery; *************** *** 224,231 **** RestoreOptions *ropt; static int disable_triggers = 0; ! static int outputNoTablespaces = 0; static int use_setsessauth = 0; static struct option long_options[] = { {"data-only", no_argument, NULL, 'a'}, --- 228,237 ---- RestoreOptions *ropt; static int disable_triggers = 0; ! static int outputNoTablespaces = 0; static int use_setsessauth = 0; + static int use_schemaPreLoadOnly; + static int use_schemaPostLoadOnly; static struct option long_options[] = { {"data-only", no_argument, NULL, 'a'}, *************** *** 265,270 **** --- 271,278 ---- {"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1}, {"disable-triggers", no_argument, &disable_triggers, 1}, {"no-tablespaces", no_argument, &outputNoTablespaces, 1}, + {"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1}, + {"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1}, {"use-set-session-authorization", no_argument, &use_setsessauth, 1}, {NULL, 0, NULL, 0} *************** *** 456,467 **** if (optind < argc) dbname = argv[optind]; ! if (dataOnly && schemaOnly) { ! write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n"); exit(1); } if (dataOnly && outputClean) { write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n"); --- 464,504 ---- if (optind < argc) dbname = argv[optind]; ! /* ! * Look for conflicting options relating to object groupings ! */ ! if (schemaOnly && dataOnly) ! { ! write_msg(NULL, "options %s and %s cannot be used together\n", ! "-s/--schema-only", "-a/--data-only"); ! exit(1); ! } ! else if ((schemaOnly || dataOnly) && ! (use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1)) { ! write_msg(NULL, "options %s and %s cannot be used together\n", ! schemaOnly ? "-s/--schema-only" : "-a/--data-only", ! use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load "); exit(1); } + /* + * Decide which of the object groups we will dump + */ + dumpObjFlags = REQ_ALL; + + if (dataOnly) + dumpObjFlags = REQ_DATA; + + if (use_schemaPreLoadOnly == 1) + dumpObjFlags = REQ_PRE_SCHEMA; + + if (use_schemaPostLoadOnly == 1) + dumpObjFlags = REQ_POST_SCHEMA; + + if (schemaOnly) + dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA); + if (dataOnly && outputClean) { write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n"); *************** *** 638,644 **** * Dumping blobs is now default unless we saw an inclusion switch or -s * ... but even if we did see one of these, -b turns it back on. */ ! if (include_everything && !schemaOnly) outputBlobs = true; /* --- 675,681 ---- * Dumping blobs is now default unless we saw an inclusion switch or -s * ... but even if we did see one of these, -b turns it back on. */ ! if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags)) outputBlobs = true; /* *************** *** 650,656 **** if (g_fout->remoteVersion < 80400) guessConstraintInheritance(tblinfo, numTables); ! if (!schemaOnly) getTableData(tblinfo, numTables, oids); if (outputBlobs && hasBlobs(g_fout)) --- 687,693 ---- if (g_fout->remoteVersion < 80400) guessConstraintInheritance(tblinfo, numTables); ! if (WANT_DATA(dumpObjFlags)) getTableData(tblinfo, numTables, oids); if (outputBlobs && hasBlobs(g_fout)) *************** *** 704,710 **** dumpStdStrings(g_fout); /* The database item is always next, unless we don't want it at all */ ! if (include_everything && !dataOnly) dumpDatabase(g_fout); /* Now the rearrangeable objects. */ --- 741,747 ---- dumpStdStrings(g_fout); /* The database item is always next, unless we don't want it at all */ ! if (include_everything && WANT_DATA(dumpObjFlags)) dumpDatabase(g_fout); /* Now the rearrangeable objects. */ *************** *** 726,732 **** ropt->noTablespace = outputNoTablespaces; ropt->disable_triggers = disable_triggers; ropt->use_setsessauth = use_setsessauth; ! ropt->dataOnly = dataOnly; if (compressLevel == -1) ropt->compression = 0; --- 763,769 ---- ropt->noTablespace = outputNoTablespaces; ropt->disable_triggers = disable_triggers; ropt->use_setsessauth = use_setsessauth; ! ropt->dumpObjFlags = dumpObjFlags; if (compressLevel == -1) ropt->compression = 0; *************** *** 3389,3395 **** continue; /* Ignore indexes of tables not to be dumped */ ! if (!tbinfo->dobj.dump) continue; if (g_verbose) --- 3426,3432 ---- continue; /* Ignore indexes of tables not to be dumped */ ! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags)) continue; if (g_verbose) *************** *** 5140,5146 **** int ncomments; /* Comments are SCHEMA not data */ ! if (dataOnly) return; /* Search for comments associated with catalogId, using table */ --- 5177,5183 ---- int ncomments; /* Comments are SCHEMA not data */ ! if (!WANT_PRE_SCHEMA(dumpObjFlags)) return; /* Search for comments associated with catalogId, using table */ *************** *** 5191,5197 **** PQExpBuffer target; /* Comments are SCHEMA not data */ ! if (dataOnly) return; /* Search for comments associated with relation, using table */ --- 5228,5234 ---- PQExpBuffer target; /* Comments are SCHEMA not data */ ! if (!WANT_PRE_SCHEMA(dumpObjFlags)) return; /* Search for comments associated with relation, using table */ *************** *** 5543,5549 **** char *qnspname; /* Skip if not to be dumped */ ! if (!nspinfo->dobj.dump || dataOnly) return; /* don't dump dummy namespace from pre-7.3 source */ --- 5580,5586 ---- char *qnspname; /* Skip if not to be dumped */ ! if (!nspinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* don't dump dummy namespace from pre-7.3 source */ *************** *** 5592,5598 **** dumpType(Archive *fout, TypeInfo *tinfo) { /* Skip if not to be dumped */ ! if (!tinfo->dobj.dump || dataOnly) return; /* Dump out in proper style */ --- 5629,5635 ---- dumpType(Archive *fout, TypeInfo *tinfo) { /* Skip if not to be dumped */ ! if (!tinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* Dump out in proper style */ *************** *** 6237,6243 **** PQExpBuffer q; /* Skip if not to be dumped */ ! if (!stinfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 6274,6280 ---- PQExpBuffer q; /* Skip if not to be dumped */ ! if (!stinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 6284,6290 **** if (!include_everything) return false; /* And they're schema not data */ ! if (dataOnly) return false; return true; } --- 6321,6327 ---- if (!include_everything) return false; /* And they're schema not data */ ! if (!WANT_PRE_SCHEMA(dumpObjFlags)) return false; return true; } *************** *** 6305,6311 **** FuncInfo *funcInfo; FuncInfo *validatorInfo = NULL; ! if (dataOnly) return; /* --- 6342,6348 ---- FuncInfo *funcInfo; FuncInfo *validatorInfo = NULL; ! if (!WANT_PRE_SCHEMA(dumpObjFlags)) return; /* *************** *** 6565,6571 **** int i; /* Skip if not to be dumped */ ! if (!finfo->dobj.dump || dataOnly) return; query = createPQExpBuffer(); --- 6602,6608 ---- int i; /* Skip if not to be dumped */ ! if (!finfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; query = createPQExpBuffer(); *************** *** 6960,6966 **** TypeInfo *sourceInfo; TypeInfo *targetInfo; ! if (dataOnly) return; if (OidIsValid(cast->castfunc)) --- 6997,7003 ---- TypeInfo *sourceInfo; TypeInfo *targetInfo; ! if (!WANT_PRE_SCHEMA(dumpObjFlags)) return; if (OidIsValid(cast->castfunc)) *************** *** 7110,7116 **** char *oprcanhash; /* Skip if not to be dumped */ ! if (!oprinfo->dobj.dump || dataOnly) return; /* --- 7147,7153 ---- char *oprcanhash; /* Skip if not to be dumped */ ! if (!oprinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* *************** *** 7494,7500 **** int i; /* Skip if not to be dumped */ ! if (!opcinfo->dobj.dump || dataOnly) return; /* --- 7531,7537 ---- int i; /* Skip if not to be dumped */ ! if (!opcinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* *************** *** 7802,7808 **** int i; /* Skip if not to be dumped */ ! if (!opfinfo->dobj.dump || dataOnly) return; /* --- 7839,7845 ---- int i; /* Skip if not to be dumped */ ! if (!opfinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* *************** *** 8071,8077 **** bool condefault; /* Skip if not to be dumped */ ! if (!convinfo->dobj.dump || dataOnly) return; query = createPQExpBuffer(); --- 8108,8114 ---- bool condefault; /* Skip if not to be dumped */ ! if (!convinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; query = createPQExpBuffer(); *************** *** 8225,8231 **** bool convertok; /* Skip if not to be dumped */ ! if (!agginfo->aggfn.dobj.dump || dataOnly) return; query = createPQExpBuffer(); --- 8262,8268 ---- bool convertok; /* Skip if not to be dumped */ ! if (!agginfo->aggfn.dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; query = createPQExpBuffer(); *************** *** 8428,8434 **** PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!prsinfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 8465,8471 ---- PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!prsinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 8497,8503 **** char *tmplname; /* Skip if not to be dumped */ ! if (!dictinfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 8534,8540 ---- char *tmplname; /* Skip if not to be dumped */ ! if (!dictinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 8582,8588 **** PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!tmplinfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 8619,8625 ---- PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!tmplinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 8648,8654 **** int i_dictname; /* Skip if not to be dumped */ ! if (!cfginfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 8685,8691 ---- int i_dictname; /* Skip if not to be dumped */ ! if (!cfginfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 8784,8790 **** PQExpBuffer sql; /* Do nothing if ACL dump is not enabled */ ! if (dataOnly || aclsSkip) return; sql = createPQExpBuffer(); --- 8821,8827 ---- PQExpBuffer sql; /* Do nothing if ACL dump is not enabled */ ! if (!WANT_PRE_SCHEMA(dumpObjFlags) || aclsSkip) return; sql = createPQExpBuffer(); *************** *** 8821,8827 **** { if (tbinfo->relkind == RELKIND_SEQUENCE) dumpSequence(fout, tbinfo); ! else if (!dataOnly) dumpTableSchema(fout, tbinfo); /* Handle the ACL here */ --- 8858,8864 ---- { if (tbinfo->relkind == RELKIND_SEQUENCE) dumpSequence(fout, tbinfo); ! else if (WANT_PRE_SCHEMA(dumpObjFlags)) dumpTableSchema(fout, tbinfo); /* Handle the ACL here */ *************** *** 9128,9134 **** PQExpBuffer delq; /* Only print it if "separate" mode is selected */ ! if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly) return; /* Don't print inherited defaults, either */ --- 9165,9171 ---- PQExpBuffer delq; /* Only print it if "separate" mode is selected */ ! if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_PRE_SCHEMA(dumpObjFlags)) return; /* Don't print inherited defaults, either */ *************** *** 9213,9219 **** PQExpBuffer q; PQExpBuffer delq; ! if (dataOnly) return; q = createPQExpBuffer(); --- 9250,9256 ---- PQExpBuffer q; PQExpBuffer delq; ! if (!WANT_POST_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 9282,9288 **** PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!coninfo->dobj.dump || dataOnly) return; q = createPQExpBuffer(); --- 9319,9325 ---- PQExpBuffer delq; /* Skip if not to be dumped */ ! if (!coninfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags)) return; q = createPQExpBuffer(); *************** *** 9675,9681 **** * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump. */ ! if (!dataOnly) { resetPQExpBuffer(delqry); --- 9712,9718 ---- * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump. */ ! if (WANT_PRE_SCHEMA(dumpObjFlags)) { resetPQExpBuffer(delqry); *************** *** 9778,9784 **** tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId); } ! if (!schemaOnly) { resetPQExpBuffer(query); appendPQExpBuffer(query, "SELECT pg_catalog.setval("); --- 9815,9821 ---- tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId); } ! if (WANT_PRE_SCHEMA(dumpObjFlags)) { resetPQExpBuffer(query); appendPQExpBuffer(query, "SELECT pg_catalog.setval("); *************** *** 9811,9817 **** const char *p; int findx; ! if (dataOnly) return; query = createPQExpBuffer(); --- 9848,9854 ---- const char *p; int findx; ! if (!WANT_POST_SCHEMA(dumpObjFlags)) return; query = createPQExpBuffer(); *************** *** 10019,10025 **** PGresult *res; /* Skip if not to be dumped */ ! if (!rinfo->dobj.dump || dataOnly) return; /* --- 10056,10062 ---- PGresult *res; /* Skip if not to be dumped */ ! if (!rinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags)) return; /* Index: src/bin/pg_dump/pg_restore.c =================================================================== RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v retrieving revision 1.88 diff -c -r1.88 pg_restore.c *** src/bin/pg_dump/pg_restore.c 13 Apr 2008 03:49:22 -0000 1.88 --- src/bin/pg_dump/pg_restore.c 20 Jul 2008 05:57:51 -0000 *************** *** 78,83 **** --- 78,90 ---- static int no_data_for_failed_tables = 0; static int outputNoTablespaces = 0; static int use_setsessauth = 0; + bool dataOnly = false; + bool schemaOnly = false; + + static int use_schemaPreLoadOnly; + static int use_schemaPostLoadOnly; + + int dumpObjFlags; struct option cmdopts[] = { {"clean", 0, NULL, 'c'}, *************** *** 114,119 **** --- 121,128 ---- {"disable-triggers", no_argument, &disable_triggers, 1}, {"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1}, {"no-tablespaces", no_argument, &outputNoTablespaces, 1}, + {"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1}, + {"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1}, {"use-set-session-authorization", no_argument, &use_setsessauth, 1}, {NULL, 0, NULL, 0} *************** *** 145,151 **** switch (c) { case 'a': /* Dump data only */ ! opts->dataOnly = 1; break; case 'c': /* clean (i.e., drop) schema prior to create */ opts->dropSchema = 1; --- 154,160 ---- switch (c) { case 'a': /* Dump data only */ ! dataOnly = true; break; case 'c': /* clean (i.e., drop) schema prior to create */ opts->dropSchema = 1; *************** *** 213,219 **** opts->triggerNames = strdup(optarg); break; case 's': /* dump schema only */ ! opts->schemaOnly = 1; break; case 'S': /* Superuser username */ if (strlen(optarg) != 0) --- 222,228 ---- opts->triggerNames = strdup(optarg); break; case 's': /* dump schema only */ ! schemaOnly = true; break; case 'S': /* Superuser username */ if (strlen(optarg) != 0) *************** *** 295,300 **** --- 304,344 ---- opts->useDB = 1; } + /* + * Look for conflicting options relating to object groupings + */ + if (schemaOnly && dataOnly) + { + write_msg(NULL, "options %s and %s cannot be used together\n", + "-s/--schema-only", "-a/--data-only"); + exit(1); + } + else if ((schemaOnly || dataOnly) && + (use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1)) + { + write_msg(NULL, "options %s and %s cannot be used together\n", + schemaOnly ? "-s/--schema-only" : "-a/--data-only", + use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load "); + exit(1); + } + + /* + * Decide which of the object groups we will dump + */ + dumpObjFlags = REQ_ALL; + + if (dataOnly) + dumpObjFlags = REQ_DATA; + + if (use_schemaPreLoadOnly == 1) + dumpObjFlags = REQ_PRE_SCHEMA; + + if (use_schemaPostLoadOnly == 1) + dumpObjFlags = REQ_POST_SCHEMA; + + if (schemaOnly) + dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA); + opts->disable_triggers = disable_triggers; opts->noDataForFailedTables = no_data_for_failed_tables; opts->noTablespace = outputNoTablespaces; -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceSimon,
* Simon Riggs (simon@...) wrote: > On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote: > > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: [...] > > > - Conflicting option handling Thanks for putting in the extra code to explicitly indicate which conflicting options were used. > > > - Documentation > > > When writing the documentation I would stress that "pre-schema" and > > > "post-schema" be defined in terms of PostgreSQL objects and why they > > > are pre vs. post. Perhaps this is up for some debate, but I find the documentation added for these options to be lacking the definitions I was looking for, and the explanation of why they are what they are. I'm also not sure I agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't really fit with the option names or what they do. Would you consider: <term><option>--schema-pre-load</option></term> <listitem> <para> Pre-Data Load - Minimum amount of the schema required before data loading can begin. This consists mainly of creating the tables using the <command>CREATE TABLE</command>. This part can be requested using <option>--schema-pre-load</>. </para> </listitem> <term><option>--schema-post-load</option></term> <listitem> <para> Post-Data Load - The rest of the schema definition, including keys, indexes, etc. By putting keys and indexes after the data has been loaded the whole process of restoring data is much faster. This is because it is faster to build indexes and check keys in bulk than piecemeal as the data is loaded. This part can be requested using <option>--schema-post-load</>. </para> </listitem> Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types? A couple of additional points: - The command-line help hasn't been updated. Clearly, that also needs to be done to consider the documentation aspect complete. - There appears to be a bit of mistakenly included additions. The patch to pg_restore.sgml attempts to add in documentation for --superuser. I'm guessing that was unintentional, and looks like just a mistaken extra copy&paste. > > > - Technically, the patch needs to be updated slightly since another > > > pg_dump-related patch was committed recently which also added > > > options and thus causes a conflict. I think this might have just happened again, funny enough. It's something that a committer could perhaps fix, but if you're reworking the patch anyway... Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceOn Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: > Perhaps this is up for some debate, but I find the documentation added > for these options to be lacking the definitions I was looking for, and > the explanation of why they are what they are. I'm also not sure I > agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't > really fit with the option names or what they do. Would you consider: Will reword. > Even this doesn't cover everything though- it's too focused on tables > and data loading. Where do functions go? What about types? Yes, it is focused on tables and data loading. What about functions/types? No relevance here. > - The command-line help hasn't been updated. Clearly, that also needs > to be done to consider the documentation aspect complete. > > - There appears to be a bit of mistakenly included additions. The > patch to pg_restore.sgml attempts to add in documentation for > --superuser. I'm guessing that was unintentional, and looks like > just a mistaken extra copy&paste. Thanks, will do. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performance* Simon Riggs (simon@...) wrote:
> On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: > > Even this doesn't cover everything though- it's too focused on tables > > and data loading. Where do functions go? What about types? > > Yes, it is focused on tables and data loading. What about > functions/types? No relevance here. I don't see how they're not relevant, it's not like they're being excluded and in fact they show up in the pre-load output. Heck, even if they *were* excluded, that should be made clear in the documentation (either be an explicit include list, or saying they're excluded). Part of what's driving this is making sure we have a plan for future objects and where they'll go. Perhaps it would be enough to just say "pre-load is everything in the schema, except things which are faster done in bulk (eg: indexes, keys)". I don't think it's right to say pre-load is "only object definitions required to load data" when it includes functions and ACLs though. Hopefully my suggestion and these comments will get us to a happy middle-ground. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceOn Sun, Jul 20, 2008 at 09:18:29PM -0400, Stephen Frost wrote:
> * Simon Riggs (simon@...) wrote: > > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: > > > Even this doesn't cover everything though- it's too focused on tables > > > and data loading. Where do functions go? What about types? > > > > Yes, it is focused on tables and data loading. What about > > functions/types? No relevance here. > > I don't see how they're not relevant, it's not like they're being > excluded and in fact they show up in the pre-load output. Heck, even if > they *were* excluded, that should be made clear in the documentation > (either be an explicit include list, or saying they're excluded). > > Part of what's driving this is making sure we have a plan for future > objects and where they'll go. Perhaps it would be enough to just say > "pre-load is everything in the schema, except things which are faster > done in bulk (eg: indexes, keys)". I don't think it's right to say > pre-load is "only object definitions required to load data" when it > includes functions and ACLs though. > > Hopefully my suggestion and these comments will get us to a happy > middle-ground. One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. -dg -- David Gould daveg@... 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performance* daveg (daveg@...) wrote:
> One observation, indexes should be built right after the table data > is loaded for each table, this way, the index build gets a hot cache > for the table data instead of having to re-read it later as we do now. That's not how pg_dump has traditionally worked, and the point of this patch is to add options to easily segregate the main pieces of the existing pg_dump output (main schema definition, data dump, key/index building). You suggestion brings up an interesting point that should pg_dump's traditional output structure change the "--schema-post-load" set of objects wouldn't be as clear to newcomers since the load and the indexes would be interleaved in the regular output. I'd be curious about the performance impact this has on an actual load too. It would probably be more valuable on smaller loads where it would have less of an impact anyway than on loads larger than the cache size. Still, not an issue for this patch, imv. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceStephen Frost <sfrost@...> writes:
> * daveg (daveg@...) wrote: >> One observation, indexes should be built right after the table data >> is loaded for each table, this way, the index build gets a hot cache >> for the table data instead of having to re-read it later as we do now. > That's not how pg_dump has traditionally worked, and the point of this > patch is to add options to easily segregate the main pieces of the > existing pg_dump output (main schema definition, data dump, key/index > building). You suggestion brings up an interesting point that should > pg_dump's traditional output structure change the "--schema-post-load" > set of objects wouldn't be as clear to newcomers since the load and the > indexes would be interleaved in the regular output. Yeah. Also, that is pushing into an entirely different line of development, which is to enable multithreaded pg_restore. The patch at hand is necessarily incompatible with that type of operation, and wouldn't be used together with it. As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data and not try to be any more specific than that. There are corner cases that will turn any simple breakdown into a lie, and I doubt that it's worth trying to explain them all. (Take a close look at the dependency loop breaking logic in pg_dump if you doubt this.) I hadn't realized that Simon was using "pre-schema" and "post-schema" to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about "pre-data and "post-data"? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceOn Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote: > * Simon Riggs (simon@...) wrote: > > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: > > > Even this doesn't cover everything though- it's too focused on tables > > > and data loading. Where do functions go? What about types? > > > > Yes, it is focused on tables and data loading. What about > > functions/types? No relevance here. > > I don't see how they're not relevant, it's not like they're being > excluded and in fact they show up in the pre-load output. Heck, even if > they *were* excluded, that should be made clear in the documentation > (either be an explicit include list, or saying they're excluded). > > Part of what's driving this is making sure we have a plan for future > objects and where they'll go. Perhaps it would be enough to just say > "pre-load is everything in the schema, except things which are faster > done in bulk (eg: indexes, keys)". I don't think it's right to say > pre-load is "only object definitions required to load data" when it > includes functions and ACLs though. > > Hopefully my suggestion and these comments will get us to a happy > middle-ground. I don't really understand what you're saying. The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says "Dumps exactly what <option>--schema-only</> would dump, but only those statements before the data load." What is it you are suggesting? I'm unclear. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceOn Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote: > Stephen Frost <sfrost@...> writes: > > * daveg (daveg@...) wrote: > >> One observation, indexes should be built right after the table data > >> is loaded for each table, this way, the index build gets a hot cache > >> for the table data instead of having to re-read it later as we do now. > > > That's not how pg_dump has traditionally worked, and the point of this > > patch is to add options to easily segregate the main pieces of the > > existing pg_dump output (main schema definition, data dump, key/index > > building). You suggestion brings up an interesting point that should > > pg_dump's traditional output structure change the "--schema-post-load" > > set of objects wouldn't be as clear to newcomers since the load and the > > indexes would be interleaved in the regular output. Stephen: Agreed. > Yeah. Also, that is pushing into an entirely different line of > development, which is to enable multithreaded pg_restore. The patch > at hand is necessarily incompatible with that type of operation, and > wouldn't be used together with it. > > As far as the documentation/definition aspect goes, I think it should > just say the parts are > * stuff needed before you can load the data > * the data > * stuff needed after loading the data > and not try to be any more specific than that. There are corner cases > that will turn any simple breakdown into a lie, and I doubt that it's > worth trying to explain them all. (Take a close look at the dependency > loop breaking logic in pg_dump if you doubt this.) Tom: Agreed. > I hadn't realized that Simon was using "pre-schema" and "post-schema" > to name the first and third parts. I'd agree that this is confusing > nomenclature: it looks like it's trying to say that the data is the > schema, and the schema is not! How about "pre-data and "post-data"? OK by me. Any other takers? I also suggested having three options --want-pre-schema --want-data --want-post-schema so we could ask for any or all parts in the one dump. --data-only and --schema-only are negative options so don't allow this. (I don't like those names either, just thinking about capabilities) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceSimon Riggs <simon@...> writes:
> I also suggested having three options > --want-pre-schema > --want-data > --want-post-schema > so we could ask for any or all parts in the one dump. --data-only and > --schema-only are negative options so don't allow this. > (I don't like those names either, just thinking about capabilities) Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Not wedded to these either, just tossing out an idea... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performance* Simon Riggs (simon@...) wrote:
> The options split the dump into 3 parts that's all: before the load, the > load and after the load. > > --schema-pre-load says > "Dumps exactly what <option>--schema-only</> would dump, but only those > statements before the data load." > > What is it you are suggesting? I'm unclear. That part is fine, the problem is that elsewhere in the documentation (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you change it to be "objects required before data loading", which isn't the same. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceTom,
* Tom Lane (tgl@...) wrote: > As far as the documentation/definition aspect goes, I think it should > just say the parts are > * stuff needed before you can load the data > * the data > * stuff needed after loading the data > and not try to be any more specific than that. There are corner cases > that will turn any simple breakdown into a lie, and I doubt that it's > worth trying to explain them all. (Take a close look at the dependency > loop breaking logic in pg_dump if you doubt this.) Even that is a lie though, which I guess is what my problem is. It's really "everything for the schema, except stuff that is better done in bulk", I believe. Also, I'm a bit concerned about people who would argue that you need PKs and FKs before you can load the data. Probably couldn't be avoided tho. > I hadn't realized that Simon was using "pre-schema" and "post-schema" > to name the first and third parts. I'd agree that this is confusing > nomenclature: it looks like it's trying to say that the data is the > schema, and the schema is not! How about "pre-data and "post-data"? Argh. The command-line options follow the 'data'/'load' line (--schema-pre-load and --schema-post-load), and so I think those are fine. The problem was that in the documentation he switched to saying they were "Pre-Schema" and "Post-Schema", which could lead to confusion. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceSimon,
* Simon Riggs (simon@...) wrote: > > I hadn't realized that Simon was using "pre-schema" and "post-schema" > > to name the first and third parts. I'd agree that this is confusing > > nomenclature: it looks like it's trying to say that the data is the > > schema, and the schema is not! How about "pre-data and "post-data"? > > OK by me. Any other takers? Having the command-line options be "--schema-pre-data" and "--schema-post-data" is fine with me. Leaving them the way they are is also fine by me. It's the documentation (back to pg_dump.sgml, ~774/~797) that starts talking about Pre-Schema and Post-Schema. Thanks, Stephen |
|
|
Re: pg_dump additional options for performanceTom Lane wrote: > Simon Riggs <simon@...> writes: > >> I also suggested having three options >> --want-pre-schema >> --want-data >> --want-post-schema >> so we could ask for any or all parts in the one dump. --data-only and >> --schema-only are negative options so don't allow this. >> (I don't like those names either, just thinking about capabilities) >> > > Maybe invert the logic? > > --omit-pre-data > --omit-data > --omit-post-data > > Not wedded to these either, just tossing out an idea... > > > Please, no. Negative logic seems likely to cause endless confusion. I'd even be happier with --schema-part-1 and --schema-part-2 if we can't find some more expressive way of designating them. cheers andrew -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceAndrew Dunstan <andrew@...> writes:
> Tom Lane wrote: >> Maybe invert the logic? >> --omit-pre-data >> --omit-data >> --omit-post-data > Please, no. Negative logic seems likely to cause endless confusion. I think it might actually be less confusing, because with this approach, each switch has an identifiable default (no) and setting it doesn't cause side-effects on settings of other switches. The interactions of the switches as Simon presents 'em seem less than obvious. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceOn Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote: > * Simon Riggs (simon@...) wrote: > > The options split the dump into 3 parts that's all: before the load, the > > load and after the load. > > > > --schema-pre-load says > > "Dumps exactly what <option>--schema-only</> would dump, but only those > > statements before the data load." > > > > What is it you are suggesting? I'm unclear. > > That part is fine, the problem is that elsewhere in the documentation > (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you > change it to be "objects required before data loading", which isn't the > same. OK, gotcha now - will change that. I thought you might mean something about changing the output itself. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceStephen Frost <sfrost@...> writes:
> * Tom Lane (tgl@...) wrote: >> As far as the documentation/definition aspect goes, I think it should >> just say the parts are >> * stuff needed before you can load the data >> * the data >> * stuff needed after loading the data > Even that is a lie though, which I guess is what my problem is. True; the stuff done after is done that way at least in part for performance reasons rather than because it has to be done that way. (I think it's not only performance issues, though --- for circular FKs you pretty much have to load the data first.) >> I hadn't realized that Simon was using "pre-schema" and "post-schema" >> to name the first and third parts. I'd agree that this is confusing >> nomenclature: it looks like it's trying to say that the data is the >> schema, and the schema is not! How about "pre-data and "post-data"? > Argh. The command-line options follow the 'data'/'load' line > (--schema-pre-load and --schema-post-load), and so I think those are > fine. The problem was that in the documentation he switched to saying > they were "Pre-Schema" and "Post-Schema", which could lead to confusion. Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches |
|
|
Re: pg_dump additional options for performanceTom, et al,
* Tom Lane (tgl@...) wrote: > Ah, I see. No objection to those switch names, at least assuming we > want to stick to positive-logic switches. What did you think of the > negative-logic suggestion (--omit-xxx)? My preference is for positive-logic switches in general. The place where I would use this patch would lend itself to being more options if --omit-xxxx were used. I expect that would hold true for most people. It would be: --omit-data --omit-post-load --omit-pre-load --omit-post-load --omit-pre-load --omit-data vs. --schema-pre-load --data-only --schema-post-load Point being that I'd be dumping these into seperate files where I could more easily manipulate the pre-load or post-load files. I'd still want pre/post load to be seperate though since this would be used in cases where there's alot of data (hence the reason for the split) and putting pre and post together and running them before data would slow things down quite a bit. Are there use cases for just --omit-post-load or --omit-pre-load? Probably, but I just don't see any situation where I'd use them like that. Thanks, Stephen |
| < Prev | 1 - 2 - 3 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |