pg_dump additional options for performance

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Simon,

  I agree with adding these options in general, since I find myself
  frustrated by having to vi huge dumps to change simple schema things.
  A couple of comments on the patch though:

  - Conflicting option handling
    I think we are doing our users a disservice by putting it on them to
        figure out exactly what:
        multiple object groups cannot be used together
        means to them.  You and I may understand what an "object group" is,
        and why there can be only one, but it's a great deal less clear than
        the prior message of
        options -s/--schema-only and -a/--data-only cannot be used together
        My suggestion would be to either list out the specific options which
        can't be used together, as was done previously, or add a bit of (I
        realize, boring) code and actually tell the user which of the
        conflicting options were used.

  - Documentation
        When writing the documentation I would stress that "pre-schema" and
        "post-schema" be defined in terms of PostgreSQL objects and why they
        are pre vs. post.

  - Technically, the patch needs to be updated slightly since another
        pg_dump-related patch was committed recently which also added
        options and thus causes a conflict.

  Beyond those minor points, the patch looks good to me.

  Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:

> Simon,
>
>   I agree with adding these options in general, since I find myself
>   frustrated by having to vi huge dumps to change simple schema things.
>   A couple of comments on the patch though:
>
>   - Conflicting option handling
>     I think we are doing our users a disservice by putting it on them to
> figure out exactly what:
> multiple object groups cannot be used together
> means to them.  You and I may understand what an "object group" is,
> and why there can be only one, but it's a great deal less clear than
> the prior message of
> options -s/--schema-only and -a/--data-only cannot be used together
> My suggestion would be to either list out the specific options which
> can't be used together, as was done previously, or add a bit of (I
> realize, boring) code and actually tell the user which of the
> conflicting options were used.
>
>   - Documentation
> When writing the documentation I would stress that "pre-schema" and
> "post-schema" be defined in terms of PostgreSQL objects and why they
> are pre vs. post.
>
>   - Technically, the patch needs to be updated slightly since another
> pg_dump-related patch was committed recently which also added
> options and thus causes a conflict.
>
>   Beyond those minor points, the patch looks good to me.

Thanks for the review. I'll make the changes you suggest.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:

> On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
> > Simon,
> >
> >   I agree with adding these options in general, since I find myself
> >   frustrated by having to vi huge dumps to change simple schema things.
> >   A couple of comments on the patch though:
> >
> >   - Conflicting option handling
> >     I think we are doing our users a disservice by putting it on them to
> > figure out exactly what:
> > multiple object groups cannot be used together
> > means to them.  You and I may understand what an "object group" is,
> > and why there can be only one, but it's a great deal less clear than
> > the prior message of
> > options -s/--schema-only and -a/--data-only cannot be used together
> > My suggestion would be to either list out the specific options which
> > can't be used together, as was done previously, or add a bit of (I
> > realize, boring) code and actually tell the user which of the
> > conflicting options were used.
> >
> >   - Documentation
> > When writing the documentation I would stress that "pre-schema" and
> > "post-schema" be defined in terms of PostgreSQL objects and why they
> > are pre vs. post.
> >
> >   - Technically, the patch needs to be updated slightly since another
> > pg_dump-related patch was committed recently which also added
> > options and thus causes a conflict.
> >
> >   Beyond those minor points, the patch looks good to me.
>
> Thanks for the review. I'll make the changes you suggest.
Patch updated to head, plus changes/docs requested.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

[pg_dump_prepost.v3.patch]

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.102
diff -c -r1.102 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml 13 Apr 2008 03:49:21 -0000 1.102
--- doc/src/sgml/ref/pg_dump.sgml 20 Jul 2008 06:33:30 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-pre-load</> or
! <option>--schema-post-load</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 443,448 ****
--- 444,471 ----
       </varlistentry>
 
       <varlistentry>
+       <term><option>--schema-pre-load</option></term>
+       <listitem>
+        <para>
+ Dump only the object definitions (schema) required to load data. Dumps
+ exactly what <option>--schema-only</> would dump, but only those
+ statements before the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><option>--schema-post-load</option></term>
+       <listitem>
+        <para>
+ Dump only the object definitions (schema) required after data has been
+ loaded. Dumps exactly what <option>--schema-only</> would dump, but
+ only those statements after the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 774,779 ****
--- 797,830 ----
    </para>
 
    <para>
+    The output of pg_dump can be notionally divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+  Pre-Schema - objects required before data loading, such as
+  <command>CREATE TABLE</command>.
+  This part can be requested using <option>--schema-pre-load</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+  Post-Schema - objects required after data loading, such as
+  <command>ALTER TABLE</command> and <command>CREATE INDEX</command>.
+  This part can be requested using <option>--schema-post-load</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml 13 Apr 2008 03:49:21 -0000 1.75
--- doc/src/sgml/ref/pg_restore.sgml 20 Jul 2008 06:33:18 -0000
***************
*** 321,326 ****
--- 321,350 ----
       </varlistentry>
 
       <varlistentry>
+       <term><option>--schema-post-load</option></term>
+       <listitem>
+        <para>
+ Dump only the object definitions (schema) required after data has been
+ loaded. Dumps exactly what <option>--schema-only</> would dump, but
+ only those statements after the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
+       <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
+       <listitem>
+        <para>
+         Specify the superuser user name to use when disabling triggers.
+         This is only relevant if <option>--disable-triggers</> is used.
+         (Usually, it's better to leave this out, and instead start the
+         resulting script as superuser.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 596,629 ----
    </para>
 
    <para>
+    The actions of pg_restore can be notionally divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+  Pre-Schema - objects required before data loading, such as
+  <command>CREATE TABLE</command>.
+  This part can be requested using <option>--schema-pre-load</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+  Post-Schema - objects required after data loading, such as
+  <command>ALTER TABLE</command> and <command>CREATE INDEX</command>.
+  This part can be requested using <option>--schema-post-load</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
 
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h 13 Apr 2008 03:49:21 -0000 1.47
--- src/bin/pg_dump/pg_backup.h 20 Jul 2008 05:19:34 -0000
***************
*** 89,95 ****
  int use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
  char   *superuser; /* Username to use as superuser */
! int dataOnly;
  int dropSchema;
  char   *filename;
  int schemaOnly;
--- 89,95 ----
  int use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
  char   *superuser; /* Username to use as superuser */
! int dumpObjFlags; /* which objects types to dump */
  int dropSchema;
  char   *filename;
  int schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c 4 May 2008 08:32:21 -0000 1.157
--- src/bin/pg_dump/pg_backup_archiver.c 20 Jul 2008 05:19:34 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  ArchiveHandle *AH = (ArchiveHandle *) AHX;
  TocEntry   *te;
! teReqs reqs;
  OutputContext sav;
  bool defnDumped;
 
--- 129,135 ----
  {
  ArchiveHandle *AH = (ArchiveHandle *) AHX;
  TocEntry   *te;
! int reqs;
  OutputContext sav;
  bool defnDumped;
 
***************
*** 175,193 ****
  * Work out if we have an implied data-only restore. This can happen if
  * the dump was data only or if the user has used a toc list to exclude
  * all of the schema data. All we do is look for schema entries - if none
! * are found then we set the dataOnly flag.
  *
! * We could scan for wanted TABLE entries, but that is not the same as
! * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  */
! if (!ropt->dataOnly)
  {
  int impliedDataOnly = 1;
 
  for (te = AH->toc->next; te != AH->toc; te = te->next)
  {
  reqs = _tocEntryRequired(te, ropt, true);
! if ((reqs & REQ_SCHEMA) != 0)
  { /* It's schema, and it's wanted */
  impliedDataOnly = 0;
  break;
--- 175,193 ----
  * Work out if we have an implied data-only restore. This can happen if
  * the dump was data only or if the user has used a toc list to exclude
  * all of the schema data. All we do is look for schema entries - if none
! * are found then say we only want DATA type objects.
  *
! * We could scan for wanted TABLE entries, but that is not the same.
! * At this stage, it seems unnecessary (6-Mar-2001).
  */
! if (!WANT_DATA(ropt->dumpObjFlags))
  {
  int impliedDataOnly = 1;
 
  for (te = AH->toc->next; te != AH->toc; te = te->next)
  {
  reqs = _tocEntryRequired(te, ropt, true);
! if (WANT_PRE_SCHEMA(reqs) || WANT_POST_SCHEMA(reqs))
  { /* It's schema, and it's wanted */
  impliedDataOnly = 0;
  break;
***************
*** 195,201 ****
  }
  if (impliedDataOnly)
  {
! ropt->dataOnly = impliedDataOnly;
  ahlog(AH, 1, "implied data-only restore\n");
  }
  }
--- 195,201 ----
  }
  if (impliedDataOnly)
  {
! ropt->dumpObjFlags = REQ_DATA;
  ahlog(AH, 1, "implied data-only restore\n");
  }
  }
***************
*** 236,242 ****
  AH->currentTE = te;
 
  reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  {
  /* We want the schema */
  ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  AH->currentTE = te;
 
  reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! if (((reqs & REQ_PRE_SCHEMA) != 0) && te->dropStmt)
  {
  /* We want the schema */
  ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  /* Dump any relevant dump warnings to stderr */
  if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  {
! if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  /* Dump any relevant dump warnings to stderr */
  if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  {
! if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
 
  defnDumped = false;
 
! if ((reqs & REQ_SCHEMA) != 0) /* We want the schema */
  {
  ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
 
--- 286,293 ----
 
  defnDumped = false;
 
! if ((WANT_PRE_SCHEMA(reqs) && WANT_PRE_SCHEMA(ropt->dumpObjFlags)) ||
! (WANT_POST_SCHEMA(reqs) && WANT_POST_SCHEMA(ropt->dumpObjFlags))) /* We want the schema */
  {
  ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
 
***************
*** 331,337 ****
  /*
  * If we have a data component, then process it
  */
! if ((reqs & REQ_DATA) != 0)
  {
  /*
  * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  /*
  * If we have a data component, then process it
  */
! if (WANT_DATA(reqs))
  {
  /*
  * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  /*
  * If we can output the data, then restore it.
  */
! if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  {
  #ifndef HAVE_LIBZ
  if (AH->compression != 0)
--- 344,350 ----
  /*
  * If we can output the data, then restore it.
  */
! if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  {
  #ifndef HAVE_LIBZ
  if (AH->compression != 0)
***************
*** 415,421 ****
  /* Work out what, if anything, we want from this entry */
  reqs = _tocEntryRequired(te, ropt, true);
 
! if ((reqs & REQ_SCHEMA) != 0) /* We want the schema */
  {
  ahlog(AH, 1, "setting owner and privileges for %s %s\n",
   te->desc, te->tag);
--- 416,422 ----
  /* Work out what, if anything, we want from this entry */
  reqs = _tocEntryRequired(te, ropt, true);
 
! if (WANT_PRE_SCHEMA(reqs)) /* We want the schema */
  {
  ahlog(AH, 1, "setting owner and privileges for %s %s\n",
   te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  /* This hack is only needed in a data-only restore */
! if (!ropt->dataOnly || !ropt->disable_triggers)
  return;
 
  ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  /* This hack is only needed in a data-only restore */
! if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  return;
 
  ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  /* This hack is only needed in a data-only restore */
! if (!ropt->dataOnly || !ropt->disable_triggers)
  return;
 
  ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  /* This hack is only needed in a data-only restore */
! if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  return;
 
  ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  return NULL;
  }
 
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  return NULL;
  }
 
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  te->defn);
  }
 
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! teReqs res = REQ_ALL;
 
  /* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  te->defn);
  }
 
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! int res = ropt->dumpObjFlags;
 
  /* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  return 0;
 
- /* Mask it if we only want schema */
- if (ropt->schemaOnly)
- res = res & REQ_SCHEMA;
-
- /* Mask it we only want data */
- if (ropt->dataOnly)
- res = res & REQ_DATA;
-
  /* Mask it if we don't have a schema contribution */
  if (!te->defn || strlen(te->defn) == 0)
! res = res & ~REQ_SCHEMA;
 
  /* Finally, if there's a per-ID filter, limit based on that as well */
  if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  return 0;
 
  /* Mask it if we don't have a schema contribution */
  if (!te->defn || strlen(te->defn) == 0)
! res = res & ~(REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
 
  /* Finally, if there's a per-ID filter, limit based on that as well */
  if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h 7 Nov 2007 12:24:24 -0000 1.76
--- src/bin/pg_dump/pg_backup_archiver.h 20 Jul 2008 05:19:34 -0000
***************
*** 158,169 ****
  STAGE_FINALIZING
  } ArchiverStage;
 
! typedef enum
! {
! REQ_SCHEMA = 1,
! REQ_DATA = 2,
! REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
 
  typedef struct _archiveHandle
  {
--- 158,173 ----
  STAGE_FINALIZING
  } ArchiverStage;
 
! #define REQ_PRE_SCHEMA (1 << 0)
! #define REQ_DATA (1 << 1)
! #define REQ_POST_SCHEMA (1 << 2)
! #define REQ_ALL (REQ_PRE_SCHEMA + REQ_DATA + REQ_POST_SCHEMA)
!
! #define WANT_PRE_SCHEMA(req) ((req & REQ_PRE_SCHEMA) == REQ_PRE_SCHEMA)
! #define WANT_DATA(req) ((req & REQ_DATA) == REQ_DATA)
! #define WANT_POST_SCHEMA(req) ((req & REQ_POST_SCHEMA) == REQ_POST_SCHEMA)
! #define WANT_ALL(req) ((req & REQ_ALL) == REQ_ALL)
!
 
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
 
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
 
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
 
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
 
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.496
diff -c -r1.496 pg_dump.c
*** src/bin/pg_dump/pg_dump.c 18 Jul 2008 03:32:52 -0000 1.496
--- src/bin/pg_dump/pg_dump.c 20 Jul 2008 05:57:58 -0000
***************
*** 72,77 ****
--- 72,81 ----
  bool dataOnly;
  bool aclsSkip;
 
+ /* groups of objects: default is we dump all groups */
+
+ int dumpObjFlags;
+
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
 
***************
*** 224,231 ****
  RestoreOptions *ropt;
 
  static int disable_triggers = 0;
! static int  outputNoTablespaces = 0;
  static int use_setsessauth = 0;
 
  static struct option long_options[] = {
  {"data-only", no_argument, NULL, 'a'},
--- 228,237 ----
  RestoreOptions *ropt;
 
  static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
  static int use_setsessauth = 0;
+ static int use_schemaPreLoadOnly;
+ static int use_schemaPostLoadOnly;
 
  static struct option long_options[] = {
  {"data-only", no_argument, NULL, 'a'},
***************
*** 265,270 ****
--- 271,278 ----
  {"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1},
  {"disable-triggers", no_argument, &disable_triggers, 1},
  {"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+   {"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1},
+   {"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1},
  {"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 
  {NULL, 0, NULL, 0}
***************
*** 456,467 ****
  if (optind < argc)
  dbname = argv[optind];
 
! if (dataOnly && schemaOnly)
  {
! write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  exit(1);
  }
 
  if (dataOnly && outputClean)
  {
  write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n");
--- 464,504 ----
  if (optind < argc)
  dbname = argv[optind];
 
! /*
! * Look for conflicting options relating to object groupings
! */
! if (schemaOnly && dataOnly)
! {
! write_msg(NULL, "options %s and %s cannot be used together\n",
! "-s/--schema-only", "-a/--data-only");
! exit(1);
! }
! else if ((schemaOnly || dataOnly) &&
! (use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1))
  {
! write_msg(NULL, "options %s and %s cannot be used together\n",
! schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load ");
  exit(1);
  }
 
+ /*
+ * Decide which of the object groups we will dump
+ */
+ dumpObjFlags = REQ_ALL;
+
+ if (dataOnly)
+ dumpObjFlags = REQ_DATA;
+
+ if (use_schemaPreLoadOnly == 1)
+ dumpObjFlags = REQ_PRE_SCHEMA;
+
+ if (use_schemaPostLoadOnly == 1)
+ dumpObjFlags = REQ_POST_SCHEMA;
+
+ if (schemaOnly)
+ dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
+
  if (dataOnly && outputClean)
  {
  write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n");
***************
*** 638,644 ****
  * Dumping blobs is now default unless we saw an inclusion switch or -s
  * ... but even if we did see one of these, -b turns it back on.
  */
! if (include_everything && !schemaOnly)
  outputBlobs = true;
 
  /*
--- 675,681 ----
  * Dumping blobs is now default unless we saw an inclusion switch or -s
  * ... but even if we did see one of these, -b turns it back on.
  */
! if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
  outputBlobs = true;
 
  /*
***************
*** 650,656 ****
  if (g_fout->remoteVersion < 80400)
  guessConstraintInheritance(tblinfo, numTables);
 
! if (!schemaOnly)
  getTableData(tblinfo, numTables, oids);
 
  if (outputBlobs && hasBlobs(g_fout))
--- 687,693 ----
  if (g_fout->remoteVersion < 80400)
  guessConstraintInheritance(tblinfo, numTables);
 
! if (WANT_DATA(dumpObjFlags))
  getTableData(tblinfo, numTables, oids);
 
  if (outputBlobs && hasBlobs(g_fout))
***************
*** 704,710 ****
  dumpStdStrings(g_fout);
 
  /* The database item is always next, unless we don't want it at all */
! if (include_everything && !dataOnly)
  dumpDatabase(g_fout);
 
  /* Now the rearrangeable objects. */
--- 741,747 ----
  dumpStdStrings(g_fout);
 
  /* The database item is always next, unless we don't want it at all */
! if (include_everything && WANT_DATA(dumpObjFlags))
  dumpDatabase(g_fout);
 
  /* Now the rearrangeable objects. */
***************
*** 726,732 ****
  ropt->noTablespace = outputNoTablespaces;
  ropt->disable_triggers = disable_triggers;
  ropt->use_setsessauth = use_setsessauth;
! ropt->dataOnly = dataOnly;
 
  if (compressLevel == -1)
  ropt->compression = 0;
--- 763,769 ----
  ropt->noTablespace = outputNoTablespaces;
  ropt->disable_triggers = disable_triggers;
  ropt->use_setsessauth = use_setsessauth;
! ropt->dumpObjFlags = dumpObjFlags;
 
  if (compressLevel == -1)
  ropt->compression = 0;
***************
*** 3389,3395 ****
  continue;
 
  /* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump)
  continue;
 
  if (g_verbose)
--- 3426,3432 ----
  continue;
 
  /* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  continue;
 
  if (g_verbose)
***************
*** 5140,5146 ****
  int ncomments;
 
  /* Comments are SCHEMA not data */
! if (dataOnly)
  return;
 
  /* Search for comments associated with catalogId, using table */
--- 5177,5183 ----
  int ncomments;
 
  /* Comments are SCHEMA not data */
! if (!WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /* Search for comments associated with catalogId, using table */
***************
*** 5191,5197 ****
  PQExpBuffer target;
 
  /* Comments are SCHEMA not data */
! if (dataOnly)
  return;
 
  /* Search for comments associated with relation, using table */
--- 5228,5234 ----
  PQExpBuffer target;
 
  /* Comments are SCHEMA not data */
! if (!WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /* Search for comments associated with relation, using table */
***************
*** 5543,5549 ****
  char   *qnspname;
 
  /* Skip if not to be dumped */
! if (!nspinfo->dobj.dump || dataOnly)
  return;
 
  /* don't dump dummy namespace from pre-7.3 source */
--- 5580,5586 ----
  char   *qnspname;
 
  /* Skip if not to be dumped */
! if (!nspinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /* don't dump dummy namespace from pre-7.3 source */
***************
*** 5592,5598 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  /* Skip if not to be dumped */
! if (!tinfo->dobj.dump || dataOnly)
  return;
 
  /* Dump out in proper style */
--- 5629,5635 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  /* Skip if not to be dumped */
! if (!tinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /* Dump out in proper style */
***************
*** 6237,6243 ****
  PQExpBuffer q;
 
  /* Skip if not to be dumped */
! if (!stinfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 6274,6280 ----
  PQExpBuffer q;
 
  /* Skip if not to be dumped */
! if (!stinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 6284,6290 ****
  if (!include_everything)
  return false;
  /* And they're schema not data */
! if (dataOnly)
  return false;
  return true;
  }
--- 6321,6327 ----
  if (!include_everything)
  return false;
  /* And they're schema not data */
! if (!WANT_PRE_SCHEMA(dumpObjFlags))
  return false;
  return true;
  }
***************
*** 6305,6311 ****
  FuncInfo   *funcInfo;
  FuncInfo   *validatorInfo = NULL;
 
! if (dataOnly)
  return;
 
  /*
--- 6342,6348 ----
  FuncInfo   *funcInfo;
  FuncInfo   *validatorInfo = NULL;
 
! if (!WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /*
***************
*** 6565,6571 ****
  int i;
 
  /* Skip if not to be dumped */
! if (!finfo->dobj.dump || dataOnly)
  return;
 
  query = createPQExpBuffer();
--- 6602,6608 ----
  int i;
 
  /* Skip if not to be dumped */
! if (!finfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  query = createPQExpBuffer();
***************
*** 6960,6966 ****
  TypeInfo   *sourceInfo;
  TypeInfo   *targetInfo;
 
! if (dataOnly)
  return;
 
  if (OidIsValid(cast->castfunc))
--- 6997,7003 ----
  TypeInfo   *sourceInfo;
  TypeInfo   *targetInfo;
 
! if (!WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  if (OidIsValid(cast->castfunc))
***************
*** 7110,7116 ****
  char   *oprcanhash;
 
  /* Skip if not to be dumped */
! if (!oprinfo->dobj.dump || dataOnly)
  return;
 
  /*
--- 7147,7153 ----
  char   *oprcanhash;
 
  /* Skip if not to be dumped */
! if (!oprinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /*
***************
*** 7494,7500 ****
  int i;
 
  /* Skip if not to be dumped */
! if (!opcinfo->dobj.dump || dataOnly)
  return;
 
  /*
--- 7531,7537 ----
  int i;
 
  /* Skip if not to be dumped */
! if (!opcinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /*
***************
*** 7802,7808 ****
  int i;
 
  /* Skip if not to be dumped */
! if (!opfinfo->dobj.dump || dataOnly)
  return;
 
  /*
--- 7839,7845 ----
  int i;
 
  /* Skip if not to be dumped */
! if (!opfinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /*
***************
*** 8071,8077 ****
  bool condefault;
 
  /* Skip if not to be dumped */
! if (!convinfo->dobj.dump || dataOnly)
  return;
 
  query = createPQExpBuffer();
--- 8108,8114 ----
  bool condefault;
 
  /* Skip if not to be dumped */
! if (!convinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  query = createPQExpBuffer();
***************
*** 8225,8231 ****
  bool convertok;
 
  /* Skip if not to be dumped */
! if (!agginfo->aggfn.dobj.dump || dataOnly)
  return;
 
  query = createPQExpBuffer();
--- 8262,8268 ----
  bool convertok;
 
  /* Skip if not to be dumped */
! if (!agginfo->aggfn.dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  query = createPQExpBuffer();
***************
*** 8428,8434 ****
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!prsinfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 8465,8471 ----
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!prsinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 8497,8503 ****
  char   *tmplname;
 
  /* Skip if not to be dumped */
! if (!dictinfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 8534,8540 ----
  char   *tmplname;
 
  /* Skip if not to be dumped */
! if (!dictinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 8582,8588 ****
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!tmplinfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 8619,8625 ----
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!tmplinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 8648,8654 ****
  int i_dictname;
 
  /* Skip if not to be dumped */
! if (!cfginfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 8685,8691 ----
  int i_dictname;
 
  /* Skip if not to be dumped */
! if (!cfginfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 8784,8790 ****
  PQExpBuffer sql;
 
  /* Do nothing if ACL dump is not enabled */
! if (dataOnly || aclsSkip)
  return;
 
  sql = createPQExpBuffer();
--- 8821,8827 ----
  PQExpBuffer sql;
 
  /* Do nothing if ACL dump is not enabled */
! if (!WANT_PRE_SCHEMA(dumpObjFlags) || aclsSkip)
  return;
 
  sql = createPQExpBuffer();
***************
*** 8821,8827 ****
  {
  if (tbinfo->relkind == RELKIND_SEQUENCE)
  dumpSequence(fout, tbinfo);
! else if (!dataOnly)
  dumpTableSchema(fout, tbinfo);
 
  /* Handle the ACL here */
--- 8858,8864 ----
  {
  if (tbinfo->relkind == RELKIND_SEQUENCE)
  dumpSequence(fout, tbinfo);
! else if (WANT_PRE_SCHEMA(dumpObjFlags))
  dumpTableSchema(fout, tbinfo);
 
  /* Handle the ACL here */
***************
*** 9128,9134 ****
  PQExpBuffer delq;
 
  /* Only print it if "separate" mode is selected */
! if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  return;
 
  /* Don't print inherited defaults, either */
--- 9165,9171 ----
  PQExpBuffer delq;
 
  /* Only print it if "separate" mode is selected */
! if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_PRE_SCHEMA(dumpObjFlags))
  return;
 
  /* Don't print inherited defaults, either */
***************
*** 9213,9219 ****
  PQExpBuffer q;
  PQExpBuffer delq;
 
! if (dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 9250,9256 ----
  PQExpBuffer q;
  PQExpBuffer delq;
 
! if (!WANT_POST_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 9282,9288 ****
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!coninfo->dobj.dump || dataOnly)
  return;
 
  q = createPQExpBuffer();
--- 9319,9325 ----
  PQExpBuffer delq;
 
  /* Skip if not to be dumped */
! if (!coninfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  return;
 
  q = createPQExpBuffer();
***************
*** 9675,9681 ****
  *
  * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  */
! if (!dataOnly)
  {
  resetPQExpBuffer(delqry);
 
--- 9712,9718 ----
  *
  * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  */
! if (WANT_PRE_SCHEMA(dumpObjFlags))
  {
  resetPQExpBuffer(delqry);
 
***************
*** 9778,9784 ****
  tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  }
 
! if (!schemaOnly)
  {
  resetPQExpBuffer(query);
  appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9815,9821 ----
  tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  }
 
! if (WANT_PRE_SCHEMA(dumpObjFlags))
  {
  resetPQExpBuffer(query);
  appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9811,9817 ****
  const char *p;
  int findx;
 
! if (dataOnly)
  return;
 
  query = createPQExpBuffer();
--- 9848,9854 ----
  const char *p;
  int findx;
 
! if (!WANT_POST_SCHEMA(dumpObjFlags))
  return;
 
  query = createPQExpBuffer();
***************
*** 10019,10025 ****
  PGresult   *res;
 
  /* Skip if not to be dumped */
! if (!rinfo->dobj.dump || dataOnly)
  return;
 
  /*
--- 10056,10062 ----
  PGresult   *res;
 
  /* Skip if not to be dumped */
! if (!rinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  return;
 
  /*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c 13 Apr 2008 03:49:22 -0000 1.88
--- src/bin/pg_dump/pg_restore.c 20 Jul 2008 05:57:51 -0000
***************
*** 78,83 ****
--- 78,90 ----
  static int no_data_for_failed_tables = 0;
  static int  outputNoTablespaces = 0;
  static int use_setsessauth = 0;
+   bool dataOnly = false;
+   bool schemaOnly = false;
+  
+   static int use_schemaPreLoadOnly;
+   static int use_schemaPostLoadOnly;
+  
+   int dumpObjFlags;
 
  struct option cmdopts[] = {
  {"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  {"disable-triggers", no_argument, &disable_triggers, 1},
  {"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  {"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+   {"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1},
+   {"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1},
  {"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 
  {NULL, 0, NULL, 0}
***************
*** 145,151 ****
  switch (c)
  {
  case 'a': /* Dump data only */
! opts->dataOnly = 1;
  break;
  case 'c': /* clean (i.e., drop) schema prior to create */
  opts->dropSchema = 1;
--- 154,160 ----
  switch (c)
  {
  case 'a': /* Dump data only */
! dataOnly = true;
  break;
  case 'c': /* clean (i.e., drop) schema prior to create */
  opts->dropSchema = 1;
***************
*** 213,219 ****
  opts->triggerNames = strdup(optarg);
  break;
  case 's': /* dump schema only */
! opts->schemaOnly = 1;
  break;
  case 'S': /* Superuser username */
  if (strlen(optarg) != 0)
--- 222,228 ----
  opts->triggerNames = strdup(optarg);
  break;
  case 's': /* dump schema only */
! schemaOnly = true;
  break;
  case 'S': /* Superuser username */
  if (strlen(optarg) != 0)
***************
*** 295,300 ****
--- 304,344 ----
  opts->useDB = 1;
  }
 
+ /*
+ * Look for conflicting options relating to object groupings
+ */
+ if (schemaOnly && dataOnly)
+ {
+ write_msg(NULL, "options %s and %s cannot be used together\n",
+ "-s/--schema-only", "-a/--data-only");
+ exit(1);
+ }
+ else if ((schemaOnly || dataOnly) &&
+ (use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1))
+ {
+ write_msg(NULL, "options %s and %s cannot be used together\n",
+ schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load ");
+ exit(1);
+ }
+
+ /*
+ * Decide which of the object groups we will dump
+ */
+ dumpObjFlags = REQ_ALL;
+
+ if (dataOnly)
+ dumpObjFlags = REQ_DATA;
+
+ if (use_schemaPreLoadOnly == 1)
+ dumpObjFlags = REQ_PRE_SCHEMA;
+
+ if (use_schemaPostLoadOnly == 1)
+ dumpObjFlags = REQ_POST_SCHEMA;
+
+ if (schemaOnly)
+ dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
+
  opts->disable_triggers = disable_triggers;
  opts->noDataForFailedTables = no_data_for_failed_tables;
  opts->noTablespace = outputNoTablespaces;



--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Simon,

* Simon Riggs (simon@...) wrote:
> On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:
> > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
[...]
> > >   - Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

> > >   - Documentation
> > > When writing the documentation I would stress that "pre-schema" and
> > > "post-schema" be defined in terms of PostgreSQL objects and why they
> > > are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are.  I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do.  Would you consider:

<term><option>--schema-pre-load</option></term>
<listitem>
 <para>
   Pre-Data Load - Minimum amount of the schema required before data
   loading can begin.  This consists mainly of creating the tables
   using the <command>CREATE TABLE</command>.
   This part can be requested using <option>--schema-pre-load</>.
 </para>
</listitem>

<term><option>--schema-post-load</option></term>
<listitem>
 <para>
   Post-Data Load - The rest of the schema definition, including keys,
   indexes, etc.  By putting keys and indexes after the data has been
   loaded the whole process of restoring data is much faster.  This is
   because it is faster to build indexes and check keys in bulk than
   piecemeal as the data is loaded.
   This part can be requested using <option>--schema-post-load</>.
 </para>
</listitem>

Even this doesn't cover everything though- it's too focused on tables
and data loading.  Where do functions go?  What about types?

A couple of additional points:

  - The command-line help hasn't been updated.  Clearly, that also needs
        to be done to consider the documentation aspect complete.

  - There appears to be a bit of mistakenly included additions.  The
    patch to pg_restore.sgml attempts to add in documentation for
        --superuser.  I'm guessing that was unintentional, and looks like
        just a mistaken extra copy&paste.

> > >   - Technically, the patch needs to be updated slightly since another
> > > pg_dump-related patch was committed recently which also added
> > > options and thus causes a conflict.

I think this might have just happened again, funny enough.  It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

> Perhaps this is up for some debate, but I find the documentation added
> for these options to be lacking the definitions I was looking for, and
> the explanation of why they are what they are.  I'm also not sure I
> agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
> really fit with the option names or what they do.  Would you consider:

Will reword.

> Even this doesn't cover everything though- it's too focused on tables
> and data loading.  Where do functions go?  What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

>   - The command-line help hasn't been updated.  Clearly, that also needs
> to be done to consider the documentation aspect complete.
>
>   - There appears to be a bit of mistakenly included additions.  The
>     patch to pg_restore.sgml attempts to add in documentation for
> --superuser.  I'm guessing that was unintentional, and looks like
> just a mistaken extra copy&paste.

Thanks, will do.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Simon Riggs (simon@...) wrote:
> On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > Even this doesn't cover everything though- it's too focused on tables
> > and data loading.  Where do functions go?  What about types?
>
> Yes, it is focused on tables and data loading. What about
> functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output.  Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go.  Perhaps it would be enough to just say
"pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys)".  I don't think it's right to say
pre-load is "only object definitions required to load data" when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by DaveG-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Jul 20, 2008 at 09:18:29PM -0400, Stephen Frost wrote:

> * Simon Riggs (simon@...) wrote:
> > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > > Even this doesn't cover everything though- it's too focused on tables
> > > and data loading.  Where do functions go?  What about types?
> >
> > Yes, it is focused on tables and data loading. What about
> > functions/types? No relevance here.
>
> I don't see how they're not relevant, it's not like they're being
> excluded and in fact they show up in the pre-load output.  Heck, even if
> they *were* excluded, that should be made clear in the documentation
> (either be an explicit include list, or saying they're excluded).
>
> Part of what's driving this is making sure we have a plan for future
> objects and where they'll go.  Perhaps it would be enough to just say
> "pre-load is everything in the schema, except things which are faster
> done in bulk (eg: indexes, keys)".  I don't think it's right to say
> pre-load is "only object definitions required to load data" when it
> includes functions and ACLs though.
>
> Hopefully my suggestion and these comments will get us to a happy
> middle-ground.

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

-dg
 


--
David Gould       daveg@...      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* daveg (daveg@...) wrote:
> One observation, indexes should be built right after the table data
> is loaded for each table, this way, the index build gets a hot cache
> for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building).  You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the "--schema-post-load"
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

I'd be curious about the performance impact this has on an actual load
too.  It would probably be more valuable on smaller loads where it would
have less of an impact anyway than on loads larger than the cache size.
Still, not an issue for this patch, imv.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Tom Lane-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stephen Frost <sfrost@...> writes:
> * daveg (daveg@...) wrote:
>> One observation, indexes should be built right after the table data
>> is loaded for each table, this way, the index build gets a hot cache
>> for the table data instead of having to re-read it later as we do now.

> That's not how pg_dump has traditionally worked, and the point of this
> patch is to add options to easily segregate the main pieces of the
> existing pg_dump output (main schema definition, data dump, key/index
> building).  You suggestion brings up an interesting point that should
> pg_dump's traditional output structure change the "--schema-post-load"
> set of objects wouldn't be as clear to newcomers since the load and the
> indexes would be interleaved in the regular output.

Yeah.  Also, that is pushing into an entirely different line of
development, which is to enable multithreaded pg_restore.  The patch
at hand is necessarily incompatible with that type of operation, and
wouldn't be used together with it.

As far as the documentation/definition aspect goes, I think it should
just say the parts are
        * stuff needed before you can load the data
        * the data
        * stuff needed after loading the data
and not try to be any more specific than that.  There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all.  (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts.  I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not!  How about "pre-data and "post-data"?

                        regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote:

> * Simon Riggs (simon@...) wrote:
> > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > > Even this doesn't cover everything though- it's too focused on tables
> > > and data loading.  Where do functions go?  What about types?
> >
> > Yes, it is focused on tables and data loading. What about
> > functions/types? No relevance here.
>
> I don't see how they're not relevant, it's not like they're being
> excluded and in fact they show up in the pre-load output.  Heck, even if
> they *were* excluded, that should be made clear in the documentation
> (either be an explicit include list, or saying they're excluded).
>
> Part of what's driving this is making sure we have a plan for future
> objects and where they'll go.  Perhaps it would be enough to just say
> "pre-load is everything in the schema, except things which are faster
> done in bulk (eg: indexes, keys)".  I don't think it's right to say
> pre-load is "only object definitions required to load data" when it
> includes functions and ACLs though.
>
> Hopefully my suggestion and these comments will get us to a happy
> middle-ground.

I don't really understand what you're saying.

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
"Dumps exactly what <option>--schema-only</> would dump, but only those
statements before the data load."

What is it you are suggesting? I'm unclear.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote:

> Stephen Frost <sfrost@...> writes:
> > * daveg (daveg@...) wrote:
> >> One observation, indexes should be built right after the table data
> >> is loaded for each table, this way, the index build gets a hot cache
> >> for the table data instead of having to re-read it later as we do now.
>
> > That's not how pg_dump has traditionally worked, and the point of this
> > patch is to add options to easily segregate the main pieces of the
> > existing pg_dump output (main schema definition, data dump, key/index
> > building).  You suggestion brings up an interesting point that should
> > pg_dump's traditional output structure change the "--schema-post-load"
> > set of objects wouldn't be as clear to newcomers since the load and the
> > indexes would be interleaved in the regular output.

Stephen: Agreed.

> Yeah.  Also, that is pushing into an entirely different line of
> development, which is to enable multithreaded pg_restore.  The patch
> at hand is necessarily incompatible with that type of operation, and
> wouldn't be used together with it.
>
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
> * stuff needed before you can load the data
> * the data
> * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Tom: Agreed.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

OK by me. Any other takers?



I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Tom Lane-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Simon Riggs <simon@...> writes:
> I also suggested having three options
> --want-pre-schema
> --want-data
> --want-post-schema
> so we could ask for any or all parts in the one dump. --data-only and
> --schema-only are negative options so don't allow this.
> (I don't like those names either, just thinking about capabilities)

Maybe invert the logic?

        --omit-pre-data
        --omit-data
        --omit-post-data

Not wedded to these either, just tossing out an idea...

                        regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Simon Riggs (simon@...) wrote:
> The options split the dump into 3 parts that's all: before the load, the
> load and after the load.
>
> --schema-pre-load says
> "Dumps exactly what <option>--schema-only</> would dump, but only those
> statements before the data load."
>
> What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be "objects required before data loading", which isn't the
same.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom,

* Tom Lane (tgl@...) wrote:
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
> * stuff needed before you can load the data
> * the data
> * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Even that is a lie though, which I guess is what my problem is.  It's
really "everything for the schema, except stuff that is better done in
bulk", I believe.  Also, I'm a bit concerned about people who would
argue that you need PKs and FKs before you can load the data.  Probably
couldn't be avoided tho.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

Argh.  The command-line options follow the 'data'/'load' line
(--schema-pre-load and --schema-post-load), and so I think those are
fine.  The problem was that in the documentation he switched to saying
they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Simon,

* Simon Riggs (simon@...) wrote:
> > I hadn't realized that Simon was using "pre-schema" and "post-schema"
> > to name the first and third parts.  I'd agree that this is confusing
> > nomenclature: it looks like it's trying to say that the data is the
> > schema, and the schema is not!  How about "pre-data and "post-data"?
>
> OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me.  Leaving them the way they are is
also fine by me.  It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment

Re: pg_dump additional options for performance

by Andrew Dunstan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Tom Lane wrote:

> Simon Riggs <simon@...> writes:
>  
>> I also suggested having three options
>> --want-pre-schema
>> --want-data
>> --want-post-schema
>> so we could ask for any or all parts in the one dump. --data-only and
>> --schema-only are negative options so don't allow this.
>> (I don't like those names either, just thinking about capabilities)
>>    
>
> Maybe invert the logic?
>
> --omit-pre-data
> --omit-data
> --omit-post-data
>
> Not wedded to these either, just tossing out an idea...
>
>
>  

Please, no. Negative logic seems likely to cause endless confusion.

I'd even be happier with --schema-part-1 and --schema-part-2 if we can't
find some more expressive way of designating them.

cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Tom Lane-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew Dunstan <andrew@...> writes:
> Tom Lane wrote:
>> Maybe invert the logic?
>> --omit-pre-data
>> --omit-data
>> --omit-post-data

> Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches.  The interactions of
the switches as Simon presents 'em seem less than obvious.

                        regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote:

> * Simon Riggs (simon@...) wrote:
> > The options split the dump into 3 parts that's all: before the load, the
> > load and after the load.
> >
> > --schema-pre-load says
> > "Dumps exactly what <option>--schema-only</> would dump, but only those
> > statements before the data load."
> >
> > What is it you are suggesting? I'm unclear.
>
> That part is fine, the problem is that elsewhere in the documentation
> (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
> change it to be "objects required before data loading", which isn't the
> same.

OK, gotcha now - will change that. I thought you might mean something
about changing the output itself.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Tom Lane-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stephen Frost <sfrost@...> writes:
> * Tom Lane (tgl@...) wrote:
>> As far as the documentation/definition aspect goes, I think it should
>> just say the parts are
>> * stuff needed before you can load the data
>> * the data
>> * stuff needed after loading the data

> Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

>> I hadn't realized that Simon was using "pre-schema" and "post-schema"
>> to name the first and third parts.  I'd agree that this is confusing
>> nomenclature: it looks like it's trying to say that the data is the
>> schema, and the schema is not!  How about "pre-data and "post-data"?

> Argh.  The command-line options follow the 'data'/'load' line
> (--schema-pre-load and --schema-post-load), and so I think those are
> fine.  The problem was that in the documentation he switched to saying
> they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Ah, I see.  No objection to those switch names, at least assuming we
want to stick to positive-logic switches.  What did you think of the
negative-logic suggestion (--omit-xxx)?

                        regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: pg_dump additional options for performance

by Stephen Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom, et al,

* Tom Lane (tgl@...) wrote:
> Ah, I see.  No objection to those switch names, at least assuming we
> want to stick to positive-logic switches.  What did you think of the
> negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general.  The place
where I would use this patch would lend itself to being more options if
--omit-xxxx were used.  I expect that would hold true for most people.
It would be:

  --omit-data --omit-post-load
  --omit-pre-load --omit-post-load
  --omit-pre-load --omit-data

vs.

  --schema-pre-load
  --data-only
  --schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files.  I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

        Thanks,

                Stephen


signature.asc (204 bytes) Download Attachment
< Prev | 1 - 2 - 3 | Next >