Infrastructure changes for recovery (v8)

View: New views
1 Messages — Rating Filter:   Alert me  

Infrastructure changes for recovery (v8)

by Simon Riggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patch now includes all previous agreed changes, plus I've found what
looks to be a workable method of removing the shutdown checkpoint
without loss of robustness.

Patch summary

Tuning
* Bgwriter performs dirty block cleaning during recovery
* Bgwriter performs restartpoints, offloading this task from Startup
process to allow it to continue with recovery actions
* Shutdown checkpoint removed at end of recovery. Bgwriter performs
immediate checkpoint instead, so we have same protection, but
connections and transactions can be started earlier than previously.
* PreAllocXLogs() not performed by startup process, so we do not delay
startup while we write zeroes to next WAL file. bgwriter does that now.
* XLogCtl structure padding for enhanced scalability

Recovery State Changes
* If archive recovery proceeds past a safe stopping point we signal the
postmaster that database is now in a consistent state, PM_RECOVERY. This
state change is also linked to startup of the bgwriter and stats
processes (and will in the future be the place where read only backends
may connect also)
* optional recovery_safe_start_location parameter now provided in
recovery.conf, to allow a consistency point to be manually defined if a
base backup was not taken using standard pg_start/stop backup functions
* New minSafeStopPoint added to controlfile to allow us to determine
consistency if archive recovery crashes/restarts. Value is updated each
time we access new WAL file.
* stats file removed earlier in recovery, so we may accumulate new stats
during recovery
* End of recovery is now marked by a clear global state change. Change
is global, atomic and fast - tested for using IsRecoveryProcessingMode()

Additional Safeguards
* Locks are placed around all ControlFile operations
* XLogInsert() and AssignTransactionId() now have specific checks to
prevent their use during recovery
* Makes StartupMultiXact() atomic. Adds comments to show that
StartCLOG() is already atomic, though StartupSUBTRANS() is not (this
will be addressed in a later patch, so not touched here)
* recovery.conf is not removed until slightly later now, to protect
against crash at the end of startup
* New WAL record XLOG_RECOVERY_END is now only place where timelineid
may change

Other Changes
* log_restartpoints removed, use log_checkpoints in postgresql.conf
* pg_controldata and pg_resetxlog changed to show safe start point
* designed to work in EXEC_BACKEND mode for Windows
* additional function signature for pg_start_backup('label', true |
false) to allow definition of immediate checkpoint/not
* doc changes for recovery.conf parameters
* fixes bug discovered while other testing: if pg_stop_backup() is run
when xlogswitch has just occurred then we do not switch log files, yet
we return current filename even though nothing of value in it. If
archive_timeout not enabled we would wait forever for pg_stop_backup()
to return.
* Substantial comments throughout

Patch is now v8.

 doc/src/sgml/backup.sgml                 |   30 !
 doc/src/sgml/func.sgml                   |   12
 src/backend/access/transam/clog.c        |    3
 src/backend/access/transam/multixact.c   |   14
 src/backend/access/transam/subtrans.c    |    3
 src/backend/access/transam/xact.c        |    3
 src/backend/access/transam/xlog.c        |  783 ++++++++++++++-!!!!!!!!!!!!!!!
 src/backend/postmaster/bgwriter.c        |  418 +++--!!!!!!!!!
 src/backend/postmaster/postmaster.c      |   62 +!
 src/backend/storage/buffer/README        |    9
 src/bin/pg_controldata/pg_controldata.c  |    3
 src/bin/pg_resetxlog/pg_resetxlog.c      |    2
 src/include/access/xlog.h                |   14
 src/include/access/xlog_internal.h       |    4
 src/include/catalog/pg_control.h         |    3
 src/include/postmaster/bgwriter.h        |    6
 src/include/storage/pmsignal.h           |    1
 src/test/regress/expected/opr_sanity.out |    7
 18 files changed, 579 insertions(+), 79 deletions(-), 719 modifications(!)

Please review everybody. Many thanks.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

[recovery_infrastruc.v8.patch]

Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.120
diff -c -r2.120 backup.sgml
*** doc/src/sgml/backup.sgml 18 Jul 2008 17:33:17 -0000 2.120
--- doc/src/sgml/backup.sgml 30 Sep 2008 17:15:15 -0000
***************
*** 1200,1205 ****
--- 1200,1229 ----
        </listitem>
       </varlistentry>
 
+      <varlistentry id="recovery-safe-start-location"
+                    xreflabel="recovery_safe_start_location">
+       <term><varname>recovery_safe_start_location</varname>
+         (<type>string</type>)
+       </term>
+       <listitem>
+        <para>
+         Allows user to optionally specify a safe start location for a base
+ backup that was not made online using <function>pg_start_backup()</>
+ and <function>pg_stop_backup()</>.  If those functions were used,
+ this parameter need not be set because the server sets this for you
+ automatically to avoid error.  You cannot use this parameter to move
+ the safe stopping point to an earlier transaction log location. The
+ format for this parameter is identical to the output of
+ <function>pg_current_xlog_insert_location()</>, example:
+ <programlisting>
+ recovery_safe_start_location = '0/D4445B8'
+ </programlisting>
+ The location always has a forward slash, even on Windows, since it
+ is not a file path.
+        </para>
+       </listitem>
+      </varlistentry>
+
       <varlistentry id="log-restartpoints"
                     xreflabel="log_restartpoints">
        <term><varname>log_restartpoints</varname>
***************
*** 1207,1215 ****
        </term>
        <listitem>
         <para>
!         Specifies whether to log each restart point as it occurs. This
!         can be helpful to track the progress of a long recovery.
!         Default is <literal>false</>.
         </para>
        </listitem>
       </varlistentry>
--- 1231,1239 ----
        </term>
        <listitem>
         <para>
!         This parameter has now been deprecated. Instead, please set
! <varname>log_checkpoints</varname> in <filename>postgresql.conf</>
! if you want similar log entries during recovery.
         </para>
        </listitem>
       </varlistentry>
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.447
diff -c -r1.447 func.sgml
*** doc/src/sgml/func.sgml 11 Sep 2008 17:32:33 -0000 1.447
--- doc/src/sgml/func.sgml 30 Sep 2008 17:15:15 -0000
***************
*** 12262,12267 ****
--- 12262,12275 ----
        </row>
        <row>
         <entry>
+         <literal><function>pg_start_backup</function>(<parameter>label</> <type>text</>)</literal>
+         </entry>
+        <entry><type>text</type>, <type>boolean</type></entry>
+        <entry>Set up for performing on-line backup, specifying if
+ we want an immediate checkpoint or not.</entry>
+       </row>
+       <row>
+        <entry>
          <literal><function>pg_stop_backup</function>()</literal>
          </entry>
         <entry><type>text</type></entry>
***************
*** 12333,12338 ****
--- 12341,12350 ----
      interest).  After noting the ending location, the current transaction log insertion
      point is automatically advanced to the next transaction log file, so that the
      ending transaction log file can be archived immediately to complete the backup.
+ <function>pg_start_backup</> issues a checkpoint while we wait.
+ <function>pg_start_backup</> can also be specified with two parameters,
+ the second parameter defining whether the checkpoint is an immediate
+ checkpoint or whether we write out buffers smoothly over a short period.
     </para>
 
     <para>
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 30 Sep 2008 17:15:15 -0000
***************
*** 260,265 ****
--- 260,268 ----
  /*
   * This must be called ONCE during postmaster or standalone-backend startup,
   * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+  *
+  * We access just a single clog page, so this action is atomic and safe
+  * for use if other processes are active during recovery.
   */
  void
  StartupCLOG(void)
Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.28
diff -c -r1.28 multixact.c
*** src/backend/access/transam/multixact.c 1 Aug 2008 13:16:08 -0000 1.28
--- src/backend/access/transam/multixact.c 30 Sep 2008 17:15:15 -0000
***************
*** 1413,1420 ****
   * MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact. Note that we
   * may already have replayed WAL data into the SLRU files.
   *
!  * We don't need any locks here, really; the SLRU locks are taken
!  * only because slru.c expects to be called with locks held.
   */
  void
  StartupMultiXact(void)
--- 1413,1423 ----
   * MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact. Note that we
   * may already have replayed WAL data into the SLRU files.
   *
!  * We want this operation to be atomic to ensure that other processes can
!  * use MultiXact while we complete recovery. We access one page only from the
!  * offset and members buffers, so once locks are acquired they will not be
!  * dropped and re-acquired by SLRU code. So we take both locks at start, then
!  * hold them all the way to the end.
   */
  void
  StartupMultiXact(void)
***************
*** 1426,1431 ****
--- 1429,1435 ----
 
  /* Clean up offsets state */
  LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
+ LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
 
  /*
  * Initialize our idea of the latest page number.
***************
*** 1452,1461 ****
  MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
  }
 
- LWLockRelease(MultiXactOffsetControlLock);
-
  /* And the same for members */
- LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
 
  /*
  * Initialize our idea of the latest page number.
--- 1456,1462 ----
***************
*** 1483,1488 ****
--- 1484,1490 ----
  }
 
  LWLockRelease(MultiXactMemberControlLock);
+ LWLockRelease(MultiXactOffsetControlLock);
 
  /*
  * Initialize lastTruncationPoint to invalid, ensuring that the first
***************
*** 1543,1549 ****
  * SimpleLruTruncate would get confused.  It seems best not to risk
  * removing any data during recovery anyway, so don't truncate.
  */
! if (!InRecovery)
  TruncateMultiXact();
 
  TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
--- 1545,1551 ----
  * SimpleLruTruncate would get confused.  It seems best not to risk
  * removing any data during recovery anyway, so don't truncate.
  */
! if (!IsRecoveryProcessingMode())
  TruncateMultiXact();
 
  TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.23
diff -c -r1.23 subtrans.c
*** src/backend/access/transam/subtrans.c 1 Aug 2008 13:16:08 -0000 1.23
--- src/backend/access/transam/subtrans.c 30 Sep 2008 17:15:15 -0000
***************
*** 226,231 ****
--- 226,234 ----
   *
   * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
   * if there are none.
+  *
+  * Note that this is not atomic and is not yet safe to perform while other
+  * processes might access subtrans.
   */
  void
  StartupSUBTRANS(TransactionId oldestActiveXID)
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 30 Sep 2008 17:15:15 -0000
***************
*** 393,398 ****
--- 393,401 ----
  bool isSubXact = (s->parent != NULL);
  ResourceOwner currentOwner;
 
+ if (IsRecoveryProcessingMode())
+ elog(FATAL, "cannot assign TransactionIds during recovery");
+
  /* Assert that caller didn't screw up */
  Assert(!TransactionIdIsValid(s->transactionId));
  Assert(s->state == TRANS_INPROGRESS);
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.319
diff -c -r1.319 xlog.c
*** src/backend/access/transam/xlog.c 23 Sep 2008 09:20:35 -0000 1.319
--- src/backend/access/transam/xlog.c 30 Sep 2008 22:32:49 -0000
***************
*** 113,119 ****
 
  /*
   * ThisTimeLineID will be same in all backends --- it identifies current
!  * WAL timeline for the database system.
   */
  TimeLineID ThisTimeLineID = 0;
 
--- 113,120 ----
 
  /*
   * ThisTimeLineID will be same in all backends --- it identifies current
!  * WAL timeline for the database system. Zero is always a bug, so we
!  * start with that to allow us to spot any errors.
   */
  TimeLineID ThisTimeLineID = 0;
 
***************
*** 123,128 ****
--- 124,133 ----
  /* Are we recovering using offline XLOG archives? */
  static bool InArchiveRecovery = false;
 
+ /* Local copy of shared RecoveryProcessingMode state */
+ static bool LocalRecoveryProcessingMode = true;
+ static bool knownProcessingMode = false;
+
  /* Was the last xlog file restored from archive, or local? */
  static bool restoredFromArchive = false;
 
***************
*** 131,137 ****
  static bool recoveryTarget = false;
  static bool recoveryTargetExact = false;
  static bool recoveryTargetInclusive = true;
- static bool recoveryLogRestartpoints = false;
  static TransactionId recoveryTargetXid;
  static TimestampTz recoveryTargetTime;
  static TimestampTz recoveryLastXTime = 0;
--- 136,141 ----
***************
*** 141,146 ****
--- 145,153 ----
  static TimestampTz recoveryStopTime;
  static bool recoveryStopAfter;
 
+ /* is the database proven consistent yet? */
+ bool reachedSafeStartPoint = false;
+
  /*
   * During normal operation, the only timeline we care about is ThisTimeLineID.
   * During recovery, however, things are more complicated.  To simplify life
***************
*** 240,248 ****
   * ControlFileLock: must be held to read/update control file or create
   * new log file.
   *
!  * CheckpointLock: must be held to do a checkpoint (ensures only one
!  * checkpointer at a time; currently, with all checkpoints done by the
!  * bgwriter, this is just pro forma).
   *
   *----------
   */
--- 247,256 ----
   * ControlFileLock: must be held to read/update control file or create
   * new log file.
   *
!  * CheckpointLock: must be held to do a checkpoint or restartpoint, ensuring
!  * we get just one of those at any time. In 8.4+ recovery, both startup and
!  * bgwriter processes may take restartpoints, so this locking must be strict
!  * to ensure there are no mistakes.
   *
   *----------
   */
***************
*** 285,295 ****
--- 293,310 ----
 
  /*
   * Total shared-memory state for XLOG.
+  *
+  * This small structure is accessed by many backends, so we take care to
+  * pad out the parts of the structure so they can be accessed by separate
+  * CPUs without causing false sharing cache flushes. Padding is generous
+  * to allow for a wide variety of CPU architectures.
   */
+ #define XLOGCTL_BUFFER_SPACING 128
  typedef struct XLogCtlData
  {
  /* Protected by WALInsertLock: */
  XLogCtlInsert Insert;
+ char InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
 
  /* Protected by info_lck: */
  XLogwrtRqst LogwrtRqst;
***************
*** 297,305 ****
--- 312,327 ----
  uint32 ckptXidEpoch; /* nextXID & epoch of latest checkpoint */
  TransactionId ckptXid;
  XLogRecPtr asyncCommitLSN; /* LSN of newest async commit */
+ /* add data structure padding for above info_lck declarations */
+ char InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst)
+ - sizeof(XLogwrtResult)
+ - sizeof(uint32)
+ - sizeof(TransactionId)
+ - sizeof(XLogRecPtr)];
 
  /* Protected by WALWriteLock: */
  XLogCtlWrite Write;
+ char WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
 
  /*
  * These values do not change after startup, although the pointed-to pages
***************
*** 311,316 ****
--- 333,356 ----
  int XLogCacheBlck; /* highest allocated xlog buffer index */
  TimeLineID ThisTimeLineID;
 
+ /*
+ * IsRecoveryProcessingMode shows whether the postmaster is in a
+ * postmaster state earlier than PM_RUN, or not. This is a globally
+ * accessible state to allow EXEC_BACKEND case.
+ *
+ * We also retain a local state variable InRecovery. InRecovery=true
+ * means the code is being executed by Startup process and therefore
+ * always during Recovery Processing Mode. This allows us to identify
+ * code executed *during* Recovery Processing Mode but not necessarily
+ * by Startup process itself.
+ *
+ * Protected by mode_lck
+ */
+ bool SharedRecoveryProcessingMode;
+ slock_t mode_lck;
+
+ char InfoLockPadding[XLOGCTL_BUFFER_SPACING];
+
  slock_t info_lck; /* locks shared variables shown above */
  } XLogCtlData;
 
***************
*** 397,404 ****
--- 437,446 ----
  static void readRecoveryCommandFile(void);
  static void exitArchiveRecovery(TimeLineID endTLI,
  uint32 endLogId, uint32 endLogSeg);
+ static void exitRecovery(void);
  static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+ static XLogRecPtr GetRedoLocationForCheckpoint(void);
 
  static bool XLogCheckBuffer(XLogRecData *rdata, bool doPageWrites,
  XLogRecPtr *lsn, BkpBlock *bkpb);
***************
*** 480,485 ****
--- 522,532 ----
  bool updrqst;
  bool doPageWrites;
  bool isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);
+ bool isRecoveryEnd = (rmid == RM_XLOG_ID && info == XLOG_RECOVERY_END);
+
+ /* cross-check on whether we should be here or not */
+ if (IsRecoveryProcessingMode() && !isRecoveryEnd)
+ elog(FATAL, "cannot make new WAL entries during recovery");
 
  /* info's high bits are reserved for use by me */
  if (info & XLR_INFO_MASK)
***************
*** 1720,1727 ****
  XLogRecPtr WriteRqstPtr;
  XLogwrtRqst WriteRqst;
 
! /* Disabled during REDO */
! if (InRedo)
  return;
 
  /* Quick exit if already known flushed */
--- 1767,1773 ----
  XLogRecPtr WriteRqstPtr;
  XLogwrtRqst WriteRqst;
 
! if (IsRecoveryProcessingMode())
  return;
 
  /* Quick exit if already known flushed */
***************
*** 1809,1817 ****
  * the bad page is encountered again during recovery then we would be
  * unable to restart the database at all!  (This scenario has actually
  * happened in the field several times with 7.1 releases. Note that we
! * cannot get here while InRedo is true, but if the bad page is brought in
! * and marked dirty during recovery then CreateCheckPoint will try to
! * flush it at the end of recovery.)
  *
  * The current approach is to ERROR under normal conditions, but only
  * WARNING during recovery, so that the system can be brought up even if
--- 1855,1863 ----
  * the bad page is encountered again during recovery then we would be
  * unable to restart the database at all!  (This scenario has actually
  * happened in the field several times with 7.1 releases. Note that we
! * cannot get here while IsRecoveryProcessingMode(), but if the bad page is
! * brought in and marked dirty during recovery then if a checkpoint were
! * performed at the end of recovery it will try to flush it.
  *
  * The current approach is to ERROR under normal conditions, but only
  * WARNING during recovery, so that the system can be brought up even if
***************
*** 1821,1827 ****
  * and so we will not force a restart for a bad LSN on a data page.
  */
  if (XLByteLT(LogwrtResult.Flush, record))
! elog(InRecovery ? WARNING : ERROR,
  "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
  record.xlogid, record.xrecoff,
  LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
--- 1867,1873 ----
  * and so we will not force a restart for a bad LSN on a data page.
  */
  if (XLByteLT(LogwrtResult.Flush, record))
! elog(ERROR,
  "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
  record.xlogid, record.xrecoff,
  LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
***************
*** 2094,2100 ****
  unlink(tmppath);
  }
 
! elog(DEBUG2, "done creating and filling new WAL file");
 
  /* Set flag to tell caller there was no existent file */
  *use_existent = false;
--- 2140,2147 ----
  unlink(tmppath);
  }
 
! XLogFileName(tmppath, ThisTimeLineID, log, seg);
! elog(DEBUG2, "done creating and filling new WAL file %s", tmppath);
 
  /* Set flag to tell caller there was no existent file */
  *use_existent = false;
***************
*** 2400,2405 ****
--- 2447,2474 ----
  xlogfname);
  set_ps_display(activitymsg, false);
 
+ /*
+ * Calculate and write out a new safeStartPoint. This defines
+ * the latest LSN that might appear on-disk while we apply
+ * the WAL records in this file. If we crash during recovery
+ * we must reach this point again before we can prove
+ * database consistency. Not a restartpoint! Restart points
+ * define where we should start recovery from, if we crash.
+ */
+ if (InArchiveRecovery)
+ {
+ uint32 nextLog = log;
+ uint32 nextSeg = seg;
+
+ NextLogSeg(nextLog, nextSeg);
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->minSafeStartPoint.xlogid = nextLog;
+ ControlFile->minSafeStartPoint.xrecoff = nextSeg * XLogSegSize;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ }
+
  return fd;
  }
  if (errno != ENOENT) /* unexpected failure? */
***************
*** 4228,4233 ****
--- 4297,4303 ----
  XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
  XLogCtl->Insert.currpage = (XLogPageHeader) (XLogCtl->pages);
  SpinLockInit(&XLogCtl->info_lck);
+ SpinLockInit(&XLogCtl->mode_lck);
 
  /*
  * If we are not in bootstrap mode, pg_control should already exist. Read
***************
*** 4532,4548 ****
  ereport(LOG,
  (errmsg("recovery_target_inclusive = %s", tok2)));
  }
  else if (strcmp(tok1, "log_restartpoints") == 0)
  {
- /*
- * does nothing if a recovery_target is not also set
- */
- if (!parse_bool(tok2, &recoveryLogRestartpoints))
-  ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-  errmsg("parameter \"log_restartpoints\" requires a Boolean value")));
  ereport(LOG,
! (errmsg("log_restartpoints = %s", tok2)));
  }
  else
  ereport(FATAL,
--- 4602,4642 ----
  ereport(LOG,
  (errmsg("recovery_target_inclusive = %s", tok2)));
  }
+ else if (strcmp(tok1, "recovery_safe_start_location") == 0)
+ {
+ unsigned int uxlogid;
+ unsigned int uxrecoff;
+ XLogRecPtr NewSafeStartPtr;
+
+ if (sscanf(tok2, "%X/%X", &uxlogid, &uxrecoff) != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse transaction log location \"%s\"",
+ tok2)));
+
+ NewSafeStartPtr.xlogid = uxlogid;
+ NewSafeStartPtr.xrecoff = uxrecoff;
+ if (XLByteLE(ControlFile->minSafeStartPoint, NewSafeStartPtr))
+ {
+ ControlFile->minSafeStartPoint.xlogid = uxlogid;
+ ControlFile->minSafeStartPoint.xrecoff = uxrecoff;
+
+ ereport(LOG,
+ (errmsg("recovery_safe_start_location = '%s'", tok2)));
+ }
+ else if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("recovery_safe_start_location = '%s' is earlier than control file %X/%X",
+ tok2,
+ ControlFile->minSafeStartPoint.xlogid,
+ ControlFile->minSafeStartPoint.xrecoff)));
+ }
  else if (strcmp(tok1, "log_restartpoints") == 0)
  {
  ereport(LOG,
! (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!  errmsg("parameter \"log_restartpoints\" has been deprecated")));
  }
  else
  ereport(FATAL,
***************
*** 4678,4692 ****
  unlink(recoveryPath); /* ignore any error */
 
  /*
! * Rename the config file out of the way, so that we don't accidentally
! * re-enter archive recovery mode in a subsequent crash.
  */
- unlink(RECOVERY_COMMAND_DONE);
- if (rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0)
- ereport(FATAL,
- (errcode_for_file_access(),
- errmsg("could not rename file \"%s\" to \"%s\": %m",
- RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
 
  ereport(LOG,
  (errmsg("archive recovery complete")));
--- 4772,4784 ----
  unlink(recoveryPath); /* ignore any error */
 
  /*
! * As of 8.4 we no longer rename the recovery.conf file out of the
! * way until after we have performed a full checkpoint. This ensures
! * that any crash between now and the end of the checkpoint does not
! * attempt to restart from a WAL file that is no longer available to us.
! * As soon as we remove recovery.conf we lose our recovery_command and
! * cannot reaccess WAL files from the archive.
  */
 
  ereport(LOG,
  (errmsg("archive recovery complete")));
***************
*** 4813,4818 ****
--- 4905,4911 ----
  CheckPoint checkPoint;
  bool wasShutdown;
  bool reachedStopPoint = false;
+ bool performedRecovery = false;
  bool haveBackupLabel = false;
  XLogRecPtr RecPtr,
  LastRec,
***************
*** 4825,4830 ****
--- 4918,4925 ----
  uint32 freespace;
  TransactionId oldestActiveXID;
 
+ XLogCtl->SharedRecoveryProcessingMode = true;
+
  /*
  * Read control file and check XLOG status looks valid.
  *
***************
*** 5038,5046 ****
--- 5133,5147 ----
  if (minRecoveryLoc.xlogid != 0 || minRecoveryLoc.xrecoff != 0)
  ControlFile->minRecoveryPoint = minRecoveryLoc;
  ControlFile->time = (pg_time_t) time(NULL);
+ /* No need to hold ControlFileLock yet, we aren't up far enough */
  UpdateControlFile();
 
  /*
+ * Reset pgstat data, because it may be invalid after recovery.
+ */
+ pgstat_reset_all();
+
+ /*
  * If there was a backup label file, it's done its job and the info
  * has now been propagated into pg_control.  We must get rid of the
  * label file so that if we crash during recovery, we'll pick up at
***************
*** 5150,5155 ****
--- 5251,5282 ----
 
  LastRec = ReadRecPtr;
 
+ /*
+ * Have we reached our safe starting point? If so, we can
+ * signal Postmaster to enter consistent recovery mode.
+ *
+ * There are two point in the log we must pass. The first is
+ * the minRecoveryPoint, which is the LSN at the time the
+ * base backup was taken that we are about to rollfoward from.
+ * If recovery has ever crashed or was stopped there is
+ * another point also: minSafeStartPoint, which we know the
+ * latest LSN that recovery could have reached prior to crash.
+ */
+ if (!reachedSafeStartPoint &&
+ XLByteLE(ControlFile->minSafeStartPoint, EndRecPtr) &&
+ XLByteLE(ControlFile->minRecoveryPoint, EndRecPtr))
+ {
+ reachedSafeStartPoint = true;
+ if (InArchiveRecovery)
+ {
+ ereport(LOG,
+ (errmsg("consistent recovery state reached at %X/%X",
+ EndRecPtr.xlogid, EndRecPtr.xrecoff)));
+ if (IsUnderPostmaster)
+ SendPostmasterSignal(PMSIGNAL_RECOVERY_START);
+ }
+ }
+
  record = ReadRecord(NULL, LOG);
  } while (record != NULL && recoveryContinue);
 
***************
*** 5171,5176 ****
--- 5298,5304 ----
  /* there are no WAL records following the checkpoint */
  ereport(LOG,
  (errmsg("redo is not required")));
+ reachedSafeStartPoint = true;
  }
  }
 
***************
*** 5184,5192 ****
 
  /*
  * Complain if we did not roll forward far enough to render the backup
! * dump consistent.
  */
! if (XLByteLT(EndOfLog, ControlFile->minRecoveryPoint))
  {
  if (reachedStopPoint) /* stopped because of stop request */
  ereport(FATAL,
--- 5312,5320 ----
 
  /*
  * Complain if we did not roll forward far enough to render the backup
! * dump consistent and start safely.
  */
! if (InRecovery && !reachedSafeStartPoint)
  {
  if (reachedStopPoint) /* stopped because of stop request */
  ereport(FATAL,
***************
*** 5308,5346 ****
  XLogCheckInvalidPages();
 
  /*
! * Reset pgstat data, because it may be invalid after recovery.
  */
! pgstat_reset_all();
 
! /*
! * Perform a checkpoint to update all our recovery activity to disk.
! *
! * Note that we write a shutdown checkpoint rather than an on-line
! * one. This is not particularly critical, but since we may be
! * assigning a new TLI, using a shutdown checkpoint allows us to have
! * the rule that TLI only changes in shutdown checkpoints, which
! * allows some extra error checking in xlog_redo.
! */
! CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
  }
 
- /*
- * Preallocate additional log files, if wanted.
- */
- PreallocXlogFiles(EndOfLog);
-
- /*
- * Okay, we're officially UP.
- */
- InRecovery = false;
-
- ControlFile->state = DB_IN_PRODUCTION;
- ControlFile->time = (pg_time_t) time(NULL);
- UpdateControlFile();
-
- /* start the archive_timeout timer running */
- XLogCtl->Write.lastSegSwitchTime = ControlFile->time;
-
  /* initialize shared-memory copy of latest checkpoint XID/epoch */
  XLogCtl->ckptXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
  XLogCtl->ckptXid = ControlFile->checkPointCopy.nextXid;
--- 5436,5449 ----
  XLogCheckInvalidPages();
 
  /*
! * Finally exit recovery and mark that in WAL. Pre-8.4 we wrote
! * a shutdown checkpoint here, but we ask bgwriter to do that now.
  */
! exitRecovery();
 
! performedRecovery = true;
  }
 
  /* initialize shared-memory copy of latest checkpoint XID/epoch */
  XLogCtl->ckptXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
  XLogCtl->ckptXid = ControlFile->checkPointCopy.nextXid;
***************
*** 5374,5379 ****
--- 5477,5565 ----
  readRecordBuf = NULL;
  readRecordBufSize = 0;
  }
+
+ /*
+ * Prior to 8.4 we wrote a Shutdown Checkpoint at the end of recovery.
+ * This could add minutes to the startup time, so we want bgwriter
+ * to perform it. This then frees the Startup process to complete so we can
+ * allow transactions and WAL inserts. We still write a checkpoint, but
+ * it will be an online checkpoint. Online checkpoints have a redo
+ * location that can be prior to the actual checkpoint record. So we want
+ * to derive that redo location *before* we let anybody else write WAL,
+ * otherwise we might miss some WAL records if we crash.
+ */
+ if (performedRecovery)
+ {
+ XLogRecPtr redo;
+
+ /*
+ * We must grab the pointer before anybody writes WAL
+ */
+ redo = GetRedoLocationForCheckpoint();
+
+ /*
+ * Tell the bgwriter
+ */
+ SetRedoLocationForArchiveCheckpoint(redo);
+
+ /*
+ * Okay, we can come up now. Allow others to write WAL.
+ */
+ XLogCtl->SharedRecoveryProcessingMode = false;
+
+ /*
+ * Now request checkpoint
+ */
+ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_IMMEDIATE);
+ }
+ else
+ {
+ /*
+ * No recovery, so lets just get on with it.
+ */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->state = DB_IN_PRODUCTION;
+ ControlFile->time = (pg_time_t) time(NULL);
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
+ /*
+ * Okay, we're officially UP.
+ */
+ XLogCtl->SharedRecoveryProcessingMode = false;
+ }
+
+ /* start the archive_timeout timer running */
+ XLogCtl->Write.lastSegSwitchTime = (pg_time_t) time(NULL);
+
+ }
+
+ /*
+  * IsRecoveryProcessingMode()
+  *
+  * Fast test for whether we're still in recovery or not. We test the shared
+  * state each time only until we leave recovery mode. After that we never
+  * look again, relying upon the settings of our local state variables. This
+  * is designed to avoid the need for a separate initialisation step.
+  */
+ bool
+ IsRecoveryProcessingMode(void)
+ {
+ if (knownProcessingMode && !LocalRecoveryProcessingMode)
+ return false;
+
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile XLogCtlData *xlogctl = XLogCtl;
+
+ SpinLockAcquire(&xlogctl->mode_lck);
+ LocalRecoveryProcessingMode = XLogCtl->SharedRecoveryProcessingMode;
+ SpinLockRelease(&xlogctl->mode_lck);
+ }
+
+ knownProcessingMode = true;
+
+ return LocalRecoveryProcessingMode;
  }
 
  /*
***************
*** 5631,5650 ****
  static void
  LogCheckpointStart(int flags)
  {
! elog(LOG, "checkpoint starting:%s%s%s%s%s%s",
! (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
! (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
! (flags & CHECKPOINT_FORCE) ? " force" : "",
! (flags & CHECKPOINT_WAIT) ? " wait" : "",
! (flags & CHECKPOINT_CAUSE_XLOG) ? " xlog" : "",
! (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "");
  }
 
  /*
   * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(void)
  {
  long write_secs,
  sync_secs,
--- 5817,5840 ----
  static void
  LogCheckpointStart(int flags)
  {
! if (flags & CHECKPOINT_RESTARTPOINT)
! elog(LOG, "restartpoint starting:%s",
! (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "");
! else
! elog(LOG, "checkpoint starting:%s%s%s%s%s%s",
! (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
! (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
! (flags & CHECKPOINT_FORCE) ? " force" : "",
! (flags & CHECKPOINT_WAIT) ? " wait" : "",
! (flags & CHECKPOINT_CAUSE_XLOG) ? " xlog" : "",
! (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "");
  }
 
  /*
   * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(int flags)
  {
  long write_secs,
  sync_secs,
***************
*** 5667,5683 ****
  CheckpointStats.ckpt_sync_end_t,
  &sync_secs, &sync_usecs);
 
! elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
! "%d transaction log file(s) added, %d removed, %d recycled; "
! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
! CheckpointStats.ckpt_bufs_written,
! (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
! CheckpointStats.ckpt_segs_added,
! CheckpointStats.ckpt_segs_removed,
! CheckpointStats.ckpt_segs_recycled,
! write_secs, write_usecs / 1000,
! sync_secs, sync_usecs / 1000,
! total_secs, total_usecs / 1000);
  }
 
  /*
--- 5857,5882 ----
  CheckpointStats.ckpt_sync_end_t,
  &sync_secs, &sync_usecs);
 
! if (flags & CHECKPOINT_RESTARTPOINT)
! elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
! CheckpointStats.ckpt_bufs_written,
! (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
! write_secs, write_usecs / 1000,
! sync_secs, sync_usecs / 1000,
! total_secs, total_usecs / 1000);
! else
! elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
! "%d transaction log file(s) added, %d removed, %d recycled; "
! "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
! CheckpointStats.ckpt_bufs_written,
! (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
! CheckpointStats.ckpt_segs_added,
! CheckpointStats.ckpt_segs_removed,
! CheckpointStats.ckpt_segs_recycled,
! write_secs, write_usecs / 1000,
! sync_secs, sync_usecs / 1000,
! total_secs, total_usecs / 1000);
  }
 
  /*
***************
*** 5702,5718 ****
  XLogRecPtr recptr;
  XLogCtlInsert *Insert = &XLogCtl->Insert;
  XLogRecData rdata;
- uint32 freespace;
  uint32 _logId;
  uint32 _logSeg;
  TransactionId *inCommitXids;
  int nInCommit;
 
  /*
  * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
! * (This is just pro forma, since in the present system structure there is
! * only one process that is allowed to issue checkpoints at any given
! * time.)
  */
  LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
 
--- 5901,5916 ----
  XLogRecPtr recptr;
  XLogCtlInsert *Insert = &XLogCtl->Insert;
  XLogRecData rdata;
  uint32 _logId;
  uint32 _logSeg;
  TransactionId *inCommitXids;
  int nInCommit;
+ bool leavingArchiveRecovery = false;
 
  /*
  * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
! * That shouldn't be happening, but checkpoints are an important aspect
! * of our resilience, so we take no chances.
  */
  LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
 
***************
*** 5727,5741 ****
--- 5925,5948 ----
  CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
 
  /*
+ * Find out if this is the first checkpoint after archive recovery.
+ */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ leavingArchiveRecovery = (ControlFile->state == DB_IN_ARCHIVE_RECOVERY);
+ LWLockRelease(ControlFileLock);
+
+ /*
  * Use a critical section to force system panic if we have trouble.
  */
  START_CRIT_SECTION();
 
  if (shutdown)
  {
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
  ControlFile->state = DB_SHUTDOWNING;
  ControlFile->time = (pg_time_t) time(NULL);
  UpdateControlFile();
+ LWLockRelease(ControlFileLock);
  }
 
  /*
***************
*** 5750,5840 ****
  checkPoint.ThisTimeLineID = ThisTimeLineID;
  checkPoint.time = (pg_time_t) time(NULL);
 
! /*
! * We must hold WALInsertLock while examining insert state to determine
! * the checkpoint REDO pointer.
! */
! LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
!
! /*
! * If this isn't a shutdown or forced checkpoint, and we have not inserted
! * any XLOG records since the start of the last checkpoint, skip the
! * checkpoint. The idea here is to avoid inserting duplicate checkpoints
! * when the system is idle. That wastes log space, and more importantly it
! * exposes us to possible loss of both current and previous checkpoint
! * records if the machine crashes just as we're writing the update.
! * (Perhaps it'd make even more sense to checkpoint only when the previous
! * checkpoint record is in a different xlog page?)
! *
! * We have to make two tests to determine that nothing has happened since
! * the start of the last checkpoint: current insertion point must match
! * the end of the last checkpoint record, and its redo pointer must point
! * to itself.
! */
! if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0)
  {
! XLogRecPtr curInsert;
 
! INSERT_RECPTR(curInsert, Insert, Insert->curridx);
! if (curInsert.xlogid == ControlFile->checkPoint.xlogid &&
! curInsert.xrecoff == ControlFile->checkPoint.xrecoff +
! MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
! ControlFile->checkPoint.xlogid ==
! ControlFile->checkPointCopy.redo.xlogid &&
! ControlFile->checkPoint.xrecoff ==
! ControlFile->checkPointCopy.redo.xrecoff)
  {
! LWLockRelease(WALInsertLock);
! LWLockRelease(CheckpointLock);
! END_CRIT_SECTION();
! return;
! }
! }
 
! /*
! * Compute new REDO record ptr = location of next XLOG record.
! *
! * NB: this is NOT necessarily where the checkpoint record itself will be,
! * since other backends may insert more XLOG records while we're off doing
! * the buffer flush work.  Those XLOG records are logically after the
! * checkpoint, even though physically before it.  Got that?
! */
! freespace = INSERT_FREESPACE(Insert);
! if (freespace < SizeOfXLogRecord)
! {
! (void) AdvanceXLInsertBuffer(false);
! /* OK to ignore update return flag, since we will do flush anyway */
! freespace = INSERT_FREESPACE(Insert);
! }
! INSERT_RECPTR(checkPoint.redo, Insert, Insert->curridx);
 
! /*
! * Here we update the shared RedoRecPtr for future XLogInsert calls; this
! * must be done while holding the insert lock AND the info_lck.
! *
! * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
! * pointing past where it really needs to point.  This is okay; the only
! * consequence is that XLogInsert might back up whole buffers that it
! * didn't really need to.  We can't postpone advancing RedoRecPtr because
! * XLogInserts that happen while we are dumping buffers must assume that
! * their buffer changes are not included in the checkpoint.
! */
! {
! /* use volatile pointer to prevent code rearrangement */
! volatile XLogCtlData *xlogctl = XLogCtl;
 
! SpinLockAcquire(&xlogctl->info_lck);
! RedoRecPtr = xlogctl->Insert.RedoRecPtr = checkPoint.redo;
! SpinLockRelease(&xlogctl->info_lck);
  }
 
  /*
- * Now we can release WAL insert lock, allowing other xacts to proceed
- * while we are flushing disk buffers.
- */
- LWLockRelease(WALInsertLock);
-
- /*
  * If enabled, log checkpoint start.  We postpone this until now so as not
  * to log anything if we decided to skip the checkpoint.
  */
--- 5957,6025 ----
  checkPoint.ThisTimeLineID = ThisTimeLineID;
  checkPoint.time = (pg_time_t) time(NULL);
 
! if (leavingArchiveRecovery)
! checkPoint.redo = GetRedoLocationForArchiveCheckpoint();
! else
  {
! /*
! * We must hold WALInsertLock while examining insert state to determine
! * the checkpoint REDO pointer.
! */
! LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
 
! /*
! * If this isn't a shutdown or forced checkpoint, and we have not inserted
! * any XLOG records since the start of the last checkpoint, skip the
! * checkpoint. The idea here is to avoid inserting duplicate checkpoints
! * when the system is idle. That wastes log space, and more importantly it
! * exposes us to possible loss of both current and previous checkpoint
! * records if the machine crashes just as we're writing the update.
! * (Perhaps it'd make even more sense to checkpoint only when the previous
! * checkpoint record is in a different xlog page?)
! *
! * We have to make two tests to determine that nothing has happened since
! * the start of the last checkpoint: current insertion point must match
! * the end of the last checkpoint record, and its redo pointer must point
! * to itself.
! */
! if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0)
  {
! XLogRecPtr curInsert;
 
! INSERT_RECPTR(curInsert, Insert, Insert->curridx);
! if (curInsert.xlogid == ControlFile->checkPoint.xlogid &&
! curInsert.xrecoff == ControlFile->checkPoint.xrecoff +
! MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &&
! ControlFile->checkPoint.xlogid ==
! ControlFile->checkPointCopy.redo.xlogid &&
! ControlFile->checkPoint.xrecoff ==
! ControlFile->checkPointCopy.redo.xrecoff)
! {
! LWLockRelease(WALInsertLock);
! LWLockRelease(CheckpointLock);
! END_CRIT_SECTION();
! return;
! }
! }
 
! /*
! * Compute new REDO record ptr = location of next XLOG record.
! *
! * NB: this is NOT necessarily where the checkpoint record itself will be,
! * since other backends may insert more XLOG records while we're off doing
! * the buffer flush work.  Those XLOG records are logically after the
! * checkpoint, even though physically before it.  Got that?
! */
! checkPoint.redo = GetRedoLocationForCheckpoint();
 
! /*
! * Now we can release WAL insert lock, allowing other xacts to proceed
! * while we are flushing disk buffers.
! */
! LWLockRelease(WALInsertLock);
  }
 
  /*
  * If enabled, log checkpoint start.  We postpone this until now so as not
  * to log anything if we decided to skip the checkpoint.
  */
***************
*** 5941,5958 ****
  XLByteToSeg(ControlFile->checkPointCopy.redo, _logId, _logSeg);
 
  /*
! * Update the control file.
  */
  LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
  if (shutdown)
  ControlFile->state = DB_SHUTDOWNED;
  ControlFile->prevCheckPoint = ControlFile->checkPoint;
  ControlFile->checkPoint = ProcLastRecPtr;
  ControlFile->checkPointCopy = checkPoint;
  ControlFile->time = (pg_time_t) time(NULL);
  UpdateControlFile();
  LWLockRelease(ControlFileLock);
 
  /* Update shared-memory copy of checkpoint XID/epoch */
  {
  /* use volatile pointer to prevent code rearrangement */
--- 6126,6168 ----
  XLByteToSeg(ControlFile->checkPointCopy.redo, _logId, _logSeg);
 
  /*
! * Update the control file. In 8.4, this routine becomes the primary
! * point for recording changes of state in the control file at the
! * end of recovery. Postmaster state already shows us being in
! * normal running mode, but it is only after this point that we
! * are completely free of reperforming a recovery if we crash.  Note
! * that this is executed by bgwriter after the death of Startup process.
  */
  LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
  if (shutdown)
  ControlFile->state = DB_SHUTDOWNED;
+ else
+ ControlFile->state = DB_IN_PRODUCTION;
+
  ControlFile->prevCheckPoint = ControlFile->checkPoint;
  ControlFile->checkPoint = ProcLastRecPtr;
  ControlFile->checkPointCopy = checkPoint;
  ControlFile->time = (pg_time_t) time(NULL);
  UpdateControlFile();
+
  LWLockRelease(ControlFileLock);
 
+ if (leavingArchiveRecovery)
+ {
+ /*
+ * Rename the config file out of the way, so that we don't accidentally
+ * re-enter archive recovery mode in a subsequent crash. Prior to
+ * 8.4 this step was performed at end of exitArchiveRecovery().
+ */
+ unlink(RECOVERY_COMMAND_DONE);
+ if (rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not rename file \"%s\" to \"%s\": %m",
+ RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
+ }
+
  /* Update shared-memory copy of checkpoint XID/epoch */
  {
  /* use volatile pointer to prevent code rearrangement */
***************
*** 5999,6014 ****
  * in subtrans.c). During recovery, though, we mustn't do this because
  * StartupSUBTRANS hasn't been called yet.
  */
! if (!InRecovery)
! TruncateSUBTRANS(GetOldestXmin(true, false));
 
  /* All real work is done, but log before releasing lock. */
  if (log_checkpoints)
! LogCheckpointEnd();
 
  LWLockRelease(CheckpointLock);
  }
 
  /*
   * Flush all data in shared memory to disk, and fsync
   *
--- 6209,6268 ----
  * in subtrans.c). During recovery, though, we mustn't do this because
  * StartupSUBTRANS hasn't been called yet.
  */
! TruncateSUBTRANS(GetOldestXmin(true, false));
 
  /* All real work is done, but log before releasing lock. */
  if (log_checkpoints)
! LogCheckpointEnd(flags);
 
  LWLockRelease(CheckpointLock);
  }
 
+ /*
+  * GetRedoLocationForCheckpoint()
+  *
+  * When !IsRecoveryProcessingMode() this must be called while holding
+  * WALInsertLock().
+  */
+ static XLogRecPtr
+ GetRedoLocationForCheckpoint()
+ {
+ XLogCtlInsert  *Insert = &XLogCtl->Insert;
+ uint32 freespace;
+ XLogRecPtr redo;
+
+ freespace = INSERT_FREESPACE(Insert);
+ if (freespace < SizeOfXLogRecord)
+ {
+ (void) AdvanceXLInsertBuffer(false);
+ /* OK to ignore update return flag, since we will do flush anyway */
+ freespace = INSERT_FREESPACE(Insert);
+ }
+ INSERT_RECPTR(redo, Insert, Insert->curridx);
+
+ /*
+ * Here we update the shared RedoRecPtr for future XLogInsert calls; this
+ * must be done while holding the insert lock AND the info_lck.
+ *
+ * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
+ * pointing past where it really needs to point.  This is okay; the only
+ * consequence is that XLogInsert might back up whole buffers that it
+ * didn't really need to.  We can't postpone advancing RedoRecPtr because
+ * XLogInserts that happen while we are dumping buffers must assume that
+ * their buffer changes are not included in the checkpoint.
+ */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile XLogCtlData *xlogctl = XLogCtl;
+
+ SpinLockAcquire(&xlogctl->info_lck);
+ RedoRecPtr = xlogctl->Insert.RedoRecPtr = redo;
+ SpinLockRelease(&xlogctl->info_lck);
+ }
+
+ return redo;
+ }
+
  /*
   * Flush all data in shared memory to disk, and fsync
   *
***************
*** 6073,6101 ****
  }
  }
 
  /*
! * OK, force data out to disk
  */
! CheckPointGuts(checkPoint->redo, CHECKPOINT_IMMEDIATE);
 
  /*
! * Update pg_control so that any subsequent crash will restart from this
! * checkpoint. Note: ReadRecPtr gives the XLOG address of the checkpoint
! * record itself.
  */
  ControlFile->prevCheckPoint = ControlFile->checkPoint;
! ControlFile->checkPoint = ReadRecPtr;
! ControlFile->checkPointCopy = *checkPoint;
  ControlFile->time = (pg_time_t) time(NULL);
  UpdateControlFile();
 
! ereport((recoveryLogRestartpoints ? LOG : DEBUG2),
  (errmsg("recovery restart point at %X/%X",
! checkPoint->redo.xlogid, checkPoint->redo.xrecoff)));
  if (recoveryLastXTime)
! ereport((recoveryLogRestartpoints ? LOG : DEBUG2),
! (errmsg("last completed transaction was at log time %s",
! timestamptz_to_str(recoveryLastXTime))));
  }
 
  /*
--- 6327,6395 ----
  }
  }
 
+ RequestRestartPoint(ReadRecPtr, checkPoint, reachedSafeStartPoint);
+ }
+
+ /*
+  * As of 8.4, RestartPoints are always created by the bgwriter
+  * once we have reachedSafeStartPoint. We use bgwriter's shared memory
+  * area wherever we call it from, to keep better code structure.
+  */
+ void
+ CreateRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, int flags)
+ {
+ if (log_checkpoints)
+ {
+ /*
+ * Prepare to accumulate statistics.
+ */
+
+ MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
+ CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
+
+ LogCheckpointStart(CHECKPOINT_RESTARTPOINT | flags);
+ }
+
  /*
! * Acquire CheckpointLock to ensure only one restartpoint happens at a time.
! * We rely on this lock to ensure that the startup process doesn't exit
! * Recovery while we are half way through a restartpoint.
  */
! LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
!
! CheckPointGuts(restartPoint->redo, CHECKPOINT_RESTARTPOINT | flags);
 
  /*
! * Update pg_control, using current time
  */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
  ControlFile->prevCheckPoint = ControlFile->checkPoint;
! ControlFile->checkPoint = ReadPtr;
! ControlFile->checkPointCopy = *restartPoint;
  ControlFile->time = (pg_time_t) time(NULL);
  UpdateControlFile();
+ LWLockRelease(ControlFileLock);
 
! /*
! * Currently, there is no need to truncate pg_subtrans during recovery.
! * If we did do that, we will need to have called StartupSUBTRANS()
! * already and then TruncateSUBTRANS() would go here.
! */
!
! /* All real work is done, but log before releasing lock. */
! if (log_checkpoints)
! LogCheckpointEnd(CHECKPOINT_RESTARTPOINT);
!
! ereport((log_checkpoints ? LOG : DEBUG2),
  (errmsg("recovery restart point at %X/%X",
! restartPoint->redo.xlogid, restartPoint->redo.xrecoff)));
!
  if (recoveryLastXTime)
! ereport((log_checkpoints ? LOG : DEBUG2),
! (errmsg("last completed transaction was at log time %s",
! timestamptz_to_str(recoveryLastXTime))));
!
! LWLockRelease(CheckpointLock);
  }
 
  /*
***************
*** 6160,6166 ****
  }
 
  /*
!  * XLOG resource manager's routines
   */
  void
  xlog_redo(XLogRecPtr lsn, XLogRecord *record)
--- 6454,6516 ----
  }
 
  /*
!  * exitRecovery()
!  *
!  * Exit recovery state and write a XLOG_RECOVERY_END record. This is the
!  * only record type that can record a change of timelineID. We assume
!  * caller has already set ThisTimeLineID, if appropriate.
!  */
! static void
! exitRecovery(void)
! {
! XLogRecData rdata;
!
! rdata.buffer = InvalidBuffer;
! rdata.data = (char *) (&ThisTimeLineID);
! rdata.len = sizeof(TimeLineID);
! rdata.next = NULL;
!
! /*
! * If a restartpoint is in progress, we will not be able to successfully
! * acquire CheckpointLock. If bgwriter is still in progress then send
! * a second signal to nudge bgwriter to go faster so we can avoid delay.
! * Then wait for lock, so we know the restartpoint has completed. We do
! * this because we don't want to interrupt the restartpoint half way
! * through, which might leave us in a mess and we want to be robust. We're
! * going to checkpoint soon anyway, so not it's not wasted effort.
! */
! if (LWLockConditionalAcquire(CheckpointLock, LW_EXCLUSIVE))
! LWLockRelease(CheckpointLock);
! else
! {
! RequestRestartPointCompletion();
! ereport(LOG,
! (errmsg("startup process waiting for restartpoint to complete")));
! LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
! LWLockRelease(CheckpointLock);
! }
!
! /*
! * This is the only type of WAL message that can be inserted during
! * recovery. This ensures that we don't allow others to get access
! * until after we have changed state.
! */
! (void) XLogInsert(RM_XLOG_ID, XLOG_RECOVERY_END, &rdata);
!
! /*
! * We don't XLogFlush() here otherwise we'll end up zeroing the WAL
! * file ourselves. So just let bgwriter's forthcoming checkpoint do
! * that for us.
! */
!
! InRecovery = false;
! }
!
! /*
!  * XLOG resource manager's routines.
!  *
!  * Definitions of message info are in include/catalog/pg_control.h,
!  * though not all messages relate to control file processing.
   */
  void
  xlog_redo(XLogRecPtr lsn, XLogRecord *record)
***************
*** 6195,6215 ****
  ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 
  /*
! * TLI may change in a shutdown checkpoint, but it shouldn't decrease
  */
! if (checkPoint.ThisTimeLineID != ThisTimeLineID)
  {
! if (checkPoint.ThisTimeLineID < ThisTimeLineID ||
  !list_member_int(expectedTLIs,
! (int) checkPoint.ThisTimeLineID))
  ereport(PANIC,
! (errmsg("unexpected timeline ID %u (after %u) in checkpoint record",
! checkPoint.ThisTimeLineID, ThisTimeLineID)));
  /* Following WAL records should be run with new TLI */
! ThisTimeLineID = checkPoint.ThisTimeLineID;
  }
-
- RecoveryRestartPoint(&checkPoint);
  }
  else if (info == XLOG_CHECKPOINT_ONLINE)
  {
--- 6545,6582 ----
  ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 
  /*
! * TLI no longer changes at shutdown checkpoint, since as of 8.4,
! * shutdown checkpoints only occur at shutdown. Much less confusing.
  */
!
! RecoveryRestartPoint(&checkPoint);
! }
! else if (info == XLOG_RECOVERY_END)
! {
! TimeLineID tli;
!
! memcpy(&tli, XLogRecGetData(record), sizeof(TimeLineID));
!
! /*
! * TLI may change when recovery ends, but it shouldn't decrease.
! *
! * This is the only WAL record that can tell us to change timelineID
! * while we process WAL records.
! *
! * We can *choose* to stop recovery at any point, generating a
! * new timelineID which is recorded using this record type.
! */
! if (tli != ThisTimeLineID)
  {
! if (tli < ThisTimeLineID ||
  !list_member_int(expectedTLIs,
! (int) tli))
  ereport(PANIC,
! (errmsg("unexpected timeline ID %u (after %u) at recovery end record",
! tli, ThisTimeLineID)));
  /* Following WAL records should be run with new TLI */
! ThisTimeLineID = tli;
  }
  }
  else if (info == XLOG_CHECKPOINT_ONLINE)
  {
***************
*** 6232,6238 ****
  ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
  ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 
! /* TLI should not change in an on-line checkpoint */
  if (checkPoint.ThisTimeLineID != ThisTimeLineID)
  ereport(PANIC,
  (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
--- 6599,6605 ----
  ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
  ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 
! /* TLI must not change at a checkpoint */
  if (checkPoint.ThisTimeLineID != ThisTimeLineID)
  ereport(PANIC,
  (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
***************
*** 6290,6296 ****
  }
 
  #ifdef WAL_DEBUG
-
  static void
  xlog_outrec(StringInfo buf, XLogRecord *record)
  {
--- 6657,6662 ----
***************
*** 6310,6316 ****
  }
  #endif   /* WAL_DEBUG */
 
-
  /*
   * Return the (possible) sync flag used for opening a file, depending on the
   * value of the GUC wal_sync_method.
--- 6676,6681 ----
***************
*** 6449,6454 ****
--- 6814,6820 ----
  uint32 _logSeg;
  struct stat stat_buf;
  FILE   *fp;
+ bool immediate_checkpoint = false;
 
  if (!superuser())
  ereport(ERROR,
***************
*** 6502,6516 ****
  /* Ensure we release forcePageWrites if fail below */
  PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
  {
  /*
  * Force a CHECKPOINT. Aside from being necessary to prevent torn
  * page problems, this guarantees that two successive backup runs will
  * have different checkpoint positions and hence different history
  * file names, even if nothing happened in between.
- *
- * We don't use CHECKPOINT_IMMEDIATE, hence this can take awhile.
  */
! RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT);
 
  /*
  * Now we need to fetch the checkpoint record location, and also its
--- 6868,6905 ----
  /* Ensure we release forcePageWrites if fail below */
  PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
  {
+ bool flags = CHECKPOINT_FORCE | CHECKPOINT_WAIT;
+
+ /*
+ * We support both variants of the pg_start_backup() SQL function
+ * with a single C function. If we requested two parameter variant,
+ * then get the value for the second parameter.
+ */
+ if (PG_NARGS() == 2)
+ {
+ immediate_checkpoint = PG_GETARG_BOOL(1);
+
+ /* By default, this can take some time */
+ if (immediate_checkpoint)
+ {
+ flags |= CHECKPOINT_IMMEDIATE;
+ ereport(NOTICE,
+ (errmsg("pg_start_backup() signalling for immediate checkpoint")));
+ }
+ else
+ ereport(NOTICE,
+ (errmsg("pg_start_backup() signalling for smooth checkpoint"
+ ", may last up to %u s",
+ (int) (CheckPointTimeout * CheckPointCompletionTarget))));
+ }
+
  /*
  * Force a CHECKPOINT. Aside from being necessary to prevent torn
  * page problems, this guarantees that two successive backup runs will
  * have different checkpoint positions and hence different history
  * file names, even if nothing happened in between.
  */
! RequestCheckpoint(flags);
 
  /*
  * Now we need to fetch the checkpoint record location, and also its
***************
*** 6639,6651 ****
  LWLockRelease(WALInsertLock);
 
  /*
! * Force a switch to a new xlog segment file, so that the backup is valid
  * as soon as archiver moves out the current segment file. We'll report
  * the end address of the XLOG SWITCH record as the backup stopping point.
  */
  stoppoint = RequestXLogSwitch();
 
  XLByteToSeg(stoppoint, _logId, _logSeg);
  XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
 
  /* Use the log timezone here, not the session timezone */
--- 7028,7049 ----
  LWLockRelease(WALInsertLock);
 
  /*
! * Request switch to a new xlog segment file, so that the backup is valid
  * as soon as archiver moves out the current segment file. We'll report
  * the end address of the XLOG SWITCH record as the backup stopping point.
  */
  stoppoint = RequestXLogSwitch();
 
  XLByteToSeg(stoppoint, _logId, _logSeg);
+
+ /*
+ * If we didn't actually switch xlog files then there is nothing in
+ * this file for us to wait for, so set stopxlogfilename to be the
+ * previous file instead. We still report the same ending location.
+ */
+ if ((stoppoint.xrecoff % XLogSegSize) == 0)
+ PrevLogSeg(_logId, _logSeg);
+
  XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
 
  /* Use the log timezone here, not the session timezone */
***************
*** 6741,6747 ****
  BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
   startpoint.xrecoff % XLogSegSize);
 
! seconds_before_warning = 60;
  waits = 0;
 
  while (XLogArchiveIsBusy(stopxlogfilename) ||
--- 7139,7145 ----
  BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
   startpoint.xrecoff % XLogSegSize);
 
! seconds_before_warning = 10;
  waits = 0;
 
  while (XLogArchiveIsBusy(stopxlogfilename) ||
Index: src/backend/postmaster/bgwriter.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/postmaster/bgwriter.c,v
retrieving revision 1.51
diff -c -r1.51 bgwriter.c
*** src/backend/postmaster/bgwriter.c 11 Aug 2008 11:05:11 -0000 1.51
--- src/backend/postmaster/bgwriter.c 30 Sep 2008 18:33:55 -0000
***************
*** 49,54 ****
--- 49,55 ----
  #include <unistd.h>
 
  #include "access/xlog_internal.h"
+ #include "catalog/pg_control.h"
  #include "libpq/pqsignal.h"
  #include "miscadmin.h"
  #include "pgstat.h"
***************
*** 130,135 ****
--- 131,143 ----
 
  int ckpt_flags; /* checkpoint flags, as defined in xlog.h */
 
+ /*
+ * When the Startup process wants bgwriter to perform a restartpoint, it
+ * sets these fields so that we can update the control file afterwards.
+ */
+ XLogRecPtr ReadPtr; /* Requested log pointer */
+ CheckPoint  restartPoint; /* restartPoint data for ControlFile */
+
  uint32 num_backend_writes; /* counts non-bgwriter buffer writes */
 
  int num_requests; /* current # of requests */
***************
*** 166,172 ****
 
  /* these values are valid when ckpt_active is true: */
  static pg_time_t ckpt_start_time;
! static XLogRecPtr ckpt_start_recptr;
  static double ckpt_cached_elapsed;
 
  static pg_time_t last_checkpoint_time;
--- 174,180 ----
 
  /* these values are valid when ckpt_active is true: */
  static pg_time_t ckpt_start_time;
! static XLogRecPtr ckpt_start_recptr; /* not used if IsRecoveryProcessingMode */
  static double ckpt_cached_elapsed;
 
  static pg_time_t last_checkpoint_time;
***************
*** 198,203 ****
--- 206,212 ----
  {
  sigjmp_buf local_sigjmp_buf;
  MemoryContext bgwriter_context;
+ bool BgWriterRecoveryMode;
 
  BgWriterShmem->bgwriter_pid = MyProcPid;
  am_bg_writer = true;
***************
*** 356,371 ****
  */
  PG_SETMASK(&UnBlockSig);
 
  /*
  * Loop forever
  */
  for (;;)
  {
- bool do_checkpoint = false;
- int flags = 0;
- pg_time_t now;
- int elapsed_secs;
-
  /*
  * Emergency bailout if postmaster has died.  This is to avoid the
  * necessity for manual cleanup of all postmaster children.
--- 365,381 ----
  */
  PG_SETMASK(&UnBlockSig);
 
+ BgWriterRecoveryMode = IsRecoveryProcessingMode();
+
+ if (BgWriterRecoveryMode)
+ elog(DEBUG1, "bgwriter starting during recovery, pid = %u",
+ BgWriterShmem->bgwriter_pid);
+
  /*
  * Loop forever
  */
  for (;;)
  {
  /*
  * Emergency bailout if postmaster has died.  This is to avoid the
  * necessity for manual cleanup of all postmaster children.
***************
*** 383,501 ****
  got_SIGHUP = false;
  ProcessConfigFile(PGC_SIGHUP);
  }
- if (checkpoint_requested)
- {
- checkpoint_requested = false;
- do_checkpoint = true;
- BgWriterStats.m_requested_checkpoints++;
- }
- if (shutdown_requested)
- {
- /*
- * From here on, elog(ERROR) should end with exit(1), not send
- * control back to the sigsetjmp block above
- */
- ExitOnAnyError = true;
- /* Close down the database */
- ShutdownXLOG(0, 0);
- DumpFreeSpaceMap(0, 0);
- /* Normal exit from the bgwriter is here */
- proc_exit(0); /* done */
- }
 
! /*
! * Force a checkpoint if too much time has elapsed since the last one.
! * Note that we count a timed checkpoint in stats only when this
! * occurs without an external request, but we set the CAUSE_TIME flag
! * bit even if there is also an external request.
! */
! now = (pg_time_t) time(NULL);
! elapsed_secs = now - last_checkpoint_time;
! if (elapsed_secs >= CheckPointTimeout)
  {
! if (!do_checkpoint)
! BgWriterStats.m_timed_checkpoints++;
! do_checkpoint = true;
! flags |= CHECKPOINT_CAUSE_TIME;
  }
!
! /*
! * Do a checkpoint if requested, otherwise do one cycle of
! * dirty-buffer writing.
! */
! if (do_checkpoint)
  {
! /* use volatile pointer to prevent code rearrangement */
! volatile BgWriterShmemStruct *bgs = BgWriterShmem;
 
  /*
! * Atomically fetch the request flags to figure out what kind of a
! * checkpoint we should perform, and increase the started-counter
! * to acknowledge that we've started a new checkpoint.
  */
! SpinLockAcquire(&bgs->ckpt_lck);
! flags |= bgs->ckpt_flags;
! bgs->ckpt_flags = 0;
! bgs->ckpt_started++;
! SpinLockRelease(&bgs->ckpt_lck);
 
  /*
! * We will warn if (a) too soon since last checkpoint (whatever
! * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
! * since the last checkpoint start.  Note in particular that this
! * implementation will not generate warnings caused by
! * CheckPointTimeout < CheckPointWarning.
  */
! if ((flags & CHECKPOINT_CAUSE_XLOG) &&
! elapsed_secs < CheckPointWarning)
! ereport(LOG,
! (errmsg("checkpoints are occurring too frequently (%d seconds apart)",
! elapsed_secs),
! errhint("Consider increasing the configuration parameter \"checkpoint_segments\".")));
 
! /*
! * Initialize bgwriter-private variables used during checkpoint.
! */
! ckpt_active = true;
! ckpt_start_recptr = GetInsertRecPtr();
! ckpt_start_time = now;
! ckpt_cached_elapsed = 0;
 
! /*
! * Do the checkpoint.
! */
! CreateCheckPoint(flags);
!
! /*
! * After any checkpoint, close all smgr files. This is so we
! * won't hang onto smgr references to deleted files indefinitely.
! */
! smgrcloseall();
!
! /*
! * Indicate checkpoint completion to any waiting backends.
! */
! SpinLockAcquire(&bgs->ckpt_lck);
! bgs->ckpt_done = bgs->ckpt_started;
! SpinLockRelease(&bgs->ckpt_lck);
!
! ckpt_active = false;
!
! /*
! * Note we record the checkpoint start time not end time as
! * last_checkpoint_time.  This is so that time-driven checkpoints
! * happen at a predictable spacing.
! */
! last_checkpoint_time = now;
  }
- else
- BgBufferSync();
-
- /* Check for archive_timeout and switch xlog files if necessary. */
- CheckArchiveTimeout();
-
- /* Nap for the configured time. */
- BgWriterNap();
  }
  }
 
--- 393,599 ----
  got_SIGHUP = false;
  ProcessConfigFile(PGC_SIGHUP);
  }
 
! if (BgWriterRecoveryMode)
  {
! if (shutdown_requested)
! {
! /*
! * From here on, elog(ERROR) should end with exit(1), not send
! * control back to the sigsetjmp block above
! */
! ExitOnAnyError = true;
! /* Normal exit from the bgwriter is here */
! proc_exit(0); /* done */
! }
!
! if (!IsRecoveryProcessingMode())
! {
! elog(DEBUG2, "bgwriter changing from recovery to normal mode");
!
! InitXLOGAccess();
! BgWriterRecoveryMode = false;
!
! /*
! * Start time-driven events from now
! */
! last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL);
!
! /*
! * Notice that we do *not* act on a checkpoint_requested
! * state at this point. We have changed mode, so we wish to
! * perform a checkpoint not a restartpoint.
! */
! continue;
! }
!
! if (checkpoint_requested)
! {
! XLogRecPtr ReadPtr;
! CheckPoint restartPoint;
!
! checkpoint_requested = false;
!
! /*
! * Initialize bgwriter-private variables used during checkpoint.
! */
! ckpt_active = true;
! ckpt_start_time = (pg_time_t) time(NULL);
! ckpt_cached_elapsed = 0;
!
! /*
! * Get the requested values from shared memory that the
! * Startup process has put there for us.
! */
! SpinLockAcquire(&BgWriterShmem->ckpt_lck);
! ReadPtr = BgWriterShmem->ReadPtr;
! memcpy(&restartPoint, &BgWriterShmem->restartPoint, sizeof(CheckPoint));
! SpinLockRelease(&BgWriterShmem->ckpt_lck);
!
! /* Use smoothed writes, until interrupted if ever */
! CreateRestartPoint(ReadPtr, &restartPoint, 0);
!
! /*
! * After any checkpoint, close all smgr files. This is so we
! * won't hang onto smgr references to deleted files indefinitely.
! */
! smgrcloseall();
!
! ckpt_active = false;
! checkpoint_requested = false;
! }
! else
! {
! /* Clean buffers dirtied by recovery */
! BgBufferSync();
!
! /* Nap for the configured time. */
! BgWriterNap();
! }
  }
! else /* Normal processing */
  {
! bool do_checkpoint = false;
! int flags = 0;
! pg_time_t now;
! int elapsed_secs;
!
! Assert(!IsRecoveryProcessingMode());
!
! if (checkpoint_requested)
! {
! checkpoint_requested = false;
! do_checkpoint = true;
! BgWriterStats.m_requested_checkpoints++;
! }
! if (shutdown_requested)
! {
! /*
! * From here on, elog(ERROR) should end with exit(1), not send
! * control back to the sigsetjmp block above
! */
! ExitOnAnyError = true;
! /* Close down the database */
! ShutdownXLOG(0, 0);
! DumpFreeSpaceMap(0, 0);
! /* Normal exit from the bgwriter is here */
! proc_exit(0); /* done */
! }
 
  /*
! * Force a checkpoint if too much time has elapsed since the last one.
! * Note that we count a timed checkpoint in stats only when this
! * occurs without an external request, but we set the CAUSE_TIME flag
! * bit even if there is also an external request.
  */
! now = (pg_time_t) time(NULL);
! elapsed_secs = now - last_checkpoint_time;
! if (elapsed_secs >= CheckPointTimeout)
! {
! if (!do_checkpoint)
! BgWriterStats.m_timed_checkpoints++;
! do_checkpoint = true;
! flags |= CHECKPOINT_CAUSE_TIME;
! }
 
  /*
! * Do a checkpoint if requested, otherwise do one cycle of
! * dirty-buffer writing.
  */
! if (do_checkpoint)
! {
! /* use volatile pointer to prevent code rearrangement */
! volatile BgWriterShmemStruct *bgs = BgWriterShmem;
!
! /*
! * Atomically fetch the request flags to figure out what kind of a
! * checkpoint we should perform, and increase the started-counter
! * to acknowledge that we've started a new checkpoint.
! */
! SpinLockAcquire(&bgs->ckpt_lck);
! flags |= bgs->ckpt_flags;
! bgs->ckpt_flags = 0;
! bgs->ckpt_started++;
! SpinLockRelease(&bgs->ckpt_lck);
!
! /*
! * We will warn if (a) too soon since last checkpoint (whatever
! * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
! * since the last checkpoint start.  Note in particular that this
! * implementation will not generate warnings caused by
! * CheckPointTimeout < CheckPointWarning.
! */
! if ((flags & CHECKPOINT_CAUSE_XLOG) &&
! elapsed_secs < CheckPointWarning)
! ereport(LOG,
! (errmsg("checkpoints are occurring too frequently (%d seconds apart)",
! elapsed_secs),
! errhint("Consider increasing the configuration parameter \"checkpoint_segments\".")));
!
! /*
! * Initialize bgwriter-private variables used during checkpoint.
! */
! ckpt_active = true;
! ckpt_start_recptr = GetInsertRecPtr();
! ckpt_start_time = now;
! ckpt_cached_elapsed = 0;
!
! /*
! * Do the checkpoint.
! */
! CreateCheckPoint(flags);
!
! /*
! * After any checkpoint, close all smgr files. This is so we
! * won't hang onto smgr references to deleted files indefinitely.
! */
! smgrcloseall();
!
! /*
! * Indicate checkpoint completion to any waiting backends.
! */
! SpinLockAcquire(&bgs->ckpt_lck);
! bgs->ckpt_done = bgs->ckpt_started;
! SpinLockRelease(&bgs->ckpt_lck);
!
! ckpt_active = false;
!
! /*
! * Note we record the checkpoint start time not end time as
! * last_checkpoint_time.  This is so that time-driven checkpoints
! * happen at a predictable spacing.
! */
! last_checkpoint_time = now;
! }
! else
! BgBufferSync();
 
! /* Check for archive_timeout and switch xlog files if necessary. */
! CheckArchiveTimeout();
 
! /* Nap for the configured time. */
! BgWriterNap();
  }
  }
  }
 
***************
*** 588,594 ****
  (ckpt_active ? ImmediateCheckpointRequested() : checkpoint_requested))
  break;
  pg_usleep(1000000L);
! AbsorbFsyncRequests();
  udelay -= 1000000L;
  }
 
--- 686,693 ----
  (ckpt_active ? ImmediateCheckpointRequested() : checkpoint_requested))
  break;
  pg_usleep(1000000L);
! if (!IsRecoveryProcessingMode())
! AbsorbFsyncRequests();
  udelay -= 1000000L;
  }
 
***************
*** 642,647 ****
--- 741,759 ----
  if (!am_bg_writer)
  return;
 
+ /* Perform minimal duties during recovery and skip wait if requested */
+ if (IsRecoveryProcessingMode())
+ {
+ BgBufferSync();
+
+ if (!shutdown_requested &&
+ !checkpoint_requested &&
+ IsCheckpointOnSchedule(progress))
+ BgWriterNap();
+
+ return;
+ }
+
  /*
  * Perform the usual bgwriter duties and take a nap, unless we're behind
  * schedule, in which case we just try to catch up as quickly as possible.
***************
*** 716,731 ****
  * However, it's good enough for our purposes, we're only calculating an
  * estimate anyway.
  */
! recptr = GetInsertRecPtr();
! elapsed_xlogs =
! (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
! ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
! CheckPointSegments;
!
! if (progress < elapsed_xlogs)
  {
! ckpt_cached_elapsed = elapsed_xlogs;
! return false;
  }
 
  /*
--- 828,846 ----
  * However, it's good enough for our purposes, we're only calculating an
  * estimate anyway.
  */
! if (!IsRecoveryProcessingMode())
  {
! recptr = GetInsertRecPtr();
! elapsed_xlogs =
! (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
! ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
! CheckPointSegments;
!
! if (progress < elapsed_xlogs)
! {
! ckpt_cached_elapsed = elapsed_xlogs;
! return false;
! }
  }
 
  /*
***************
*** 967,972 ****
--- 1082,1158 ----
  }
 
  /*
+  * Always runs in Startup process (see xlog.c)
+  */
+ void
+ RequestRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, bool sendToBGWriter)
+ {
+ /*
+ * Should we just do it ourselves?
+ */
+ if (!IsPostmasterEnvironment || !sendToBGWriter)
+ {
+ CreateRestartPoint(ReadPtr, restartPoint, CHECKPOINT_IMMEDIATE);
+ return;
+ }
+
+ /*
+ * Push requested values into shared memory, then signal to request restartpoint.
+ */
+ if (BgWriterShmem->bgwriter_pid == 0)
+ elog(LOG, "could not request restartpoint because bgwriter not running");
+
+ #ifdef NOT_USED
+ elog(LOG, "tli = %u nextXidEpoch = %u nextXid = %u nextOid = %u",
+ restartPoint->ThisTimeLineID,
+ restartPoint->nextXidEpoch,
+ restartPoint->nextXid,
+ restartPoint->nextOid);
+ #endif
+
+ SpinLockAcquire(&BgWriterShmem->ckpt_lck);
+ BgWriterShmem->ReadPtr = ReadPtr;
+ memcpy(&BgWriterShmem->restartPoint, restartPoint, sizeof(CheckPoint));
+ SpinLockRelease(&BgWriterShmem->ckpt_lck);
+
+ if (kill(BgWriterShmem->bgwriter_pid, SIGINT) != 0)
+ elog(LOG, "could not signal for restartpoint: %m");
+ }
+
+ /*
+  * Sends another checkpoint request signal to bgwriter, which causes it
+  * to avoid smoothed writes and continue processing as if it had been
+  * called with CHECKPOINT_IMMEDIATE. This is used at the end of recovery.
+  */
+ void
+ RequestRestartPointCompletion(void)
+ {
+ if (BgWriterShmem->bgwriter_pid != 0 &&
+ kill(BgWriterShmem->bgwriter_pid, SIGINT) != 0)
+ elog(LOG, "could not signal for restartpoint immediate: %m");
+ }
+
+ XLogRecPtr
+ GetRedoLocationForArchiveCheckpoint(void)
+ {
+ XLogRecPtr redo;
+
+ SpinLockAcquire(&BgWriterShmem->ckpt_lck);
+ redo = BgWriterShmem->ReadPtr;
+ SpinLockRelease(&BgWriterShmem->ckpt_lck);
+
+ return redo;
+ }
+
+ void
+ SetRedoLocationForArchiveCheckpoint(XLogRecPtr redo)
+ {
+ SpinLockAcquire(&BgWriterShmem->ckpt_lck);
+ BgWriterShmem->ReadPtr = redo;
+ SpinLockRelease(&BgWriterShmem->ckpt_lck);
+ }
+
+ /*
   * ForwardFsyncRequest
   * Forward a file-fsync request from a backend to the bgwriter
   *
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.565
diff -c -r1.565 postmaster.c
*** src/backend/postmaster/postmaster.c 23 Sep 2008 20:35:38 -0000 1.565
--- src/backend/postmaster/postmaster.c 30 Sep 2008 17:15:15 -0000
***************
*** 254,259 ****
--- 254,264 ----
  {
  PM_INIT, /* postmaster starting */
  PM_STARTUP, /* waiting for startup subprocess */
+ PM_RECOVERY, /* consistent recovery mode; state only
+ * entered for archive and streaming recovery,
+ * and only after the point where the
+ * all data is in consistent state.
+ */
  PM_RUN, /* normal "database is alive" state */
  PM_WAIT_BACKUP, /* waiting for online backup mode to end */
  PM_WAIT_BACKENDS, /* waiting for live backends to exit */
***************
*** 1302,1308 ****
  * state that prevents it, start one.  It doesn't matter if this
  * fails, we'll just try again later.
  */
! if (BgWriterPID == 0 && pmState == PM_RUN)
  BgWriterPID = StartBackgroundWriter();
 
  /*
--- 1307,1313 ----
  * state that prevents it, start one.  It doesn't matter if this
  * fails, we'll just try again later.
  */
! if (BgWriterPID == 0 && (pmState == PM_RUN || pmState == PM_RECOVERY))
  BgWriterPID = StartBackgroundWriter();
 
  /*
***************
*** 2116,2122 ****
  if (pid == StartupPID)
  {
  StartupPID = 0;
! Assert(pmState == PM_STARTUP);
 
  /* FATAL exit of startup is treated as catastrophic */
  if (!EXIT_STATUS_0(exitstatus))
--- 2121,2127 ----
  if (pid == StartupPID)
  {
  StartupPID = 0;
! Assert(pmState == PM_STARTUP || pmState == PM_RECOVERY);
 
  /* FATAL exit of startup is treated as catastrophic */
  if (!EXIT_STATUS_0(exitstatus))
***************
*** 2157,2167 ****
  load_role();
 
  /*
! * Crank up the background writer. It doesn't matter if this
! * fails, we'll just try again later.
  */
! Assert(BgWriterPID == 0);
! BgWriterPID = StartBackgroundWriter();
 
  /*
  * Likewise, start other special children as needed.  In a restart
--- 2162,2172 ----
  load_role();
 
  /*
! * Check whether we need to start background writer, if not
! * already running.
  */
! if (BgWriterPID == 0)
! BgWriterPID = StartBackgroundWriter();
 
  /*
  * Likewise, start other special children as needed.  In a restart
***************
*** 3845,3850 ****
--- 3850,3900 ----
 
  PG_SETMASK(&BlockSig);
 
+ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
+ {
+ Assert(pmState == PM_STARTUP);
+
+ /*
+ * Go to shutdown mode if a shutdown request was pending.
+ */
+ if (Shutdown > NoShutdown)
+ {
+ pmState = PM_WAIT_BACKENDS;
+ /* PostmasterStateMachine logic does the rest */
+ }
+ else
+ {
+ /*
+ * Startup process has entered recovery
+ */
+ pmState = PM_RECOVERY;
+
+ /*
+ * Load the flat authorization file into postmaster's cache. The
+ * startup process won't have recomputed this from the database yet,
+ * so we it may change following recovery.
+ */
+ load_role();
+
+ /*
+ * Crank up the background writer. It doesn't matter if this
+ * fails, we'll just try again later.
+ */
+ Assert(BgWriterPID == 0);
+ BgWriterPID = StartBackgroundWriter();
+
+ /*
+ * Likewise, start other special children as needed.
+ */
+ Assert(PgStatPID == 0);
+ PgStatPID = pgstat_start();
+
+ /* XXX at this point we could accept read-only connections */
+ ereport(DEBUG1,
+ (errmsg("database system is in consistent recovery mode")));
+ }
+ }
+
  if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
  {
  /*
Index: src/backend/storage/buffer/README
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/buffer/README,v
retrieving revision 1.14
diff -c -r1.14 README
*** src/backend/storage/buffer/README 21 Mar 2008 13:23:28 -0000 1.14
--- src/backend/storage/buffer/README 30 Sep 2008 17:15:15 -0000
***************
*** 264,266 ****
--- 264,275 ----
  This ensures that the page image transferred to disk is reasonably consistent.
  We might miss a hint-bit update or two but that isn't a problem, for the same
  reasons mentioned under buffer access rules.
+
+ As of 8.4, background writer starts during recovery mode when there is
+ some form of potentially extended recovery to perform. It performs an
+ identical service to normal processing, except that checkpoints it
+ writes are technically restartpoints. Flushing outstanding WAL for dirty
+ buffers is also skipped, though there shouldn't ever be new WAL entries
+ at that time in any case. We could choose to start background writer
+ immediately but we hold off until we can prove the database is in a
+ consistent state so that postmaster has a single, clean state change.
Index: src/bin/pg_controldata/pg_controldata.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_controldata/pg_controldata.c,v
retrieving revision 1.41
diff -c -r1.41 pg_controldata.c
*** src/bin/pg_controldata/pg_controldata.c 24 Sep 2008 08:59:42 -0000 1.41
--- src/bin/pg_controldata/pg_controldata.c 30 Sep 2008 17:15:15 -0000
***************
*** 197,202 ****
--- 197,205 ----
  printf(_("Minimum recovery ending location:     %X/%X\n"),
    ControlFile.minRecoveryPoint.xlogid,
    ControlFile.minRecoveryPoint.xrecoff);
+ printf(_("Minimum safe starting location:       %X/%X\n"),
+   ControlFile.minSafeStartPoint.xlogid,
+   ControlFile.minSafeStartPoint.xrecoff);
  printf(_("Maximum data alignment:               %u\n"),
    ControlFile.maxAlign);
  /* we don't print floatFormat since can't say much useful about it */
Index: src/bin/pg_resetxlog/pg_resetxlog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_resetxlog/pg_resetxlog.c,v
retrieving revision 1.68
diff -c -r1.68 pg_resetxlog.c
*** src/bin/pg_resetxlog/pg_resetxlog.c 24 Sep 2008 09:00:44 -0000 1.68
--- src/bin/pg_resetxlog/pg_resetxlog.c 30 Sep 2008 17:15:15 -0000
***************
*** 595,600 ****
--- 595,602 ----
  ControlFile.prevCheckPoint.xrecoff = 0;
  ControlFile.minRecoveryPoint.xlogid = 0;
  ControlFile.minRecoveryPoint.xrecoff = 0;
+ ControlFile.minSafeStartPoint.xlogid = 0;
+ ControlFile.minSafeStartPoint.xrecoff = 0;
 
  /* Now we can force the recorded xlog seg size to the right thing. */
  ControlFile.xlog_seg_size = XLogSegSize;
Index: src/include/access/xlog.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/xlog.h,v
retrieving revision 1.88
diff -c -r1.88 xlog.h
*** src/include/access/xlog.h 12 May 2008 08:35:05 -0000 1.88
--- src/include/access/xlog.h 30 Sep 2008 17:15:15 -0000
***************
*** 133,139 ****
  } XLogRecData;
 
  extern TimeLineID ThisTimeLineID; /* current TLI */
! extern bool InRecovery;
  extern XLogRecPtr XactLastRecEnd;
 
  /* these variables are GUC parameters related to XLOG */
--- 133,148 ----
  } XLogRecData;
 
  extern TimeLineID ThisTimeLineID; /* current TLI */
!
! /*
!  * Prior to 8.4, all activity during recovery were carried out by Startup
!  * process. This local variable continues to be used in many parts of the
!  * code to indicate actions taken by RecoveryManagers. Other processes who
!  * potentially perform work during recovery should check
!  * IsRecoveryProcessingMode(), see XLogCtl notes in xlog.c
!  */
! extern bool InRecovery;
!
  extern XLogRecPtr XactLastRecEnd;
 
  /* these variables are GUC parameters related to XLOG */
***************
*** 166,171 ****
--- 175,181 ----
  /* These indicate the cause of a checkpoint request */
  #define CHECKPOINT_CAUSE_XLOG 0x0010 /* XLOG consumption */
  #define CHECKPOINT_CAUSE_TIME 0x0020 /* Elapsed time */
+ #define CHECKPOINT_RESTARTPOINT 0x0040 /* Restartpoint during recovery */
 
  /* Checkpoint statistics */
  typedef struct CheckpointStatsData
***************
*** 197,202 ****
--- 207,214 ----
  extern void xlog_redo(XLogRecPtr lsn, XLogRecord *record);
  extern void xlog_desc(StringInfo buf, uint8 xl_info, char *rec);
 
+ extern bool IsRecoveryProcessingMode(void);
+
  extern void UpdateControlFile(void);
  extern Size XLOGShmemSize(void);
  extern void XLOGShmemInit(void);
Index: src/include/access/xlog_internal.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/xlog_internal.h,v
retrieving revision 1.24
diff -c -r1.24 xlog_internal.h
*** src/include/access/xlog_internal.h 11 Aug 2008 11:05:11 -0000 1.24
--- src/include/access/xlog_internal.h 30 Sep 2008 17:15:15 -0000
***************
*** 17,22 ****
--- 17,23 ----
  #define XLOG_INTERNAL_H
 
  #include "access/xlog.h"
+ #include "catalog/pg_control.h"
  #include "fmgr.h"
  #include "pgtime.h"
  #include "storage/block.h"
***************
*** 245,250 ****
--- 246,254 ----
  extern pg_time_t GetLastSegSwitchTime(void);
  extern XLogRecPtr RequestXLogSwitch(void);
 
+ extern void CreateRestartPoint(const XLogRecPtr ReadPtr,
+ const CheckPoint *restartPoint, int flags);
+
  /*
   * These aren't in xlog.h because I'd rather not include fmgr.h there.
   */
Index: src/include/catalog/pg_control.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/catalog/pg_control.h,v
retrieving revision 1.42
diff -c -r1.42 pg_control.h
*** src/include/catalog/pg_control.h 23 Sep 2008 09:20:39 -0000 1.42
--- src/include/catalog/pg_control.h 30 Sep 2008 17:15:15 -0000
***************
*** 46,52 ****
  #define XLOG_NOOP 0x20
  #define XLOG_NEXTOID 0x30
  #define XLOG_SWITCH 0x40
!
 
  /* System status indicator */
  typedef enum DBState
--- 46,52 ----
  #define XLOG_NOOP 0x20
  #define XLOG_NEXTOID 0x30
  #define XLOG_SWITCH 0x40
! #define XLOG_RECOVERY_END 0x50
 
  /* System status indicator */
  typedef enum DBState
***************
*** 102,107 ****
--- 102,108 ----
  CheckPoint checkPointCopy; /* copy of last check point record */
 
  XLogRecPtr minRecoveryPoint; /* must replay xlog to here */
+ XLogRecPtr minSafeStartPoint; /* safe point after recovery crashes */
 
  /*
  * This data is used to check for hardware-architecture compatibility of
Index: src/include/postmaster/bgwriter.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/postmaster/bgwriter.h,v
retrieving revision 1.12
diff -c -r1.12 bgwriter.h
*** src/include/postmaster/bgwriter.h 11 Aug 2008 11:05:11 -0000 1.12
--- src/include/postmaster/bgwriter.h 30 Sep 2008 17:15:15 -0000
***************
*** 12,17 ****
--- 12,18 ----
  #ifndef _BGWRITER_H
  #define _BGWRITER_H
 
+ #include "catalog/pg_control.h"
  #include "storage/block.h"
  #include "storage/relfilenode.h"
 
***************
*** 25,30 ****
--- 26,36 ----
  extern void BackgroundWriterMain(void);
 
  extern void RequestCheckpoint(int flags);
+ extern void RequestRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, bool sendToBGWriter);
+ extern void RequestRestartPointCompletion(void);
+ extern XLogRecPtr GetRedoLocationForArchiveCheckpoint(void);
+ extern void SetRedoLocationForArchiveCheckpoint(XLogRecPtr redo);
+
  extern void CheckpointWriteDelay(int flags, double progress);
 
  extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
Index: src/include/storage/pmsignal.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/storage/pmsignal.h,v
retrieving revision 1.20
diff -c -r1.20 pmsignal.h
*** src/include/storage/pmsignal.h 19 Jun 2008 21:32:56 -0000 1.20
--- src/include/storage/pmsignal.h 30 Sep 2008 17:15:15 -0000
***************
*** 22,27 ****
--- 22,28 ----
   */
  typedef enum
  {
+ PMSIGNAL_RECOVERY_START, /* move to PM_RECOVERY state */
  PMSIGNAL_PASSWORD_CHANGE, /* pg_auth file has changed */
  PMSIGNAL_WAKEN_ARCHIVER, /* send a NOTIFY signal to xlog archiver */
  PMSIGNAL_ROTATE_LOGFILE, /* send SIGUSR1 to syslogger to rotate logfile */
Index: src/test/regress/expected/opr_sanity.out
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/test/regress/expected/opr_sanity.out,v
retrieving revision 1.84
diff -c -r1.84 opr_sanity.out
*** src/test/regress/expected/opr_sanity.out 16 Aug 2008 00:01:38 -0000 1.84
--- src/test/regress/expected/opr_sanity.out 30 Sep 2008 17:15:15 -0000
***************
*** 109,117 ****
       p1.proretset != p2.proretset OR
       p1.provolatile != p2.provolatile OR
       p1.pronargs != p2.pronargs);
!  oid | proname | oid | proname
! -----+---------+-----+---------
! (0 rows)
 
  -- Look for uses of different type OIDs in the argument/result type fields
  -- for different aliases of the same built-in function.
--- 109,118 ----
       p1.proretset != p2.proretset OR
       p1.provolatile != p2.provolatile OR
       p1.pronargs != p2.pronargs);
!  oid  |     proname     | oid  |     proname
! ------+-----------------+------+-----------------
!  2172 | pg_start_backup | 2176 | pg_start_backup
! (1 row)
 
  -- Look for uses of different type OIDs in the argument/result type fields
  -- for different aliases of the same built-in function.



--
Sent via pgsql-patches mailing list (pgsql-patches@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches