<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>tag:old.nabble.com,2006:forum-782</id>
	<title>Nabble - PostgreSQL - patches</title>
	<updated>2008-10-01T17:29:35Z</updated>
	<link rel="self" type="application/atom+xml" href="http://old.nabble.com/PostgreSQL---patches-f782.xml" />
	<link rel="alternate" type="text/html" href="http://old.nabble.com/PostgreSQL---patches-f782.html" />
	<subtitle type="html"></subtitle>
	
<entry>
	<id>tag:old.nabble.com,2006:post-19771595</id>
	<title>Re: still alive?</title>
	<published>2008-10-01T17:29:35Z</published>
	<updated>2008-10-01T17:29:35Z</updated>
	<author>
		<name>Alvaro Herrera-7</name>
	</author>
	<content type="html">Bruce Momjian wrote:
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Marc, care to do the honors?
&lt;br&gt;&amp;gt; 
&lt;br&gt;&lt;br&gt;Note:
&lt;br&gt;&lt;br&gt;1. there are several lists to kill, not just pgsql-patches. &amp;nbsp;The
&lt;br&gt;database says:
&lt;br&gt;&lt;br&gt;&amp;nbsp;pgsql-chat
&lt;br&gt;&amp;nbsp;pgsql-benchmarks
&lt;br&gt;&amp;nbsp;pgsql-hackers-win32
&lt;br&gt;&amp;nbsp;pgsql-hackers-pitr
&lt;br&gt;&amp;nbsp;pgsql-cygwin
&lt;br&gt;&amp;nbsp;pgsql-ports
&lt;br&gt;&lt;br&gt;2. The archives must, obviously, survive the kill, and still be
&lt;br&gt;fetchable via rsync to the archives server.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Alvaro Herrera &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://www.CommandPrompt.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.CommandPrompt.com/&lt;/a&gt;&lt;br&gt;The PostgreSQL Company - Command Prompt, Inc.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771595&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19771595.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19771404</id>
	<title>Re: still alive?</title>
	<published>2008-10-01T17:08:00Z</published>
	<updated>2008-10-01T17:08:00Z</updated>
	<author>
		<name>Joshua D. Drake</name>
	</author>
	<content type="html">On Wed, 1 Oct 2008 20:05:00 -0400 (EDT)
&lt;br&gt;Bruce Momjian &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771404&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;bruce@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Marc, care to do the honors?
&lt;br&gt;&lt;br&gt;KILL IT!!!!!! :P
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; ---------------------------------------------------------------------------
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Peter Eisentraut wrote:
&lt;br&gt;&amp;gt; &amp;gt; Simon Riggs wrote:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; On Thu, 2008-09-11 at 15:39 +0300, Peter Eisentraut wrote:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt; Bruce Momjian wrote:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt;&amp;gt; Abhijit Menon-Sen wrote:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt;&amp;gt;&amp;gt; I thought -patches was supposed to die. What happened?
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt;&amp;gt; I was wondering the same thing. &amp;nbsp;Peter?
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt; Hmm, let's try this:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt; Anyone who thinks the patches list should remain as separate
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt;&amp;gt; from hackers, shout now (with rationale)!
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; Kill it now, long enough before the next patchfest for it to
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; stick.
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; I think what we need now, for patches, ports, and the others, is
&lt;br&gt;&amp;gt; &amp;gt; someone to actually kill it. &amp;nbsp;All the talk has been talked,
&lt;br&gt;&amp;gt; &amp;gt; everything has been decided, now someone with the right permission
&lt;br&gt;&amp;gt; &amp;gt; bits just turn it off.
&lt;br&gt;&amp;gt; 
&lt;/div&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;The PostgreSQL Company since 1997: &lt;a href=&quot;http://www.commandprompt.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.commandprompt.com/&lt;/a&gt;&amp;nbsp;
&lt;br&gt;PostgreSQL Community Conference: &lt;a href=&quot;http://www.postgresqlconference.org/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresqlconference.org/&lt;/a&gt;&lt;br&gt;United States PostgreSQL Association: &lt;a href=&quot;http://www.postgresql.us/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.us/&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771404&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19771404.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19771373</id>
	<title>Re: still alive?</title>
	<published>2008-10-01T17:05:00Z</published>
	<updated>2008-10-01T17:05:00Z</updated>
	<author>
		<name>Bruce Momjian-5</name>
	</author>
	<content type="html">&lt;br&gt;Marc, care to do the honors?
&lt;br&gt;&lt;br&gt;---------------------------------------------------------------------------
&lt;br&gt;&lt;br&gt;Peter Eisentraut wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs wrote:
&lt;br&gt;&amp;gt; &amp;gt; On Thu, 2008-09-11 at 15:39 +0300, Peter Eisentraut wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Bruce Momjian wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; Abhijit Menon-Sen wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt;&amp;gt; I thought -patches was supposed to die. What happened?
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; I was wondering the same thing. &amp;nbsp;Peter?
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Hmm, let's try this:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Anyone who thinks the patches list should remain as separate from 
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; hackers, shout now (with rationale)!
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; Kill it now, long enough before the next patchfest for it to stick.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I think what we need now, for patches, ports, and the others, is someone 
&lt;br&gt;&amp;gt; to actually kill it. &amp;nbsp;All the talk has been talked, everything has been 
&lt;br&gt;&amp;gt; decided, now someone with the right permission bits just turn it off.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp; Bruce Momjian &amp;nbsp;&amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771373&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;bruce@...&lt;/a&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://momjian.us&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://momjian.us&lt;/a&gt;&lt;br&gt;&amp;nbsp; EnterpriseDB &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://enterprisedb.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://enterprisedb.com&lt;/a&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp; + If your life is a hard drive, Christ can be your backup. +
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771373&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19771373.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19771370</id>
	<title>Re: still alive?</title>
	<published>2008-10-01T17:04:40Z</published>
	<updated>2008-10-01T17:04:40Z</updated>
	<author>
		<name>Bruce Momjian-5</name>
	</author>
	<content type="html">Peter Eisentraut wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs wrote:
&lt;br&gt;&amp;gt; &amp;gt; On Thu, 2008-09-11 at 15:39 +0300, Peter Eisentraut wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Bruce Momjian wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; Abhijit Menon-Sen wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt;&amp;gt; I thought -patches was supposed to die. What happened?
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; I was wondering the same thing. &amp;nbsp;Peter?
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Hmm, let's try this:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Anyone who thinks the patches list should remain as separate from 
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; hackers, shout now (with rationale)!
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; Kill it now, long enough before the next patchfest for it to stick.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I think what we need now, for patches, ports, and the others, is someone 
&lt;br&gt;&amp;gt; to actually kill it. &amp;nbsp;All the talk has been talked, everything has been 
&lt;br&gt;&amp;gt; decided, now someone with the right permission bits just turn it off.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp; Bruce Momjian &amp;nbsp;&amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771370&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;bruce@...&lt;/a&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://momjian.us&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://momjian.us&lt;/a&gt;&lt;br&gt;&amp;nbsp; EnterpriseDB &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://enterprisedb.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://enterprisedb.com&lt;/a&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp; + If your life is a hard drive, Christ can be your backup. +
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771370&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19771370.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19771265</id>
	<title>Re: still alive?</title>
	<published>2008-10-01T16:53:10Z</published>
	<updated>2008-10-01T16:53:10Z</updated>
	<author>
		<name>Peter Eisentraut-2</name>
	</author>
	<content type="html">Simon Riggs wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; On Thu, 2008-09-11 at 15:39 +0300, Peter Eisentraut wrote:
&lt;br&gt;&amp;gt;&amp;gt; Bruce Momjian wrote:
&lt;br&gt;&amp;gt;&amp;gt;&amp;gt; Abhijit Menon-Sen wrote:
&lt;br&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt; I thought -patches was supposed to die. What happened?
&lt;br&gt;&amp;gt;&amp;gt;&amp;gt; I was wondering the same thing. &amp;nbsp;Peter?
&lt;br&gt;&amp;gt;&amp;gt; Hmm, let's try this:
&lt;br&gt;&amp;gt;&amp;gt;
&lt;br&gt;&amp;gt;&amp;gt; Anyone who thinks the patches list should remain as separate from 
&lt;br&gt;&amp;gt;&amp;gt; hackers, shout now (with rationale)!
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Kill it now, long enough before the next patchfest for it to stick.
&lt;/div&gt;&lt;br&gt;I think what we need now, for patches, ports, and the others, is someone 
&lt;br&gt;to actually kill it. &amp;nbsp;All the talk has been talked, everything has been 
&lt;br&gt;decided, now someone with the right permission bits just turn it off.
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19771265&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19771265.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19763094</id>
	<title>Re: libpq not linked against libgssapi</title>
	<published>2008-10-01T08:35:50Z</published>
	<updated>2008-10-01T08:35:50Z</updated>
	<author>
		<name>Magnus Hagander-2</name>
	</author>
	<content type="html">Markus Schaaf wrote:
&lt;br&gt;&amp;gt; src/interfaces/libpq/Makefile is missing a reference to libgssapi.
&lt;br&gt;&amp;gt; The problem occurred while compiling 8.3.3 on NetBSD --with-gssapi,
&lt;br&gt;&amp;gt; and is still present in HEAD. A patch is attached.
&lt;br&gt;&lt;br&gt;Applied and backpatched to 8.3.
&lt;br&gt;&lt;br&gt;Thanks!
&lt;br&gt;&lt;br&gt;//Magnus
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19763094&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/libpq-not-linked-against-libgssapi-tp19762208p19763094.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19762208</id>
	<title>libpq not linked against libgssapi</title>
	<published>2008-10-01T07:13:24Z</published>
	<updated>2008-10-01T07:13:24Z</updated>
	<author>
		<name>Markus Schaaf</name>
	</author>
	<content type="html">src/interfaces/libpq/Makefile is missing a reference to libgssapi.
&lt;br&gt;The problem occurred while compiling 8.3.3 on NetBSD --with-gssapi,
&lt;br&gt;and is still present in HEAD. A patch is attached.
&lt;br&gt;&lt;br /&gt;Index: src/interfaces/libpq/Makefile
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/Makefile,v
&lt;br&gt;retrieving revision 1.167
&lt;br&gt;diff -b -u -r1.167 Makefile
&lt;br&gt;--- src/interfaces/libpq/Makefile	17 Sep 2008 04:31:08 -0000	1.167
&lt;br&gt;+++ src/interfaces/libpq/Makefile	1 Oct 2008 13:57:49 -0000
&lt;br&gt;@@ -56,7 +56,7 @@
&lt;br&gt;&amp;nbsp;# shared library link. &amp;nbsp;(The order in which you list them here doesn't
&lt;br&gt;&amp;nbsp;# matter.)
&lt;br&gt;&amp;nbsp;ifneq ($(PORTNAME), win32)
&lt;br&gt;-SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi_krb5 -lgss -lssl -lsocket -lnsl -lresolv -lintl, $(LIBS)) $(LDAP_LIBS_FE) $(PTHREAD_LIBS)
&lt;br&gt;+SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi_krb5 -lgss -lgssapi -lssl -lsocket -lnsl -lresolv -lintl, $(LIBS)) $(LDAP_LIBS_FE) $(PTHREAD_LIBS)
&lt;br&gt;&amp;nbsp;else
&lt;br&gt;&amp;nbsp;SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi32 -lssl -lsocket -lnsl -lresolv -lintl $(PTHREAD_LIBS), $(LIBS)) $(LDAP_LIBS_FE)
&lt;br&gt;&amp;nbsp;endif
&lt;br&gt;&lt;br /&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19762208&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/libpq-not-linked-against-libgssapi-tp19762208p19762208.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19751854</id>
	<title>Infrastructure changes for recovery (v8)</title>
	<published>2008-09-30T15:52:31Z</published>
	<updated>2008-09-30T15:52:31Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">Patch now includes all previous agreed changes, plus I've found what
&lt;br&gt;looks to be a workable method of removing the shutdown checkpoint
&lt;br&gt;without loss of robustness. 
&lt;br&gt;&lt;br&gt;Patch summary
&lt;br&gt;&lt;br&gt;Tuning 
&lt;br&gt;* Bgwriter performs dirty block cleaning during recovery
&lt;br&gt;* Bgwriter performs restartpoints, offloading this task from Startup
&lt;br&gt;process to allow it to continue with recovery actions 
&lt;br&gt;* Shutdown checkpoint removed at end of recovery. Bgwriter performs
&lt;br&gt;immediate checkpoint instead, so we have same protection, but
&lt;br&gt;connections and transactions can be started earlier than previously.
&lt;br&gt;* PreAllocXLogs() not performed by startup process, so we do not delay
&lt;br&gt;startup while we write zeroes to next WAL file. bgwriter does that now.
&lt;br&gt;* XLogCtl structure padding for enhanced scalability
&lt;br&gt;&lt;br&gt;Recovery State Changes
&lt;br&gt;* If archive recovery proceeds past a safe stopping point we signal the
&lt;br&gt;postmaster that database is now in a consistent state, PM_RECOVERY. This
&lt;br&gt;state change is also linked to startup of the bgwriter and stats
&lt;br&gt;processes (and will in the future be the place where read only backends
&lt;br&gt;may connect also)
&lt;br&gt;* optional recovery_safe_start_location parameter now provided in
&lt;br&gt;recovery.conf, to allow a consistency point to be manually defined if a
&lt;br&gt;base backup was not taken using standard pg_start/stop backup functions
&lt;br&gt;* New minSafeStopPoint added to controlfile to allow us to determine
&lt;br&gt;consistency if archive recovery crashes/restarts. Value is updated each
&lt;br&gt;time we access new WAL file.
&lt;br&gt;* stats file removed earlier in recovery, so we may accumulate new stats
&lt;br&gt;during recovery
&lt;br&gt;* End of recovery is now marked by a clear global state change. Change
&lt;br&gt;is global, atomic and fast - tested for using IsRecoveryProcessingMode()
&lt;br&gt;&lt;br&gt;Additional Safeguards
&lt;br&gt;* Locks are placed around all ControlFile operations
&lt;br&gt;* XLogInsert() and AssignTransactionId() now have specific checks to
&lt;br&gt;prevent their use during recovery
&lt;br&gt;* Makes StartupMultiXact() atomic. Adds comments to show that
&lt;br&gt;StartCLOG() is already atomic, though StartupSUBTRANS() is not (this
&lt;br&gt;will be addressed in a later patch, so not touched here)
&lt;br&gt;* recovery.conf is not removed until slightly later now, to protect
&lt;br&gt;against crash at the end of startup
&lt;br&gt;* New WAL record XLOG_RECOVERY_END is now only place where timelineid
&lt;br&gt;may change
&lt;br&gt;&lt;br&gt;Other Changes
&lt;br&gt;* log_restartpoints removed, use log_checkpoints in postgresql.conf
&lt;br&gt;* pg_controldata and pg_resetxlog changed to show safe start point
&lt;br&gt;* designed to work in EXEC_BACKEND mode for Windows
&lt;br&gt;* additional function signature for pg_start_backup('label', true |
&lt;br&gt;false) to allow definition of immediate checkpoint/not
&lt;br&gt;* doc changes for recovery.conf parameters
&lt;br&gt;* fixes bug discovered while other testing: if pg_stop_backup() is run
&lt;br&gt;when xlogswitch has just occurred then we do not switch log files, yet
&lt;br&gt;we return current filename even though nothing of value in it. If
&lt;br&gt;archive_timeout not enabled we would wait forever for pg_stop_backup()
&lt;br&gt;to return. 
&lt;br&gt;* Substantial comments throughout
&lt;br&gt;&lt;br&gt;Patch is now v8.
&lt;br&gt;&lt;br&gt;&amp;nbsp;doc/src/sgml/backup.sgml &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; 30 !
&lt;br&gt;&amp;nbsp;doc/src/sgml/func.sgml &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; 12 
&lt;br&gt;&amp;nbsp;src/backend/access/transam/clog.c &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;3 
&lt;br&gt;&amp;nbsp;src/backend/access/transam/multixact.c &amp;nbsp; | &amp;nbsp; 14 
&lt;br&gt;&amp;nbsp;src/backend/access/transam/subtrans.c &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;3 
&lt;br&gt;&amp;nbsp;src/backend/access/transam/xact.c &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;3 
&lt;br&gt;&amp;nbsp;src/backend/access/transam/xlog.c &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp;783 ++++++++++++++-!!!!!!!!!!!!!!!
&lt;br&gt;&amp;nbsp;src/backend/postmaster/bgwriter.c &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp;418 +++--!!!!!!!!!
&lt;br&gt;&amp;nbsp;src/backend/postmaster/postmaster.c &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; 62 +!
&lt;br&gt;&amp;nbsp;src/backend/storage/buffer/README &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;9 
&lt;br&gt;&amp;nbsp;src/bin/pg_controldata/pg_controldata.c &amp;nbsp;| &amp;nbsp; &amp;nbsp;3 
&lt;br&gt;&amp;nbsp;src/bin/pg_resetxlog/pg_resetxlog.c &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;2 
&lt;br&gt;&amp;nbsp;src/include/access/xlog.h &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; 14 
&lt;br&gt;&amp;nbsp;src/include/access/xlog_internal.h &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; &amp;nbsp;4 
&lt;br&gt;&amp;nbsp;src/include/catalog/pg_control.h &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; &amp;nbsp;3 
&lt;br&gt;&amp;nbsp;src/include/postmaster/bgwriter.h &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| &amp;nbsp; &amp;nbsp;6 
&lt;br&gt;&amp;nbsp;src/include/storage/pmsignal.h &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; | &amp;nbsp; &amp;nbsp;1 
&lt;br&gt;&amp;nbsp;src/test/regress/expected/opr_sanity.out | &amp;nbsp; &amp;nbsp;7 
&lt;br&gt;&amp;nbsp;18 files changed, 579 insertions(+), 79 deletions(-), 719 modifications(!)
&lt;br&gt;&lt;br&gt;Please review everybody. Many thanks.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br /&gt;&lt;tt&gt;[recovery_infrastruc.v8.patch]&lt;/tt&gt;&lt;br /&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;tt&gt;Index: doc/src/sgml/backup.sgml
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/backup.sgml,v
&lt;br&gt;retrieving revision 2.120
&lt;br&gt;diff -c -r2.120 backup.sgml
&lt;br&gt;*** doc/src/sgml/backup.sgml	18 Jul 2008 17:33:17 -0000	2.120
&lt;br&gt;--- doc/src/sgml/backup.sgml	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 1200,1205 ****
&lt;br&gt;--- 1200,1229 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/listitem&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/varlistentry&amp;gt;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;varlistentry id=&amp;quot;recovery-safe-start-location&amp;quot;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xreflabel=&amp;quot;recovery_safe_start_location&amp;quot;&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;term&amp;gt;&amp;lt;varname&amp;gt;recovery_safe_start_location&amp;lt;/varname&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (&amp;lt;type&amp;gt;string&amp;lt;/type&amp;gt;)
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/term&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;listitem&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;para&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Allows user to optionally specify a safe start location for a base
&lt;br&gt;+ 		backup that was not made online using &amp;lt;function&amp;gt;pg_start_backup()&amp;lt;/&amp;gt; 
&lt;br&gt;+ 		and &amp;lt;function&amp;gt;pg_stop_backup()&amp;lt;/&amp;gt;. &amp;nbsp;If those functions were used, 
&lt;br&gt;+ 		this parameter need not be set because the server sets this for you
&lt;br&gt;+ 		automatically to avoid error. &amp;nbsp;You cannot use this parameter to move
&lt;br&gt;+ 		the safe stopping point to an earlier transaction log location. The
&lt;br&gt;+ 		format for this parameter is identical to the output of 
&lt;br&gt;+ 		&amp;lt;function&amp;gt;pg_current_xlog_insert_location()&amp;lt;/&amp;gt;, example: 
&lt;br&gt;+ &amp;lt;programlisting&amp;gt;
&lt;br&gt;+ recovery_safe_start_location = '0/D4445B8'
&lt;br&gt;+ &amp;lt;/programlisting&amp;gt;
&lt;br&gt;+ 		The location always has a forward slash, even on Windows, since it
&lt;br&gt;+ 		is not a file path.
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/para&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/listitem&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/varlistentry&amp;gt;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;varlistentry id=&amp;quot;log-restartpoints&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xreflabel=&amp;quot;log_restartpoints&amp;quot;&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;term&amp;gt;&amp;lt;varname&amp;gt;log_restartpoints&amp;lt;/varname&amp;gt;
&lt;br&gt;***************
&lt;br&gt;*** 1207,1215 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/term&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;listitem&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;para&amp;gt;
&lt;br&gt;! &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Specifies whether to log each restart point as it occurs. This
&lt;br&gt;! &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; can be helpful to track the progress of a long recovery.
&lt;br&gt;! &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Default is &amp;lt;literal&amp;gt;false&amp;lt;/&amp;gt;.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/para&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/listitem&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/varlistentry&amp;gt;
&lt;br&gt;--- 1231,1239 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/term&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;listitem&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;para&amp;gt;
&lt;br&gt;! &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; This parameter has now been deprecated. Instead, please set
&lt;br&gt;! 		&amp;lt;varname&amp;gt;log_checkpoints&amp;lt;/varname&amp;gt; in &amp;lt;filename&amp;gt;postgresql.conf&amp;lt;/&amp;gt;
&lt;br&gt;! 		if you want similar log entries during recovery.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/para&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/listitem&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/varlistentry&amp;gt;
&lt;br&gt;Index: doc/src/sgml/func.sgml
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/func.sgml,v
&lt;br&gt;retrieving revision 1.447
&lt;br&gt;diff -c -r1.447 func.sgml
&lt;br&gt;*** doc/src/sgml/func.sgml	11 Sep 2008 17:32:33 -0000	1.447
&lt;br&gt;--- doc/src/sgml/func.sgml	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 12262,12267 ****
&lt;br&gt;--- 12262,12275 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/row&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;row&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;entry&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;literal&amp;gt;&amp;lt;function&amp;gt;pg_start_backup&amp;lt;/function&amp;gt;(&amp;lt;parameter&amp;gt;label&amp;lt;/&amp;gt; &amp;lt;type&amp;gt;text&amp;lt;/&amp;gt;)&amp;lt;/literal&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/entry&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;entry&amp;gt;&amp;lt;type&amp;gt;text&amp;lt;/type&amp;gt;, &amp;lt;type&amp;gt;boolean&amp;lt;/type&amp;gt;&amp;lt;/entry&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;entry&amp;gt;Set up for performing on-line backup, specifying if
&lt;br&gt;+ 		we want an immediate checkpoint or not.&amp;lt;/entry&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/row&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;row&amp;gt;
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;entry&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;literal&amp;gt;&amp;lt;function&amp;gt;pg_stop_backup&amp;lt;/function&amp;gt;()&amp;lt;/literal&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/entry&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;entry&amp;gt;&amp;lt;type&amp;gt;text&amp;lt;/type&amp;gt;&amp;lt;/entry&amp;gt;
&lt;br&gt;***************
&lt;br&gt;*** 12333,12338 ****
&lt;br&gt;--- 12341,12350 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; interest). &amp;nbsp;After noting the ending location, the current transaction log insertion
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; point is automatically advanced to the next transaction log file, so that the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; ending transaction log file can be archived immediately to complete the backup.
&lt;br&gt;+ 	&amp;lt;function&amp;gt;pg_start_backup&amp;lt;/&amp;gt; issues a checkpoint while we wait. 
&lt;br&gt;+ 	&amp;lt;function&amp;gt;pg_start_backup&amp;lt;/&amp;gt; can also be specified with two parameters,
&lt;br&gt;+ 	the second parameter defining whether the checkpoint is an immediate
&lt;br&gt;+ 	checkpoint or whether we write out buffers smoothly over a short period.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;/para&amp;gt;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;lt;para&amp;gt;
&lt;br&gt;Index: src/backend/access/transam/clog.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
&lt;br&gt;retrieving revision 1.47
&lt;br&gt;diff -c -r1.47 clog.c
&lt;br&gt;*** src/backend/access/transam/clog.c	1 Aug 2008 13:16:08 -0000	1.47
&lt;br&gt;--- src/backend/access/transam/clog.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 260,265 ****
&lt;br&gt;--- 260,268 ----
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* This must be called ONCE during postmaster or standalone-backend startup,
&lt;br&gt;&amp;nbsp; &amp;nbsp;* after StartupXLOG has initialized ShmemVariableCache-&amp;gt;nextXid.
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;* We access just a single clog page, so this action is atomic and safe
&lt;br&gt;+ &amp;nbsp;* for use if other processes are active during recovery.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; StartupCLOG(void)
&lt;br&gt;Index: src/backend/access/transam/multixact.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/multixact.c,v
&lt;br&gt;retrieving revision 1.28
&lt;br&gt;diff -c -r1.28 multixact.c
&lt;br&gt;*** src/backend/access/transam/multixact.c	1 Aug 2008 13:16:08 -0000	1.28
&lt;br&gt;--- src/backend/access/transam/multixact.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 1413,1420 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;* MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact.	Note that we
&lt;br&gt;&amp;nbsp; &amp;nbsp;* may already have replayed WAL data into the SLRU files.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* We don't need any locks here, really; the SLRU locks are taken
&lt;br&gt;! &amp;nbsp;* only because slru.c expects to be called with locks held.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; StartupMultiXact(void)
&lt;br&gt;--- 1413,1423 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;* MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact.	Note that we
&lt;br&gt;&amp;nbsp; &amp;nbsp;* may already have replayed WAL data into the SLRU files.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* We want this operation to be atomic to ensure that other processes can 
&lt;br&gt;! &amp;nbsp;* use MultiXact while we complete recovery. We access one page only from the
&lt;br&gt;! &amp;nbsp;* offset and members buffers, so once locks are acquired they will not be
&lt;br&gt;! &amp;nbsp;* dropped and re-acquired by SLRU code. So we take both locks at start, then
&lt;br&gt;! &amp;nbsp;* hold them all the way to the end.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; StartupMultiXact(void)
&lt;br&gt;***************
&lt;br&gt;*** 1426,1431 ****
&lt;br&gt;--- 1429,1435 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Clean up offsets state */
&lt;br&gt;&amp;nbsp; 	LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
&lt;br&gt;+ 	LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Initialize our idea of the latest page number.
&lt;br&gt;***************
&lt;br&gt;*** 1452,1461 ****
&lt;br&gt;&amp;nbsp; 		MultiXactOffsetCtl-&amp;gt;shared-&amp;gt;page_dirty[slotno] = true;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 	LWLockRelease(MultiXactOffsetControlLock);
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 	/* And the same for members */
&lt;br&gt;- 	LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Initialize our idea of the latest page number.
&lt;br&gt;--- 1456,1462 ----
&lt;br&gt;***************
&lt;br&gt;*** 1483,1488 ****
&lt;br&gt;--- 1484,1490 ----
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	LWLockRelease(MultiXactMemberControlLock);
&lt;br&gt;+ 	LWLockRelease(MultiXactOffsetControlLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Initialize lastTruncationPoint to invalid, ensuring that the first
&lt;br&gt;***************
&lt;br&gt;*** 1543,1549 ****
&lt;br&gt;&amp;nbsp; 	 * SimpleLruTruncate would get confused. &amp;nbsp;It seems best not to risk
&lt;br&gt;&amp;nbsp; 	 * removing any data during recovery anyway, so don't truncate.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (!InRecovery)
&lt;br&gt;&amp;nbsp; 		TruncateMultiXact();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
&lt;br&gt;--- 1545,1551 ----
&lt;br&gt;&amp;nbsp; 	 * SimpleLruTruncate would get confused. &amp;nbsp;It seems best not to risk
&lt;br&gt;&amp;nbsp; 	 * removing any data during recovery anyway, so don't truncate.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (!IsRecoveryProcessingMode())
&lt;br&gt;&amp;nbsp; 		TruncateMultiXact();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
&lt;br&gt;Index: src/backend/access/transam/subtrans.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/subtrans.c,v
&lt;br&gt;retrieving revision 1.23
&lt;br&gt;diff -c -r1.23 subtrans.c
&lt;br&gt;*** src/backend/access/transam/subtrans.c	1 Aug 2008 13:16:08 -0000	1.23
&lt;br&gt;--- src/backend/access/transam/subtrans.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 226,231 ****
&lt;br&gt;--- 226,234 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
&lt;br&gt;&amp;nbsp; &amp;nbsp;* if there are none.
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;* Note that this is not atomic and is not yet safe to perform while other
&lt;br&gt;+ &amp;nbsp;* processes might access subtrans.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; StartupSUBTRANS(TransactionId oldestActiveXID)
&lt;br&gt;Index: src/backend/access/transam/xact.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
&lt;br&gt;retrieving revision 1.265
&lt;br&gt;diff -c -r1.265 xact.c
&lt;br&gt;*** src/backend/access/transam/xact.c	11 Aug 2008 11:05:10 -0000	1.265
&lt;br&gt;--- src/backend/access/transam/xact.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 393,398 ****
&lt;br&gt;--- 393,401 ----
&lt;br&gt;&amp;nbsp; 	bool		isSubXact = (s-&amp;gt;parent != NULL);
&lt;br&gt;&amp;nbsp; 	ResourceOwner currentOwner;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	if (IsRecoveryProcessingMode())
&lt;br&gt;+ 		elog(FATAL, &amp;quot;cannot assign TransactionIds during recovery&amp;quot;);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/* Assert that caller didn't screw up */
&lt;br&gt;&amp;nbsp; 	Assert(!TransactionIdIsValid(s-&amp;gt;transactionId));
&lt;br&gt;&amp;nbsp; 	Assert(s-&amp;gt;state == TRANS_INPROGRESS);
&lt;br&gt;Index: src/backend/access/transam/xlog.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
&lt;br&gt;retrieving revision 1.319
&lt;br&gt;diff -c -r1.319 xlog.c
&lt;br&gt;*** src/backend/access/transam/xlog.c	23 Sep 2008 09:20:35 -0000	1.319
&lt;br&gt;--- src/backend/access/transam/xlog.c	30 Sep 2008 22:32:49 -0000
&lt;br&gt;***************
&lt;br&gt;*** 113,119 ****
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ThisTimeLineID will be same in all backends --- it identifies current
&lt;br&gt;! &amp;nbsp;* WAL timeline for the database system.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; TimeLineID	ThisTimeLineID = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 113,120 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ThisTimeLineID will be same in all backends --- it identifies current
&lt;br&gt;! &amp;nbsp;* WAL timeline for the database system. Zero is always a bug, so we 
&lt;br&gt;! &amp;nbsp;* start with that to allow us to spot any errors.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; TimeLineID	ThisTimeLineID = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 123,128 ****
&lt;br&gt;--- 124,133 ----
&lt;br&gt;&amp;nbsp; /* Are we recovering using offline XLOG archives? */
&lt;br&gt;&amp;nbsp; static bool InArchiveRecovery = false;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* Local copy of shared RecoveryProcessingMode state */
&lt;br&gt;+ static bool LocalRecoveryProcessingMode = true;
&lt;br&gt;+ static bool knownProcessingMode = false;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; /* Was the last xlog file restored from archive, or local? */
&lt;br&gt;&amp;nbsp; static bool restoredFromArchive = false;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 131,137 ****
&lt;br&gt;&amp;nbsp; static bool recoveryTarget = false;
&lt;br&gt;&amp;nbsp; static bool recoveryTargetExact = false;
&lt;br&gt;&amp;nbsp; static bool recoveryTargetInclusive = true;
&lt;br&gt;- static bool recoveryLogRestartpoints = false;
&lt;br&gt;&amp;nbsp; static TransactionId recoveryTargetXid;
&lt;br&gt;&amp;nbsp; static TimestampTz recoveryTargetTime;
&lt;br&gt;&amp;nbsp; static TimestampTz recoveryLastXTime = 0;
&lt;br&gt;--- 136,141 ----
&lt;br&gt;***************
&lt;br&gt;*** 141,146 ****
&lt;br&gt;--- 145,153 ----
&lt;br&gt;&amp;nbsp; static TimestampTz recoveryStopTime;
&lt;br&gt;&amp;nbsp; static bool recoveryStopAfter;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* is the database proven consistent yet? */
&lt;br&gt;+ bool	reachedSafeStartPoint = false;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* During normal operation, the only timeline we care about is ThisTimeLineID.
&lt;br&gt;&amp;nbsp; &amp;nbsp;* During recovery, however, things are more complicated. &amp;nbsp;To simplify life
&lt;br&gt;***************
&lt;br&gt;*** 240,248 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ControlFileLock: must be held to read/update control file or create
&lt;br&gt;&amp;nbsp; &amp;nbsp;* new log file.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* CheckpointLock: must be held to do a checkpoint (ensures only one
&lt;br&gt;! &amp;nbsp;* checkpointer at a time; currently, with all checkpoints done by the
&lt;br&gt;! &amp;nbsp;* bgwriter, this is just pro forma).
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*----------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;--- 247,256 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ControlFileLock: must be held to read/update control file or create
&lt;br&gt;&amp;nbsp; &amp;nbsp;* new log file.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* CheckpointLock: must be held to do a checkpoint or restartpoint, ensuring
&lt;br&gt;! &amp;nbsp;* we get just one of those at any time. In 8.4+ recovery, both startup and
&lt;br&gt;! &amp;nbsp;* bgwriter processes may take restartpoints, so this locking must be strict 
&lt;br&gt;! &amp;nbsp;* to ensure there are no mistakes.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*----------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;***************
&lt;br&gt;*** 285,295 ****
&lt;br&gt;--- 293,310 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Total shared-memory state for XLOG.
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;* This small structure is accessed by many backends, so we take care to
&lt;br&gt;+ &amp;nbsp;* pad out the parts of the structure so they can be accessed by separate
&lt;br&gt;+ &amp;nbsp;* CPUs without causing false sharing cache flushes. Padding is generous
&lt;br&gt;+ &amp;nbsp;* to allow for a wide variety of CPU architectures.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;+ #define	XLOGCTL_BUFFER_SPACING	128
&lt;br&gt;&amp;nbsp; typedef struct XLogCtlData
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	/* Protected by WALInsertLock: */
&lt;br&gt;&amp;nbsp; 	XLogCtlInsert Insert;
&lt;br&gt;+ 	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Protected by info_lck: */
&lt;br&gt;&amp;nbsp; 	XLogwrtRqst LogwrtRqst;
&lt;br&gt;***************
&lt;br&gt;*** 297,305 ****
&lt;br&gt;--- 312,327 ----
&lt;br&gt;&amp;nbsp; 	uint32		ckptXidEpoch;	/* nextXID &amp; epoch of latest checkpoint */
&lt;br&gt;&amp;nbsp; 	TransactionId ckptXid;
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
&lt;br&gt;+ 	/* add data structure padding for above info_lck declarations */
&lt;br&gt;+ 	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
&lt;br&gt;+ 											- sizeof(XLogwrtResult)
&lt;br&gt;+ 											- sizeof(uint32)
&lt;br&gt;+ 											- sizeof(TransactionId)
&lt;br&gt;+ 											- sizeof(XLogRecPtr)];
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Protected by WALWriteLock: */
&lt;br&gt;&amp;nbsp; 	XLogCtlWrite Write;
&lt;br&gt;+ 	char	WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * These values do not change after startup, although the pointed-to pages
&lt;br&gt;***************
&lt;br&gt;*** 311,316 ****
&lt;br&gt;--- 333,356 ----
&lt;br&gt;&amp;nbsp; 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
&lt;br&gt;&amp;nbsp; 	TimeLineID	ThisTimeLineID;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * IsRecoveryProcessingMode shows whether the postmaster is in a
&lt;br&gt;+ 	 * postmaster state earlier than PM_RUN, or not. This is a globally
&lt;br&gt;+ 	 * accessible state to allow EXEC_BACKEND case.
&lt;br&gt;+ 	 *
&lt;br&gt;+ 	 * We also retain a local state variable InRecovery. InRecovery=true
&lt;br&gt;+ 	 * means the code is being executed by Startup process and therefore
&lt;br&gt;+ 	 * always during Recovery Processing Mode. This allows us to identify
&lt;br&gt;+ 	 * code executed *during* Recovery Processing Mode but not necessarily
&lt;br&gt;+ 	 * by Startup process itself.
&lt;br&gt;+ 	 *
&lt;br&gt;+ 	 * Protected by mode_lck
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	bool		SharedRecoveryProcessingMode;
&lt;br&gt;+ 	slock_t		mode_lck;
&lt;br&gt;+ 
&lt;br&gt;+ 	char		InfoLockPadding[XLOGCTL_BUFFER_SPACING];
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	slock_t		info_lck;		/* locks shared variables shown above */
&lt;br&gt;&amp;nbsp; } XLogCtlData;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 397,404 ****
&lt;br&gt;--- 437,446 ----
&lt;br&gt;&amp;nbsp; static void readRecoveryCommandFile(void);
&lt;br&gt;&amp;nbsp; static void exitArchiveRecovery(TimeLineID endTLI,
&lt;br&gt;&amp;nbsp; 					uint32 endLogId, uint32 endLogSeg);
&lt;br&gt;+ static void exitRecovery(void);
&lt;br&gt;&amp;nbsp; static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
&lt;br&gt;&amp;nbsp; static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
&lt;br&gt;+ static XLogRecPtr GetRedoLocationForCheckpoint(void);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; static bool XLogCheckBuffer(XLogRecData *rdata, bool doPageWrites,
&lt;br&gt;&amp;nbsp; 				XLogRecPtr *lsn, BkpBlock *bkpb);
&lt;br&gt;***************
&lt;br&gt;*** 480,485 ****
&lt;br&gt;--- 522,532 ----
&lt;br&gt;&amp;nbsp; 	bool		updrqst;
&lt;br&gt;&amp;nbsp; 	bool		doPageWrites;
&lt;br&gt;&amp;nbsp; 	bool		isLogSwitch = (rmid == RM_XLOG_ID &amp;&amp; info == XLOG_SWITCH);
&lt;br&gt;+ 	bool		isRecoveryEnd = (rmid == RM_XLOG_ID &amp;&amp; info == XLOG_RECOVERY_END);
&lt;br&gt;+ 
&lt;br&gt;+ 	/* cross-check on whether we should be here or not */
&lt;br&gt;+ 	if (IsRecoveryProcessingMode() &amp;&amp; !isRecoveryEnd)
&lt;br&gt;+ 		elog(FATAL, &amp;quot;cannot make new WAL entries during recovery&amp;quot;);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* info's high bits are reserved for use by me */
&lt;br&gt;&amp;nbsp; 	if (info &amp; XLR_INFO_MASK)
&lt;br&gt;***************
&lt;br&gt;*** 1720,1727 ****
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	WriteRqstPtr;
&lt;br&gt;&amp;nbsp; 	XLogwrtRqst WriteRqst;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/* Disabled during REDO */
&lt;br&gt;! 	if (InRedo)
&lt;br&gt;&amp;nbsp; 		return;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Quick exit if already known flushed */
&lt;br&gt;--- 1767,1773 ----
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	WriteRqstPtr;
&lt;br&gt;&amp;nbsp; 	XLogwrtRqst WriteRqst;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	if (IsRecoveryProcessingMode())
&lt;br&gt;&amp;nbsp; 		return;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Quick exit if already known flushed */
&lt;br&gt;***************
&lt;br&gt;*** 1809,1817 ****
&lt;br&gt;&amp;nbsp; 	 * the bad page is encountered again during recovery then we would be
&lt;br&gt;&amp;nbsp; 	 * unable to restart the database at all! &amp;nbsp;(This scenario has actually
&lt;br&gt;&amp;nbsp; 	 * happened in the field several times with 7.1 releases. Note that we
&lt;br&gt;! 	 * cannot get here while InRedo is true, but if the bad page is brought in
&lt;br&gt;! 	 * and marked dirty during recovery then CreateCheckPoint will try to
&lt;br&gt;! 	 * flush it at the end of recovery.)
&lt;br&gt;&amp;nbsp; 	 *
&lt;br&gt;&amp;nbsp; 	 * The current approach is to ERROR under normal conditions, but only
&lt;br&gt;&amp;nbsp; 	 * WARNING during recovery, so that the system can be brought up even if
&lt;br&gt;--- 1855,1863 ----
&lt;br&gt;&amp;nbsp; 	 * the bad page is encountered again during recovery then we would be
&lt;br&gt;&amp;nbsp; 	 * unable to restart the database at all! &amp;nbsp;(This scenario has actually
&lt;br&gt;&amp;nbsp; 	 * happened in the field several times with 7.1 releases. Note that we
&lt;br&gt;! 	 * cannot get here while IsRecoveryProcessingMode(), but if the bad page is
&lt;br&gt;! 	 * brought in and marked dirty during recovery then if a checkpoint were
&lt;br&gt;! 	 * performed at the end of recovery it will try to flush it.
&lt;br&gt;&amp;nbsp; 	 *
&lt;br&gt;&amp;nbsp; 	 * The current approach is to ERROR under normal conditions, but only
&lt;br&gt;&amp;nbsp; 	 * WARNING during recovery, so that the system can be brought up even if
&lt;br&gt;***************
&lt;br&gt;*** 1821,1827 ****
&lt;br&gt;&amp;nbsp; 	 * and so we will not force a restart for a bad LSN on a data page.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	if (XLByteLT(LogwrtResult.Flush, record))
&lt;br&gt;! 		elog(InRecovery ? WARNING : ERROR,
&lt;br&gt;&amp;nbsp; 		&amp;quot;xlog flush request %X/%X is not satisfied --- flushed only to %X/%X&amp;quot;,
&lt;br&gt;&amp;nbsp; 			 record.xlogid, record.xrecoff,
&lt;br&gt;&amp;nbsp; 			 LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
&lt;br&gt;--- 1867,1873 ----
&lt;br&gt;&amp;nbsp; 	 * and so we will not force a restart for a bad LSN on a data page.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	if (XLByteLT(LogwrtResult.Flush, record))
&lt;br&gt;! 		elog(ERROR,
&lt;br&gt;&amp;nbsp; 		&amp;quot;xlog flush request %X/%X is not satisfied --- flushed only to %X/%X&amp;quot;,
&lt;br&gt;&amp;nbsp; 			 record.xlogid, record.xrecoff,
&lt;br&gt;&amp;nbsp; 			 LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
&lt;br&gt;***************
&lt;br&gt;*** 2094,2100 ****
&lt;br&gt;&amp;nbsp; 		unlink(tmppath);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	elog(DEBUG2, &amp;quot;done creating and filling new WAL file&amp;quot;);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Set flag to tell caller there was no existent file */
&lt;br&gt;&amp;nbsp; 	*use_existent = false;
&lt;br&gt;--- 2140,2147 ----
&lt;br&gt;&amp;nbsp; 		unlink(tmppath);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	XLogFileName(tmppath, ThisTimeLineID, log, seg);
&lt;br&gt;! 	elog(DEBUG2, &amp;quot;done creating and filling new WAL file %s&amp;quot;, tmppath);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Set flag to tell caller there was no existent file */
&lt;br&gt;&amp;nbsp; 	*use_existent = false;
&lt;br&gt;***************
&lt;br&gt;*** 2400,2405 ****
&lt;br&gt;--- 2447,2474 ----
&lt;br&gt;&amp;nbsp; 					 xlogfname);
&lt;br&gt;&amp;nbsp; 			set_ps_display(activitymsg, false);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 			/* 
&lt;br&gt;+ 			 * Calculate and write out a new safeStartPoint. This defines
&lt;br&gt;+ 			 * the latest LSN that might appear on-disk while we apply
&lt;br&gt;+ 			 * the WAL records in this file. If we crash during recovery
&lt;br&gt;+ 			 * we must reach this point again before we can prove
&lt;br&gt;+ 			 * database consistency. Not a restartpoint! Restart points
&lt;br&gt;+ 			 * define where we should start recovery from, if we crash.
&lt;br&gt;+ 			 */
&lt;br&gt;+ 			if (InArchiveRecovery)
&lt;br&gt;+ 			{
&lt;br&gt;+ 				uint32 nextLog = log;
&lt;br&gt;+ 				uint32 nextSeg = seg;
&lt;br&gt;+ 
&lt;br&gt;+ 				NextLogSeg(nextLog, nextSeg);
&lt;br&gt;+ 
&lt;br&gt;+ 				LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;+ 				ControlFile-&amp;gt;minSafeStartPoint.xlogid = nextLog;
&lt;br&gt;+ 				ControlFile-&amp;gt;minSafeStartPoint.xrecoff = nextSeg * XLogSegSize;
&lt;br&gt;+ 				UpdateControlFile();
&lt;br&gt;+ 				LWLockRelease(ControlFileLock);
&lt;br&gt;+ 			}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 			return fd;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		if (errno != ENOENT)	/* unexpected failure? */
&lt;br&gt;***************
&lt;br&gt;*** 4228,4233 ****
&lt;br&gt;--- 4297,4303 ----
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;XLogCacheBlck = XLOGbuffers - 1;
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;Insert.currpage = (XLogPageHeader) (XLogCtl-&amp;gt;pages);
&lt;br&gt;&amp;nbsp; 	SpinLockInit(&amp;XLogCtl-&amp;gt;info_lck);
&lt;br&gt;+ 	SpinLockInit(&amp;XLogCtl-&amp;gt;mode_lck);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * If we are not in bootstrap mode, pg_control should already exist. Read
&lt;br&gt;***************
&lt;br&gt;*** 4532,4548 ****
&lt;br&gt;&amp;nbsp; 			ereport(LOG,
&lt;br&gt;&amp;nbsp; 					(errmsg(&amp;quot;recovery_target_inclusive = %s&amp;quot;, tok2)));
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else if (strcmp(tok1, &amp;quot;log_restartpoints&amp;quot;) == 0)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;- 			/*
&lt;br&gt;- 			 * does nothing if a recovery_target is not also set
&lt;br&gt;- 			 */
&lt;br&gt;- 			if (!parse_bool(tok2, &amp;recoveryLogRestartpoints))
&lt;br&gt;- 				 &amp;nbsp;ereport(ERROR,
&lt;br&gt;- 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
&lt;br&gt;- 					 &amp;nbsp;errmsg(&amp;quot;parameter \&amp;quot;log_restartpoints\&amp;quot; requires a Boolean value&amp;quot;)));
&lt;br&gt;&amp;nbsp; 			ereport(LOG,
&lt;br&gt;! 					(errmsg(&amp;quot;log_restartpoints = %s&amp;quot;, tok2)));
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else
&lt;br&gt;&amp;nbsp; 			ereport(FATAL,
&lt;br&gt;--- 4602,4642 ----
&lt;br&gt;&amp;nbsp; 			ereport(LOG,
&lt;br&gt;&amp;nbsp; 					(errmsg(&amp;quot;recovery_target_inclusive = %s&amp;quot;, tok2)));
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;+ 		else if (strcmp(tok1, &amp;quot;recovery_safe_start_location&amp;quot;) == 0)
&lt;br&gt;+ 		{
&lt;br&gt;+ 			unsigned int uxlogid;
&lt;br&gt;+ 			unsigned int uxrecoff;
&lt;br&gt;+ 			XLogRecPtr	NewSafeStartPtr;
&lt;br&gt;+ 
&lt;br&gt;+ 			if (sscanf(tok2, &amp;quot;%X/%X&amp;quot;, &amp;uxlogid, &amp;uxrecoff) != 2)
&lt;br&gt;+ 				ereport(ERROR,
&lt;br&gt;+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
&lt;br&gt;+ 						 errmsg(&amp;quot;could not parse transaction log location \&amp;quot;%s\&amp;quot;&amp;quot;,
&lt;br&gt;+ 								tok2)));
&lt;br&gt;+ 
&lt;br&gt;+ 			NewSafeStartPtr.xlogid = uxlogid;
&lt;br&gt;+ 			NewSafeStartPtr.xrecoff = uxrecoff;
&lt;br&gt;+ 			if (XLByteLE(ControlFile-&amp;gt;minSafeStartPoint, NewSafeStartPtr))
&lt;br&gt;+ 			{
&lt;br&gt;+ 				ControlFile-&amp;gt;minSafeStartPoint.xlogid = uxlogid;
&lt;br&gt;+ 				ControlFile-&amp;gt;minSafeStartPoint.xrecoff = uxrecoff;
&lt;br&gt;+ 
&lt;br&gt;+ 				ereport(LOG,
&lt;br&gt;+ 					(errmsg(&amp;quot;recovery_safe_start_location = '%s'&amp;quot;, tok2)));
&lt;br&gt;+ 			}
&lt;br&gt;+ 			else if (ControlFile-&amp;gt;state != DB_IN_ARCHIVE_RECOVERY)
&lt;br&gt;+ 				ereport(ERROR,
&lt;br&gt;+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
&lt;br&gt;+ 						 errmsg(&amp;quot;recovery_safe_start_location = '%s' is earlier than control file %X/%X&amp;quot;,
&lt;br&gt;+ 								tok2,
&lt;br&gt;+ 								ControlFile-&amp;gt;minSafeStartPoint.xlogid,
&lt;br&gt;+ 								ControlFile-&amp;gt;minSafeStartPoint.xrecoff)));
&lt;br&gt;+ 		}
&lt;br&gt;&amp;nbsp; 		else if (strcmp(tok1, &amp;quot;log_restartpoints&amp;quot;) == 0)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;&amp;nbsp; 			ereport(LOG,
&lt;br&gt;! 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
&lt;br&gt;! 					 &amp;nbsp;errmsg(&amp;quot;parameter \&amp;quot;log_restartpoints\&amp;quot; has been deprecated&amp;quot;)));
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else
&lt;br&gt;&amp;nbsp; 			ereport(FATAL,
&lt;br&gt;***************
&lt;br&gt;*** 4678,4692 ****
&lt;br&gt;&amp;nbsp; 	unlink(recoveryPath);		/* ignore any error */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Rename the config file out of the way, so that we don't accidentally
&lt;br&gt;! 	 * re-enter archive recovery mode in a subsequent crash.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;- 	unlink(RECOVERY_COMMAND_DONE);
&lt;br&gt;- 	if (rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0)
&lt;br&gt;- 		ereport(FATAL,
&lt;br&gt;- 				(errcode_for_file_access(),
&lt;br&gt;- 				 errmsg(&amp;quot;could not rename file \&amp;quot;%s\&amp;quot; to \&amp;quot;%s\&amp;quot;: %m&amp;quot;,
&lt;br&gt;- 						RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	ereport(LOG,
&lt;br&gt;&amp;nbsp; 			(errmsg(&amp;quot;archive recovery complete&amp;quot;)));
&lt;br&gt;--- 4772,4784 ----
&lt;br&gt;&amp;nbsp; 	unlink(recoveryPath);		/* ignore any error */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * As of 8.4 we no longer rename the recovery.conf file out of the
&lt;br&gt;! 	 * way until after we have performed a full checkpoint. This ensures
&lt;br&gt;! 	 * that any crash between now and the end of the checkpoint does not
&lt;br&gt;! 	 * attempt to restart from a WAL file that is no longer available to us.
&lt;br&gt;! 	 * As soon as we remove recovery.conf we lose our recovery_command and
&lt;br&gt;! 	 * cannot reaccess WAL files from the archive.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	ereport(LOG,
&lt;br&gt;&amp;nbsp; 			(errmsg(&amp;quot;archive recovery complete&amp;quot;)));
&lt;br&gt;***************
&lt;br&gt;*** 4813,4818 ****
&lt;br&gt;--- 4905,4911 ----
&lt;br&gt;&amp;nbsp; 	CheckPoint	checkPoint;
&lt;br&gt;&amp;nbsp; 	bool		wasShutdown;
&lt;br&gt;&amp;nbsp; 	bool		reachedStopPoint = false;
&lt;br&gt;+ 	bool		performedRecovery = false;
&lt;br&gt;&amp;nbsp; 	bool		haveBackupLabel = false;
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	RecPtr,
&lt;br&gt;&amp;nbsp; 				LastRec,
&lt;br&gt;***************
&lt;br&gt;*** 4825,4830 ****
&lt;br&gt;--- 4918,4925 ----
&lt;br&gt;&amp;nbsp; 	uint32		freespace;
&lt;br&gt;&amp;nbsp; 	TransactionId oldestActiveXID;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	XLogCtl-&amp;gt;SharedRecoveryProcessingMode = true;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Read control file and check XLOG status looks valid.
&lt;br&gt;&amp;nbsp; 	 *
&lt;br&gt;***************
&lt;br&gt;*** 5038,5046 ****
&lt;br&gt;--- 5133,5147 ----
&lt;br&gt;&amp;nbsp; 		if (minRecoveryLoc.xlogid != 0 || minRecoveryLoc.xrecoff != 0)
&lt;br&gt;&amp;nbsp; 			ControlFile-&amp;gt;minRecoveryPoint = minRecoveryLoc;
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;+ 		/* No need to hold ControlFileLock yet, we aren't up far enough */
&lt;br&gt;&amp;nbsp; 		UpdateControlFile();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;+ 		 * Reset pgstat data, because it may be invalid after recovery.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		pgstat_reset_all();
&lt;br&gt;+ 
&lt;br&gt;+ 		/*
&lt;br&gt;&amp;nbsp; 		 * If there was a backup label file, it's done its job and the info
&lt;br&gt;&amp;nbsp; 		 * has now been propagated into pg_control. &amp;nbsp;We must get rid of the
&lt;br&gt;&amp;nbsp; 		 * label file so that if we crash during recovery, we'll pick up at
&lt;br&gt;***************
&lt;br&gt;*** 5150,5155 ****
&lt;br&gt;--- 5251,5282 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 				LastRec = ReadRecPtr;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 				/*
&lt;br&gt;+ 				 * Have we reached our safe starting point? If so, we can
&lt;br&gt;+ 				 * signal Postmaster to enter consistent recovery mode.
&lt;br&gt;+ 				 *
&lt;br&gt;+ 				 * There are two point in the log we must pass. The first is
&lt;br&gt;+ 				 * the minRecoveryPoint, which is the LSN at the time the
&lt;br&gt;+ 				 * base backup was taken that we are about to rollfoward from.
&lt;br&gt;+ 				 * If recovery has ever crashed or was stopped there is 
&lt;br&gt;+ 				 * another point also: minSafeStartPoint, which we know the
&lt;br&gt;+ 				 * latest LSN that recovery could have reached prior to crash.
&lt;br&gt;+ 				 */
&lt;br&gt;+ 				if (!reachedSafeStartPoint &amp;&amp; 
&lt;br&gt;+ 					 XLByteLE(ControlFile-&amp;gt;minSafeStartPoint, EndRecPtr) &amp;&amp; 
&lt;br&gt;+ 					 XLByteLE(ControlFile-&amp;gt;minRecoveryPoint, EndRecPtr))
&lt;br&gt;+ 				{
&lt;br&gt;+ 					reachedSafeStartPoint = true;
&lt;br&gt;+ 					if (InArchiveRecovery)
&lt;br&gt;+ 					{
&lt;br&gt;+ 						ereport(LOG,
&lt;br&gt;+ 							(errmsg(&amp;quot;consistent recovery state reached at %X/%X&amp;quot;,
&lt;br&gt;+ 								EndRecPtr.xlogid, EndRecPtr.xrecoff)));
&lt;br&gt;+ 						if (IsUnderPostmaster)
&lt;br&gt;+ 							SendPostmasterSignal(PMSIGNAL_RECOVERY_START);
&lt;br&gt;+ 					}
&lt;br&gt;+ 				}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 				record = ReadRecord(NULL, LOG);
&lt;br&gt;&amp;nbsp; 			} while (record != NULL &amp;&amp; recoveryContinue);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 5171,5176 ****
&lt;br&gt;--- 5298,5304 ----
&lt;br&gt;&amp;nbsp; 			/* there are no WAL records following the checkpoint */
&lt;br&gt;&amp;nbsp; 			ereport(LOG,
&lt;br&gt;&amp;nbsp; 					(errmsg(&amp;quot;redo is not required&amp;quot;)));
&lt;br&gt;+ 			reachedSafeStartPoint = true;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 5184,5192 ****
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Complain if we did not roll forward far enough to render the backup
&lt;br&gt;! 	 * dump consistent.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (XLByteLT(EndOfLog, ControlFile-&amp;gt;minRecoveryPoint))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		if (reachedStopPoint)	/* stopped because of stop request */
&lt;br&gt;&amp;nbsp; 			ereport(FATAL,
&lt;br&gt;--- 5312,5320 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Complain if we did not roll forward far enough to render the backup
&lt;br&gt;! 	 * dump consistent and start safely.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (InRecovery &amp;&amp; !reachedSafeStartPoint)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		if (reachedStopPoint)	/* stopped because of stop request */
&lt;br&gt;&amp;nbsp; 			ereport(FATAL,
&lt;br&gt;***************
&lt;br&gt;*** 5308,5346 ****
&lt;br&gt;&amp;nbsp; 		XLogCheckInvalidPages();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;! 		 * Reset pgstat data, because it may be invalid after recovery.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		pgstat_reset_all();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Perform a checkpoint to update all our recovery activity to disk.
&lt;br&gt;! 		 *
&lt;br&gt;! 		 * Note that we write a shutdown checkpoint rather than an on-line
&lt;br&gt;! 		 * one. This is not particularly critical, but since we may be
&lt;br&gt;! 		 * assigning a new TLI, using a shutdown checkpoint allows us to have
&lt;br&gt;! 		 * the rule that TLI only changes in shutdown checkpoints, which
&lt;br&gt;! 		 * allows some extra error checking in xlog_redo.
&lt;br&gt;! 		 */
&lt;br&gt;! 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 	/*
&lt;br&gt;- 	 * Preallocate additional log files, if wanted.
&lt;br&gt;- 	 */
&lt;br&gt;- 	PreallocXlogFiles(EndOfLog);
&lt;br&gt;- 
&lt;br&gt;- 	/*
&lt;br&gt;- 	 * Okay, we're officially UP.
&lt;br&gt;- 	 */
&lt;br&gt;- 	InRecovery = false;
&lt;br&gt;- 
&lt;br&gt;- 	ControlFile-&amp;gt;state = DB_IN_PRODUCTION;
&lt;br&gt;- 	ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;- 	UpdateControlFile();
&lt;br&gt;- 
&lt;br&gt;- 	/* start the archive_timeout timer running */
&lt;br&gt;- 	XLogCtl-&amp;gt;Write.lastSegSwitchTime = ControlFile-&amp;gt;time;
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 	/* initialize shared-memory copy of latest checkpoint XID/epoch */
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;ckptXidEpoch = ControlFile-&amp;gt;checkPointCopy.nextXidEpoch;
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;ckptXid = ControlFile-&amp;gt;checkPointCopy.nextXid;
&lt;br&gt;--- 5436,5449 ----
&lt;br&gt;&amp;nbsp; 		XLogCheckInvalidPages();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;! 		 * Finally exit recovery and mark that in WAL. Pre-8.4 we wrote
&lt;br&gt;! 		 * a shutdown checkpoint here, but we ask bgwriter to do that now.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		exitRecovery();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		performedRecovery = true;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* initialize shared-memory copy of latest checkpoint XID/epoch */
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;ckptXidEpoch = ControlFile-&amp;gt;checkPointCopy.nextXidEpoch;
&lt;br&gt;&amp;nbsp; 	XLogCtl-&amp;gt;ckptXid = ControlFile-&amp;gt;checkPointCopy.nextXid;
&lt;br&gt;***************
&lt;br&gt;*** 5374,5379 ****
&lt;br&gt;--- 5477,5565 ----
&lt;br&gt;&amp;nbsp; 		readRecordBuf = NULL;
&lt;br&gt;&amp;nbsp; 		readRecordBufSize = 0;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * Prior to 8.4 we wrote a Shutdown Checkpoint at the end of recovery.
&lt;br&gt;+ 	 * This could add minutes to the startup time, so we want bgwriter
&lt;br&gt;+ 	 * to perform it. This then frees the Startup process to complete so we can
&lt;br&gt;+ 	 * allow transactions and WAL inserts. We still write a checkpoint, but
&lt;br&gt;+ 	 * it will be an online checkpoint. Online checkpoints have a redo
&lt;br&gt;+ 	 * location that can be prior to the actual checkpoint record. So we want
&lt;br&gt;+ 	 * to derive that redo location *before* we let anybody else write WAL,
&lt;br&gt;+ 	 * otherwise we might miss some WAL records if we crash.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	if (performedRecovery)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		XLogRecPtr	redo;
&lt;br&gt;+ 
&lt;br&gt;+ 		/* 
&lt;br&gt;+ 		 * We must grab the pointer before anybody writes WAL 
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		redo = GetRedoLocationForCheckpoint();
&lt;br&gt;+ 
&lt;br&gt;+ 		/* 
&lt;br&gt;+ 		 * Tell the bgwriter
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		SetRedoLocationForArchiveCheckpoint(redo);
&lt;br&gt;+ 
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Okay, we can come up now. Allow others to write WAL.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		XLogCtl-&amp;gt;SharedRecoveryProcessingMode = false;
&lt;br&gt;+ 
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Now request checkpoint
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_IMMEDIATE);
&lt;br&gt;+ 	}
&lt;br&gt;+ 	else
&lt;br&gt;+ 	{
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * No recovery, so lets just get on with it. 
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;+ 		ControlFile-&amp;gt;state = DB_IN_PRODUCTION;
&lt;br&gt;+ 		ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;+ 		UpdateControlFile();
&lt;br&gt;+ 		LWLockRelease(ControlFileLock);
&lt;br&gt;+ 
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Okay, we're officially UP.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		XLogCtl-&amp;gt;SharedRecoveryProcessingMode = false;
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	/* start the archive_timeout timer running */
&lt;br&gt;+ 	XLogCtl-&amp;gt;Write.lastSegSwitchTime = (pg_time_t) time(NULL);
&lt;br&gt;+ 
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ /*
&lt;br&gt;+ &amp;nbsp;* IsRecoveryProcessingMode()
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;* Fast test for whether we're still in recovery or not. We test the shared
&lt;br&gt;+ &amp;nbsp;* state each time only until we leave recovery mode. After that we never
&lt;br&gt;+ &amp;nbsp;* look again, relying upon the settings of our local state variables. This
&lt;br&gt;+ &amp;nbsp;* is designed to avoid the need for a separate initialisation step.
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ bool
&lt;br&gt;+ IsRecoveryProcessingMode(void)
&lt;br&gt;+ {
&lt;br&gt;+ 	if (knownProcessingMode &amp;&amp; !LocalRecoveryProcessingMode)
&lt;br&gt;+ 		return false;
&lt;br&gt;+ 
&lt;br&gt;+ 	{
&lt;br&gt;+ 		/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;+ 		volatile XLogCtlData *xlogctl = XLogCtl;
&lt;br&gt;+ 
&lt;br&gt;+ 		SpinLockAcquire(&amp;xlogctl-&amp;gt;mode_lck);
&lt;br&gt;+ 		LocalRecoveryProcessingMode = XLogCtl-&amp;gt;SharedRecoveryProcessingMode;
&lt;br&gt;+ 		SpinLockRelease(&amp;xlogctl-&amp;gt;mode_lck);
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	knownProcessingMode = true;
&lt;br&gt;+ 
&lt;br&gt;+ 	return LocalRecoveryProcessingMode;
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;***************
&lt;br&gt;*** 5631,5650 ****
&lt;br&gt;&amp;nbsp; static void
&lt;br&gt;&amp;nbsp; LogCheckpointStart(int flags)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	elog(LOG, &amp;quot;checkpoint starting:%s%s%s%s%s%s&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_IS_SHUTDOWN) ? &amp;quot; shutdown&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_IMMEDIATE) ? &amp;quot; immediate&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_FORCE) ? &amp;quot; force&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_WAIT) ? &amp;quot; wait&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_CAUSE_XLOG) ? &amp;quot; xlog&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 		 (flags &amp; CHECKPOINT_CAUSE_TIME) ? &amp;quot; time&amp;quot; : &amp;quot;&amp;quot;);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Log end of a checkpoint.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; static void
&lt;br&gt;! LogCheckpointEnd(void)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	long		write_secs,
&lt;br&gt;&amp;nbsp; 				sync_secs,
&lt;br&gt;--- 5817,5840 ----
&lt;br&gt;&amp;nbsp; static void
&lt;br&gt;&amp;nbsp; LogCheckpointStart(int flags)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	if (flags &amp; CHECKPOINT_RESTARTPOINT)
&lt;br&gt;! 		elog(LOG, &amp;quot;restartpoint starting:%s&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_IMMEDIATE) ? &amp;quot; immediate&amp;quot; : &amp;quot;&amp;quot;);
&lt;br&gt;! 	else
&lt;br&gt;! 		elog(LOG, &amp;quot;checkpoint starting:%s%s%s%s%s%s&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_IS_SHUTDOWN) ? &amp;quot; shutdown&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_IMMEDIATE) ? &amp;quot; immediate&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_FORCE) ? &amp;quot; force&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_WAIT) ? &amp;quot; wait&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_CAUSE_XLOG) ? &amp;quot; xlog&amp;quot; : &amp;quot;&amp;quot;,
&lt;br&gt;! 			 (flags &amp; CHECKPOINT_CAUSE_TIME) ? &amp;quot; time&amp;quot; : &amp;quot;&amp;quot;);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Log end of a checkpoint.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; static void
&lt;br&gt;! LogCheckpointEnd(int flags)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	long		write_secs,
&lt;br&gt;&amp;nbsp; 				sync_secs,
&lt;br&gt;***************
&lt;br&gt;*** 5667,5683 ****
&lt;br&gt;&amp;nbsp; 						CheckpointStats.ckpt_sync_end_t,
&lt;br&gt;&amp;nbsp; 						&amp;sync_secs, &amp;sync_usecs);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	elog(LOG, &amp;quot;checkpoint complete: wrote %d buffers (%.1f%%); &amp;quot;
&lt;br&gt;! 		 &amp;quot;%d transaction log file(s) added, %d removed, %d recycled; &amp;quot;
&lt;br&gt;! 		 &amp;quot;write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s&amp;quot;,
&lt;br&gt;! 		 CheckpointStats.ckpt_bufs_written,
&lt;br&gt;! 		 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
&lt;br&gt;! 		 CheckpointStats.ckpt_segs_added,
&lt;br&gt;! 		 CheckpointStats.ckpt_segs_removed,
&lt;br&gt;! 		 CheckpointStats.ckpt_segs_recycled,
&lt;br&gt;! 		 write_secs, write_usecs / 1000,
&lt;br&gt;! 		 sync_secs, sync_usecs / 1000,
&lt;br&gt;! 		 total_secs, total_usecs / 1000);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;--- 5857,5882 ----
&lt;br&gt;&amp;nbsp; 						CheckpointStats.ckpt_sync_end_t,
&lt;br&gt;&amp;nbsp; 						&amp;sync_secs, &amp;sync_usecs);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	if (flags &amp; CHECKPOINT_RESTARTPOINT)
&lt;br&gt;! 		elog(LOG, &amp;quot;restartpoint complete: wrote %d buffers (%.1f%%); &amp;quot;
&lt;br&gt;! 			 &amp;quot;write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s&amp;quot;,
&lt;br&gt;! 			 CheckpointStats.ckpt_bufs_written,
&lt;br&gt;! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
&lt;br&gt;! 			 write_secs, write_usecs / 1000,
&lt;br&gt;! 			 sync_secs, sync_usecs / 1000,
&lt;br&gt;! 			 total_secs, total_usecs / 1000);
&lt;br&gt;! 	else
&lt;br&gt;! 		elog(LOG, &amp;quot;checkpoint complete: wrote %d buffers (%.1f%%); &amp;quot;
&lt;br&gt;! 			 &amp;quot;%d transaction log file(s) added, %d removed, %d recycled; &amp;quot;
&lt;br&gt;! 			 &amp;quot;write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s&amp;quot;,
&lt;br&gt;! 			 CheckpointStats.ckpt_bufs_written,
&lt;br&gt;! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
&lt;br&gt;! 			 CheckpointStats.ckpt_segs_added,
&lt;br&gt;! 			 CheckpointStats.ckpt_segs_removed,
&lt;br&gt;! 			 CheckpointStats.ckpt_segs_recycled,
&lt;br&gt;! 			 write_secs, write_usecs / 1000,
&lt;br&gt;! 			 sync_secs, sync_usecs / 1000,
&lt;br&gt;! 			 total_secs, total_usecs / 1000);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;***************
&lt;br&gt;*** 5702,5718 ****
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	recptr;
&lt;br&gt;&amp;nbsp; 	XLogCtlInsert *Insert = &amp;XLogCtl-&amp;gt;Insert;
&lt;br&gt;&amp;nbsp; 	XLogRecData rdata;
&lt;br&gt;- 	uint32		freespace;
&lt;br&gt;&amp;nbsp; 	uint32		_logId;
&lt;br&gt;&amp;nbsp; 	uint32		_logSeg;
&lt;br&gt;&amp;nbsp; 	TransactionId *inCommitXids;
&lt;br&gt;&amp;nbsp; 	int			nInCommit;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
&lt;br&gt;! 	 * (This is just pro forma, since in the present system structure there is
&lt;br&gt;! 	 * only one process that is allowed to issue checkpoints at any given
&lt;br&gt;! 	 * time.)
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 5901,5916 ----
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	recptr;
&lt;br&gt;&amp;nbsp; 	XLogCtlInsert *Insert = &amp;XLogCtl-&amp;gt;Insert;
&lt;br&gt;&amp;nbsp; 	XLogRecData rdata;
&lt;br&gt;&amp;nbsp; 	uint32		_logId;
&lt;br&gt;&amp;nbsp; 	uint32		_logSeg;
&lt;br&gt;&amp;nbsp; 	TransactionId *inCommitXids;
&lt;br&gt;&amp;nbsp; 	int			nInCommit;
&lt;br&gt;+ 	bool		leavingArchiveRecovery = false;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
&lt;br&gt;! 	 * That shouldn't be happening, but checkpoints are an important aspect
&lt;br&gt;! 	 * of our resilience, so we take no chances.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 5727,5741 ****
&lt;br&gt;--- 5925,5948 ----
&lt;br&gt;&amp;nbsp; 	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;+ 	 * Find out if this is the first checkpoint after archive recovery.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;+ 	leavingArchiveRecovery = (ControlFile-&amp;gt;state == DB_IN_ARCHIVE_RECOVERY);
&lt;br&gt;+ 	LWLockRelease(ControlFileLock);
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;&amp;nbsp; 	 * Use a critical section to force system panic if we have trouble.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	START_CRIT_SECTION();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	if (shutdown)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;+ 		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;state = DB_SHUTDOWNING;
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 		UpdateControlFile();
&lt;br&gt;+ 		LWLockRelease(ControlFileLock);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;***************
&lt;br&gt;*** 5750,5840 ****
&lt;br&gt;&amp;nbsp; 	checkPoint.ThisTimeLineID = ThisTimeLineID;
&lt;br&gt;&amp;nbsp; 	checkPoint.time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * We must hold WALInsertLock while examining insert state to determine
&lt;br&gt;! 	 * the checkpoint REDO pointer.
&lt;br&gt;! 	 */
&lt;br&gt;! 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * If this isn't a shutdown or forced checkpoint, and we have not inserted
&lt;br&gt;! 	 * any XLOG records since the start of the last checkpoint, skip the
&lt;br&gt;! 	 * checkpoint.	The idea here is to avoid inserting duplicate checkpoints
&lt;br&gt;! 	 * when the system is idle. That wastes log space, and more importantly it
&lt;br&gt;! 	 * exposes us to possible loss of both current and previous checkpoint
&lt;br&gt;! 	 * records if the machine crashes just as we're writing the update.
&lt;br&gt;! 	 * (Perhaps it'd make even more sense to checkpoint only when the previous
&lt;br&gt;! 	 * checkpoint record is in a different xlog page?)
&lt;br&gt;! 	 *
&lt;br&gt;! 	 * We have to make two tests to determine that nothing has happened since
&lt;br&gt;! 	 * the start of the last checkpoint: current insertion point must match
&lt;br&gt;! 	 * the end of the last checkpoint record, and its redo pointer must point
&lt;br&gt;! 	 * to itself.
&lt;br&gt;! 	 */
&lt;br&gt;! 	if ((flags &amp; (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;! 		XLogRecPtr	curInsert;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		INSERT_RECPTR(curInsert, Insert, Insert-&amp;gt;curridx);
&lt;br&gt;! 		if (curInsert.xlogid == ControlFile-&amp;gt;checkPoint.xlogid &amp;&amp;
&lt;br&gt;! 			curInsert.xrecoff == ControlFile-&amp;gt;checkPoint.xrecoff +
&lt;br&gt;! 			MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &amp;&amp;
&lt;br&gt;! 			ControlFile-&amp;gt;checkPoint.xlogid ==
&lt;br&gt;! 			ControlFile-&amp;gt;checkPointCopy.redo.xlogid &amp;&amp;
&lt;br&gt;! 			ControlFile-&amp;gt;checkPoint.xrecoff ==
&lt;br&gt;! 			ControlFile-&amp;gt;checkPointCopy.redo.xrecoff)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			LWLockRelease(WALInsertLock);
&lt;br&gt;! 			LWLockRelease(CheckpointLock);
&lt;br&gt;! 			END_CRIT_SECTION();
&lt;br&gt;! 			return;
&lt;br&gt;! 		}
&lt;br&gt;! 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * Compute new REDO record ptr = location of next XLOG record.
&lt;br&gt;! 	 *
&lt;br&gt;! 	 * NB: this is NOT necessarily where the checkpoint record itself will be,
&lt;br&gt;! 	 * since other backends may insert more XLOG records while we're off doing
&lt;br&gt;! 	 * the buffer flush work. &amp;nbsp;Those XLOG records are logically after the
&lt;br&gt;! 	 * checkpoint, even though physically before it. &amp;nbsp;Got that?
&lt;br&gt;! 	 */
&lt;br&gt;! 	freespace = INSERT_FREESPACE(Insert);
&lt;br&gt;! 	if (freespace &amp;lt; SizeOfXLogRecord)
&lt;br&gt;! 	{
&lt;br&gt;! 		(void) AdvanceXLInsertBuffer(false);
&lt;br&gt;! 		/* OK to ignore update return flag, since we will do flush anyway */
&lt;br&gt;! 		freespace = INSERT_FREESPACE(Insert);
&lt;br&gt;! 	}
&lt;br&gt;! 	INSERT_RECPTR(checkPoint.redo, Insert, Insert-&amp;gt;curridx);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * Here we update the shared RedoRecPtr for future XLogInsert calls; this
&lt;br&gt;! 	 * must be done while holding the insert lock AND the info_lck.
&lt;br&gt;! 	 *
&lt;br&gt;! 	 * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
&lt;br&gt;! 	 * pointing past where it really needs to point. &amp;nbsp;This is okay; the only
&lt;br&gt;! 	 * consequence is that XLogInsert might back up whole buffers that it
&lt;br&gt;! 	 * didn't really need to. &amp;nbsp;We can't postpone advancing RedoRecPtr because
&lt;br&gt;! 	 * XLogInserts that happen while we are dumping buffers must assume that
&lt;br&gt;! 	 * their buffer changes are not included in the checkpoint.
&lt;br&gt;! 	 */
&lt;br&gt;! 	{
&lt;br&gt;! 		/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;! 		volatile XLogCtlData *xlogctl = XLogCtl;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		SpinLockAcquire(&amp;xlogctl-&amp;gt;info_lck);
&lt;br&gt;! 		RedoRecPtr = xlogctl-&amp;gt;Insert.RedoRecPtr = checkPoint.redo;
&lt;br&gt;! 		SpinLockRelease(&amp;xlogctl-&amp;gt;info_lck);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;- 	 * Now we can release WAL insert lock, allowing other xacts to proceed
&lt;br&gt;- 	 * while we are flushing disk buffers.
&lt;br&gt;- 	 */
&lt;br&gt;- 	LWLockRelease(WALInsertLock);
&lt;br&gt;- 
&lt;br&gt;- 	/*
&lt;br&gt;&amp;nbsp; 	 * If enabled, log checkpoint start. &amp;nbsp;We postpone this until now so as not
&lt;br&gt;&amp;nbsp; 	 * to log anything if we decided to skip the checkpoint.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;--- 5957,6025 ----
&lt;br&gt;&amp;nbsp; 	checkPoint.ThisTimeLineID = ThisTimeLineID;
&lt;br&gt;&amp;nbsp; 	checkPoint.time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	if (leavingArchiveRecovery)
&lt;br&gt;! 		checkPoint.redo = GetRedoLocationForArchiveCheckpoint();
&lt;br&gt;! 	else
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * We must hold WALInsertLock while examining insert state to determine
&lt;br&gt;! 		 * the checkpoint REDO pointer.
&lt;br&gt;! 		 */
&lt;br&gt;! 		LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * If this isn't a shutdown or forced checkpoint, and we have not inserted
&lt;br&gt;! 		 * any XLOG records since the start of the last checkpoint, skip the
&lt;br&gt;! 		 * checkpoint.	The idea here is to avoid inserting duplicate checkpoints
&lt;br&gt;! 		 * when the system is idle. That wastes log space, and more importantly it
&lt;br&gt;! 		 * exposes us to possible loss of both current and previous checkpoint
&lt;br&gt;! 		 * records if the machine crashes just as we're writing the update.
&lt;br&gt;! 		 * (Perhaps it'd make even more sense to checkpoint only when the previous
&lt;br&gt;! 		 * checkpoint record is in a different xlog page?)
&lt;br&gt;! 		 *
&lt;br&gt;! 		 * We have to make two tests to determine that nothing has happened since
&lt;br&gt;! 		 * the start of the last checkpoint: current insertion point must match
&lt;br&gt;! 		 * the end of the last checkpoint record, and its redo pointer must point
&lt;br&gt;! 		 * to itself.
&lt;br&gt;! 		 */
&lt;br&gt;! 		if ((flags &amp; (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			XLogRecPtr	curInsert;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 			INSERT_RECPTR(curInsert, Insert, Insert-&amp;gt;curridx);
&lt;br&gt;! 			if (curInsert.xlogid == ControlFile-&amp;gt;checkPoint.xlogid &amp;&amp;
&lt;br&gt;! 				curInsert.xrecoff == ControlFile-&amp;gt;checkPoint.xrecoff +
&lt;br&gt;! 				MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) &amp;&amp;
&lt;br&gt;! 				ControlFile-&amp;gt;checkPoint.xlogid ==
&lt;br&gt;! 				ControlFile-&amp;gt;checkPointCopy.redo.xlogid &amp;&amp;
&lt;br&gt;! 				ControlFile-&amp;gt;checkPoint.xrecoff ==
&lt;br&gt;! 				ControlFile-&amp;gt;checkPointCopy.redo.xrecoff)
&lt;br&gt;! 			{
&lt;br&gt;! 				LWLockRelease(WALInsertLock);
&lt;br&gt;! 				LWLockRelease(CheckpointLock);
&lt;br&gt;! 				END_CRIT_SECTION();
&lt;br&gt;! 				return;
&lt;br&gt;! 			}
&lt;br&gt;! 		}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Compute new REDO record ptr = location of next XLOG record.
&lt;br&gt;! 		 *
&lt;br&gt;! 		 * NB: this is NOT necessarily where the checkpoint record itself will be,
&lt;br&gt;! 		 * since other backends may insert more XLOG records while we're off doing
&lt;br&gt;! 		 * the buffer flush work. &amp;nbsp;Those XLOG records are logically after the
&lt;br&gt;! 		 * checkpoint, even though physically before it. &amp;nbsp;Got that?
&lt;br&gt;! 		 */
&lt;br&gt;! 		checkPoint.redo = GetRedoLocationForCheckpoint();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Now we can release WAL insert lock, allowing other xacts to proceed
&lt;br&gt;! 		 * while we are flushing disk buffers.
&lt;br&gt;! 		 */
&lt;br&gt;! 		LWLockRelease(WALInsertLock);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * If enabled, log checkpoint start. &amp;nbsp;We postpone this until now so as not
&lt;br&gt;&amp;nbsp; 	 * to log anything if we decided to skip the checkpoint.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;***************
&lt;br&gt;*** 5941,5958 ****
&lt;br&gt;&amp;nbsp; 	XLByteToSeg(ControlFile-&amp;gt;checkPointCopy.redo, _logId, _logSeg);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Update the control file.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 	if (shutdown)
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;state = DB_SHUTDOWNED;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;prevCheckPoint = ControlFile-&amp;gt;checkPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;checkPoint = ProcLastRecPtr;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;checkPointCopy = checkPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 	UpdateControlFile();
&lt;br&gt;&amp;nbsp; 	LWLockRelease(ControlFileLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Update shared-memory copy of checkpoint XID/epoch */
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;--- 6126,6168 ----
&lt;br&gt;&amp;nbsp; 	XLByteToSeg(ControlFile-&amp;gt;checkPointCopy.redo, _logId, _logSeg);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Update the control file. In 8.4, this routine becomes the primary
&lt;br&gt;! 	 * point for recording changes of state in the control file at the 
&lt;br&gt;! 	 * end of recovery. Postmaster state already shows us being in 
&lt;br&gt;! 	 * normal running mode, but it is only after this point that we
&lt;br&gt;! 	 * are completely free of reperforming a recovery if we crash. &amp;nbsp;Note
&lt;br&gt;! 	 * that this is executed by bgwriter after the death of Startup process.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	if (shutdown)
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;state = DB_SHUTDOWNED;
&lt;br&gt;+ 	else
&lt;br&gt;+ 		ControlFile-&amp;gt;state = DB_IN_PRODUCTION;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;prevCheckPoint = ControlFile-&amp;gt;checkPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;checkPoint = ProcLastRecPtr;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;checkPointCopy = checkPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 	UpdateControlFile();
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	LWLockRelease(ControlFileLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	if (leavingArchiveRecovery)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Rename the config file out of the way, so that we don't accidentally
&lt;br&gt;+ 		 * re-enter archive recovery mode in a subsequent crash. Prior to
&lt;br&gt;+ 		 * 8.4 this step was performed at end of exitArchiveRecovery().
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		unlink(RECOVERY_COMMAND_DONE);
&lt;br&gt;+ 		if (rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0)
&lt;br&gt;+ 			ereport(ERROR,
&lt;br&gt;+ 					(errcode_for_file_access(),
&lt;br&gt;+ 					 errmsg(&amp;quot;could not rename file \&amp;quot;%s\&amp;quot; to \&amp;quot;%s\&amp;quot;: %m&amp;quot;,
&lt;br&gt;+ 							RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/* Update shared-memory copy of checkpoint XID/epoch */
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;***************
&lt;br&gt;*** 5999,6014 ****
&lt;br&gt;&amp;nbsp; 	 * in subtrans.c).	During recovery, though, we mustn't do this because
&lt;br&gt;&amp;nbsp; 	 * StartupSUBTRANS hasn't been called yet.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (!InRecovery)
&lt;br&gt;! 		TruncateSUBTRANS(GetOldestXmin(true, false));
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* All real work is done, but log before releasing lock. */
&lt;br&gt;&amp;nbsp; 	if (log_checkpoints)
&lt;br&gt;! 		LogCheckpointEnd();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	LWLockRelease(CheckpointLock);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Flush all data in shared memory to disk, and fsync
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;--- 6209,6268 ----
&lt;br&gt;&amp;nbsp; 	 * in subtrans.c).	During recovery, though, we mustn't do this because
&lt;br&gt;&amp;nbsp; 	 * StartupSUBTRANS hasn't been called yet.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	TruncateSUBTRANS(GetOldestXmin(true, false));
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* All real work is done, but log before releasing lock. */
&lt;br&gt;&amp;nbsp; 	if (log_checkpoints)
&lt;br&gt;! 		LogCheckpointEnd(flags);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	LWLockRelease(CheckpointLock);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* 
&lt;br&gt;+ &amp;nbsp;* GetRedoLocationForCheckpoint()
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;* When !IsRecoveryProcessingMode() this must be called while holding 
&lt;br&gt;+ &amp;nbsp;* WALInsertLock().
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ static XLogRecPtr
&lt;br&gt;+ GetRedoLocationForCheckpoint()
&lt;br&gt;+ {
&lt;br&gt;+ 	XLogCtlInsert &amp;nbsp;*Insert = &amp;XLogCtl-&amp;gt;Insert;
&lt;br&gt;+ 	uint32			freespace;
&lt;br&gt;+ 	XLogRecPtr		redo;
&lt;br&gt;+ 
&lt;br&gt;+ 	freespace = INSERT_FREESPACE(Insert);
&lt;br&gt;+ 	if (freespace &amp;lt; SizeOfXLogRecord)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		(void) AdvanceXLInsertBuffer(false);
&lt;br&gt;+ 		/* OK to ignore update return flag, since we will do flush anyway */
&lt;br&gt;+ 		freespace = INSERT_FREESPACE(Insert);
&lt;br&gt;+ 	}
&lt;br&gt;+ 	INSERT_RECPTR(redo, Insert, Insert-&amp;gt;curridx);
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * Here we update the shared RedoRecPtr for future XLogInsert calls; this
&lt;br&gt;+ 	 * must be done while holding the insert lock AND the info_lck.
&lt;br&gt;+ 	 *
&lt;br&gt;+ 	 * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
&lt;br&gt;+ 	 * pointing past where it really needs to point. &amp;nbsp;This is okay; the only
&lt;br&gt;+ 	 * consequence is that XLogInsert might back up whole buffers that it
&lt;br&gt;+ 	 * didn't really need to. &amp;nbsp;We can't postpone advancing RedoRecPtr because
&lt;br&gt;+ 	 * XLogInserts that happen while we are dumping buffers must assume that
&lt;br&gt;+ 	 * their buffer changes are not included in the checkpoint.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	{
&lt;br&gt;+ 		/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;+ 		volatile XLogCtlData *xlogctl = XLogCtl;
&lt;br&gt;+ 
&lt;br&gt;+ 		SpinLockAcquire(&amp;xlogctl-&amp;gt;info_lck);
&lt;br&gt;+ 		RedoRecPtr = xlogctl-&amp;gt;Insert.RedoRecPtr = redo;
&lt;br&gt;+ 		SpinLockRelease(&amp;xlogctl-&amp;gt;info_lck);
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	return redo;
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Flush all data in shared memory to disk, and fsync
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;***************
&lt;br&gt;*** 6073,6101 ****
&lt;br&gt;&amp;nbsp; 			}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * OK, force data out to disk
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	CheckPointGuts(checkPoint-&amp;gt;redo, CHECKPOINT_IMMEDIATE);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Update pg_control so that any subsequent crash will restart from this
&lt;br&gt;! 	 * checkpoint.	Note: ReadRecPtr gives the XLOG address of the checkpoint
&lt;br&gt;! 	 * record itself.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;prevCheckPoint = ControlFile-&amp;gt;checkPoint;
&lt;br&gt;! 	ControlFile-&amp;gt;checkPoint = ReadRecPtr;
&lt;br&gt;! 	ControlFile-&amp;gt;checkPointCopy = *checkPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 	UpdateControlFile();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	ereport((recoveryLogRestartpoints ? LOG : DEBUG2),
&lt;br&gt;&amp;nbsp; 			(errmsg(&amp;quot;recovery restart point at %X/%X&amp;quot;,
&lt;br&gt;! 					checkPoint-&amp;gt;redo.xlogid, checkPoint-&amp;gt;redo.xrecoff)));
&lt;br&gt;&amp;nbsp; 	if (recoveryLastXTime)
&lt;br&gt;! 		ereport((recoveryLogRestartpoints ? LOG : DEBUG2),
&lt;br&gt;! 				(errmsg(&amp;quot;last completed transaction was at log time %s&amp;quot;,
&lt;br&gt;! 						timestamptz_to_str(recoveryLastXTime))));
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;--- 6327,6395 ----
&lt;br&gt;&amp;nbsp; 			}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	RequestRestartPoint(ReadRecPtr, checkPoint, reachedSafeStartPoint);
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ /*
&lt;br&gt;+ &amp;nbsp;* As of 8.4, RestartPoints are always created by the bgwriter
&lt;br&gt;+ &amp;nbsp;* once we have reachedSafeStartPoint. We use bgwriter's shared memory
&lt;br&gt;+ &amp;nbsp;* area wherever we call it from, to keep better code structure.
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ void
&lt;br&gt;+ CreateRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, int flags)
&lt;br&gt;+ {
&lt;br&gt;+ 	if (log_checkpoints)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Prepare to accumulate statistics.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 
&lt;br&gt;+ 		MemSet(&amp;CheckpointStats, 0, sizeof(CheckpointStats));
&lt;br&gt;+ 		CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
&lt;br&gt;+ 
&lt;br&gt;+ 		LogCheckpointStart(CHECKPOINT_RESTARTPOINT | flags);
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Acquire CheckpointLock to ensure only one restartpoint happens at a time.
&lt;br&gt;! 	 * We rely on this lock to ensure that the startup process doesn't exit
&lt;br&gt;! 	 * Recovery while we are half way through a restartpoint.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
&lt;br&gt;! 
&lt;br&gt;! 	CheckPointGuts(restartPoint-&amp;gt;redo, CHECKPOINT_RESTARTPOINT | flags);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Update pg_control, using current time
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;+ 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;prevCheckPoint = ControlFile-&amp;gt;checkPoint;
&lt;br&gt;! 	ControlFile-&amp;gt;checkPoint = ReadPtr;
&lt;br&gt;! 	ControlFile-&amp;gt;checkPointCopy = *restartPoint;
&lt;br&gt;&amp;nbsp; 	ControlFile-&amp;gt;time = (pg_time_t) time(NULL);
&lt;br&gt;&amp;nbsp; 	UpdateControlFile();
&lt;br&gt;+ 	LWLockRelease(ControlFileLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * Currently, there is no need to truncate pg_subtrans during recovery.
&lt;br&gt;! 	 * If we did do that, we will need to have called StartupSUBTRANS()
&lt;br&gt;! 	 * already and then TruncateSUBTRANS() would go here.
&lt;br&gt;! 	 */
&lt;br&gt;! 
&lt;br&gt;! 	/* All real work is done, but log before releasing lock. */
&lt;br&gt;! 	if (log_checkpoints)
&lt;br&gt;! 		LogCheckpointEnd(CHECKPOINT_RESTARTPOINT);
&lt;br&gt;! 
&lt;br&gt;! 	ereport((log_checkpoints ? LOG : DEBUG2),
&lt;br&gt;&amp;nbsp; 			(errmsg(&amp;quot;recovery restart point at %X/%X&amp;quot;,
&lt;br&gt;! 					restartPoint-&amp;gt;redo.xlogid, restartPoint-&amp;gt;redo.xrecoff)));
&lt;br&gt;! 
&lt;br&gt;&amp;nbsp; 	if (recoveryLastXTime)
&lt;br&gt;! 		ereport((log_checkpoints ? LOG : DEBUG2),
&lt;br&gt;! 			(errmsg(&amp;quot;last completed transaction was at log time %s&amp;quot;,
&lt;br&gt;! 					timestamptz_to_str(recoveryLastXTime))));
&lt;br&gt;! 
&lt;br&gt;! 	LWLockRelease(CheckpointLock);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;***************
&lt;br&gt;*** 6160,6166 ****
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;! &amp;nbsp;* XLOG resource manager's routines
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; xlog_redo(XLogRecPtr lsn, XLogRecord *record)
&lt;br&gt;--- 6454,6516 ----
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;! &amp;nbsp;* exitRecovery()
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* Exit recovery state and write a XLOG_RECOVERY_END record. This is the
&lt;br&gt;! &amp;nbsp;* only record type that can record a change of timelineID. We assume
&lt;br&gt;! &amp;nbsp;* caller has already set ThisTimeLineID, if appropriate.
&lt;br&gt;! &amp;nbsp;*/
&lt;br&gt;! static void
&lt;br&gt;! exitRecovery(void)
&lt;br&gt;! {
&lt;br&gt;! 	XLogRecData rdata;
&lt;br&gt;! 
&lt;br&gt;! 	rdata.buffer = InvalidBuffer;
&lt;br&gt;! 	rdata.data = (char *) (&amp;ThisTimeLineID);
&lt;br&gt;! 	rdata.len = sizeof(TimeLineID);
&lt;br&gt;! 	rdata.next = NULL;
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * If a restartpoint is in progress, we will not be able to successfully
&lt;br&gt;! 	 * acquire CheckpointLock. If bgwriter is still in progress then send
&lt;br&gt;! 	 * a second signal to nudge bgwriter to go faster so we can avoid delay.
&lt;br&gt;! 	 * Then wait for lock, so we know the restartpoint has completed. We do
&lt;br&gt;! 	 * this because we don't want to interrupt the restartpoint half way
&lt;br&gt;! 	 * through, which might leave us in a mess and we want to be robust. We're
&lt;br&gt;! 	 * going to checkpoint soon anyway, so not it's not wasted effort.
&lt;br&gt;! 	 */
&lt;br&gt;! 	if (LWLockConditionalAcquire(CheckpointLock, LW_EXCLUSIVE))
&lt;br&gt;! 		LWLockRelease(CheckpointLock);
&lt;br&gt;! 	else
&lt;br&gt;! 	{
&lt;br&gt;! 		RequestRestartPointCompletion();
&lt;br&gt;! 		ereport(LOG,
&lt;br&gt;! 			(errmsg(&amp;quot;startup process waiting for restartpoint to complete&amp;quot;)));
&lt;br&gt;! 		LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
&lt;br&gt;! 		LWLockRelease(CheckpointLock);
&lt;br&gt;! 	}	
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * This is the only type of WAL message that can be inserted during
&lt;br&gt;! 	 * recovery. This ensures that we don't allow others to get access
&lt;br&gt;! 	 * until after we have changed state.
&lt;br&gt;! 	 */
&lt;br&gt;! 	(void) XLogInsert(RM_XLOG_ID, XLOG_RECOVERY_END, &amp;rdata);
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * We don't XLogFlush() here otherwise we'll end up zeroing the WAL
&lt;br&gt;! 	 * file ourselves. So just let bgwriter's forthcoming checkpoint do
&lt;br&gt;! 	 * that for us.
&lt;br&gt;! 	 */
&lt;br&gt;! 
&lt;br&gt;! 	InRecovery = false;
&lt;br&gt;! }
&lt;br&gt;! 
&lt;br&gt;! /*
&lt;br&gt;! &amp;nbsp;* XLOG resource manager's routines.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* Definitions of message info are in include/catalog/pg_control.h,
&lt;br&gt;! &amp;nbsp;* though not all messages relate to control file processing.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;&amp;nbsp; xlog_redo(XLogRecPtr lsn, XLogRecord *record)
&lt;br&gt;***************
&lt;br&gt;*** 6195,6215 ****
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXid = checkPoint.nextXid;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;! 		 * TLI may change in a shutdown checkpoint, but it shouldn't decrease
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		if (checkPoint.ThisTimeLineID != ThisTimeLineID)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			if (checkPoint.ThisTimeLineID &amp;lt; ThisTimeLineID ||
&lt;br&gt;&amp;nbsp; 				!list_member_int(expectedTLIs,
&lt;br&gt;! 								 (int) checkPoint.ThisTimeLineID))
&lt;br&gt;&amp;nbsp; 				ereport(PANIC,
&lt;br&gt;! 						(errmsg(&amp;quot;unexpected timeline ID %u (after %u) in checkpoint record&amp;quot;,
&lt;br&gt;! 								checkPoint.ThisTimeLineID, ThisTimeLineID)));
&lt;br&gt;&amp;nbsp; 			/* Following WAL records should be run with new TLI */
&lt;br&gt;! 			ThisTimeLineID = checkPoint.ThisTimeLineID;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;- 
&lt;br&gt;- 		RecoveryRestartPoint(&amp;checkPoint);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 	else if (info == XLOG_CHECKPOINT_ONLINE)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;--- 6545,6582 ----
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXid = checkPoint.nextXid;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;! 		 * TLI no longer changes at shutdown checkpoint, since as of 8.4,
&lt;br&gt;! 		 * shutdown checkpoints only occur at shutdown. Much less confusing.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 
&lt;br&gt;! 		RecoveryRestartPoint(&amp;checkPoint);
&lt;br&gt;! 	}
&lt;br&gt;! 	else if (info == XLOG_RECOVERY_END)
&lt;br&gt;! 	{
&lt;br&gt;! 		TimeLineID	tli;
&lt;br&gt;! 
&lt;br&gt;! 		memcpy(&amp;tli, XLogRecGetData(record), sizeof(TimeLineID));
&lt;br&gt;! 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * TLI may change when recovery ends, but it shouldn't decrease.
&lt;br&gt;! 		 *
&lt;br&gt;! 		 * This is the only WAL record that can tell us to change timelineID
&lt;br&gt;! 		 * while we process WAL records. 
&lt;br&gt;! 		 *
&lt;br&gt;! 		 * We can *choose* to stop recovery at any point, generating a
&lt;br&gt;! 		 * new timelineID which is recorded using this record type.
&lt;br&gt;! 		 */
&lt;br&gt;! 		if (tli != ThisTimeLineID)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			if (tli &amp;lt; ThisTimeLineID ||
&lt;br&gt;&amp;nbsp; 				!list_member_int(expectedTLIs,
&lt;br&gt;! 								 (int) tli))
&lt;br&gt;&amp;nbsp; 				ereport(PANIC,
&lt;br&gt;! 						(errmsg(&amp;quot;unexpected timeline ID %u (after %u) at recovery end record&amp;quot;,
&lt;br&gt;! 								tli, ThisTimeLineID)));
&lt;br&gt;&amp;nbsp; 			/* Following WAL records should be run with new TLI */
&lt;br&gt;! 			ThisTimeLineID = tli;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 	else if (info == XLOG_CHECKPOINT_ONLINE)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;***************
&lt;br&gt;*** 6232,6238 ****
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXid = checkPoint.nextXid;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/* TLI should not change in an on-line checkpoint */
&lt;br&gt;&amp;nbsp; 		if (checkPoint.ThisTimeLineID != ThisTimeLineID)
&lt;br&gt;&amp;nbsp; 			ereport(PANIC,
&lt;br&gt;&amp;nbsp; 					(errmsg(&amp;quot;unexpected timeline ID %u (should be %u) in checkpoint record&amp;quot;,
&lt;br&gt;--- 6599,6605 ----
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
&lt;br&gt;&amp;nbsp; 		ControlFile-&amp;gt;checkPointCopy.nextXid = checkPoint.nextXid;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/* TLI must not change at a checkpoint */
&lt;br&gt;&amp;nbsp; 		if (checkPoint.ThisTimeLineID != ThisTimeLineID)
&lt;br&gt;&amp;nbsp; 			ereport(PANIC,
&lt;br&gt;&amp;nbsp; 					(errmsg(&amp;quot;unexpected timeline ID %u (should be %u) in checkpoint record&amp;quot;,
&lt;br&gt;***************
&lt;br&gt;*** 6290,6296 ****
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; #ifdef WAL_DEBUG
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; static void
&lt;br&gt;&amp;nbsp; xlog_outrec(StringInfo buf, XLogRecord *record)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;--- 6657,6662 ----
&lt;br&gt;***************
&lt;br&gt;*** 6310,6316 ****
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; #endif &amp;nbsp; /* WAL_DEBUG */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Return the (possible) sync flag used for opening a file, depending on the
&lt;br&gt;&amp;nbsp; &amp;nbsp;* value of the GUC wal_sync_method.
&lt;br&gt;--- 6676,6681 ----
&lt;br&gt;***************
&lt;br&gt;*** 6449,6454 ****
&lt;br&gt;--- 6814,6820 ----
&lt;br&gt;&amp;nbsp; 	uint32		_logSeg;
&lt;br&gt;&amp;nbsp; 	struct stat stat_buf;
&lt;br&gt;&amp;nbsp; 	FILE	 &amp;nbsp; *fp;
&lt;br&gt;+ 	bool		immediate_checkpoint = false;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	if (!superuser())
&lt;br&gt;&amp;nbsp; 		ereport(ERROR,
&lt;br&gt;***************
&lt;br&gt;*** 6502,6516 ****
&lt;br&gt;&amp;nbsp; 	/* Ensure we release forcePageWrites if fail below */
&lt;br&gt;&amp;nbsp; 	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
&lt;br&gt;&amp;nbsp; 		 * page problems, this guarantees that two successive backup runs will
&lt;br&gt;&amp;nbsp; 		 * have different checkpoint positions and hence different history
&lt;br&gt;&amp;nbsp; 		 * file names, even if nothing happened in between.
&lt;br&gt;- 		 *
&lt;br&gt;- 		 * We don't use CHECKPOINT_IMMEDIATE, hence this can take awhile.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Now we need to fetch the checkpoint record location, and also its
&lt;br&gt;--- 6868,6905 ----
&lt;br&gt;&amp;nbsp; 	/* Ensure we release forcePageWrites if fail below */
&lt;br&gt;&amp;nbsp; 	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;+ 		bool flags = CHECKPOINT_FORCE | CHECKPOINT_WAIT;
&lt;br&gt;+ 
&lt;br&gt;+ 		/* 
&lt;br&gt;+ 		 * We support both variants of the pg_start_backup() SQL function
&lt;br&gt;+ 		 * with a single C function. If we requested two parameter variant,
&lt;br&gt;+ 		 * then get the value for the second parameter.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		if (PG_NARGS() == 2)
&lt;br&gt;+ 		{
&lt;br&gt;+ 			immediate_checkpoint = PG_GETARG_BOOL(1);
&lt;br&gt;+ 
&lt;br&gt;+ 			/* By default, this can take some time */
&lt;br&gt;+ 			if (immediate_checkpoint)
&lt;br&gt;+ 			{
&lt;br&gt;+ 				flags |= CHECKPOINT_IMMEDIATE;
&lt;br&gt;+ 				ereport(NOTICE,
&lt;br&gt;+ 					(errmsg(&amp;quot;pg_start_backup() signalling for immediate checkpoint&amp;quot;)));
&lt;br&gt;+ 			}
&lt;br&gt;+ 			else
&lt;br&gt;+ 				ereport(NOTICE,
&lt;br&gt;+ 					(errmsg(&amp;quot;pg_start_backup() signalling for smooth checkpoint&amp;quot;
&lt;br&gt;+ 							&amp;quot;, may last up to %u s&amp;quot;,
&lt;br&gt;+ 							(int) (CheckPointTimeout * CheckPointCompletionTarget))));			
&lt;br&gt;+ 		}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
&lt;br&gt;&amp;nbsp; 		 * page problems, this guarantees that two successive backup runs will
&lt;br&gt;&amp;nbsp; 		 * have different checkpoint positions and hence different history
&lt;br&gt;&amp;nbsp; 		 * file names, even if nothing happened in between.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		RequestCheckpoint(flags);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Now we need to fetch the checkpoint record location, and also its
&lt;br&gt;***************
&lt;br&gt;*** 6639,6651 ****
&lt;br&gt;&amp;nbsp; 	LWLockRelease(WALInsertLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Force a switch to a new xlog segment file, so that the backup is valid
&lt;br&gt;&amp;nbsp; 	 * as soon as archiver moves out the current segment file. We'll report
&lt;br&gt;&amp;nbsp; 	 * the end address of the XLOG SWITCH record as the backup stopping point.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	stoppoint = RequestXLogSwitch();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	XLByteToSeg(stoppoint, _logId, _logSeg);
&lt;br&gt;&amp;nbsp; 	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Use the log timezone here, not the session timezone */
&lt;br&gt;--- 7028,7049 ----
&lt;br&gt;&amp;nbsp; 	LWLockRelease(WALInsertLock);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;! 	 * Request switch to a new xlog segment file, so that the backup is valid
&lt;br&gt;&amp;nbsp; 	 * as soon as archiver moves out the current segment file. We'll report
&lt;br&gt;&amp;nbsp; 	 * the end address of the XLOG SWITCH record as the backup stopping point.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	stoppoint = RequestXLogSwitch();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	XLByteToSeg(stoppoint, _logId, _logSeg);
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * If we didn't actually switch xlog files then there is nothing in
&lt;br&gt;+ 	 * this file for us to wait for, so set stopxlogfilename to be the
&lt;br&gt;+ 	 * previous file instead. We still report the same ending location.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	if ((stoppoint.xrecoff % XLogSegSize) == 0)
&lt;br&gt;+ 		PrevLogSeg(_logId, _logSeg);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Use the log timezone here, not the session timezone */
&lt;br&gt;***************
&lt;br&gt;*** 6741,6747 ****
&lt;br&gt;&amp;nbsp; 	BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
&lt;br&gt;&amp;nbsp; 						 &amp;nbsp;startpoint.xrecoff % XLogSegSize);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	seconds_before_warning = 60;
&lt;br&gt;&amp;nbsp; 	waits = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	while (XLogArchiveIsBusy(stopxlogfilename) ||
&lt;br&gt;--- 7139,7145 ----
&lt;br&gt;&amp;nbsp; 	BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
&lt;br&gt;&amp;nbsp; 						 &amp;nbsp;startpoint.xrecoff % XLogSegSize);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	seconds_before_warning = 10;
&lt;br&gt;&amp;nbsp; 	waits = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	while (XLogArchiveIsBusy(stopxlogfilename) ||
&lt;br&gt;Index: src/backend/postmaster/bgwriter.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/postmaster/bgwriter.c,v
&lt;br&gt;retrieving revision 1.51
&lt;br&gt;diff -c -r1.51 bgwriter.c
&lt;br&gt;*** src/backend/postmaster/bgwriter.c	11 Aug 2008 11:05:11 -0000	1.51
&lt;br&gt;--- src/backend/postmaster/bgwriter.c	30 Sep 2008 18:33:55 -0000
&lt;br&gt;***************
&lt;br&gt;*** 49,54 ****
&lt;br&gt;--- 49,55 ----
&lt;br&gt;&amp;nbsp; #include &amp;lt;unistd.h&amp;gt;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; #include &amp;quot;access/xlog_internal.h&amp;quot;
&lt;br&gt;+ #include &amp;quot;catalog/pg_control.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;libpq/pqsignal.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;miscadmin.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;pgstat.h&amp;quot;
&lt;br&gt;***************
&lt;br&gt;*** 130,135 ****
&lt;br&gt;--- 131,143 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	int			ckpt_flags;		/* checkpoint flags, as defined in xlog.h */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	/* 
&lt;br&gt;+ 	 * When the Startup process wants bgwriter to perform a restartpoint, it 
&lt;br&gt;+ 	 * sets these fields so that we can update the control file afterwards.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	XLogRecPtr	ReadPtr;		/* Requested log pointer */
&lt;br&gt;+ 	CheckPoint &amp;nbsp;restartPoint;	/* restartPoint data for ControlFile */
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	uint32		num_backend_writes;		/* counts non-bgwriter buffer writes */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	int			num_requests;	/* current # of requests */
&lt;br&gt;***************
&lt;br&gt;*** 166,172 ****
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* these values are valid when ckpt_active is true: */
&lt;br&gt;&amp;nbsp; static pg_time_t ckpt_start_time;
&lt;br&gt;! static XLogRecPtr ckpt_start_recptr;
&lt;br&gt;&amp;nbsp; static double ckpt_cached_elapsed;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; static pg_time_t last_checkpoint_time;
&lt;br&gt;--- 174,180 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* these values are valid when ckpt_active is true: */
&lt;br&gt;&amp;nbsp; static pg_time_t ckpt_start_time;
&lt;br&gt;! static XLogRecPtr ckpt_start_recptr;	/* not used if IsRecoveryProcessingMode */
&lt;br&gt;&amp;nbsp; static double ckpt_cached_elapsed;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; static pg_time_t last_checkpoint_time;
&lt;br&gt;***************
&lt;br&gt;*** 198,203 ****
&lt;br&gt;--- 206,212 ----
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	sigjmp_buf	local_sigjmp_buf;
&lt;br&gt;&amp;nbsp; 	MemoryContext bgwriter_context;
&lt;br&gt;+ 	bool		BgWriterRecoveryMode;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	BgWriterShmem-&amp;gt;bgwriter_pid = MyProcPid;
&lt;br&gt;&amp;nbsp; 	am_bg_writer = true;
&lt;br&gt;***************
&lt;br&gt;*** 356,371 ****
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	PG_SETMASK(&amp;UnBlockSig);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Loop forever
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	for (;;)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;- 		bool		do_checkpoint = false;
&lt;br&gt;- 		int			flags = 0;
&lt;br&gt;- 		pg_time_t	now;
&lt;br&gt;- 		int			elapsed_secs;
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Emergency bailout if postmaster has died. &amp;nbsp;This is to avoid the
&lt;br&gt;&amp;nbsp; 		 * necessity for manual cleanup of all postmaster children.
&lt;br&gt;--- 365,381 ----
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	PG_SETMASK(&amp;UnBlockSig);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	BgWriterRecoveryMode = IsRecoveryProcessingMode();
&lt;br&gt;+ 
&lt;br&gt;+ 	if (BgWriterRecoveryMode)
&lt;br&gt;+ 		elog(DEBUG1, &amp;quot;bgwriter starting during recovery, pid = %u&amp;quot;, 
&lt;br&gt;+ 			BgWriterShmem-&amp;gt;bgwriter_pid);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Loop forever
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	for (;;)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;&amp;nbsp; 		 * Emergency bailout if postmaster has died. &amp;nbsp;This is to avoid the
&lt;br&gt;&amp;nbsp; 		 * necessity for manual cleanup of all postmaster children.
&lt;br&gt;***************
&lt;br&gt;*** 383,501 ****
&lt;br&gt;&amp;nbsp; 			got_SIGHUP = false;
&lt;br&gt;&amp;nbsp; 			ProcessConfigFile(PGC_SIGHUP);
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;- 		if (checkpoint_requested)
&lt;br&gt;- 		{
&lt;br&gt;- 			checkpoint_requested = false;
&lt;br&gt;- 			do_checkpoint = true;
&lt;br&gt;- 			BgWriterStats.m_requested_checkpoints++;
&lt;br&gt;- 		}
&lt;br&gt;- 		if (shutdown_requested)
&lt;br&gt;- 		{
&lt;br&gt;- 			/*
&lt;br&gt;- 			 * From here on, elog(ERROR) should end with exit(1), not send
&lt;br&gt;- 			 * control back to the sigsetjmp block above
&lt;br&gt;- 			 */
&lt;br&gt;- 			ExitOnAnyError = true;
&lt;br&gt;- 			/* Close down the database */
&lt;br&gt;- 			ShutdownXLOG(0, 0);
&lt;br&gt;- 			DumpFreeSpaceMap(0, 0);
&lt;br&gt;- 			/* Normal exit from the bgwriter is here */
&lt;br&gt;- 			proc_exit(0);		/* done */
&lt;br&gt;- 		}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Force a checkpoint if too much time has elapsed since the last one.
&lt;br&gt;! 		 * Note that we count a timed checkpoint in stats only when this
&lt;br&gt;! 		 * occurs without an external request, but we set the CAUSE_TIME flag
&lt;br&gt;! 		 * bit even if there is also an external request.
&lt;br&gt;! 		 */
&lt;br&gt;! 		now = (pg_time_t) time(NULL);
&lt;br&gt;! 		elapsed_secs = now - last_checkpoint_time;
&lt;br&gt;! 		if (elapsed_secs &amp;gt;= CheckPointTimeout)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			if (!do_checkpoint)
&lt;br&gt;! 				BgWriterStats.m_timed_checkpoints++;
&lt;br&gt;! 			do_checkpoint = true;
&lt;br&gt;! 			flags |= CHECKPOINT_CAUSE_TIME;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;! 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Do a checkpoint if requested, otherwise do one cycle of
&lt;br&gt;! 		 * dirty-buffer writing.
&lt;br&gt;! 		 */
&lt;br&gt;! 		if (do_checkpoint)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;! 			volatile BgWriterShmemStruct *bgs = BgWriterShmem;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * Atomically fetch the request flags to figure out what kind of a
&lt;br&gt;! 			 * checkpoint we should perform, and increase the started-counter
&lt;br&gt;! 			 * to acknowledge that we've started a new checkpoint.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			SpinLockAcquire(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 			flags |= bgs-&amp;gt;ckpt_flags;
&lt;br&gt;! 			bgs-&amp;gt;ckpt_flags = 0;
&lt;br&gt;! 			bgs-&amp;gt;ckpt_started++;
&lt;br&gt;! 			SpinLockRelease(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * We will warn if (a) too soon since last checkpoint (whatever
&lt;br&gt;! 			 * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
&lt;br&gt;! 			 * since the last checkpoint start. &amp;nbsp;Note in particular that this
&lt;br&gt;! 			 * implementation will not generate warnings caused by
&lt;br&gt;! 			 * CheckPointTimeout &amp;lt; CheckPointWarning.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			if ((flags &amp; CHECKPOINT_CAUSE_XLOG) &amp;&amp;
&lt;br&gt;! 				elapsed_secs &amp;lt; CheckPointWarning)
&lt;br&gt;! 				ereport(LOG,
&lt;br&gt;! 						(errmsg(&amp;quot;checkpoints are occurring too frequently (%d seconds apart)&amp;quot;,
&lt;br&gt;! 								elapsed_secs),
&lt;br&gt;! 						 errhint(&amp;quot;Consider increasing the configuration parameter \&amp;quot;checkpoint_segments\&amp;quot;.&amp;quot;)));
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 			/*
&lt;br&gt;! 			 * Initialize bgwriter-private variables used during checkpoint.
&lt;br&gt;! 			 */
&lt;br&gt;! 			ckpt_active = true;
&lt;br&gt;! 			ckpt_start_recptr = GetInsertRecPtr();
&lt;br&gt;! 			ckpt_start_time = now;
&lt;br&gt;! 			ckpt_cached_elapsed = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 			/*
&lt;br&gt;! 			 * Do the checkpoint.
&lt;br&gt;! 			 */
&lt;br&gt;! 			CreateCheckPoint(flags);
&lt;br&gt;! 
&lt;br&gt;! 			/*
&lt;br&gt;! 			 * After any checkpoint, close all smgr files.	This is so we
&lt;br&gt;! 			 * won't hang onto smgr references to deleted files indefinitely.
&lt;br&gt;! 			 */
&lt;br&gt;! 			smgrcloseall();
&lt;br&gt;! 
&lt;br&gt;! 			/*
&lt;br&gt;! 			 * Indicate checkpoint completion to any waiting backends.
&lt;br&gt;! 			 */
&lt;br&gt;! 			SpinLockAcquire(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 			bgs-&amp;gt;ckpt_done = bgs-&amp;gt;ckpt_started;
&lt;br&gt;! 			SpinLockRelease(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 
&lt;br&gt;! 			ckpt_active = false;
&lt;br&gt;! 
&lt;br&gt;! 			/*
&lt;br&gt;! 			 * Note we record the checkpoint start time not end time as
&lt;br&gt;! 			 * last_checkpoint_time. &amp;nbsp;This is so that time-driven checkpoints
&lt;br&gt;! 			 * happen at a predictable spacing.
&lt;br&gt;! 			 */
&lt;br&gt;! 			last_checkpoint_time = now;
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;- 		else
&lt;br&gt;- 			BgBufferSync();
&lt;br&gt;- 
&lt;br&gt;- 		/* Check for archive_timeout and switch xlog files if necessary. */
&lt;br&gt;- 		CheckArchiveTimeout();
&lt;br&gt;- 
&lt;br&gt;- 		/* Nap for the configured time. */
&lt;br&gt;- 		BgWriterNap();
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 393,599 ----
&lt;br&gt;&amp;nbsp; 			got_SIGHUP = false;
&lt;br&gt;&amp;nbsp; 			ProcessConfigFile(PGC_SIGHUP);
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 		if (BgWriterRecoveryMode)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			if (shutdown_requested)
&lt;br&gt;! 			{
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * From here on, elog(ERROR) should end with exit(1), not send
&lt;br&gt;! 				 * control back to the sigsetjmp block above
&lt;br&gt;! 				 */
&lt;br&gt;! 				ExitOnAnyError = true;
&lt;br&gt;! 				/* Normal exit from the bgwriter is here */
&lt;br&gt;! 				proc_exit(0);		/* done */
&lt;br&gt;! 			}
&lt;br&gt;! 
&lt;br&gt;! 			if (!IsRecoveryProcessingMode())
&lt;br&gt;! 			{
&lt;br&gt;! 				elog(DEBUG2, &amp;quot;bgwriter changing from recovery to normal mode&amp;quot;);
&lt;br&gt;! 
&lt;br&gt;! 				InitXLOGAccess();
&lt;br&gt;! 				BgWriterRecoveryMode = false;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Start time-driven events from now
&lt;br&gt;! 				 */
&lt;br&gt;! 				last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL);
&lt;br&gt;! 
&lt;br&gt;! 				/* 
&lt;br&gt;! 				 * Notice that we do *not* act on a checkpoint_requested
&lt;br&gt;! 				 * state at this point. We have changed mode, so we wish to
&lt;br&gt;! 				 * perform a checkpoint not a restartpoint.
&lt;br&gt;! 				 */
&lt;br&gt;! 				continue;
&lt;br&gt;! 			}
&lt;br&gt;! 
&lt;br&gt;! 			if (checkpoint_requested) 
&lt;br&gt;! 			{
&lt;br&gt;! 				XLogRecPtr		ReadPtr;
&lt;br&gt;! 				CheckPoint		restartPoint;
&lt;br&gt;! 
&lt;br&gt;! 				checkpoint_requested = false;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Initialize bgwriter-private variables used during checkpoint.
&lt;br&gt;! 				 */
&lt;br&gt;! 				ckpt_active = true;
&lt;br&gt;! 				ckpt_start_time = (pg_time_t) time(NULL);
&lt;br&gt;! 				ckpt_cached_elapsed = 0;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Get the requested values from shared memory that the 
&lt;br&gt;! 				 * Startup process has put there for us.
&lt;br&gt;! 				 */
&lt;br&gt;! 				SpinLockAcquire(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;! 				ReadPtr = BgWriterShmem-&amp;gt;ReadPtr;
&lt;br&gt;! 				memcpy(&amp;restartPoint, &amp;BgWriterShmem-&amp;gt;restartPoint, sizeof(CheckPoint));
&lt;br&gt;! 				SpinLockRelease(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;! 
&lt;br&gt;! 				/* Use smoothed writes, until interrupted if ever */
&lt;br&gt;! 				CreateRestartPoint(ReadPtr, &amp;restartPoint, 0);
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * After any checkpoint, close all smgr files.	This is so we
&lt;br&gt;! 				 * won't hang onto smgr references to deleted files indefinitely.
&lt;br&gt;! 				 */
&lt;br&gt;! 				smgrcloseall();
&lt;br&gt;! 
&lt;br&gt;! 				ckpt_active = false;
&lt;br&gt;! 				checkpoint_requested = false;
&lt;br&gt;! 			}
&lt;br&gt;! 			else
&lt;br&gt;! 			{
&lt;br&gt;! 				/* Clean buffers dirtied by recovery */
&lt;br&gt;! 				BgBufferSync();
&lt;br&gt;! 
&lt;br&gt;! 				/* Nap for the configured time. */
&lt;br&gt;! 				BgWriterNap();
&lt;br&gt;! 			}
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;! 		else	/* Normal processing */
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			bool		do_checkpoint = false;
&lt;br&gt;! 			int			flags = 0;
&lt;br&gt;! 			pg_time_t	now;
&lt;br&gt;! 			int			elapsed_secs;
&lt;br&gt;! 
&lt;br&gt;! 			Assert(!IsRecoveryProcessingMode());
&lt;br&gt;! 
&lt;br&gt;! 			if (checkpoint_requested) 
&lt;br&gt;! 			{
&lt;br&gt;! 				checkpoint_requested = false;
&lt;br&gt;! 				do_checkpoint = true;
&lt;br&gt;! 				BgWriterStats.m_requested_checkpoints++;
&lt;br&gt;! 			}
&lt;br&gt;! 			if (shutdown_requested)
&lt;br&gt;! 			{
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * From here on, elog(ERROR) should end with exit(1), not send
&lt;br&gt;! 				 * control back to the sigsetjmp block above
&lt;br&gt;! 				 */
&lt;br&gt;! 				ExitOnAnyError = true;
&lt;br&gt;! 				/* Close down the database */
&lt;br&gt;! 				ShutdownXLOG(0, 0);
&lt;br&gt;! 				DumpFreeSpaceMap(0, 0);
&lt;br&gt;! 				/* Normal exit from the bgwriter is here */
&lt;br&gt;! 				proc_exit(0);		/* done */
&lt;br&gt;! 			}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * Force a checkpoint if too much time has elapsed since the last one.
&lt;br&gt;! 			 * Note that we count a timed checkpoint in stats only when this
&lt;br&gt;! 			 * occurs without an external request, but we set the CAUSE_TIME flag
&lt;br&gt;! 			 * bit even if there is also an external request.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			now = (pg_time_t) time(NULL);
&lt;br&gt;! 			elapsed_secs = now - last_checkpoint_time;
&lt;br&gt;! 			if (elapsed_secs &amp;gt;= CheckPointTimeout)
&lt;br&gt;! 			{
&lt;br&gt;! 				if (!do_checkpoint)
&lt;br&gt;! 					BgWriterStats.m_timed_checkpoints++;
&lt;br&gt;! 				do_checkpoint = true;
&lt;br&gt;! 				flags |= CHECKPOINT_CAUSE_TIME;
&lt;br&gt;! 			}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * Do a checkpoint if requested, otherwise do one cycle of
&lt;br&gt;! 			 * dirty-buffer writing.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			if (do_checkpoint)
&lt;br&gt;! 			{
&lt;br&gt;! 				/* use volatile pointer to prevent code rearrangement */
&lt;br&gt;! 				volatile BgWriterShmemStruct *bgs = BgWriterShmem;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Atomically fetch the request flags to figure out what kind of a
&lt;br&gt;! 				 * checkpoint we should perform, and increase the started-counter
&lt;br&gt;! 				 * to acknowledge that we've started a new checkpoint.
&lt;br&gt;! 				 */
&lt;br&gt;! 				SpinLockAcquire(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 				flags |= bgs-&amp;gt;ckpt_flags;
&lt;br&gt;! 				bgs-&amp;gt;ckpt_flags = 0;
&lt;br&gt;! 				bgs-&amp;gt;ckpt_started++;
&lt;br&gt;! 				SpinLockRelease(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * We will warn if (a) too soon since last checkpoint (whatever
&lt;br&gt;! 				 * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
&lt;br&gt;! 				 * since the last checkpoint start. &amp;nbsp;Note in particular that this
&lt;br&gt;! 				 * implementation will not generate warnings caused by
&lt;br&gt;! 				 * CheckPointTimeout &amp;lt; CheckPointWarning.
&lt;br&gt;! 				 */
&lt;br&gt;! 				if ((flags &amp; CHECKPOINT_CAUSE_XLOG) &amp;&amp;
&lt;br&gt;! 					elapsed_secs &amp;lt; CheckPointWarning)
&lt;br&gt;! 					ereport(LOG,
&lt;br&gt;! 							(errmsg(&amp;quot;checkpoints are occurring too frequently (%d seconds apart)&amp;quot;,
&lt;br&gt;! 									elapsed_secs),
&lt;br&gt;! 							 errhint(&amp;quot;Consider increasing the configuration parameter \&amp;quot;checkpoint_segments\&amp;quot;.&amp;quot;)));
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Initialize bgwriter-private variables used during checkpoint.
&lt;br&gt;! 				 */
&lt;br&gt;! 				ckpt_active = true;
&lt;br&gt;! 				ckpt_start_recptr = GetInsertRecPtr();
&lt;br&gt;! 				ckpt_start_time = now;
&lt;br&gt;! 				ckpt_cached_elapsed = 0;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Do the checkpoint.
&lt;br&gt;! 				 */
&lt;br&gt;! 				CreateCheckPoint(flags);
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * After any checkpoint, close all smgr files.	This is so we
&lt;br&gt;! 				 * won't hang onto smgr references to deleted files indefinitely.
&lt;br&gt;! 				 */
&lt;br&gt;! 				smgrcloseall();
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Indicate checkpoint completion to any waiting backends.
&lt;br&gt;! 				 */
&lt;br&gt;! 				SpinLockAcquire(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 				bgs-&amp;gt;ckpt_done = bgs-&amp;gt;ckpt_started;
&lt;br&gt;! 				SpinLockRelease(&amp;bgs-&amp;gt;ckpt_lck);
&lt;br&gt;! 
&lt;br&gt;! 				ckpt_active = false;
&lt;br&gt;! 
&lt;br&gt;! 				/*
&lt;br&gt;! 				 * Note we record the checkpoint start time not end time as
&lt;br&gt;! 				 * last_checkpoint_time. &amp;nbsp;This is so that time-driven checkpoints
&lt;br&gt;! 				 * happen at a predictable spacing.
&lt;br&gt;! 				 */
&lt;br&gt;! 				last_checkpoint_time = now;
&lt;br&gt;! 			}
&lt;br&gt;! 			else
&lt;br&gt;! 				BgBufferSync();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 			/* Check for archive_timeout and switch xlog files if necessary. */
&lt;br&gt;! 			CheckArchiveTimeout();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 			/* Nap for the configured time. */
&lt;br&gt;! 			BgWriterNap();
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 588,594 ****
&lt;br&gt;&amp;nbsp; 		(ckpt_active ? ImmediateCheckpointRequested() : checkpoint_requested))
&lt;br&gt;&amp;nbsp; 			break;
&lt;br&gt;&amp;nbsp; 		pg_usleep(1000000L);
&lt;br&gt;! 		AbsorbFsyncRequests();
&lt;br&gt;&amp;nbsp; 		udelay -= 1000000L;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 686,693 ----
&lt;br&gt;&amp;nbsp; 		(ckpt_active ? ImmediateCheckpointRequested() : checkpoint_requested))
&lt;br&gt;&amp;nbsp; 			break;
&lt;br&gt;&amp;nbsp; 		pg_usleep(1000000L);
&lt;br&gt;! 		if (!IsRecoveryProcessingMode())
&lt;br&gt;! 			AbsorbFsyncRequests();
&lt;br&gt;&amp;nbsp; 		udelay -= 1000000L;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 642,647 ****
&lt;br&gt;--- 741,759 ----
&lt;br&gt;&amp;nbsp; 	if (!am_bg_writer)
&lt;br&gt;&amp;nbsp; 		return;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	/* Perform minimal duties during recovery and skip wait if requested */
&lt;br&gt;+ 	if (IsRecoveryProcessingMode())
&lt;br&gt;+ 	{
&lt;br&gt;+ 		BgBufferSync();
&lt;br&gt;+ 
&lt;br&gt;+ 		if (!shutdown_requested &amp;&amp;
&lt;br&gt;+ 			!checkpoint_requested &amp;&amp;
&lt;br&gt;+ 			IsCheckpointOnSchedule(progress))
&lt;br&gt;+ 			BgWriterNap();
&lt;br&gt;+ 
&lt;br&gt;+ 		return;
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Perform the usual bgwriter duties and take a nap, unless we're behind
&lt;br&gt;&amp;nbsp; 	 * schedule, in which case we just try to catch up as quickly as possible.
&lt;br&gt;***************
&lt;br&gt;*** 716,731 ****
&lt;br&gt;&amp;nbsp; 	 * However, it's good enough for our purposes, we're only calculating an
&lt;br&gt;&amp;nbsp; 	 * estimate anyway.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	recptr = GetInsertRecPtr();
&lt;br&gt;! 	elapsed_xlogs =
&lt;br&gt;! 		(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
&lt;br&gt;! 		 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
&lt;br&gt;! 		CheckPointSegments;
&lt;br&gt;! 
&lt;br&gt;! 	if (progress &amp;lt; elapsed_xlogs)
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;! 		ckpt_cached_elapsed = elapsed_xlogs;
&lt;br&gt;! 		return false;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;--- 828,846 ----
&lt;br&gt;&amp;nbsp; 	 * However, it's good enough for our purposes, we're only calculating an
&lt;br&gt;&amp;nbsp; 	 * estimate anyway.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	if (!IsRecoveryProcessingMode())
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;! 		recptr = GetInsertRecPtr();
&lt;br&gt;! 		elapsed_xlogs =
&lt;br&gt;! 			(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
&lt;br&gt;! 			 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
&lt;br&gt;! 			CheckPointSegments;
&lt;br&gt;! 
&lt;br&gt;! 		if (progress &amp;lt; elapsed_xlogs)
&lt;br&gt;! 		{
&lt;br&gt;! 			ckpt_cached_elapsed = elapsed_xlogs;
&lt;br&gt;! 			return false;
&lt;br&gt;! 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;***************
&lt;br&gt;*** 967,972 ****
&lt;br&gt;--- 1082,1158 ----
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;+ &amp;nbsp;* Always runs in Startup process (see xlog.c)
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ void
&lt;br&gt;+ RequestRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, bool sendToBGWriter)
&lt;br&gt;+ {
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * Should we just do it ourselves?
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	if (!IsPostmasterEnvironment || !sendToBGWriter)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		CreateRestartPoint(ReadPtr, restartPoint, CHECKPOINT_IMMEDIATE);
&lt;br&gt;+ 		return;
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * Push requested values into shared memory, then signal to request restartpoint.
&lt;br&gt;+ 	 */
&lt;br&gt;+ 	if (BgWriterShmem-&amp;gt;bgwriter_pid == 0)
&lt;br&gt;+ 		elog(LOG, &amp;quot;could not request restartpoint because bgwriter not running&amp;quot;);
&lt;br&gt;+ 
&lt;br&gt;+ #ifdef NOT_USED
&lt;br&gt;+ 	elog(LOG, &amp;quot;tli = %u nextXidEpoch = %u nextXid = %u nextOid = %u&amp;quot;,
&lt;br&gt;+ 		restartPoint-&amp;gt;ThisTimeLineID,
&lt;br&gt;+ 		restartPoint-&amp;gt;nextXidEpoch,
&lt;br&gt;+ 		restartPoint-&amp;gt;nextXid,
&lt;br&gt;+ 		restartPoint-&amp;gt;nextOid);
&lt;br&gt;+ #endif
&lt;br&gt;+ 
&lt;br&gt;+ 	SpinLockAcquire(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ 	BgWriterShmem-&amp;gt;ReadPtr = ReadPtr;
&lt;br&gt;+ 	memcpy(&amp;BgWriterShmem-&amp;gt;restartPoint, restartPoint, sizeof(CheckPoint));
&lt;br&gt;+ 	SpinLockRelease(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ 
&lt;br&gt;+ 	if (kill(BgWriterShmem-&amp;gt;bgwriter_pid, SIGINT) != 0)
&lt;br&gt;+ 		elog(LOG, &amp;quot;could not signal for restartpoint: %m&amp;quot;);	
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ /* 
&lt;br&gt;+ &amp;nbsp;* Sends another checkpoint request signal to bgwriter, which causes it
&lt;br&gt;+ &amp;nbsp;* to avoid smoothed writes and continue processing as if it had been
&lt;br&gt;+ &amp;nbsp;* called with CHECKPOINT_IMMEDIATE. This is used at the end of recovery.
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ void
&lt;br&gt;+ RequestRestartPointCompletion(void)
&lt;br&gt;+ {
&lt;br&gt;+ 	if (BgWriterShmem-&amp;gt;bgwriter_pid != 0 &amp;&amp;
&lt;br&gt;+ 		kill(BgWriterShmem-&amp;gt;bgwriter_pid, SIGINT) != 0)
&lt;br&gt;+ 		elog(LOG, &amp;quot;could not signal for restartpoint immediate: %m&amp;quot;);
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ XLogRecPtr
&lt;br&gt;+ GetRedoLocationForArchiveCheckpoint(void)
&lt;br&gt;+ {
&lt;br&gt;+ 	XLogRecPtr	redo;
&lt;br&gt;+ 
&lt;br&gt;+ 	SpinLockAcquire(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ 	redo = BgWriterShmem-&amp;gt;ReadPtr;
&lt;br&gt;+ 	SpinLockRelease(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ 
&lt;br&gt;+ 	return redo;
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ void
&lt;br&gt;+ SetRedoLocationForArchiveCheckpoint(XLogRecPtr redo)
&lt;br&gt;+ {
&lt;br&gt;+ 	SpinLockAcquire(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ 	BgWriterShmem-&amp;gt;ReadPtr = redo;
&lt;br&gt;+ 	SpinLockRelease(&amp;BgWriterShmem-&amp;gt;ckpt_lck);
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ForwardFsyncRequest
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		Forward a file-fsync request from a backend to the bgwriter
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;Index: src/backend/postmaster/postmaster.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/postmaster/postmaster.c,v
&lt;br&gt;retrieving revision 1.565
&lt;br&gt;diff -c -r1.565 postmaster.c
&lt;br&gt;*** src/backend/postmaster/postmaster.c	23 Sep 2008 20:35:38 -0000	1.565
&lt;br&gt;--- src/backend/postmaster/postmaster.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 254,259 ****
&lt;br&gt;--- 254,264 ----
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	PM_INIT,					/* postmaster starting */
&lt;br&gt;&amp;nbsp; 	PM_STARTUP,					/* waiting for startup subprocess */
&lt;br&gt;+ 	PM_RECOVERY,				/* consistent recovery mode; state only
&lt;br&gt;+ 								 * entered for archive and streaming recovery,
&lt;br&gt;+ 								 * and only after the point where the 
&lt;br&gt;+ 								 * all data is in consistent state.
&lt;br&gt;+ 								 */
&lt;br&gt;&amp;nbsp; 	PM_RUN,						/* normal &amp;quot;database is alive&amp;quot; state */
&lt;br&gt;&amp;nbsp; 	PM_WAIT_BACKUP,				/* waiting for online backup mode to end */
&lt;br&gt;&amp;nbsp; 	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
&lt;br&gt;***************
&lt;br&gt;*** 1302,1308 ****
&lt;br&gt;&amp;nbsp; 		 * state that prevents it, start one. &amp;nbsp;It doesn't matter if this
&lt;br&gt;&amp;nbsp; 		 * fails, we'll just try again later.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		if (BgWriterPID == 0 &amp;&amp; pmState == PM_RUN)
&lt;br&gt;&amp;nbsp; 			BgWriterPID = StartBackgroundWriter();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;--- 1307,1313 ----
&lt;br&gt;&amp;nbsp; 		 * state that prevents it, start one. &amp;nbsp;It doesn't matter if this
&lt;br&gt;&amp;nbsp; 		 * fails, we'll just try again later.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;! 		if (BgWriterPID == 0 &amp;&amp; (pmState == PM_RUN || pmState == PM_RECOVERY))
&lt;br&gt;&amp;nbsp; 			BgWriterPID = StartBackgroundWriter();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;***************
&lt;br&gt;*** 2116,2122 ****
&lt;br&gt;&amp;nbsp; 		if (pid == StartupPID)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;&amp;nbsp; 			StartupPID = 0;
&lt;br&gt;! 			Assert(pmState == PM_STARTUP);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/* FATAL exit of startup is treated as catastrophic */
&lt;br&gt;&amp;nbsp; 			if (!EXIT_STATUS_0(exitstatus))
&lt;br&gt;--- 2121,2127 ----
&lt;br&gt;&amp;nbsp; 		if (pid == StartupPID)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;&amp;nbsp; 			StartupPID = 0;
&lt;br&gt;! 			Assert(pmState == PM_STARTUP || pmState == PM_RECOVERY);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/* FATAL exit of startup is treated as catastrophic */
&lt;br&gt;&amp;nbsp; 			if (!EXIT_STATUS_0(exitstatus))
&lt;br&gt;***************
&lt;br&gt;*** 2157,2167 ****
&lt;br&gt;&amp;nbsp; 			load_role();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * Crank up the background writer.	It doesn't matter if this
&lt;br&gt;! 			 * fails, we'll just try again later.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			Assert(BgWriterPID == 0);
&lt;br&gt;! 			BgWriterPID = StartBackgroundWriter();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;&amp;nbsp; 			 * Likewise, start other special children as needed. &amp;nbsp;In a restart
&lt;br&gt;--- 2162,2172 ----
&lt;br&gt;&amp;nbsp; 			load_role();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;! 			 * Check whether we need to start background writer, if not
&lt;br&gt;! 			 * already running.
&lt;br&gt;&amp;nbsp; 			 */
&lt;br&gt;! 			if (BgWriterPID == 0)
&lt;br&gt;! 				BgWriterPID = StartBackgroundWriter();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			/*
&lt;br&gt;&amp;nbsp; 			 * Likewise, start other special children as needed. &amp;nbsp;In a restart
&lt;br&gt;***************
&lt;br&gt;*** 3845,3850 ****
&lt;br&gt;--- 3850,3900 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	PG_SETMASK(&amp;BlockSig);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ 	if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
&lt;br&gt;+ 	{
&lt;br&gt;+ 		Assert(pmState == PM_STARTUP);
&lt;br&gt;+ 
&lt;br&gt;+ 		/*
&lt;br&gt;+ 		 * Go to shutdown mode if a shutdown request was pending.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 		if (Shutdown &amp;gt; NoShutdown)
&lt;br&gt;+ 		{
&lt;br&gt;+ 			pmState = PM_WAIT_BACKENDS;
&lt;br&gt;+ 			/* PostmasterStateMachine logic does the rest */
&lt;br&gt;+ 		}
&lt;br&gt;+ 		else
&lt;br&gt;+ 		{
&lt;br&gt;+ 			/*
&lt;br&gt;+ 			 * Startup process has entered recovery
&lt;br&gt;+ 			 */
&lt;br&gt;+ 			pmState = PM_RECOVERY;
&lt;br&gt;+ 
&lt;br&gt;+ 			/*
&lt;br&gt;+ 			 * Load the flat authorization file into postmaster's cache. The
&lt;br&gt;+ 			 * startup process won't have recomputed this from the database yet,
&lt;br&gt;+ 			 * so we it may change following recovery. 
&lt;br&gt;+ 			 */
&lt;br&gt;+ 			load_role();
&lt;br&gt;+ 
&lt;br&gt;+ 			/*
&lt;br&gt;+ 			 * Crank up the background writer.	It doesn't matter if this
&lt;br&gt;+ 			 * fails, we'll just try again later.
&lt;br&gt;+ 			 */
&lt;br&gt;+ 			Assert(BgWriterPID == 0);
&lt;br&gt;+ 			BgWriterPID = StartBackgroundWriter();
&lt;br&gt;+ 
&lt;br&gt;+ 			/*
&lt;br&gt;+ 			 * Likewise, start other special children as needed.
&lt;br&gt;+ 			 */
&lt;br&gt;+ 			Assert(PgStatPID == 0);
&lt;br&gt;+ 			PgStatPID = pgstat_start();
&lt;br&gt;+ 
&lt;br&gt;+ 			/* XXX at this point we could accept read-only connections */
&lt;br&gt;+ 			ereport(DEBUG1,
&lt;br&gt;+ 				 (errmsg(&amp;quot;database system is in consistent recovery mode&amp;quot;)));
&lt;br&gt;+ 		}
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		/*
&lt;br&gt;Index: src/backend/storage/buffer/README
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/buffer/README,v
&lt;br&gt;retrieving revision 1.14
&lt;br&gt;diff -c -r1.14 README
&lt;br&gt;*** src/backend/storage/buffer/README	21 Mar 2008 13:23:28 -0000	1.14
&lt;br&gt;--- src/backend/storage/buffer/README	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 264,266 ****
&lt;br&gt;--- 264,275 ----
&lt;br&gt;&amp;nbsp; This ensures that the page image transferred to disk is reasonably consistent.
&lt;br&gt;&amp;nbsp; We might miss a hint-bit update or two but that isn't a problem, for the same
&lt;br&gt;&amp;nbsp; reasons mentioned under buffer access rules.
&lt;br&gt;+ 
&lt;br&gt;+ As of 8.4, background writer starts during recovery mode when there is
&lt;br&gt;+ some form of potentially extended recovery to perform. It performs an
&lt;br&gt;+ identical service to normal processing, except that checkpoints it
&lt;br&gt;+ writes are technically restartpoints. Flushing outstanding WAL for dirty
&lt;br&gt;+ buffers is also skipped, though there shouldn't ever be new WAL entries
&lt;br&gt;+ at that time in any case. We could choose to start background writer
&lt;br&gt;+ immediately but we hold off until we can prove the database is in a 
&lt;br&gt;+ consistent state so that postmaster has a single, clean state change.
&lt;br&gt;Index: src/bin/pg_controldata/pg_controldata.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_controldata/pg_controldata.c,v
&lt;br&gt;retrieving revision 1.41
&lt;br&gt;diff -c -r1.41 pg_controldata.c
&lt;br&gt;*** src/bin/pg_controldata/pg_controldata.c	24 Sep 2008 08:59:42 -0000	1.41
&lt;br&gt;--- src/bin/pg_controldata/pg_controldata.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 197,202 ****
&lt;br&gt;--- 197,205 ----
&lt;br&gt;&amp;nbsp; 	printf(_(&amp;quot;Minimum recovery ending location: &amp;nbsp; &amp;nbsp; %X/%X\n&amp;quot;),
&lt;br&gt;&amp;nbsp; 		 &amp;nbsp; ControlFile.minRecoveryPoint.xlogid,
&lt;br&gt;&amp;nbsp; 		 &amp;nbsp; ControlFile.minRecoveryPoint.xrecoff);
&lt;br&gt;+ 	printf(_(&amp;quot;Minimum safe starting location: &amp;nbsp; &amp;nbsp; &amp;nbsp; %X/%X\n&amp;quot;),
&lt;br&gt;+ 		 &amp;nbsp; ControlFile.minSafeStartPoint.xlogid,
&lt;br&gt;+ 		 &amp;nbsp; ControlFile.minSafeStartPoint.xrecoff);
&lt;br&gt;&amp;nbsp; 	printf(_(&amp;quot;Maximum data alignment: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; %u\n&amp;quot;),
&lt;br&gt;&amp;nbsp; 		 &amp;nbsp; ControlFile.maxAlign);
&lt;br&gt;&amp;nbsp; 	/* we don't print floatFormat since can't say much useful about it */
&lt;br&gt;Index: src/bin/pg_resetxlog/pg_resetxlog.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_resetxlog/pg_resetxlog.c,v
&lt;br&gt;retrieving revision 1.68
&lt;br&gt;diff -c -r1.68 pg_resetxlog.c
&lt;br&gt;*** src/bin/pg_resetxlog/pg_resetxlog.c	24 Sep 2008 09:00:44 -0000	1.68
&lt;br&gt;--- src/bin/pg_resetxlog/pg_resetxlog.c	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 595,600 ****
&lt;br&gt;--- 595,602 ----
&lt;br&gt;&amp;nbsp; 	ControlFile.prevCheckPoint.xrecoff = 0;
&lt;br&gt;&amp;nbsp; 	ControlFile.minRecoveryPoint.xlogid = 0;
&lt;br&gt;&amp;nbsp; 	ControlFile.minRecoveryPoint.xrecoff = 0;
&lt;br&gt;+ 	ControlFile.minSafeStartPoint.xlogid = 0;
&lt;br&gt;+ 	ControlFile.minSafeStartPoint.xrecoff = 0;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Now we can force the recorded xlog seg size to the right thing. */
&lt;br&gt;&amp;nbsp; 	ControlFile.xlog_seg_size = XLogSegSize;
&lt;br&gt;Index: src/include/access/xlog.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/xlog.h,v
&lt;br&gt;retrieving revision 1.88
&lt;br&gt;diff -c -r1.88 xlog.h
&lt;br&gt;*** src/include/access/xlog.h	12 May 2008 08:35:05 -0000	1.88
&lt;br&gt;--- src/include/access/xlog.h	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 133,139 ****
&lt;br&gt;&amp;nbsp; } XLogRecData;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern TimeLineID ThisTimeLineID;		/* current TLI */
&lt;br&gt;! extern bool InRecovery;
&lt;br&gt;&amp;nbsp; extern XLogRecPtr XactLastRecEnd;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* these variables are GUC parameters related to XLOG */
&lt;br&gt;--- 133,148 ----
&lt;br&gt;&amp;nbsp; } XLogRecData;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern TimeLineID ThisTimeLineID;		/* current TLI */
&lt;br&gt;! 
&lt;br&gt;! /* 
&lt;br&gt;! &amp;nbsp;* Prior to 8.4, all activity during recovery were carried out by Startup
&lt;br&gt;! &amp;nbsp;* process. This local variable continues to be used in many parts of the
&lt;br&gt;! &amp;nbsp;* code to indicate actions taken by RecoveryManagers. Other processes who
&lt;br&gt;! &amp;nbsp;* potentially perform work during recovery should check
&lt;br&gt;! &amp;nbsp;* IsRecoveryProcessingMode(), see XLogCtl notes in xlog.c
&lt;br&gt;! &amp;nbsp;*/
&lt;br&gt;! extern bool InRecovery;	
&lt;br&gt;! 										
&lt;br&gt;&amp;nbsp; extern XLogRecPtr XactLastRecEnd;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* these variables are GUC parameters related to XLOG */
&lt;br&gt;***************
&lt;br&gt;*** 166,171 ****
&lt;br&gt;--- 175,181 ----
&lt;br&gt;&amp;nbsp; /* These indicate the cause of a checkpoint request */
&lt;br&gt;&amp;nbsp; #define CHECKPOINT_CAUSE_XLOG	0x0010	/* XLOG consumption */
&lt;br&gt;&amp;nbsp; #define CHECKPOINT_CAUSE_TIME	0x0020	/* Elapsed time */
&lt;br&gt;+ #define CHECKPOINT_RESTARTPOINT	0x0040	/* Restartpoint during recovery */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* Checkpoint statistics */
&lt;br&gt;&amp;nbsp; typedef struct CheckpointStatsData
&lt;br&gt;***************
&lt;br&gt;*** 197,202 ****
&lt;br&gt;--- 207,214 ----
&lt;br&gt;&amp;nbsp; extern void xlog_redo(XLogRecPtr lsn, XLogRecord *record);
&lt;br&gt;&amp;nbsp; extern void xlog_desc(StringInfo buf, uint8 xl_info, char *rec);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ extern bool IsRecoveryProcessingMode(void);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; extern void UpdateControlFile(void);
&lt;br&gt;&amp;nbsp; extern Size XLOGShmemSize(void);
&lt;br&gt;&amp;nbsp; extern void XLOGShmemInit(void);
&lt;br&gt;Index: src/include/access/xlog_internal.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/xlog_internal.h,v
&lt;br&gt;retrieving revision 1.24
&lt;br&gt;diff -c -r1.24 xlog_internal.h
&lt;br&gt;*** src/include/access/xlog_internal.h	11 Aug 2008 11:05:11 -0000	1.24
&lt;br&gt;--- src/include/access/xlog_internal.h	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 17,22 ****
&lt;br&gt;--- 17,23 ----
&lt;br&gt;&amp;nbsp; #define XLOG_INTERNAL_H
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; #include &amp;quot;access/xlog.h&amp;quot;
&lt;br&gt;+ #include &amp;quot;catalog/pg_control.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;fmgr.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;pgtime.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;storage/block.h&amp;quot;
&lt;br&gt;***************
&lt;br&gt;*** 245,250 ****
&lt;br&gt;--- 246,254 ----
&lt;br&gt;&amp;nbsp; extern pg_time_t GetLastSegSwitchTime(void);
&lt;br&gt;&amp;nbsp; extern XLogRecPtr RequestXLogSwitch(void);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ extern void CreateRestartPoint(const XLogRecPtr ReadPtr, 
&lt;br&gt;+ 				const CheckPoint *restartPoint, int flags);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* These aren't in xlog.h because I'd rather not include fmgr.h there.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;Index: src/include/catalog/pg_control.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/catalog/pg_control.h,v
&lt;br&gt;retrieving revision 1.42
&lt;br&gt;diff -c -r1.42 pg_control.h
&lt;br&gt;*** src/include/catalog/pg_control.h	23 Sep 2008 09:20:39 -0000	1.42
&lt;br&gt;--- src/include/catalog/pg_control.h	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 46,52 ****
&lt;br&gt;&amp;nbsp; #define XLOG_NOOP						0x20
&lt;br&gt;&amp;nbsp; #define XLOG_NEXTOID					0x30
&lt;br&gt;&amp;nbsp; #define XLOG_SWITCH						0x40
&lt;br&gt;! 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* System status indicator */
&lt;br&gt;&amp;nbsp; typedef enum DBState
&lt;br&gt;--- 46,52 ----
&lt;br&gt;&amp;nbsp; #define XLOG_NOOP						0x20
&lt;br&gt;&amp;nbsp; #define XLOG_NEXTOID					0x30
&lt;br&gt;&amp;nbsp; #define XLOG_SWITCH						0x40
&lt;br&gt;! #define XLOG_RECOVERY_END			0x50
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* System status indicator */
&lt;br&gt;&amp;nbsp; typedef enum DBState
&lt;br&gt;***************
&lt;br&gt;*** 102,107 ****
&lt;br&gt;--- 102,108 ----
&lt;br&gt;&amp;nbsp; 	CheckPoint	checkPointCopy; /* copy of last check point record */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	XLogRecPtr	minRecoveryPoint;		/* must replay xlog to here */
&lt;br&gt;+ 	XLogRecPtr	minSafeStartPoint;		/* safe point after recovery crashes */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * This data is used to check for hardware-architecture compatibility of
&lt;br&gt;Index: src/include/postmaster/bgwriter.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/postmaster/bgwriter.h,v
&lt;br&gt;retrieving revision 1.12
&lt;br&gt;diff -c -r1.12 bgwriter.h
&lt;br&gt;*** src/include/postmaster/bgwriter.h	11 Aug 2008 11:05:11 -0000	1.12
&lt;br&gt;--- src/include/postmaster/bgwriter.h	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 12,17 ****
&lt;br&gt;--- 12,18 ----
&lt;br&gt;&amp;nbsp; #ifndef _BGWRITER_H
&lt;br&gt;&amp;nbsp; #define _BGWRITER_H
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ #include &amp;quot;catalog/pg_control.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;storage/block.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;storage/relfilenode.h&amp;quot;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 25,30 ****
&lt;br&gt;--- 26,36 ----
&lt;br&gt;&amp;nbsp; extern void BackgroundWriterMain(void);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern void RequestCheckpoint(int flags);
&lt;br&gt;+ extern void RequestRestartPoint(const XLogRecPtr ReadPtr, const CheckPoint *restartPoint, bool sendToBGWriter);
&lt;br&gt;+ extern void RequestRestartPointCompletion(void);
&lt;br&gt;+ extern XLogRecPtr GetRedoLocationForArchiveCheckpoint(void);
&lt;br&gt;+ extern void SetRedoLocationForArchiveCheckpoint(XLogRecPtr redo);
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; extern void CheckpointWriteDelay(int flags, double progress);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
&lt;br&gt;Index: src/include/storage/pmsignal.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/storage/pmsignal.h,v
&lt;br&gt;retrieving revision 1.20
&lt;br&gt;diff -c -r1.20 pmsignal.h
&lt;br&gt;*** src/include/storage/pmsignal.h	19 Jun 2008 21:32:56 -0000	1.20
&lt;br&gt;--- src/include/storage/pmsignal.h	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 22,27 ****
&lt;br&gt;--- 22,28 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; typedef enum
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;+ 	PMSIGNAL_RECOVERY_START,	/* move to PM_RECOVERY state */
&lt;br&gt;&amp;nbsp; 	PMSIGNAL_PASSWORD_CHANGE,	/* pg_auth file has changed */
&lt;br&gt;&amp;nbsp; 	PMSIGNAL_WAKEN_ARCHIVER,	/* send a NOTIFY signal to xlog archiver */
&lt;br&gt;&amp;nbsp; 	PMSIGNAL_ROTATE_LOGFILE,	/* send SIGUSR1 to syslogger to rotate logfile */
&lt;br&gt;Index: src/test/regress/expected/opr_sanity.out
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/test/regress/expected/opr_sanity.out,v
&lt;br&gt;retrieving revision 1.84
&lt;br&gt;diff -c -r1.84 opr_sanity.out
&lt;br&gt;*** src/test/regress/expected/opr_sanity.out	16 Aug 2008 00:01:38 -0000	1.84
&lt;br&gt;--- src/test/regress/expected/opr_sanity.out	30 Sep 2008 17:15:15 -0000
&lt;br&gt;***************
&lt;br&gt;*** 109,117 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.proretset != p2.proretset OR
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.provolatile != p2.provolatile OR
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.pronargs != p2.pronargs);
&lt;br&gt;! &amp;nbsp;oid | proname | oid | proname 
&lt;br&gt;! -----+---------+-----+---------
&lt;br&gt;! (0 rows)
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; -- Look for uses of different type OIDs in the argument/result type fields
&lt;br&gt;&amp;nbsp; -- for different aliases of the same built-in function.
&lt;br&gt;--- 109,118 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.proretset != p2.proretset OR
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.provolatile != p2.provolatile OR
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p1.pronargs != p2.pronargs);
&lt;br&gt;! &amp;nbsp;oid &amp;nbsp;| &amp;nbsp; &amp;nbsp; proname &amp;nbsp; &amp;nbsp; | oid &amp;nbsp;| &amp;nbsp; &amp;nbsp; proname
&lt;br&gt;! ------+-----------------+------+-----------------
&lt;br&gt;! &amp;nbsp;2172 | pg_start_backup | 2176 | pg_start_backup
&lt;br&gt;! (1 row)
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; -- Look for uses of different type OIDs in the argument/result type fields
&lt;br&gt;&amp;nbsp; -- for different aliases of the same built-in function.
&lt;br&gt;&lt;/tt&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;br /&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19751854&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Infrastructure-changes-for-recovery-%28v8%29-tp19751854p19751854.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19737136</id>
	<title>Re: still alive?</title>
	<published>2008-09-30T00:02:35Z</published>
	<updated>2008-09-30T00:02:35Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Thu, 2008-09-11 at 15:39 +0300, Peter Eisentraut wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Bruce Momjian wrote:
&lt;br&gt;&amp;gt; &amp;gt; Abhijit Menon-Sen wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; I thought -patches was supposed to die. What happened?
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; I was wondering the same thing. &amp;nbsp;Peter?
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Hmm, let's try this:
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Anyone who thinks the patches list should remain as separate from 
&lt;br&gt;&amp;gt; hackers, shout now (with rationale)!
&lt;/div&gt;&lt;br&gt;Kill it now, long enough before the next patchfest for it to stick.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19737136&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/still-alive--tp19301616p19737136.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19737125</id>
	<title>Re: Proposed patch to change TOAST compression strategy for 8.3.4</title>
	<published>2008-09-30T00:00:48Z</published>
	<updated>2008-09-30T00:00:48Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Mon, 2008-09-29 at 12:32 +0200, Jérôme Jouanin wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; Upgrade 8.3.4 is available. Before compiling, I have to apply the
&lt;br&gt;&amp;gt; optimized toast patch : bin7hetTGkMRL.bin.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; There are differences in 1 of the 3 files patched : tuptoaster.c
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; The patch runs successfully but before installing on production
&lt;br&gt;&amp;gt; servers, I have to ask : what about the compatibility of this patch in
&lt;br&gt;&amp;gt; 8.3.4 ?
&lt;br&gt;&lt;br&gt;We use patches to build the next releases of PostgreSQL. We don't make
&lt;br&gt;any attempt to maintain past patches as current so that people can apply
&lt;br&gt;them to previous branches.
&lt;br&gt;&lt;br&gt;If you want someone to do that you will probably need to hire a
&lt;br&gt;development and support company, or do it yourself.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19737125&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Proposed-patch-to-change-TOAST-compression-strategy-for-8.3.4-tp19721534p19737125.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19726524</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T08:53:55Z</published>
	<updated>2008-09-29T08:53:55Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Mon, 2008-09-29 at 11:24 -0400, Tom Lane wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19726524&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt; On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; ... If we crash and restart, we'll have to get to the end
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; of this file before we start letting backends in; which might be further
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; than we actually got before the crash, but not too much further because
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; we already know the whole WAL file is available.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; Don't want to make it per file though. Big systems can whizz through WAL
&lt;br&gt;&amp;gt; &amp;gt; files very quickly, so we either make it a big number e.g. 255 files per
&lt;br&gt;&amp;gt; &amp;gt; xlogid, or we make it settable (and recorded in pg_control).
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I think you are missing the point I made above. &amp;nbsp;If you set the
&lt;br&gt;&amp;gt; okay-to-resume point N files ahead, and then the master stops generating
&lt;br&gt;&amp;gt; files so quickly, you've got a problem --- it might be a long time until
&lt;br&gt;&amp;gt; the slave starts letting backends in after a crash/restart.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Fetching a new WAL segment from the archive is expensive enough that an
&lt;br&gt;&amp;gt; additional write/fsync per cycle doesn't seem that big a problem to me.
&lt;br&gt;&amp;gt; There's almost certainly a few fsync-equivalents going on in the
&lt;br&gt;&amp;gt; filesystem to create and delete the retrieved segment files.
&lt;/div&gt;&lt;br&gt;Didn't miss yer point, just didn't agree. :-)
&lt;br&gt;&lt;br&gt;I'll put it at one (1) and then wait for any negative perf reports. No
&lt;br&gt;need to worry about things like that until later.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19726524&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19726524.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19726009</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T08:24:08Z</published>
	<updated>2008-09-29T08:24:08Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19726009&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
&lt;br&gt;&amp;gt;&amp;gt; ... If we crash and restart, we'll have to get to the end
&lt;br&gt;&amp;gt;&amp;gt; of this file before we start letting backends in; which might be further
&lt;br&gt;&amp;gt;&amp;gt; than we actually got before the crash, but not too much further because
&lt;br&gt;&amp;gt;&amp;gt; we already know the whole WAL file is available.
&lt;br&gt;&lt;br&gt;&amp;gt; Don't want to make it per file though. Big systems can whizz through WAL
&lt;br&gt;&amp;gt; files very quickly, so we either make it a big number e.g. 255 files per
&lt;br&gt;&amp;gt; xlogid, or we make it settable (and recorded in pg_control).
&lt;br&gt;&lt;br&gt;I think you are missing the point I made above. &amp;nbsp;If you set the
&lt;br&gt;okay-to-resume point N files ahead, and then the master stops generating
&lt;br&gt;files so quickly, you've got a problem --- it might be a long time until
&lt;br&gt;the slave starts letting backends in after a crash/restart.
&lt;br&gt;&lt;br&gt;Fetching a new WAL segment from the archive is expensive enough that an
&lt;br&gt;additional write/fsync per cycle doesn't seem that big a problem to me.
&lt;br&gt;There's almost certainly a few fsync-equivalents going on in the
&lt;br&gt;filesystem to create and delete the retrieved segment files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19726009&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19726009.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19725707</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T08:05:49Z</published>
	<updated>2008-09-29T08:05:49Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19725707&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt; I think we can get away with writing the LSN value to disk, as you
&lt;br&gt;&amp;gt; &amp;gt; suggested, but only every so often. No need to do it after every WAL
&lt;br&gt;&amp;gt; &amp;gt; record, just consistently every so often, so it gives us a point at
&lt;br&gt;&amp;gt; &amp;gt; which we know we are safe.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Huh? &amp;nbsp;How does that make you safe? &amp;nbsp;What you need to know is the max
&lt;br&gt;&amp;gt; LSN that could possibly be on disk.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Hmm, actually we could get away with tying this to fetching WAL files
&lt;br&gt;&amp;gt; from the archive. &amp;nbsp;When switching to a new WAL file, write out the
&lt;br&gt;&amp;gt; *ending* WAL address of that file to pg_control. &amp;nbsp;Then process the WAL
&lt;br&gt;&amp;gt; records in it. &amp;nbsp;Whether or not any of the affected pages get to disk,
&lt;br&gt;&amp;gt; we know that there is no LSN on disk exceeding what we already put in
&lt;br&gt;&amp;gt; pg_control. &amp;nbsp;If we crash and restart, we'll have to get to the end
&lt;br&gt;&amp;gt; of this file before we start letting backends in; which might be further
&lt;br&gt;&amp;gt; than we actually got before the crash, but not too much further because
&lt;br&gt;&amp;gt; we already know the whole WAL file is available.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Or is that the same thing you were saying? &amp;nbsp;The detail about using
&lt;br&gt;&amp;gt; the end address seems fairly critical, and you didn't mention it...
&lt;/div&gt;&lt;br&gt;Same! Just said safe point was &amp;quot;LSN + 1&amp;quot;, and since end = next start.
&lt;br&gt;&lt;br&gt;Looks we've got a solution, no matter how it's described. (I actually
&lt;br&gt;have a more detailed proof of safety using snapshots/MVCC considerations
&lt;br&gt;so I wasn't overly worried but what we've discussed is much easier to
&lt;br&gt;understand and agree. Proof of safety is all we need, and this simpler
&lt;br&gt;proof is more secure.)
&lt;br&gt;&lt;br&gt;Don't want to make it per file though. Big systems can whizz through WAL
&lt;br&gt;files very quickly, so we either make it a big number e.g. 255 files per
&lt;br&gt;xlogid, or we make it settable (and recorded in pg_control).
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19725707&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19725707.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19724706</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T07:13:59Z</published>
	<updated>2008-09-29T07:13:59Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19724706&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; I think we can get away with writing the LSN value to disk, as you
&lt;br&gt;&amp;gt; suggested, but only every so often. No need to do it after every WAL
&lt;br&gt;&amp;gt; record, just consistently every so often, so it gives us a point at
&lt;br&gt;&amp;gt; which we know we are safe.
&lt;br&gt;&lt;br&gt;Huh? &amp;nbsp;How does that make you safe? &amp;nbsp;What you need to know is the max
&lt;br&gt;LSN that could possibly be on disk.
&lt;br&gt;&lt;br&gt;Hmm, actually we could get away with tying this to fetching WAL files
&lt;br&gt;from the archive. &amp;nbsp;When switching to a new WAL file, write out the
&lt;br&gt;*ending* WAL address of that file to pg_control. &amp;nbsp;Then process the WAL
&lt;br&gt;records in it. &amp;nbsp;Whether or not any of the affected pages get to disk,
&lt;br&gt;we know that there is no LSN on disk exceeding what we already put in
&lt;br&gt;pg_control. &amp;nbsp;If we crash and restart, we'll have to get to the end
&lt;br&gt;of this file before we start letting backends in; which might be further
&lt;br&gt;than we actually got before the crash, but not too much further because
&lt;br&gt;we already know the whole WAL file is available.
&lt;br&gt;&lt;br&gt;Or is that the same thing you were saying? &amp;nbsp;The detail about using
&lt;br&gt;the end address seems fairly critical, and you didn't mention it...
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19724706&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19724706.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19724298</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T06:52:28Z</published>
	<updated>2008-09-29T06:52:28Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Mon, 2008-09-29 at 08:46 -0400, Tom Lane wrote:
&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19724298&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt; ... That kinda works, but the problem is that restartpoints are time based,
&lt;br&gt;&amp;gt; &amp;gt; not log based. We need them to be deterministic for us to rely upon them
&lt;br&gt;&amp;gt; &amp;gt; in the above way.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Right, but the performance disadvantages of making them strictly
&lt;br&gt;&amp;gt; log-distance-based are pretty daunting. &amp;nbsp;We don't really want slaves
&lt;br&gt;&amp;gt; doing that while they're in catchup mode.
&lt;br&gt;&lt;br&gt;I don't think we need to perform restartpoints actually, now I think
&lt;br&gt;about it. It's only the LSN that is important. 
&lt;br&gt;&lt;br&gt;I think we can get away with writing the LSN value to disk, as you
&lt;br&gt;suggested, but only every so often. No need to do it after every WAL
&lt;br&gt;record, just consistently every so often, so it gives us a point at
&lt;br&gt;which we know we are safe. We will need to have Startup process block
&lt;br&gt;momentarily while the value is written.
&lt;br&gt;&lt;br&gt;Propose Startup process writes/flushes LSN to pg_control every time we
&lt;br&gt;change xlogid. That's independent of WAL file size and fairly clear. 
&lt;br&gt;&lt;br&gt;When we reach that LSN + 1 we will know that no LSNs higher than that
&lt;br&gt;value can have reached disk.
&lt;br&gt;&lt;br&gt;OK?
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19724298&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19724298.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19723250</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T05:46:46Z</published>
	<updated>2008-09-29T05:46:46Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19723250&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; ... That kinda works, but the problem is that restartpoints are time based,
&lt;br&gt;&amp;gt; not log based. We need them to be deterministic for us to rely upon them
&lt;br&gt;&amp;gt; in the above way.
&lt;br&gt;&lt;br&gt;Right, but the performance disadvantages of making them strictly
&lt;br&gt;log-distance-based are pretty daunting. &amp;nbsp;We don't really want slaves
&lt;br&gt;doing that while they're in catchup mode.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19723250&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19723250.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19723032</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-29T05:32:17Z</published>
	<updated>2008-09-29T05:32:17Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Sun, 2008-09-28 at 21:16 -0400, Tom Lane wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19723032&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; It does nothing AFAICS for the
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; problem that when restarting archive recovery from a restartpoint,
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; it's not clear when it is safe to start letting in backends. &amp;nbsp;You need
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; to get past the highest LSN that has made it out to disk, and there is
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; no good way to know what that is.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
&lt;br&gt;&amp;gt; &amp;gt; the controlfile, so whenever we restart we can see that it has been set
&lt;br&gt;&amp;gt; &amp;gt; previously and now we are beyond it.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Right ...
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; So if we crash during recovery and
&lt;br&gt;&amp;gt; &amp;gt; then restart *after* we reached minRecoveryLoc then we resume in safe
&lt;br&gt;&amp;gt; &amp;gt; mode almost immediately.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Wrong.
&lt;/div&gt;&lt;br&gt;OK, see where you're coming from now. Solution is needed, I agree.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; What minRecoveryLoc is is an upper bound for the LSNs that might be
&lt;br&gt;&amp;gt; on-disk in the filesystem backup that an archive recovery starts from.
&lt;br&gt;&amp;gt; (Defined as such, it never changes during a restartpoint crash/restart.)
&lt;br&gt;&amp;gt; Once you pass that, the on-disk state as modified by any dirty buffers
&lt;br&gt;&amp;gt; inside the recovery process represents a consistent database state.
&lt;br&gt;&amp;gt; However, the on-disk state alone is not guaranteed consistent. &amp;nbsp;As you
&lt;br&gt;&amp;gt; flush some (not all) of your shared buffers you enter other
&lt;br&gt;&amp;gt; not-certainly-consistent on-disk states. &amp;nbsp;If we crash in such a state,
&lt;br&gt;&amp;gt; we know how to use the last restartpoint plus WAL replay to recover to
&lt;br&gt;&amp;gt; another state in which disk + dirty buffers are consistent. &amp;nbsp;However,
&lt;br&gt;&amp;gt; we reach such a state only when we have read WAL to beyond the highest
&lt;br&gt;&amp;gt; LSN that has reached disk --- and in recovery mode there is no clean
&lt;br&gt;&amp;gt; way to determine what that was.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
&lt;br&gt;&amp;gt; but have it scribble a highest-LSN somewhere on stable storage (maybe
&lt;br&gt;&amp;gt; scribble on pg_control itself, or maybe better someplace else). &amp;nbsp;I'm
&lt;br&gt;&amp;gt; not totally sure about that. &amp;nbsp;But I am sure that doing nothing will
&lt;br&gt;&amp;gt; be unreliable.
&lt;/div&gt;&lt;br&gt;No need to write highest LSN to disk constantly...
&lt;br&gt;&lt;br&gt;If we restart from a restartpoint then initially the current apply LSN
&lt;br&gt;will be potentially/probably earlier than the latest on-disk LSN, as you
&lt;br&gt;say. But once we have completed the next restartpoint *after* the value
&lt;br&gt;pg_control says then we will be guaranteed that the two LSNs are the
&lt;br&gt;same, since otherwise we would have restarted at a later point.
&lt;br&gt;&lt;br&gt;That kinda works, but the problem is that restartpoints are time based,
&lt;br&gt;not log based. We need them to be deterministic for us to rely upon them
&lt;br&gt;in the above way. If we crash and then replay we can only be certain we
&lt;br&gt;are safe when we have found a restartpoint that the previous recovery
&lt;br&gt;will definitely have reached.
&lt;br&gt;&lt;br&gt;So we must have log-based restartpoints, using either a constant LSN
&lt;br&gt;offset, or a parameter like checkpoint_segments. But if it is changeable
&lt;br&gt;then it needs to be written into the control file, so we don't make a
&lt;br&gt;mistake about it. 
&lt;br&gt;&lt;br&gt;So we need to:
&lt;br&gt;* add an extra test to delay safe point if required
&lt;br&gt;* write restart_segments value to control file
&lt;br&gt;* force a restartpoint on first valid checkpoint WAL record after we
&lt;br&gt;have passed restart_segments worth of log
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19723032&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19723032.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19721534</id>
	<title>Proposed patch to change TOAST compression strategy for 8.3.4</title>
	<published>2008-09-29T03:32:48Z</published>
	<updated>2008-09-29T03:32:48Z</updated>
	<author>
		<name>Jérôme Jouanin</name>
	</author>
	<content type="html">&lt;div dir=&quot;ltr&quot;&gt;Hello,&lt;br&gt;&lt;br&gt;Upgrade 8.3.4 is available. Before compiling, I have to apply the optimized &lt;span class=&quot;nfakPe&quot;&gt;toast&lt;/span&gt; patch : &lt;strong&gt;&lt;a href=&quot;http://archives.postgresql.org/pgsql-patches/2008-02/bin7hetTGkMRL.bin&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;&lt;tt&gt;bin7hetTGkMRL.bin&lt;/tt&gt;&lt;/a&gt;&lt;/strong&gt;.&lt;br&gt;
&lt;br&gt;There are differences in 1 of the 3 files patched : tuptoaster.c&lt;br&gt;&lt;br&gt;The patch runs successfully but before installing on production servers, I have to ask : what about the compatibility of this patch in 8.3.4 ?&lt;br&gt;
&lt;br&gt;Thanks,&lt;br&gt;&lt;br&gt;Jérôme Jouanin&lt;br&gt;&lt;/div&gt;
</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Proposed-patch-to-change-TOAST-compression-strategy-for-8.3.4-tp19721534p19721534.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19718426</id>
	<title>Re: [PgFoundry] Unsigned Data Types [1 of 2]</title>
	<published>2008-09-28T22:12:38Z</published>
	<updated>2008-09-28T22:12:38Z</updated>
	<author>
		<name>Ryan Bradetich-2</name>
	</author>
	<content type="html">Hello all,
&lt;br&gt;&lt;br&gt;Just wanted to let everyone know I have committed this patch to the
&lt;br&gt;PgFoundry uint project.
&lt;br&gt;I have also updated the commit-fest wiki with this status.
&lt;br&gt;&lt;br&gt;Thanks to everyone (especially Jaime) for the feedback and reviews.
&lt;br&gt;&lt;br&gt;- Ryan
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19718426&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-PgFoundry--Unsigned-Data-Types--1-of-2--tp19245940p19718426.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19717262</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-28T18:16:01Z</published>
	<updated>2008-09-28T18:16:01Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19717262&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt;&amp;gt; It does nothing AFAICS for the
&lt;br&gt;&amp;gt;&amp;gt; problem that when restarting archive recovery from a restartpoint,
&lt;br&gt;&amp;gt;&amp;gt; it's not clear when it is safe to start letting in backends. &amp;nbsp;You need
&lt;br&gt;&amp;gt;&amp;gt; to get past the highest LSN that has made it out to disk, and there is
&lt;br&gt;&amp;gt;&amp;gt; no good way to know what that is.
&lt;br&gt;&lt;br&gt;&amp;gt; AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
&lt;br&gt;&amp;gt; the controlfile, so whenever we restart we can see that it has been set
&lt;br&gt;&amp;gt; previously and now we are beyond it.
&lt;br&gt;&lt;br&gt;Right ...
&lt;br&gt;&lt;br&gt;&amp;gt; So if we crash during recovery and
&lt;br&gt;&amp;gt; then restart *after* we reached minRecoveryLoc then we resume in safe
&lt;br&gt;&amp;gt; mode almost immediately.
&lt;br&gt;&lt;br&gt;Wrong.
&lt;br&gt;&lt;br&gt;What minRecoveryLoc is is an upper bound for the LSNs that might be
&lt;br&gt;on-disk in the filesystem backup that an archive recovery starts from.
&lt;br&gt;(Defined as such, it never changes during a restartpoint crash/restart.)
&lt;br&gt;Once you pass that, the on-disk state as modified by any dirty buffers
&lt;br&gt;inside the recovery process represents a consistent database state.
&lt;br&gt;However, the on-disk state alone is not guaranteed consistent. &amp;nbsp;As you
&lt;br&gt;flush some (not all) of your shared buffers you enter other
&lt;br&gt;not-certainly-consistent on-disk states. &amp;nbsp;If we crash in such a state,
&lt;br&gt;we know how to use the last restartpoint plus WAL replay to recover to
&lt;br&gt;another state in which disk + dirty buffers are consistent. &amp;nbsp;However,
&lt;br&gt;we reach such a state only when we have read WAL to beyond the highest
&lt;br&gt;LSN that has reached disk --- and in recovery mode there is no clean
&lt;br&gt;way to determine what that was.
&lt;br&gt;&lt;br&gt;Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
&lt;br&gt;but have it scribble a highest-LSN somewhere on stable storage (maybe
&lt;br&gt;scribble on pg_control itself, or maybe better someplace else). &amp;nbsp;I'm
&lt;br&gt;not totally sure about that. &amp;nbsp;But I am sure that doing nothing will
&lt;br&gt;be unreliable.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19717262&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19717262.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19717111</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-28T17:53:19Z</published>
	<updated>2008-09-28T17:53:19Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Sun, 2008-09-28 at 14:02 -0400, Tom Lane wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; It does nothing AFAICS for the
&lt;br&gt;&amp;gt; problem that when restarting archive recovery from a restartpoint,
&lt;br&gt;&amp;gt; it's not clear when it is safe to start letting in backends. &amp;nbsp;You need
&lt;br&gt;&amp;gt; to get past the highest LSN that has made it out to disk, and there is
&lt;br&gt;&amp;gt; no good way to know what that is.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Unless we can get past this problem the whole thing seems a bit dead
&lt;br&gt;&amp;gt; in
&lt;br&gt;&amp;gt; the water :-(
&lt;br&gt;&lt;br&gt;I agree the importance of your a problem but don't fully understand the
&lt;br&gt;circumstances under which you see a problem arising.
&lt;br&gt;&lt;br&gt;AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
&lt;br&gt;the controlfile, so whenever we restart we can see that it has been set
&lt;br&gt;previously and now we are beyond it. So if we crash during recovery and
&lt;br&gt;then restart *after* we reached minRecoveryLoc then we resume in safe
&lt;br&gt;mode almost immediately. If we crash during recovery before we reached
&lt;br&gt;minRecoveryLoc then we continue until we find it. 
&lt;br&gt;&lt;br&gt;There is a loophole, as described on separate post, but that can be
&lt;br&gt;plugged by offering explicit setting of the minRecoveryLoc from
&lt;br&gt;recovery.conf. Most people use pg_start_backup() so do not experience
&lt;br&gt;the need for that.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19717111&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19717111.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19714974</id>
	<title>Re: [HACKERS] get_relation_stats_hook()</title>
	<published>2008-09-28T12:57:49Z</published>
	<updated>2008-09-28T12:57:49Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19714974&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; New version of Postgres patch, v5. Implements suggested changes.
&lt;br&gt;&amp;gt; Ready for review and apply.
&lt;br&gt;&lt;br&gt;Applied with some revisions. &amp;nbsp;The method for passing back freefunc
&lt;br&gt;didn't work, so I made it pass the whole VariableStatsData struct
&lt;br&gt;instead; this might allow some additional flexibility by changing other
&lt;br&gt;fields besides the intended statsTuple and freefunc. &amp;nbsp;Also, I was still
&lt;br&gt;unhappy about adding a hook in the midst of code that clearly needs
&lt;br&gt;improvement, without making it possible for the hook to override the
&lt;br&gt;adjacent broken code paths; so I refactored the API a bit for that too.
&lt;br&gt;&lt;br&gt;The plugin function would now be something like this:
&lt;br&gt;&lt;br&gt;static bool
&lt;br&gt;plugin_get_relation_stats(PlannerInfo *root,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; RangeTblEntry *rte,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AttrNumber attnum,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; VariableStatData *vardata)
&lt;br&gt;{
&lt;br&gt;&amp;nbsp; &amp;nbsp; HeapTuple &amp;nbsp; &amp;nbsp;statstup = NULL;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; /* For now, we only cover the simple-relation case */
&lt;br&gt;&amp;nbsp; &amp;nbsp; if (rte-&amp;gt;rtekind != RTE_RELATION || rte-&amp;gt;inh)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return false;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; if (!get_tom_stats_tupletable(rte-&amp;gt;relid, attnum))
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return false;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;* Get stats if present. We asked for only one row, so no need for loops.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;*/ &amp;nbsp; &amp;nbsp;
&lt;br&gt;&amp;nbsp; &amp;nbsp; if (SPI_processed &amp;gt; 0)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; statstup = SPI_copytuple(SPI_tuptable-&amp;gt;vals[0]);
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; SPI_freetuptable(SPI_tuptable);
&lt;br&gt;&amp;nbsp; &amp;nbsp; SPI_finish();
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; if (!statstup)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return false; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/* should this happen? */
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; vardata-&amp;gt;statsTuple = statstup;
&lt;br&gt;&amp;nbsp; &amp;nbsp; /* define function to use when time to free the tuple */
&lt;br&gt;&amp;nbsp; &amp;nbsp; vardata-&amp;gt;freefunc = heap_freetuple;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; return true;
&lt;br&gt;}
&lt;br&gt;&lt;br&gt;and if you want to insert stats for expression indexes then there's a
&lt;br&gt;separate get_index_stats_hook for that.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19714974&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--get_relation_stats_hook%28%29-tp18203715p19714974.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19713828</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-28T11:02:25Z</published>
	<updated>2008-09-28T11:02:25Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19713828&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
&lt;br&gt;&amp;gt;&amp;gt; After reading this for awhile, I realized that there is a rather
&lt;br&gt;&amp;gt;&amp;gt; fundamental problem with it: it switches into &amp;quot;consistent recovery&amp;quot;
&lt;br&gt;&amp;gt;&amp;gt; mode as soon as it's read WAL beyond ControlFile-&amp;gt;minRecoveryPoint.
&lt;br&gt;&amp;gt;&amp;gt; In a crash recovery situation that typically is before the last
&lt;br&gt;&amp;gt;&amp;gt; checkpoint (if indeed it's not still zero), and what that means is
&lt;br&gt;&amp;gt;&amp;gt; that this patch will activate the bgwriter and start letting in
&lt;br&gt;&amp;gt;&amp;gt; backends instantaneously after a crash, long before we can have any
&lt;br&gt;&amp;gt;&amp;gt; certainty that the DB state really is consistent.
&lt;br&gt;&amp;gt;&amp;gt; 
&lt;br&gt;&amp;gt;&amp;gt; In a normal crash recovery situation this would be easily fixed by
&lt;br&gt;&amp;gt;&amp;gt; simply not letting it go to &amp;quot;consistent recovery&amp;quot; state at all, but
&lt;br&gt;&amp;gt;&amp;gt; what about recovery from a restartpoint? &amp;nbsp;We don't want a slave that's
&lt;br&gt;&amp;gt;&amp;gt; crashed once to never let backends in again. &amp;nbsp;But I don't see how to
&lt;br&gt;&amp;gt;&amp;gt; determine that we're far enough past the restartpoint to be consistent
&lt;br&gt;&amp;gt;&amp;gt; again. &amp;nbsp;In crash recovery we assume (without proof ;-)) that we're
&lt;br&gt;&amp;gt;&amp;gt; consistent once we reach the end of valid-looking WAL, but that rule
&lt;br&gt;&amp;gt;&amp;gt; doesn't help for a slave that's following a continuing WAL sequence.
&lt;br&gt;&amp;gt;&amp;gt; 
&lt;br&gt;&amp;gt;&amp;gt; Perhaps something could be done based on noting when we have to pull in
&lt;br&gt;&amp;gt;&amp;gt; a WAL segment from the recovery_command, but it sounds like a pretty
&lt;br&gt;&amp;gt;&amp;gt; fragile assumption.
&lt;/div&gt;&lt;br&gt;&amp;gt; Seems like we just say we only signal the postmaster if
&lt;br&gt;&amp;gt; InArchiveRecovery. Archive recovery from a restartpoint is still archive
&lt;br&gt;&amp;gt; recovery, so this shouldn't be a problem in the way you mention. The
&lt;br&gt;&amp;gt; presence of recovery.conf overrides all other cases.
&lt;br&gt;&lt;br&gt;What that implements is my comment that we don't have to let anyone in
&lt;br&gt;at all during a plain crash recovery. &amp;nbsp;It does nothing AFAICS for the
&lt;br&gt;problem that when restarting archive recovery from a restartpoint,
&lt;br&gt;it's not clear when it is safe to start letting in backends. &amp;nbsp;You need
&lt;br&gt;to get past the highest LSN that has made it out to disk, and there is
&lt;br&gt;no good way to know what that is.
&lt;br&gt;&lt;br&gt;Unless we can get past this problem the whole thing seems a bit dead in
&lt;br&gt;the water :-(
&lt;br&gt;&lt;br&gt;&amp;gt;&amp;gt; * I'm a bit uncomfortable with the fact that the
&lt;br&gt;&amp;gt;&amp;gt; IsRecoveryProcessingMode flag is read and written with no lock.
&lt;br&gt;&lt;br&gt;&amp;gt; It's not a dynamic state, so I can fix that inside
&lt;br&gt;&amp;gt; IsRecoveryProcessingMode() with a local state to make check faster.
&lt;br&gt;&lt;br&gt;Erm, this code doesn't look like it can allow IsRecoveryProcessingMode
&lt;br&gt;to become locally true in the first place? &amp;nbsp;I guess you could fix it
&lt;br&gt;by initializing IsRecoveryProcessingMode to true, but that seems likely
&lt;br&gt;to break other places. &amp;nbsp;Maybe better is to have an additional local
&lt;br&gt;state variable showing whether the flag has ever been fetched from
&lt;br&gt;shared memory.
&lt;br&gt;&lt;br&gt;The other issues don't seem worth arguing about ...
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19713828&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19713828.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19690014</id>
	<title>Re: [HACKERS] get_relation_stats_hook()</title>
	<published>2008-09-26T07:51:53Z</published>
	<updated>2008-09-26T07:51:53Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Wed, 2008-08-06 at 23:38 +0100, Simon Riggs wrote:
&lt;div class='shrinkable-quote'&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; On Wed, 2008-08-06 at 16:37 +0100, Simon Riggs wrote:
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; I'll submit the fully working plugin once we've stabilised the API. It's
&lt;br&gt;&amp;gt; &amp;gt; designed as a contrib module, so it can go in pgfoundry or contrib.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; OK, here's fully working plugin, plus API patch.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I expect to open a pgfoundry project for the plugin, but will wait for
&lt;br&gt;&amp;gt; the main patch review.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I've tried the APIs three different ways and this seems cleanest and
&lt;br&gt;&amp;gt; most widely applicable approach. It's possible to add calls in more
&lt;br&gt;&amp;gt; places, but I haven't done this for reasons discussed previously.
&lt;/div&gt;&lt;/div&gt;New version of Postgres patch, v5. Implements suggested changes.
&lt;br&gt;Ready for review and apply.
&lt;br&gt;&lt;br&gt;New version of stats plugin, v3. Works with v5.
&lt;br&gt;Corrected problems:
&lt;br&gt;* now loads using preload_shared_libraries as well as LOAD
&lt;br&gt;* example test script fix 
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br /&gt;&lt;tt&gt;[stat_hooks.v5.patch]&lt;/tt&gt;&lt;br /&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;tt&gt;Index: src/backend/utils/adt/selfuncs.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
&lt;br&gt;retrieving revision 1.253
&lt;br&gt;diff -c -r1.253 selfuncs.c
&lt;br&gt;*** src/backend/utils/adt/selfuncs.c	25 Aug 2008 22:42:34 -0000	1.253
&lt;br&gt;--- src/backend/utils/adt/selfuncs.c	25 Sep 2008 15:57:58 -0000
&lt;br&gt;***************
&lt;br&gt;*** 118,123 ****
&lt;br&gt;--- 118,126 ----
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/selfuncs.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/syscache.h&amp;quot;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* Hook for plugins to get control when we ask for stats */
&lt;br&gt;+ get_relation_stats_hook_type get_relation_stats_hook = NULL;
&lt;br&gt;+ release_relation_stats_hook_type release_relation_stats_hook = NULL;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; static double var_eq_const(VariableStatData *vardata, Oid operator,
&lt;br&gt;&amp;nbsp; 			 Datum constval, bool constisnull,
&lt;br&gt;***************
&lt;br&gt;*** 3996,4005 ****
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else if (rte-&amp;gt;rtekind == RTE_RELATION)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			vardata-&amp;gt;statsTuple = SearchSysCache(STATRELATT,
&lt;br&gt;&amp;nbsp; 												 ObjectIdGetDatum(rte-&amp;gt;relid),
&lt;br&gt;&amp;nbsp; 												 Int16GetDatum(var-&amp;gt;varattno),
&lt;br&gt;&amp;nbsp; 												 0, 0);
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;--- 3999,4017 ----
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else if (rte-&amp;gt;rtekind == RTE_RELATION)
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;! 			if (get_relation_stats_hook)
&lt;br&gt;! 				vardata-&amp;gt;statsTuple = (*get_relation_stats_hook) 
&lt;br&gt;! 										(rte-&amp;gt;relid, 
&lt;br&gt;! 										 var-&amp;gt;varattno,
&lt;br&gt;! 										 vardata-&amp;gt;freefunc);
&lt;br&gt;! 			if (!vardata-&amp;gt;statsTuple)
&lt;br&gt;! 			{
&lt;br&gt;! 				vardata-&amp;gt;statsTuple = SearchSysCache(STATRELATT,
&lt;br&gt;&amp;nbsp; 												 ObjectIdGetDatum(rte-&amp;gt;relid),
&lt;br&gt;&amp;nbsp; 												 Int16GetDatum(var-&amp;gt;varattno),
&lt;br&gt;&amp;nbsp; 												 0, 0);
&lt;br&gt;+ 				vardata-&amp;gt;freefunc = ReleaseSysCache;
&lt;br&gt;+ 			}
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;&amp;nbsp; 		else
&lt;br&gt;&amp;nbsp; 		{
&lt;br&gt;***************
&lt;br&gt;*** 4116,4125 ****
&lt;br&gt;&amp;nbsp; 							index-&amp;gt;indpred == NIL)
&lt;br&gt;&amp;nbsp; 							vardata-&amp;gt;isunique = true;
&lt;br&gt;&amp;nbsp; 						/* Has it got stats? */
&lt;br&gt;! 						vardata-&amp;gt;statsTuple = SearchSysCache(STATRELATT,
&lt;br&gt;&amp;nbsp; 										 &amp;nbsp; ObjectIdGetDatum(index-&amp;gt;indexoid),
&lt;br&gt;! 													 &amp;nbsp;Int16GetDatum(pos + 1),
&lt;br&gt;! 															 0, 0);
&lt;br&gt;&amp;nbsp; 						if (vardata-&amp;gt;statsTuple)
&lt;br&gt;&amp;nbsp; 							break;
&lt;br&gt;&amp;nbsp; 					}
&lt;br&gt;--- 4128,4147 ----
&lt;br&gt;&amp;nbsp; 							index-&amp;gt;indpred == NIL)
&lt;br&gt;&amp;nbsp; 							vardata-&amp;gt;isunique = true;
&lt;br&gt;&amp;nbsp; 						/* Has it got stats? */
&lt;br&gt;! 						if (get_relation_stats_hook)
&lt;br&gt;! 							vardata-&amp;gt;statsTuple = (*get_relation_stats_hook) 
&lt;br&gt;! 													(index-&amp;gt;indexoid, 
&lt;br&gt;! 													 pos + 1,
&lt;br&gt;! 													 vardata-&amp;gt;freefunc);
&lt;br&gt;! 						if (!vardata-&amp;gt;statsTuple)
&lt;br&gt;! 						{
&lt;br&gt;! 							vardata-&amp;gt;statsTuple = SearchSysCache(STATRELATT,
&lt;br&gt;&amp;nbsp; 										 &amp;nbsp; ObjectIdGetDatum(index-&amp;gt;indexoid),
&lt;br&gt;! 													 &amp;nbsp;	 Int16GetDatum(pos + 1),
&lt;br&gt;! 														 0, 0);
&lt;br&gt;! 							vardata-&amp;gt;freefunc = ReleaseSysCache;
&lt;br&gt;! 						}
&lt;br&gt;! 
&lt;br&gt;&amp;nbsp; 						if (vardata-&amp;gt;statsTuple)
&lt;br&gt;&amp;nbsp; 							break;
&lt;br&gt;&amp;nbsp; 					}
&lt;br&gt;***************
&lt;br&gt;*** 5551,5557 ****
&lt;br&gt;&amp;nbsp; 	double	 &amp;nbsp; *indexCorrelation = (double *) PG_GETARG_POINTER(7);
&lt;br&gt;&amp;nbsp; 	Oid			relid;
&lt;br&gt;&amp;nbsp; 	AttrNumber	colnum;
&lt;br&gt;! 	HeapTuple	tuple;
&lt;br&gt;&amp;nbsp; 	double		numIndexTuples;
&lt;br&gt;&amp;nbsp; 	List	 &amp;nbsp; *indexBoundQuals;
&lt;br&gt;&amp;nbsp; 	int			indexcol;
&lt;br&gt;--- 5573,5580 ----
&lt;br&gt;&amp;nbsp; 	double	 &amp;nbsp; *indexCorrelation = (double *) PG_GETARG_POINTER(7);
&lt;br&gt;&amp;nbsp; 	Oid			relid;
&lt;br&gt;&amp;nbsp; 	AttrNumber	colnum;
&lt;br&gt;! 	HeapTuple	tuple = NULL;
&lt;br&gt;! 	void		(*freefunc) (HeapTuple tuple) = NULL;
&lt;br&gt;&amp;nbsp; 	double		numIndexTuples;
&lt;br&gt;&amp;nbsp; 	List	 &amp;nbsp; *indexBoundQuals;
&lt;br&gt;&amp;nbsp; 	int			indexcol;
&lt;br&gt;***************
&lt;br&gt;*** 5756,5765 ****
&lt;br&gt;&amp;nbsp; 		colnum = 1;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	tuple = SearchSysCache(STATRELATT,
&lt;br&gt;! 						 &amp;nbsp; ObjectIdGetDatum(relid),
&lt;br&gt;! 						 &amp;nbsp; Int16GetDatum(colnum),
&lt;br&gt;! 						 &amp;nbsp; 0, 0);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	if (HeapTupleIsValid(tuple))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;--- 5779,5795 ----
&lt;br&gt;&amp;nbsp; 		colnum = 1;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	if (get_relation_stats_hook)
&lt;br&gt;! 		tuple = (*get_relation_stats_hook) (relid, colnum, freefunc);
&lt;br&gt;! 
&lt;br&gt;! 	if (!tuple)
&lt;br&gt;! 	{
&lt;br&gt;! 		tuple = SearchSysCache(STATRELATT,
&lt;br&gt;! 							ObjectIdGetDatum(relid),
&lt;br&gt;! 							Int16GetDatum(colnum),
&lt;br&gt;! 							0, 0);
&lt;br&gt;! 		freefunc = ReleaseSysCache;
&lt;br&gt;! 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	if (HeapTupleIsValid(tuple))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;***************
&lt;br&gt;*** 5800,5806 ****
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;! 		ReleaseSysCache(tuple);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	PG_RETURN_VOID();
&lt;br&gt;--- 5830,5837 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
&lt;br&gt;&amp;nbsp; 		}
&lt;br&gt;! 
&lt;br&gt;! 		ReleaseStatsTuple(tuple, freefunc);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	PG_RETURN_VOID();
&lt;br&gt;Index: src/backend/utils/cache/lsyscache.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
&lt;br&gt;retrieving revision 1.159
&lt;br&gt;diff -c -r1.159 lsyscache.c
&lt;br&gt;*** src/backend/utils/cache/lsyscache.c	2 Aug 2008 21:32:00 -0000	1.159
&lt;br&gt;--- src/backend/utils/cache/lsyscache.c	25 Sep 2008 14:16:01 -0000
&lt;br&gt;***************
&lt;br&gt;*** 27,32 ****
&lt;br&gt;--- 27,33 ----
&lt;br&gt;&amp;nbsp; #include &amp;quot;catalog/pg_proc.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;catalog/pg_statistic.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;catalog/pg_type.h&amp;quot;
&lt;br&gt;+ #include &amp;quot;optimizer/plancat.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;miscadmin.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;nodes/makefuncs.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/array.h&amp;quot;
&lt;br&gt;***************
&lt;br&gt;*** 35,40 ****
&lt;br&gt;--- 36,43 ----
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/lsyscache.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/syscache.h&amp;quot;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* Hook for plugins to get control in get_attavgwidth() */
&lt;br&gt;+ get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*				---------- AMOP CACHES ----------						 */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 2492,2507 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*	 &amp;nbsp;Given the table and attribute number of a column, get the average
&lt;br&gt;&amp;nbsp; &amp;nbsp;*	 &amp;nbsp;width of entries in the column. &amp;nbsp;Return zero if no data available.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; int32
&lt;br&gt;&amp;nbsp; get_attavgwidth(Oid relid, AttrNumber attnum)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	HeapTuple	tp;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	tp = SearchSysCache(STATRELATT,
&lt;br&gt;! 						ObjectIdGetDatum(relid),
&lt;br&gt;! 						Int16GetDatum(attnum),
&lt;br&gt;! 						0, 0);
&lt;br&gt;&amp;nbsp; 	if (HeapTupleIsValid(tp))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))-&amp;gt;stawidth;
&lt;br&gt;--- 2495,2524 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*	 &amp;nbsp;Given the table and attribute number of a column, get the average
&lt;br&gt;&amp;nbsp; &amp;nbsp;*	 &amp;nbsp;width of entries in the column. &amp;nbsp;Return zero if no data available.
&lt;br&gt;+ &amp;nbsp;*
&lt;br&gt;+ &amp;nbsp;*	 &amp;nbsp;Calling a hook at this point looks somewhat strange, but is required
&lt;br&gt;+ &amp;nbsp;* 	 &amp;nbsp;because the optimizer handles inheritance relations by calling for
&lt;br&gt;+ &amp;nbsp;*	 &amp;nbsp;the avg width later in the planner than get_relation_info_hook().
&lt;br&gt;+ &amp;nbsp;*	 &amp;nbsp;So the APIs and call points of hooks must match the optimizer.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; int32
&lt;br&gt;&amp;nbsp; get_attavgwidth(Oid relid, AttrNumber attnum)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	HeapTuple	tp;
&lt;br&gt;+ 	int32		stawidth;
&lt;br&gt;+ 
&lt;br&gt;+ 	if (get_attavgwidth_hook)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		stawidth = (*get_attavgwidth_hook) (relid, attnum);
&lt;br&gt;+ 		if (stawidth &amp;gt; 0)
&lt;br&gt;+ 			return stawidth;
&lt;br&gt;+ 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	tp = SearchSysCache(STATRELATT,
&lt;br&gt;! 					ObjectIdGetDatum(relid),
&lt;br&gt;! 					Int16GetDatum(attnum),
&lt;br&gt;! 					0, 0);
&lt;br&gt;! 
&lt;br&gt;&amp;nbsp; 	if (HeapTupleIsValid(tp))
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;&amp;nbsp; 		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))-&amp;gt;stawidth;
&lt;br&gt;Index: src/include/optimizer/plancat.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
&lt;br&gt;retrieving revision 1.51
&lt;br&gt;diff -c -r1.51 plancat.h
&lt;br&gt;*** src/include/optimizer/plancat.h	16 Aug 2008 00:01:38 -0000	1.51
&lt;br&gt;--- src/include/optimizer/plancat.h	25 Sep 2008 15:57:06 -0000
&lt;br&gt;***************
&lt;br&gt;*** 14,19 ****
&lt;br&gt;--- 14,20 ----
&lt;br&gt;&amp;nbsp; #ifndef PLANCAT_H
&lt;br&gt;&amp;nbsp; #define PLANCAT_H
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ #include &amp;quot;access/htup.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;nodes/relation.h&amp;quot;
&lt;br&gt;&amp;nbsp; #include &amp;quot;utils/relcache.h&amp;quot;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 24,29 ****
&lt;br&gt;--- 25,43 ----
&lt;br&gt;&amp;nbsp; 														 RelOptInfo *rel);
&lt;br&gt;&amp;nbsp; extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ /* Hooks for plugins to get control in lsyscache.c and selfuncs.c */
&lt;br&gt;+ typedef HeapTuple (*get_relation_stats_hook_type) (Oid relid, AttrNumber attnum, 
&lt;br&gt;+ 											void (*freefunc) (HeapTuple tuple));
&lt;br&gt;+ extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
&lt;br&gt;+ 
&lt;br&gt;+ /* must match ReleaseSysCache call signature */
&lt;br&gt;+ typedef void (*release_relation_stats_hook_type) (HeapTuple tuple);
&lt;br&gt;+ extern PGDLLIMPORT release_relation_stats_hook_type release_relation_stats_hook;
&lt;br&gt;+ 
&lt;br&gt;+ typedef int32 (*get_attavgwidth_hook_type) (Oid relid, AttrNumber attnum);
&lt;br&gt;+ extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
&lt;br&gt;+ 
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
&lt;br&gt;&amp;nbsp; 				 &amp;nbsp;bool inhparent, RelOptInfo *rel);
&lt;br&gt;Index: src/include/utils/selfuncs.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/utils/selfuncs.h,v
&lt;br&gt;retrieving revision 1.46
&lt;br&gt;diff -c -r1.46 selfuncs.h
&lt;br&gt;*** src/include/utils/selfuncs.h	16 Aug 2008 00:01:38 -0000	1.46
&lt;br&gt;--- src/include/utils/selfuncs.h	25 Sep 2008 15:55:05 -0000
&lt;br&gt;***************
&lt;br&gt;*** 74,85 ****
&lt;br&gt;&amp;nbsp; 	Oid			atttype;		/* type to pass to get_attstatsslot */
&lt;br&gt;&amp;nbsp; 	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
&lt;br&gt;&amp;nbsp; 	bool		isunique;		/* true if matched to a unique index */
&lt;br&gt;&amp;nbsp; } VariableStatData;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; #define ReleaseVariableStats(vardata) &amp;nbsp;\
&lt;br&gt;&amp;nbsp; 	do { \
&lt;br&gt;&amp;nbsp; 		if (HeapTupleIsValid((vardata).statsTuple)) \
&lt;br&gt;! 			ReleaseSysCache((vardata).statsTuple); \
&lt;br&gt;&amp;nbsp; 	} while(0)
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 74,102 ----
&lt;br&gt;&amp;nbsp; 	Oid			atttype;		/* type to pass to get_attstatsslot */
&lt;br&gt;&amp;nbsp; 	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
&lt;br&gt;&amp;nbsp; 	bool		isunique;		/* true if matched to a unique index */
&lt;br&gt;+ 	void	(*freefunc) (HeapTuple tuple);	/* funct ptr to free statsTuple */
&lt;br&gt;&amp;nbsp; } VariableStatData;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;+ #define ReleaseStatsTuple(tuple, freefunc) &amp;nbsp;\
&lt;br&gt;+ 	do { \
&lt;br&gt;+ 		if (HeapTupleIsValid(tuple)) \
&lt;br&gt;+ 		{ \
&lt;br&gt;+ 			if (freefunc) \
&lt;br&gt;+ 				freefunc(tuple); \
&lt;br&gt;+ 			else \
&lt;br&gt;+ 				elog(ERROR, &amp;quot;unable to release variable stats correctly&amp;quot;); \
&lt;br&gt;+ 		} \
&lt;br&gt;+ 	} while(0)
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; #define ReleaseVariableStats(vardata) &amp;nbsp;\
&lt;br&gt;&amp;nbsp; 	do { \
&lt;br&gt;&amp;nbsp; 		if (HeapTupleIsValid((vardata).statsTuple)) \
&lt;br&gt;! 		{ \
&lt;br&gt;! 			if ((vardata).freefunc) \
&lt;br&gt;! 				(vardata).freefunc((vardata).statsTuple); \
&lt;br&gt;! 			else \
&lt;br&gt;! 				elog(ERROR, &amp;quot;unable to release variable stats correctly&amp;quot;); \
&lt;br&gt;! 		} \
&lt;br&gt;&amp;nbsp; 	} while(0)
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&lt;/tt&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19690014&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;&lt;div class=&quot;small&quot;&gt;&lt;br/&gt;&lt;img src=&quot;http://old.nabble.com/images/icon_attachment.gif&quot; &gt; &lt;strong&gt;TOM.v3.tar&lt;/strong&gt; (41K) &lt;a href=&quot;http://old.nabble.com/attachment/19690014/0/TOM.v3.tar&quot; target=&quot;_top&quot;&gt;Download Attachment&lt;/a&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--get_relation_stats_hook%28%29-tp18203715p19690014.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19686849</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-26T04:41:55Z</published>
	<updated>2008-09-26T04:41:55Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Fri, 2008-09-26 at 11:20 +0100, Simon Riggs wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; &amp;gt; After reading this for awhile, I realized that there is a rather
&lt;br&gt;&amp;gt; &amp;gt; fundamental problem with it: it switches into &amp;quot;consistent recovery&amp;quot;
&lt;br&gt;&amp;gt; &amp;gt; mode as soon as it's read WAL beyond ControlFile-&amp;gt;minRecoveryPoint.
&lt;br&gt;&amp;gt; &amp;gt; In a crash recovery situation that typically is before the last
&lt;br&gt;&amp;gt; &amp;gt; checkpoint (if indeed it's not still zero), and what that means is
&lt;br&gt;&amp;gt; &amp;gt; that this patch will activate the bgwriter and start letting in
&lt;br&gt;&amp;gt; &amp;gt; backends instantaneously after a crash, long before we can have any
&lt;br&gt;&amp;gt; &amp;gt; certainty that the DB state really is consistent.
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; In a normal crash recovery situation this would be easily fixed by
&lt;br&gt;&amp;gt; &amp;gt; simply not letting it go to &amp;quot;consistent recovery&amp;quot; state at all, but
&lt;br&gt;&amp;gt; &amp;gt; what about recovery from a restartpoint? &amp;nbsp;We don't want a slave that's
&lt;br&gt;&amp;gt; &amp;gt; crashed once to never let backends in again. &amp;nbsp;But I don't see how to
&lt;br&gt;&amp;gt; &amp;gt; determine that we're far enough past the restartpoint to be consistent
&lt;br&gt;&amp;gt; &amp;gt; again. &amp;nbsp;In crash recovery we assume (without proof ;-)) that we're
&lt;br&gt;&amp;gt; &amp;gt; consistent once we reach the end of valid-looking WAL, but that rule
&lt;br&gt;&amp;gt; &amp;gt; doesn't help for a slave that's following a continuing WAL sequence.
&lt;br&gt;&amp;gt; &amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; Perhaps something could be done based on noting when we have to pull in
&lt;br&gt;&amp;gt; &amp;gt; a WAL segment from the recovery_command, but it sounds like a pretty
&lt;br&gt;&amp;gt; &amp;gt; fragile assumption.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Seems like we just say we only signal the postmaster if
&lt;br&gt;&amp;gt; InArchiveRecovery. Archive recovery from a restartpoint is still archive
&lt;br&gt;&amp;gt; recovery, so this shouldn't be a problem in the way you mention. The
&lt;br&gt;&amp;gt; presence of recovery.conf overrides all other cases.
&lt;/div&gt;&lt;br&gt;Anticipating your possible reponses, I would add this also:
&lt;br&gt;&lt;br&gt;There has long been an annoying hole in the PITR scheme which is the
&lt;br&gt;support of recovery using a crashed database. That is there to support
&lt;br&gt;split mirror snapshots, but it creates a loophole where we don't know
&lt;br&gt;the min recovery location, circumventing the care we (you!) took to put
&lt;br&gt;stop/start backup in place.
&lt;br&gt;&lt;br&gt;I think we need to add a parameter to recovery.conf that people can use
&lt;br&gt;to specify a minRecoveryPoint iff there in no backup label file. They
&lt;br&gt;can work out what this should be by following this procedure, which we
&lt;br&gt;should document:
&lt;br&gt;* split mirror, so you have offline copy of crashed database
&lt;br&gt;* copy database away to backup
&lt;br&gt;* go to running database and run pg_current_xlog_insert_location()
&lt;br&gt;* use the value to specify recovery_min_location
&lt;br&gt;&lt;br&gt;If they don't specify this, then bgwriter will not start and you cannot
&lt;br&gt;run in Hot Standby mode. Their choice, so we need not worry then about
&lt;br&gt;the loophole any more.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19686849&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19686849.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19685787</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-26T03:19:54Z</published>
	<updated>2008-09-26T03:19:54Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19685787&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt; Version 7
&lt;br&gt;&lt;br&gt;&amp;gt; Anyway, that's sufficiently bad that I'm bouncing the patch for
&lt;br&gt;&amp;gt; reconsideration. &amp;nbsp;
&lt;br&gt;&lt;br&gt;No problem, I understand this needs discussion. 
&lt;br&gt;&lt;br&gt;There's less detail here than first appears. There are some basic points
&lt;br&gt;to consider from which all else follows.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; After reading this for awhile, I realized that there is a rather
&lt;br&gt;&amp;gt; fundamental problem with it: it switches into &amp;quot;consistent recovery&amp;quot;
&lt;br&gt;&amp;gt; mode as soon as it's read WAL beyond ControlFile-&amp;gt;minRecoveryPoint.
&lt;br&gt;&amp;gt; In a crash recovery situation that typically is before the last
&lt;br&gt;&amp;gt; checkpoint (if indeed it's not still zero), and what that means is
&lt;br&gt;&amp;gt; that this patch will activate the bgwriter and start letting in
&lt;br&gt;&amp;gt; backends instantaneously after a crash, long before we can have any
&lt;br&gt;&amp;gt; certainty that the DB state really is consistent.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; In a normal crash recovery situation this would be easily fixed by
&lt;br&gt;&amp;gt; simply not letting it go to &amp;quot;consistent recovery&amp;quot; state at all, but
&lt;br&gt;&amp;gt; what about recovery from a restartpoint? &amp;nbsp;We don't want a slave that's
&lt;br&gt;&amp;gt; crashed once to never let backends in again. &amp;nbsp;But I don't see how to
&lt;br&gt;&amp;gt; determine that we're far enough past the restartpoint to be consistent
&lt;br&gt;&amp;gt; again. &amp;nbsp;In crash recovery we assume (without proof ;-)) that we're
&lt;br&gt;&amp;gt; consistent once we reach the end of valid-looking WAL, but that rule
&lt;br&gt;&amp;gt; doesn't help for a slave that's following a continuing WAL sequence.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Perhaps something could be done based on noting when we have to pull in
&lt;br&gt;&amp;gt; a WAL segment from the recovery_command, but it sounds like a pretty
&lt;br&gt;&amp;gt; fragile assumption.
&lt;/div&gt;&lt;br&gt;Seems like we just say we only signal the postmaster if
&lt;br&gt;InArchiveRecovery. Archive recovery from a restartpoint is still archive
&lt;br&gt;recovery, so this shouldn't be a problem in the way you mention. The
&lt;br&gt;presence of recovery.conf overrides all other cases.
&lt;br&gt;&lt;br&gt;&amp;gt; Some other issues I noted before giving up:
&lt;br&gt;&lt;br&gt;All of these issues raised can be addressed, but I think the main
&lt;br&gt;decision we need to make is not so much about running other processes
&lt;br&gt;but about when it can start and when they have to change mode.
&lt;br&gt;&lt;br&gt;When they can start seems solvable, as above.
&lt;br&gt;&lt;br&gt;When/how they must change state from recovery to normal mode seems more
&lt;br&gt;difficult. State change must be atomic across all processes, but also
&lt;br&gt;done at a micro level so that XLogFlush tests for the state change. The
&lt;br&gt;overhead of continually checking seems high, so I am tempted to say lets
&lt;br&gt;just kick 'em all off and then let them back on again. That's easily
&lt;br&gt;accomplished for bgwriter without anybody noticing much. For Hot Standby
&lt;br&gt;that would mean that a failover would kick off all query backends. I can
&lt;br&gt;see why that would be both desirable and undesirable.
&lt;br&gt;&lt;br&gt;Anyway, from here I propose:
&lt;br&gt;* we keep the shutdown checkpoint
&lt;br&gt;* we kick off bgwriter (and any children) then let 'em back on again so
&lt;br&gt;they can initialise in a different mode.
&lt;br&gt;&lt;br&gt;To do that, I just need to dust off a previous version of the patch. So
&lt;br&gt;we can sort this out quickly if we have a clear way to proceed.
&lt;br&gt;&lt;br&gt;------------------------------------------------------------------
&lt;br&gt;other comments relate to this current patch, so further discussion of
&lt;br&gt;the points below may not be required, if we agree how to proceed as
&lt;br&gt;above.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; * I'm a bit uncomfortable with the fact that the
&lt;br&gt;&amp;gt; IsRecoveryProcessingMode flag is read and written with no lock.
&lt;br&gt;&amp;gt; There's no atomicity or concurrent-write problem, sure, but on
&lt;br&gt;&amp;gt; a multiprocessor with weak memory ordering guarantees (eg PPC)
&lt;br&gt;&amp;gt; readers could get a fairly stale value of the flag. &amp;nbsp;The false
&lt;br&gt;&amp;gt; to true transition happens before anyone except the startup process is
&lt;br&gt;&amp;gt; running, so that's no problem; the net effect is then that backends
&lt;br&gt;&amp;gt; might think that recovery mode was still active for awhile after it
&lt;br&gt;&amp;gt; wasn't. &amp;nbsp;This seems a bit scary, eg in the patch as it stands that'll
&lt;br&gt;&amp;gt; disable XLogFlush calls that should have happened. &amp;nbsp;You could fix that
&lt;br&gt;&amp;gt; by grabbing/releasing some spinlock (any old one) around the accesses,
&lt;br&gt;&amp;gt; but are any of the call sites performance-critical? &amp;nbsp;The one in
&lt;br&gt;&amp;gt; XLogInsert looks like it is, if nothing else.
&lt;/div&gt;&lt;br&gt;Agreed.
&lt;br&gt;&lt;br&gt;It's not a dynamic state, so I can fix that inside
&lt;br&gt;IsRecoveryProcessingMode() with a local state to make check faster.
&lt;br&gt;&lt;br&gt;bool
&lt;br&gt;IsRecoveryProcessingMode(void)
&lt;br&gt;{
&lt;br&gt;&amp;nbsp; &amp;nbsp; if (!IsRecoveryProcessingMode)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return false;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /* use volatile pointer to prevent code rearrangement */
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; volatile XLogCtlData *xlogctl = XLogCtl;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; SpinLockAcquire(&amp;xlogctl-&amp;gt;mode_lck);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; RecoveryProcessingMode = XLogCtl-&amp;gt;IsRecoveryProcessingMode;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; SpinLockRelease(&amp;xlogctl-&amp;gt;mode_lck);
&lt;br&gt;&amp;nbsp; &amp;nbsp;}
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;return IsRecoveryProcessingMode;
&lt;br&gt;}
&lt;br&gt;&lt;br&gt;This only applies if we decide not to kick everybody off, change state
&lt;br&gt;and then let them back on again.
&lt;br&gt;&lt;br&gt;&amp;gt; * I kinda think you broke XLogFlush anyway. &amp;nbsp;It's certainly clear
&lt;br&gt;&amp;gt; that the WARNING case at the bottom is unreachable with the patch,
&lt;br&gt;&amp;gt; and I think that means that you've messed up an important error
&lt;br&gt;&amp;gt; robustness behavior. &amp;nbsp;Is it still possible to get out of recovery mode
&lt;br&gt;&amp;gt; if there are any bad LSNs in the shared buffer pool?
&lt;br&gt;&lt;br&gt;Perhaps. But the WARNING could only occur during shutdown checkpoints.
&lt;br&gt;This specifically patch avoids those, so the case would never arise with
&lt;br&gt;this patch and needs no avoidance. Yes, you can still leave recovery
&lt;br&gt;mode if there are bad LSNs with this patch, but you won't know what they
&lt;br&gt;are because of the lack of the shutdown checkpoint. Probably an argument
&lt;br&gt;in favour of allowing shutdown checkpoints.
&lt;br&gt;&lt;br&gt;&amp;gt; * The use of InRecovery in CreateCheckPoint seems pretty bogus, since
&lt;br&gt;&amp;gt; that function can be called from the bgwriter in which the flag will
&lt;br&gt;&amp;gt; never be true. &amp;nbsp;Either this needs to be IsRecoveryProcessingMode(),
&lt;br&gt;&amp;gt; or it's useless because we'll never create ordinary checkpoints until
&lt;br&gt;&amp;gt; after subtrans.c is up anyway.
&lt;br&gt;&lt;br&gt;Exactly. bgwriter never needs this to be set because it writes
&lt;br&gt;restorepoints before this, using different code path.
&lt;br&gt;&lt;br&gt;&amp;gt; * The patch moves the clearing of InRecovery from after to before
&lt;br&gt;&amp;gt; StartupCLOG, RecoverPreparedTransactions, etc. &amp;nbsp;Is that really a
&lt;br&gt;&amp;gt; good idea? &amp;nbsp;It makes it hard for those modules to know if they are
&lt;br&gt;&amp;gt; coming up after a normal or recovery startup. &amp;nbsp;I think they may not
&lt;br&gt;&amp;gt; care at the moment, but I'd leave that alone without a darn good
&lt;br&gt;&amp;gt; reason to change it.
&lt;br&gt;&lt;br&gt;I didn't move this as you say. It already was before StartupClog.
&lt;br&gt;&lt;br&gt;I moved it into exitRecovery() only, so it was unset in the same way
&lt;br&gt;exitArchiveRecovery sets InArchiveRecovery to false. So refactoring, no
&lt;br&gt;change of sequencing.
&lt;br&gt;&lt;br&gt;&amp;gt; * The comment about CheckpointLock being only pro forma is now wrong,
&lt;br&gt;&amp;gt; if you are going to use locking it to implement a delay in exitRecovery.
&lt;br&gt;&amp;gt; But I don't understand why the delay there. &amp;nbsp;If it's needed, seems like
&lt;br&gt;&amp;gt; the path where a restartpoint *isn't* in progress is wrong --- don't you
&lt;br&gt;&amp;gt; actually need to start one and wait for it? 
&lt;br&gt;&lt;br&gt;All of this ducking and diving is because of the bgwriter needing to
&lt;br&gt;perform a state change. That's the ball to keep our eye on.
&lt;br&gt;&lt;br&gt;After much thrashing, I decided that interrupting a restartpoint is too
&lt;br&gt;dangerous a thing to want to do. If we're in the middle of one, we
&lt;br&gt;finish it, if not there's no need to interrupt it.
&lt;br&gt;&lt;br&gt;&amp;gt; &amp;nbsp;Also I note that if the 
&lt;br&gt;&amp;gt; LWLockConditionalAcquire succeeds, you neglect to release the lock,
&lt;br&gt;&amp;gt; which surely can't be right.
&lt;br&gt;&lt;br&gt;Doh. Thanks.
&lt;br&gt;&lt;br&gt;&amp;gt; * The tail end of StartupXLOG() looks pretty unsafe to me. &amp;nbsp;Surely
&lt;br&gt;&amp;gt; we mustn't clear IsRecoveryProcessingMode until after we have
&lt;br&gt;&amp;gt; established the safe-recovery checkpoint. &amp;nbsp;The comments there seem to
&lt;br&gt;&amp;gt; be only vaguely related to the current state of the patch, too.
&lt;br&gt;&lt;br&gt;The whole point was to remove the ShutdownCheckpoint, but it sounds like
&lt;br&gt;you're not keen on that any more.
&lt;br&gt;&lt;br&gt;I'm neutral on this point: I can see why people would want it removed -
&lt;br&gt;it will speed up failover. I can see why people would want it kept -
&lt;br&gt;there is a slight window where if we crash we will need to go back to
&lt;br&gt;archive recovery.
&lt;br&gt;&lt;br&gt;I had a workable solution that kept it, so will revert to it.
&lt;br&gt;&lt;br&gt;&amp;gt; * Logging of recoveryLastXTime seems pretty bogus now. &amp;nbsp;It's supposed to
&lt;br&gt;&amp;gt; be associated with a restartpoint completion report, but now it's just
&lt;br&gt;&amp;gt; out in the ether somewhere and doesn't represent a guarantee that we're
&lt;br&gt;&amp;gt; synchronized that far.
&lt;br&gt;&lt;br&gt;That last one was there before so we knew where the log ended. It was
&lt;br&gt;not supposed to be associated with a restartpoint, just with end of log
&lt;br&gt;purely for information (by user request, for when a log file is
&lt;br&gt;corrupted and we need to know &amp;quot;when&amp;quot; we are up to).
&lt;br&gt;&lt;br&gt;&amp;gt; * backup.sgml needs to be updated to say that log_restartpoints is
&lt;br&gt;&amp;gt; obsolete.
&lt;br&gt;&lt;br&gt;Yes
&lt;br&gt;&lt;br&gt;&amp;gt; * There are a bunch of disturbing assumptions in the SLRU-related
&lt;br&gt;&amp;gt; modules about their StartUp calls being executed without any concurrent
&lt;br&gt;&amp;gt; access. &amp;nbsp;This isn't really a problem that needs to be dealt with in this
&lt;br&gt;&amp;gt; patch, I think, but that will all have to be cleaned up before we dare
&lt;br&gt;&amp;gt; allow any backends to run concurrently with recovery. &amp;nbsp;
&lt;br&gt;&lt;br&gt;Well spotted, thanks.
&lt;br&gt;&lt;br&gt;&amp;gt; btree's
&lt;br&gt;&amp;gt; suppression of relcache invals for metapage updates will be a problem
&lt;br&gt;&amp;gt; too.
&lt;br&gt;&lt;br&gt;Again thanks. This patch is stand-alone from later work, thats why.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19685787&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19685787.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19679507</id>
	<title>Re: [HACKERS] Infrastructure changes for recovery</title>
	<published>2008-09-25T15:28:39Z</published>
	<updated>2008-09-25T15:28:39Z</updated>
	<author>
		<name>Tom Lane-2</name>
	</author>
	<content type="html">Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19679507&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; Version 7
&lt;br&gt;&lt;br&gt;After reading this for awhile, I realized that there is a rather
&lt;br&gt;fundamental problem with it: it switches into &amp;quot;consistent recovery&amp;quot;
&lt;br&gt;mode as soon as it's read WAL beyond ControlFile-&amp;gt;minRecoveryPoint.
&lt;br&gt;In a crash recovery situation that typically is before the last
&lt;br&gt;checkpoint (if indeed it's not still zero), and what that means is
&lt;br&gt;that this patch will activate the bgwriter and start letting in
&lt;br&gt;backends instantaneously after a crash, long before we can have any
&lt;br&gt;certainty that the DB state really is consistent.
&lt;br&gt;&lt;br&gt;In a normal crash recovery situation this would be easily fixed by
&lt;br&gt;simply not letting it go to &amp;quot;consistent recovery&amp;quot; state at all, but
&lt;br&gt;what about recovery from a restartpoint? &amp;nbsp;We don't want a slave that's
&lt;br&gt;crashed once to never let backends in again. &amp;nbsp;But I don't see how to
&lt;br&gt;determine that we're far enough past the restartpoint to be consistent
&lt;br&gt;again. &amp;nbsp;In crash recovery we assume (without proof ;-)) that we're
&lt;br&gt;consistent once we reach the end of valid-looking WAL, but that rule
&lt;br&gt;doesn't help for a slave that's following a continuing WAL sequence.
&lt;br&gt;&lt;br&gt;Perhaps something could be done based on noting when we have to pull in
&lt;br&gt;a WAL segment from the recovery_command, but it sounds like a pretty
&lt;br&gt;fragile assumption.
&lt;br&gt;&lt;br&gt;Anyway, that's sufficiently bad that I'm bouncing the patch for
&lt;br&gt;reconsideration. &amp;nbsp;Some other issues I noted before giving up:
&lt;br&gt;&lt;br&gt;* I'm a bit uncomfortable with the fact that the
&lt;br&gt;IsRecoveryProcessingMode flag is read and written with no lock.
&lt;br&gt;There's no atomicity or concurrent-write problem, sure, but on
&lt;br&gt;a multiprocessor with weak memory ordering guarantees (eg PPC)
&lt;br&gt;readers could get a fairly stale value of the flag. &amp;nbsp;The false
&lt;br&gt;to true transition happens before anyone except the startup process is
&lt;br&gt;running, so that's no problem; the net effect is then that backends
&lt;br&gt;might think that recovery mode was still active for awhile after it
&lt;br&gt;wasn't. &amp;nbsp;This seems a bit scary, eg in the patch as it stands that'll
&lt;br&gt;disable XLogFlush calls that should have happened. &amp;nbsp;You could fix that
&lt;br&gt;by grabbing/releasing some spinlock (any old one) around the accesses,
&lt;br&gt;but are any of the call sites performance-critical? &amp;nbsp;The one in
&lt;br&gt;XLogInsert looks like it is, if nothing else.
&lt;br&gt;&lt;br&gt;* I kinda think you broke XLogFlush anyway. &amp;nbsp;It's certainly clear
&lt;br&gt;that the WARNING case at the bottom is unreachable with the patch,
&lt;br&gt;and I think that means that you've messed up an important error
&lt;br&gt;robustness behavior. &amp;nbsp;Is it still possible to get out of recovery mode
&lt;br&gt;if there are any bad LSNs in the shared buffer pool?
&lt;br&gt;&lt;br&gt;* The use of InRecovery in CreateCheckPoint seems pretty bogus, since
&lt;br&gt;that function can be called from the bgwriter in which the flag will
&lt;br&gt;never be true. &amp;nbsp;Either this needs to be IsRecoveryProcessingMode(),
&lt;br&gt;or it's useless because we'll never create ordinary checkpoints until
&lt;br&gt;after subtrans.c is up anyway.
&lt;br&gt;&lt;br&gt;* The patch moves the clearing of InRecovery from after to before
&lt;br&gt;StartupCLOG, RecoverPreparedTransactions, etc. &amp;nbsp;Is that really a
&lt;br&gt;good idea? &amp;nbsp;It makes it hard for those modules to know if they are
&lt;br&gt;coming up after a normal or recovery startup. &amp;nbsp;I think they may not
&lt;br&gt;care at the moment, but I'd leave that alone without a darn good
&lt;br&gt;reason to change it.
&lt;br&gt;&lt;br&gt;* The comment about CheckpointLock being only pro forma is now wrong,
&lt;br&gt;if you are going to use locking it to implement a delay in exitRecovery.
&lt;br&gt;But I don't understand why the delay there. &amp;nbsp;If it's needed, seems like
&lt;br&gt;the path where a restartpoint *isn't* in progress is wrong --- don't you
&lt;br&gt;actually need to start one and wait for it? &amp;nbsp;Also I note that if the 
&lt;br&gt;LWLockConditionalAcquire succeeds, you neglect to release the lock,
&lt;br&gt;which surely can't be right.
&lt;br&gt;&lt;br&gt;* The tail end of StartupXLOG() looks pretty unsafe to me. &amp;nbsp;Surely
&lt;br&gt;we mustn't clear IsRecoveryProcessingMode until after we have
&lt;br&gt;established the safe-recovery checkpoint. &amp;nbsp;The comments there seem to
&lt;br&gt;be only vaguely related to the current state of the patch, too.
&lt;br&gt;&lt;br&gt;* Logging of recoveryLastXTime seems pretty bogus now. &amp;nbsp;It's supposed to
&lt;br&gt;be associated with a restartpoint completion report, but now it's just
&lt;br&gt;out in the ether somewhere and doesn't represent a guarantee that we're
&lt;br&gt;synchronized that far.
&lt;br&gt;&lt;br&gt;* backup.sgml needs to be updated to say that log_restartpoints is
&lt;br&gt;obsolete.
&lt;br&gt;&lt;br&gt;* There are a bunch of disturbing assumptions in the SLRU-related
&lt;br&gt;modules about their StartUp calls being executed without any concurrent
&lt;br&gt;access. &amp;nbsp;This isn't really a problem that needs to be dealt with in this
&lt;br&gt;patch, I think, but that will all have to be cleaned up before we dare
&lt;br&gt;allow any backends to run concurrently with recovery. &amp;nbsp;btree's
&lt;br&gt;suppression of relcache invals for metapage updates will be a problem
&lt;br&gt;too.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regards, tom lane
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19679507&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Infrastructure-changes-for-recovery-tp19245508p19679507.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19658661</id>
	<title>Re: [HACKERS] Subtransaction commits and Hot Standby</title>
	<published>2008-09-24T14:44:19Z</published>
	<updated>2008-09-24T14:44:19Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Wed, 2008-09-24 at 13:48 +0100, Simon Riggs wrote:
&lt;br&gt;&amp;gt; On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &amp;gt; I've tested this some more and am much happier with it now.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; The concept is fine, but I've found a coding bug in further testing.
&lt;br&gt;&amp;gt; Please wait now for new version before review.
&lt;br&gt;&lt;br&gt;OK, spent long time testing various batching scenarios for this using a
&lt;br&gt;custom test harness to simulate various spreads of xids in transaction
&lt;br&gt;trees. All looks fine now.
&lt;br&gt;&lt;br&gt;The main work is done in new clog.c functions:
&lt;br&gt;TransactionIdSetTreeStatus() which sets whole tree atomically by calling
&lt;br&gt;TransactionIdSetPageStatus(), which in turn calls
&lt;br&gt;TransactionIdSetStatusBit() for each xid status change.
&lt;br&gt;&lt;br&gt;TransactionIdSetPageStatus() performs locking and handles write_ok
&lt;br&gt;problem, as did code it replaces. TransactionIdSetPageStatus() is called
&lt;br&gt;theoretical minimum number of times for any transaction tree.
&lt;br&gt;&lt;br&gt;Patch slightly fumbles diff-ing new and replacement code, so there are
&lt;br&gt;two chunks that appear to show I'm removing locking. I'm not!!
&lt;br&gt;&lt;br&gt;Everything else is just API changes.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br /&gt;&lt;tt&gt;[atomic_subxids.v4.patch]&lt;/tt&gt;&lt;br /&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;tt&gt;Index: src/backend/access/transam/README
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/README,v
&lt;br&gt;retrieving revision 1.11
&lt;br&gt;diff -c -r1.11 README
&lt;br&gt;*** src/backend/access/transam/README	21 Mar 2008 13:23:28 -0000	1.11
&lt;br&gt;--- src/backend/access/transam/README	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 342,351 ****
&lt;br&gt;&amp;nbsp; an XID. &amp;nbsp;A transaction can be in progress, committed, aborted, or
&lt;br&gt;&amp;nbsp; &amp;quot;sub-committed&amp;quot;. &amp;nbsp;This last state means that it's a subtransaction that's no
&lt;br&gt;&amp;nbsp; longer running, but its parent has not updated its state yet (either it is
&lt;br&gt;! still running, or the backend crashed without updating its status). &amp;nbsp;A
&lt;br&gt;! sub-committed transaction's status will be updated again to the final value as
&lt;br&gt;! soon as the parent commits or aborts, or when the parent is detected to be
&lt;br&gt;! aborted.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; Savepoints are implemented using subtransactions. &amp;nbsp;A subtransaction is a
&lt;br&gt;&amp;nbsp; transaction inside a transaction; its commit or abort status is not only
&lt;br&gt;--- 342,360 ----
&lt;br&gt;&amp;nbsp; an XID. &amp;nbsp;A transaction can be in progress, committed, aborted, or
&lt;br&gt;&amp;nbsp; &amp;quot;sub-committed&amp;quot;. &amp;nbsp;This last state means that it's a subtransaction that's no
&lt;br&gt;&amp;nbsp; longer running, but its parent has not updated its state yet (either it is
&lt;br&gt;! still running, or the backend crashed without updating its status). &amp;nbsp;Prior
&lt;br&gt;! to 8.4 we updated the status to sub-committed in clog as soon as
&lt;br&gt;! sub-commit had happened. &amp;nbsp;It was later realised that this is not actually
&lt;br&gt;! required for any purpose and the action can be deferred until transaction
&lt;br&gt;! commit. The main role of marking transactions as sub-committed is to 
&lt;br&gt;! provide an atomic commit protocol when transaction status is spread across
&lt;br&gt;! multiple clog pages. As a result whenever transaction status spreads
&lt;br&gt;! across multiple pages we must use a two-phase commit protocol. The first
&lt;br&gt;! phase is to mark the subtransactions as sub-committed, then we mark the
&lt;br&gt;! top level transaction and all its subtransactions committed (in that order).
&lt;br&gt;! So in 8.4 sub-committed state still exists, but as a transitory state as
&lt;br&gt;! part of final commit. Subtransaction abort is always marked in clog as
&lt;br&gt;! soon as it occurs, to allow locks to be released.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; Savepoints are implemented using subtransactions. &amp;nbsp;A subtransaction is a
&lt;br&gt;&amp;nbsp; transaction inside a transaction; its commit or abort status is not only
&lt;br&gt;Index: src/backend/access/transam/clog.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
&lt;br&gt;retrieving revision 1.47
&lt;br&gt;diff -c -r1.47 clog.c
&lt;br&gt;*** src/backend/access/transam/clog.c	1 Aug 2008 13:16:08 -0000	1.47
&lt;br&gt;--- src/backend/access/transam/clog.c	24 Sep 2008 21:24:02 -0000
&lt;br&gt;***************
&lt;br&gt;*** 80,89 ****
&lt;br&gt;&amp;nbsp; static bool CLOGPagePrecedes(int page1, int page2);
&lt;br&gt;&amp;nbsp; static void WriteZeroPageXlogRec(int pageno);
&lt;br&gt;&amp;nbsp; static void WriteTruncateXlogRec(int pageno);
&lt;br&gt;! 
&lt;br&gt;! 
&lt;br&gt;! /*
&lt;br&gt;! &amp;nbsp;* Record the final state of a transaction in the commit log.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* lsn must be the WAL location of the commit record when recording an async
&lt;br&gt;&amp;nbsp; &amp;nbsp;* commit.	For a synchronous commit it can be InvalidXLogRecPtr, since the
&lt;br&gt;--- 80,105 ----
&lt;br&gt;&amp;nbsp; static bool CLOGPagePrecedes(int page1, int page2);
&lt;br&gt;&amp;nbsp; static void WriteZeroPageXlogRec(int pageno);
&lt;br&gt;&amp;nbsp; static void WriteTruncateXlogRec(int pageno);
&lt;br&gt;! static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids, 
&lt;br&gt;! 	TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommit);
&lt;br&gt;! static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status, 
&lt;br&gt;! 	XLogRecPtr lsn, int slotno);
&lt;br&gt;! 
&lt;br&gt;! /*
&lt;br&gt;! &amp;nbsp;* TransactionIdSetTreeStatus
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* Record the final state of transaction entries in the commit log for
&lt;br&gt;! &amp;nbsp;* a transaction and its subtransaction tree. Take care to ensure this is
&lt;br&gt;! &amp;nbsp;* both atomic and efficient. Prior to 8.4, this capability was provided
&lt;br&gt;! &amp;nbsp;* by the non-atomic TransactionIdSetStatus, which is replaced by this
&lt;br&gt;! &amp;nbsp;* new atomic version.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* xid is a single xid to set status for. This will typically be
&lt;br&gt;! &amp;nbsp;* the top level transactionid for a top level commit or abort. It can
&lt;br&gt;! &amp;nbsp;* also be a subtransaction when we record transaction aborts.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* subxids is an array of xids of length nsubxids, representing subtransactions
&lt;br&gt;! &amp;nbsp;* in the tree of xid. In various cases nsubxids may be zero.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* lsn must be the WAL location of the commit record when recording an async
&lt;br&gt;&amp;nbsp; &amp;nbsp;* commit.	For a synchronous commit it can be InvalidXLogRecPtr, since the
&lt;br&gt;***************
&lt;br&gt;*** 91,107 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;* should be InvalidXLogRecPtr for abort cases, too.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* NB: this is a low-level routine and is NOT the preferred entry point
&lt;br&gt;! &amp;nbsp;* for most uses; TransactionLogUpdate() in transam.c is the intended caller.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;- 	int			pageno = TransactionIdToPage(xid);
&lt;br&gt;- 	int			byteno = TransactionIdToByte(xid);
&lt;br&gt;- 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
&lt;br&gt;&amp;nbsp; 	int			slotno;
&lt;br&gt;! 	char	 &amp;nbsp; *byteptr;
&lt;br&gt;! 	char		byteval;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
&lt;br&gt;&amp;nbsp; 		 &amp;nbsp; status == TRANSACTION_STATUS_ABORTED ||
&lt;br&gt;--- 107,238 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;* should be InvalidXLogRecPtr for abort cases, too.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* NB: this is a low-level routine and is NOT the preferred entry point
&lt;br&gt;! &amp;nbsp;* for most uses; functions in in transam.c are the intended callers.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* Note that no lock requests are made at this level, only lower functions.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
&lt;br&gt;! &amp;nbsp;* but aren't yet in cache, as well as hinting pages not to fall out of
&lt;br&gt;! &amp;nbsp;* cache yet.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids, 
&lt;br&gt;! 				TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
&lt;br&gt;! {
&lt;br&gt;! 	int		pageno = TransactionIdToPage(xid); /* get page of parent */
&lt;br&gt;! 	int 	i;
&lt;br&gt;! 
&lt;br&gt;! 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
&lt;br&gt;! 		 &amp;nbsp; status == TRANSACTION_STATUS_ABORTED);
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * See how many subxids, if any, are on the same page as the parent, if any.
&lt;br&gt;! 	 */
&lt;br&gt;! 	for (i = 0; i &amp;lt; nsubxids; i++)
&lt;br&gt;! 	{
&lt;br&gt;! 		if (TransactionIdToPage(subxids[i]) != pageno)
&lt;br&gt;! 			break;
&lt;br&gt;! 	}
&lt;br&gt;! 
&lt;br&gt;! 	/*
&lt;br&gt;! 	 * Do all items fit on a single page?
&lt;br&gt;! 	 */
&lt;br&gt;! 	if (i == nsubxids)
&lt;br&gt;! 	{
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Set the parent and any subtransactions on same page as it
&lt;br&gt;! 		 */
&lt;br&gt;! 		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn, 
&lt;br&gt;! 										pageno, true);
&lt;br&gt;! 	}
&lt;br&gt;! 	else
&lt;br&gt;! 	{
&lt;br&gt;! 		int		num_on_first_page = i;
&lt;br&gt;! 		int		num_on_page = 0;
&lt;br&gt;! 		int		offset;
&lt;br&gt;! 
&lt;br&gt;! 		if (status == TRANSACTION_STATUS_COMMITTED)
&lt;br&gt;! 		{
&lt;br&gt;! 			/* 
&lt;br&gt;! 			 * If this is a commit then we care about doing this atomically.
&lt;br&gt;! 			 * By here, we know we're updating more than one page of clog,
&lt;br&gt;! 			 * so we must mark entries that are *not* on the first page so
&lt;br&gt;! 			 * that they show as subcommitted before we then return to 
&lt;br&gt;! 			 * update the status to fully committed.
&lt;br&gt;! 			 * We don't mark the first page because we will be doing that
&lt;br&gt;! 			 * when we mark the main commit, so we wish to avoid touching
&lt;br&gt;! 			 * that page twice.
&lt;br&gt;! 			 */
&lt;br&gt;! 			num_on_page = 0;
&lt;br&gt;! 			i = offset = num_on_first_page;
&lt;br&gt;! 			pageno = TransactionIdToPage(subxids[num_on_first_page]);
&lt;br&gt;! 			while (i &amp;lt; nsubxids)
&lt;br&gt;! 			{
&lt;br&gt;! 				while (TransactionIdToPage(subxids[i]) == pageno &amp;&amp; i &amp;lt; nsubxids)
&lt;br&gt;! 				{
&lt;br&gt;! 					num_on_page++;
&lt;br&gt;! 					i++;
&lt;br&gt;! 				}
&lt;br&gt;! 
&lt;br&gt;! 				TransactionIdSetPageStatus(InvalidTransactionId, 
&lt;br&gt;! 								num_on_page, subxids + offset,
&lt;br&gt;! 								TRANSACTION_STATUS_SUB_COMMITTED, lsn, pageno, true);
&lt;br&gt;! 				offset = i;
&lt;br&gt;! 				num_on_page = 0;
&lt;br&gt;! 				pageno = TransactionIdToPage(subxids[offset]);
&lt;br&gt;! 			}
&lt;br&gt;! 		}
&lt;br&gt;! 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Now set the parent and subtransactions on same page as it, if any
&lt;br&gt;! 		 */
&lt;br&gt;! 		pageno = TransactionIdToPage(xid);
&lt;br&gt;! 		TransactionIdSetPageStatus(xid, num_on_first_page, subxids, status, lsn, 
&lt;br&gt;! 											pageno, true);
&lt;br&gt;! 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * By now, all subtransactions have been subcommitted, so all calls
&lt;br&gt;! 		 * to TransactionIdSetPageStatus() will use subcommit=false after
&lt;br&gt;! 		 * this point for this transaction tree.
&lt;br&gt;! 		 */
&lt;br&gt;! 
&lt;br&gt;! 		/*
&lt;br&gt;! 		 * Now work through the rest of the subxids one clog page at a time,
&lt;br&gt;! 		 * starting from the second page onwards, like we did above.
&lt;br&gt;! 		 */
&lt;br&gt;! 		num_on_page = 0;
&lt;br&gt;! 		i = offset = num_on_first_page;
&lt;br&gt;! 		pageno = TransactionIdToPage(subxids[num_on_first_page]);
&lt;br&gt;! 		while (i &amp;lt; nsubxids)
&lt;br&gt;! 		{
&lt;br&gt;! 			while (TransactionIdToPage(subxids[i]) == pageno &amp;&amp; i &amp;lt; nsubxids)
&lt;br&gt;! 			{
&lt;br&gt;! 				num_on_page++;
&lt;br&gt;! 				i++;
&lt;br&gt;! 			}
&lt;br&gt;! 
&lt;br&gt;! 			TransactionIdSetPageStatus(InvalidTransactionId, 
&lt;br&gt;! 							num_on_page, subxids + offset,
&lt;br&gt;! 							status, lsn, pageno, false);
&lt;br&gt;! 			offset = i;
&lt;br&gt;! 			num_on_page = 0;
&lt;br&gt;! 			pageno = TransactionIdToPage(subxids[offset]);
&lt;br&gt;! 		}
&lt;br&gt;! 	}
&lt;br&gt;! }
&lt;br&gt;! 
&lt;br&gt;! /*
&lt;br&gt;! &amp;nbsp;* Record the final state of transaction entries in the commit log for
&lt;br&gt;! &amp;nbsp;* all entries on *one* page only. Atomic only on this page.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* Otherwise API is same as TransactionIdSetTreeStatus()
&lt;br&gt;! &amp;nbsp;*/
&lt;br&gt;! static void
&lt;br&gt;! TransactionIdSetPageStatus(TransactionId xid, int nsubxids, 
&lt;br&gt;! 		TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommit)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;&amp;nbsp; 	int			slotno;
&lt;br&gt;! 	int 		i;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
&lt;br&gt;&amp;nbsp; 		 &amp;nbsp; status == TRANSACTION_STATUS_ABORTED ||
&lt;br&gt;***************
&lt;br&gt;*** 116,124 ****
&lt;br&gt;&amp;nbsp; 	 * mustn't let it reach disk until we've done the appropriate WAL flush.
&lt;br&gt;&amp;nbsp; 	 * But when lsn is invalid, it's OK to scribble on a page while it is
&lt;br&gt;&amp;nbsp; 	 * write-busy, since we don't care if the update reaches disk sooner than
&lt;br&gt;! 	 * we think. &amp;nbsp;Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
&lt;br&gt;&amp;nbsp; 	byteptr = ClogCtl-&amp;gt;shared-&amp;gt;page_buffer[slotno] + byteno;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Current state should be 0, subcommitted or target state */
&lt;br&gt;--- 247,303 ----
&lt;br&gt;&amp;nbsp; 	 * mustn't let it reach disk until we've done the appropriate WAL flush.
&lt;br&gt;&amp;nbsp; 	 * But when lsn is invalid, it's OK to scribble on a page while it is
&lt;br&gt;&amp;nbsp; 	 * write-busy, since we don't care if the update reaches disk sooner than
&lt;br&gt;! 	 * we think. &amp;nbsp;
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;&amp;nbsp; 	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
&lt;br&gt;+ 
&lt;br&gt;+ 	/*
&lt;br&gt;+ 	 * If we synch commit more than one xid on this page while write busy
&lt;br&gt;+ 	 * we might find that some of the bits go to disk and others don't.
&lt;br&gt;+ 	 * That would break atomicity, so if we haven't already subcommitted 
&lt;br&gt;+ 	 * the xids for this commit, we do that first and then come back
&lt;br&gt;+ 	 * to start marking commits. If using async commit then we already
&lt;br&gt;+ 	 * waited for the write I/O to complete by this point, so nothing to do.
&lt;br&gt;+ 		 */
&lt;br&gt;+ 	if (subcommit &amp;&amp; status == TRANSACTION_STATUS_COMMITTED &amp;&amp;
&lt;br&gt;+ 			XLogRecPtrIsInvalid(lsn))
&lt;br&gt;+ 	{
&lt;br&gt;+ 		for (i = 0; i &amp;lt; nsubxids; i++)
&lt;br&gt;+ 		{
&lt;br&gt;+ 			Assert(ClogCtl-&amp;gt;shared-&amp;gt;page_number[slotno] == TransactionIdToPage(subxids[i]));
&lt;br&gt;+ 			TransactionIdSetStatusBit(subxids[i], 
&lt;br&gt;+ 						TRANSACTION_STATUS_SUB_COMMITTED, lsn, slotno);
&lt;br&gt;+ 		}
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 
&lt;br&gt;+ 	/* Set the main transaction id, if any */
&lt;br&gt;+ 	if (TransactionIdIsValid(xid))
&lt;br&gt;+ 		TransactionIdSetStatusBit(xid, status, lsn, slotno);
&lt;br&gt;+ 
&lt;br&gt;+ 	/* Set the subtransactions on this page only */
&lt;br&gt;+ 	for (i = 0; i &amp;lt; nsubxids; i++)
&lt;br&gt;+ 	{
&lt;br&gt;+ 		Assert(ClogCtl-&amp;gt;shared-&amp;gt;page_number[slotno] == TransactionIdToPage(subxids[i]));
&lt;br&gt;+ 		TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
&lt;br&gt;+ 	}
&lt;br&gt;+ 
&lt;br&gt;+ 	ClogCtl-&amp;gt;shared-&amp;gt;page_dirty[slotno] = true;
&lt;br&gt;+ 
&lt;br&gt;+ 	LWLockRelease(CLogControlLock);
&lt;br&gt;+ }
&lt;br&gt;+ 
&lt;br&gt;+ /*
&lt;br&gt;+ &amp;nbsp;* Must be called with CLogControlLock held
&lt;br&gt;+ &amp;nbsp;*/
&lt;br&gt;+ static void
&lt;br&gt;+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
&lt;br&gt;+ {
&lt;br&gt;+ 	int			byteno = TransactionIdToByte(xid);
&lt;br&gt;+ 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
&lt;br&gt;+ 	char	 &amp;nbsp; *byteptr;
&lt;br&gt;+ 	char		byteval;
&lt;br&gt;+ 
&lt;br&gt;&amp;nbsp; 	byteptr = ClogCtl-&amp;gt;shared-&amp;gt;page_buffer[slotno] + byteno;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Current state should be 0, subcommitted or target state */
&lt;br&gt;***************
&lt;br&gt;*** 132,139 ****
&lt;br&gt;&amp;nbsp; 	byteval |= (status &amp;lt;&amp;lt; bshift);
&lt;br&gt;&amp;nbsp; 	*byteptr = byteval;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 	ClogCtl-&amp;gt;shared-&amp;gt;page_dirty[slotno] = true;
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;&amp;nbsp; 	 * Update the group LSN if the transaction completion LSN is higher.
&lt;br&gt;&amp;nbsp; 	 *
&lt;br&gt;--- 311,316 ----
&lt;br&gt;***************
&lt;br&gt;*** 149,156 ****
&lt;br&gt;&amp;nbsp; 		if (XLByteLT(ClogCtl-&amp;gt;shared-&amp;gt;group_lsn[lsnindex], lsn))
&lt;br&gt;&amp;nbsp; 			ClogCtl-&amp;gt;shared-&amp;gt;group_lsn[lsnindex] = lsn;
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;- 
&lt;br&gt;- 	LWLockRelease(CLogControlLock);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;--- 326,331 ----
&lt;br&gt;Index: src/backend/access/transam/transam.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
&lt;br&gt;retrieving revision 1.76
&lt;br&gt;diff -c -r1.76 transam.c
&lt;br&gt;*** src/backend/access/transam/transam.c	26 Mar 2008 18:48:59 -0000	1.76
&lt;br&gt;--- src/backend/access/transam/transam.c	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 40,54 ****
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* Local functions */
&lt;br&gt;&amp;nbsp; static XidStatus TransactionLogFetch(TransactionId transactionId);
&lt;br&gt;- static void TransactionLogUpdate(TransactionId transactionId,
&lt;br&gt;- 					 XidStatus status, XLogRecPtr lsn);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		Postgres log access method interface
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		TransactionLogFetch
&lt;br&gt;! &amp;nbsp;*		TransactionLogUpdate
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 40,58 ----
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* Local functions */
&lt;br&gt;&amp;nbsp; static XidStatus TransactionLogFetch(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		Postgres log access method interface
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		TransactionLogFetch
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* 		Prior to 8.4, we also had TransactionLogUpdate and 
&lt;br&gt;! &amp;nbsp;*		TransactionLogMultiUpdate. These have now been merged
&lt;br&gt;! &amp;nbsp;*		into a single command TransactionIdSetTreeStatus(),
&lt;br&gt;! &amp;nbsp;*		though that is now part of clog.c because of the need
&lt;br&gt;! &amp;nbsp;*		for closer integration with clog code to achieve
&lt;br&gt;! &amp;nbsp;*		atomic clog updates for subtransactions.
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 100,140 ****
&lt;br&gt;&amp;nbsp; 	return xidstatus;
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;*		TransactionLogUpdate
&lt;br&gt;- &amp;nbsp;*
&lt;br&gt;- &amp;nbsp;* Store the new status of a transaction. &amp;nbsp;The commit record LSN must be
&lt;br&gt;- &amp;nbsp;* passed when recording an async commit; else it should be InvalidXLogRecPtr.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- static inline void
&lt;br&gt;- TransactionLogUpdate(TransactionId transactionId,
&lt;br&gt;- 					 XidStatus status, XLogRecPtr lsn)
&lt;br&gt;- {
&lt;br&gt;- 	/*
&lt;br&gt;- 	 * update the commit log
&lt;br&gt;- 	 */
&lt;br&gt;- 	TransactionIdSetStatus(transactionId, status, lsn);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* TransactionLogMultiUpdate
&lt;br&gt;- &amp;nbsp;*
&lt;br&gt;- &amp;nbsp;* Update multiple transaction identifiers to a given status.
&lt;br&gt;- &amp;nbsp;* Don't depend on this being atomic; it's not.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- static inline void
&lt;br&gt;- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
&lt;br&gt;- 						 &amp;nbsp;XidStatus status, XLogRecPtr lsn)
&lt;br&gt;- {
&lt;br&gt;- 	int			i;
&lt;br&gt;- 
&lt;br&gt;- 	Assert(nxids != 0);
&lt;br&gt;- 
&lt;br&gt;- 	for (i = 0; i &amp;lt; nxids; i++)
&lt;br&gt;- 		TransactionIdSetStatus(xids[i], status, lsn);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; /* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*						Interface functions
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;--- 104,109 ----
&lt;br&gt;***************
&lt;br&gt;*** 143,154 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		========
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; these functions test the transaction status of
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; a specified transaction id.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;*		TransactionIdCommit
&lt;br&gt;! &amp;nbsp;*		TransactionIdAbort
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		========
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; these functions set the transaction status
&lt;br&gt;! &amp;nbsp;*		 &amp;nbsp; of the specified xid.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* See also TransactionIdIsInProgress, which once was in this module
&lt;br&gt;&amp;nbsp; &amp;nbsp;* but now lives in procarray.c.
&lt;br&gt;--- 112,125 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		========
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; these functions test the transaction status of
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; a specified transaction id.
&lt;br&gt;! &amp;nbsp;* 
&lt;br&gt;! &amp;nbsp;*		TransactionIdCommitTree
&lt;br&gt;! &amp;nbsp;*		TransactionIdAsyncCommitTree
&lt;br&gt;! &amp;nbsp;*		TransactionIdAbortTree
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		========
&lt;br&gt;&amp;nbsp; &amp;nbsp;*		 &amp;nbsp; these functions set the transaction status
&lt;br&gt;! &amp;nbsp;*		 &amp;nbsp; of the specified transaction tree. As of 8.4, these
&lt;br&gt;! &amp;nbsp;*		 &amp;nbsp; are now atomic so we set the whole tree in a single call.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* See also TransactionIdIsInProgress, which once was in this module
&lt;br&gt;&amp;nbsp; &amp;nbsp;* but now lives in procarray.c.
&lt;br&gt;***************
&lt;br&gt;*** 287,374 ****
&lt;br&gt;&amp;nbsp; 	return false;
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* TransactionIdCommit
&lt;br&gt;- &amp;nbsp;*		Commits the transaction associated with the identifier.
&lt;br&gt;- &amp;nbsp;*
&lt;br&gt;- &amp;nbsp;* Note:
&lt;br&gt;- &amp;nbsp;*		Assumes transaction identifier is valid.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- void
&lt;br&gt;- TransactionIdCommit(TransactionId transactionId)
&lt;br&gt;- {
&lt;br&gt;- 	TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
&lt;br&gt;- 						 InvalidXLogRecPtr);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* TransactionIdAsyncCommit
&lt;br&gt;- &amp;nbsp;*		Same as above, but for async commits. &amp;nbsp;The commit record LSN is needed.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- void
&lt;br&gt;- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
&lt;br&gt;- {
&lt;br&gt;- 	TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* TransactionIdAbort
&lt;br&gt;- &amp;nbsp;*		Aborts the transaction associated with the identifier.
&lt;br&gt;- &amp;nbsp;*
&lt;br&gt;- &amp;nbsp;* Note:
&lt;br&gt;- &amp;nbsp;*		Assumes transaction identifier is valid.
&lt;br&gt;- &amp;nbsp;*		No async version of this is needed.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- void
&lt;br&gt;- TransactionIdAbort(TransactionId transactionId)
&lt;br&gt;- {
&lt;br&gt;- 	TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
&lt;br&gt;- 						 InvalidXLogRecPtr);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* TransactionIdSubCommit
&lt;br&gt;- &amp;nbsp;*		Marks the subtransaction associated with the identifier as
&lt;br&gt;- &amp;nbsp;*		sub-committed.
&lt;br&gt;- &amp;nbsp;*
&lt;br&gt;- &amp;nbsp;* Note:
&lt;br&gt;- &amp;nbsp;*		No async version of this is needed.
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- void
&lt;br&gt;- TransactionIdSubCommit(TransactionId transactionId)
&lt;br&gt;- {
&lt;br&gt;- 	TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
&lt;br&gt;- 						 InvalidXLogRecPtr);
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdCommitTree
&lt;br&gt;! &amp;nbsp;*		Marks all the given transaction ids as committed.
&lt;br&gt;! &amp;nbsp;*
&lt;br&gt;! &amp;nbsp;* The caller has to be sure that this is used only to mark subcommitted
&lt;br&gt;! &amp;nbsp;* subtransactions as committed, and only *after* marking the toplevel
&lt;br&gt;! &amp;nbsp;* parent as committed. &amp;nbsp;Otherwise there is a race condition against
&lt;br&gt;! &amp;nbsp;* TransactionIdDidCommit.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdCommitTree(int nxids, TransactionId *xids)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	if (nxids &amp;gt; 0)
&lt;br&gt;! 		TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
&lt;br&gt;! 								 &amp;nbsp;InvalidXLogRecPtr);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdAsyncCommitTree
&lt;br&gt;! &amp;nbsp;*		Same as above, but for async commits. &amp;nbsp;The commit record LSN is needed.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	if (nxids &amp;gt; 0)
&lt;br&gt;! 		TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
&lt;br&gt;! 								 &amp;nbsp;lsn);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;--- 258,284 ----
&lt;br&gt;&amp;nbsp; 	return false;
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdCommitTree
&lt;br&gt;! &amp;nbsp;*		Marks all the given transaction ids as committed, atomically.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	return TransactionIdSetTreeStatus(xid, nxids, xids, 
&lt;br&gt;! 							TRANSACTION_STATUS_COMMITTED, InvalidXLogRecPtr);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdAsyncCommitTree
&lt;br&gt;! &amp;nbsp;*		Same as above, but for async commits, atomically. &amp;nbsp;The commit record 
&lt;br&gt;! &amp;nbsp;* 		LSN is needed.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	return TransactionIdSetTreeStatus(xid, nxids, xids, 
&lt;br&gt;! 							TRANSACTION_STATUS_COMMITTED, lsn);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;***************
&lt;br&gt;*** 379,392 ****
&lt;br&gt;&amp;nbsp; &amp;nbsp;* will consider all the xacts as not-yet-committed anyway.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdAbortTree(int nxids, TransactionId *xids)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	if (nxids &amp;gt; 0)
&lt;br&gt;! 		TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
&lt;br&gt;! 								 &amp;nbsp;InvalidXLogRecPtr);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdPrecedes --- is id1 logically &amp;lt; id2?
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;--- 289,300 ----
&lt;br&gt;&amp;nbsp; &amp;nbsp;* will consider all the xacts as not-yet-committed anyway.
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;&amp;nbsp; void
&lt;br&gt;! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
&lt;br&gt;&amp;nbsp; {
&lt;br&gt;! 	TransactionIdSetTreeStatus(xid, nxids, xids, 
&lt;br&gt;! 							TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; /*
&lt;br&gt;&amp;nbsp; &amp;nbsp;* TransactionIdPrecedes --- is id1 logically &amp;lt; id2?
&lt;br&gt;&amp;nbsp; &amp;nbsp;*/
&lt;br&gt;Index: src/backend/access/transam/twophase.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
&lt;br&gt;retrieving revision 1.45
&lt;br&gt;diff -c -r1.45 twophase.c
&lt;br&gt;*** src/backend/access/transam/twophase.c	11 Aug 2008 11:05:10 -0000	1.45
&lt;br&gt;--- src/backend/access/transam/twophase.c	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 1745,1753 ****
&lt;br&gt;&amp;nbsp; 	XLogFlush(recptr);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Mark the transaction committed in pg_clog */
&lt;br&gt;! 	TransactionIdCommit(xid);
&lt;br&gt;! 	/* to avoid race conditions, the parent must commit first */
&lt;br&gt;! 	TransactionIdCommitTree(nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Checkpoint can proceed now */
&lt;br&gt;&amp;nbsp; 	MyProc-&amp;gt;inCommit = false;
&lt;br&gt;--- 1745,1751 ----
&lt;br&gt;&amp;nbsp; 	XLogFlush(recptr);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Mark the transaction committed in pg_clog */
&lt;br&gt;! 	TransactionIdCommitTree(xid, nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Checkpoint can proceed now */
&lt;br&gt;&amp;nbsp; 	MyProc-&amp;gt;inCommit = false;
&lt;br&gt;***************
&lt;br&gt;*** 1822,1829 ****
&lt;br&gt;&amp;nbsp; 	 * Mark the transaction aborted in clog. &amp;nbsp;This is not absolutely necessary
&lt;br&gt;&amp;nbsp; 	 * but we may as well do it while we are here.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	TransactionIdAbort(xid);
&lt;br&gt;! 	TransactionIdAbortTree(nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	END_CRIT_SECTION();
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;--- 1820,1826 ----
&lt;br&gt;&amp;nbsp; 	 * Mark the transaction aborted in clog. &amp;nbsp;This is not absolutely necessary
&lt;br&gt;&amp;nbsp; 	 * but we may as well do it while we are here.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	TransactionIdAbortTree(xid, nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	END_CRIT_SECTION();
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;Index: src/backend/access/transam/xact.c
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
&lt;br&gt;retrieving revision 1.265
&lt;br&gt;diff -c -r1.265 xact.c
&lt;br&gt;*** src/backend/access/transam/xact.c	11 Aug 2008 11:05:10 -0000	1.265
&lt;br&gt;--- src/backend/access/transam/xact.c	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 254,260 ****
&lt;br&gt;&amp;nbsp; static TransactionId RecordTransactionAbort(bool isSubXact);
&lt;br&gt;&amp;nbsp; static void StartTransaction(void);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- static void RecordSubTransactionCommit(void);
&lt;br&gt;&amp;nbsp; static void StartSubTransaction(void);
&lt;br&gt;&amp;nbsp; static void CommitSubTransaction(void);
&lt;br&gt;&amp;nbsp; static void AbortSubTransaction(void);
&lt;br&gt;--- 254,259 ----
&lt;br&gt;***************
&lt;br&gt;*** 952,962 ****
&lt;br&gt;&amp;nbsp; 		 * Now we may update the CLOG, if we wrote a COMMIT record above
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;&amp;nbsp; 		if (markXidCommitted)
&lt;br&gt;! 		{
&lt;br&gt;! 			TransactionIdCommit(xid);
&lt;br&gt;! 			/* to avoid race conditions, the parent must commit first */
&lt;br&gt;! 			TransactionIdCommitTree(nchildren, children);
&lt;br&gt;! 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 	else
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;--- 951,957 ----
&lt;br&gt;&amp;nbsp; 		 * Now we may update the CLOG, if we wrote a COMMIT record above
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;&amp;nbsp; 		if (markXidCommitted)
&lt;br&gt;! 			TransactionIdCommitTree(xid, nchildren, children);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 	else
&lt;br&gt;&amp;nbsp; 	{
&lt;br&gt;***************
&lt;br&gt;*** 974,984 ****
&lt;br&gt;&amp;nbsp; 		 * flushed before the CLOG may be updated.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;&amp;nbsp; 		if (markXidCommitted)
&lt;br&gt;! 		{
&lt;br&gt;! 			TransactionIdAsyncCommit(xid, XactLastRecEnd);
&lt;br&gt;! 			/* to avoid race conditions, the parent must commit first */
&lt;br&gt;! 			TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
&lt;br&gt;! 		}
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;--- 969,975 ----
&lt;br&gt;&amp;nbsp; 		 * flushed before the CLOG may be updated.
&lt;br&gt;&amp;nbsp; 		 */
&lt;br&gt;&amp;nbsp; 		if (markXidCommitted)
&lt;br&gt;! 			TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
&lt;br&gt;&amp;nbsp; 	}
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/*
&lt;br&gt;***************
&lt;br&gt;*** 1156,1191 ****
&lt;br&gt;&amp;nbsp; 	s-&amp;gt;maxChildXids = 0;
&lt;br&gt;&amp;nbsp; }
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- /*
&lt;br&gt;- &amp;nbsp;* RecordSubTransactionCommit
&lt;br&gt;- &amp;nbsp;*/
&lt;br&gt;- static void
&lt;br&gt;- RecordSubTransactionCommit(void)
&lt;br&gt;- {
&lt;br&gt;- 	TransactionId xid = GetCurrentTransactionIdIfAny();
&lt;br&gt;- 
&lt;br&gt;- 	/*
&lt;br&gt;- 	 * We do not log the subcommit in XLOG; it doesn't matter until the
&lt;br&gt;- 	 * top-level transaction commits.
&lt;br&gt;- 	 *
&lt;br&gt;- 	 * We must mark the subtransaction subcommitted in the CLOG if it had a
&lt;br&gt;- 	 * valid XID assigned.	If it did not, nobody else will ever know about
&lt;br&gt;- 	 * the existence of this subxact. &amp;nbsp;We don't have to deal with deletions
&lt;br&gt;- 	 * scheduled for on-commit here, since they'll be reassigned to our parent
&lt;br&gt;- 	 * (who might still abort).
&lt;br&gt;- 	 */
&lt;br&gt;- 	if (TransactionIdIsValid(xid))
&lt;br&gt;- 	{
&lt;br&gt;- 		/* XXX does this really need to be a critical section? */
&lt;br&gt;- 		START_CRIT_SECTION();
&lt;br&gt;- 
&lt;br&gt;- 		/* Record subtransaction subcommit */
&lt;br&gt;- 		TransactionIdSubCommit(xid);
&lt;br&gt;- 
&lt;br&gt;- 		END_CRIT_SECTION();
&lt;br&gt;- 	}
&lt;br&gt;- }
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; /* ----------------------------------------------------------------
&lt;br&gt;&amp;nbsp; &amp;nbsp;*						AbortTransaction stuff
&lt;br&gt;&amp;nbsp; &amp;nbsp;* ----------------------------------------------------------------
&lt;br&gt;--- 1147,1152 ----
&lt;br&gt;***************
&lt;br&gt;*** 1288,1301 ****
&lt;br&gt;&amp;nbsp; 	 * waiting for already-aborted subtransactions. &amp;nbsp;It is OK to do it without
&lt;br&gt;&amp;nbsp; 	 * having flushed the ABORT record to disk, because in event of a crash
&lt;br&gt;&amp;nbsp; 	 * we'd be assumed to have aborted anyway.
&lt;br&gt;- 	 *
&lt;br&gt;- 	 * The ordering here isn't critical but it seems best to mark the parent
&lt;br&gt;- 	 * first. &amp;nbsp;This assures an atomic transition of all the subtransactions to
&lt;br&gt;- 	 * aborted state from the point of view of concurrent
&lt;br&gt;- 	 * TransactionIdDidAbort calls.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	TransactionIdAbort(xid);
&lt;br&gt;! 	TransactionIdAbortTree(nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	END_CRIT_SECTION();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;--- 1249,1256 ----
&lt;br&gt;&amp;nbsp; 	 * waiting for already-aborted subtransactions. &amp;nbsp;It is OK to do it without
&lt;br&gt;&amp;nbsp; 	 * having flushed the ABORT record to disk, because in event of a crash
&lt;br&gt;&amp;nbsp; 	 * we'd be assumed to have aborted anyway.
&lt;br&gt;&amp;nbsp; 	 */
&lt;br&gt;! 	TransactionIdAbortTree(xid, nchildren, children);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	END_CRIT_SECTION();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;***************
&lt;br&gt;*** 3791,3798 ****
&lt;br&gt;&amp;nbsp; 	/* Must CCI to ensure commands of subtransaction are seen as done */
&lt;br&gt;&amp;nbsp; 	CommandCounterIncrement();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/* Mark subtransaction as subcommitted */
&lt;br&gt;! 	RecordSubTransactionCommit();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Post-commit cleanup */
&lt;br&gt;&amp;nbsp; 	if (TransactionIdIsValid(s-&amp;gt;transactionId))
&lt;br&gt;--- 3746,3757 ----
&lt;br&gt;&amp;nbsp; 	/* Must CCI to ensure commands of subtransaction are seen as done */
&lt;br&gt;&amp;nbsp; 	CommandCounterIncrement();
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! 	/* 
&lt;br&gt;! 	 * Prior to 8.4 we marked subcommit in clog at this point.
&lt;br&gt;! 	 * We now only perform that step, if required, as part of the
&lt;br&gt;! 	 * atomic update of the whole transaction tree at top level 
&lt;br&gt;! 	 * commit or abort.
&lt;br&gt;! 	 */
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Post-commit cleanup */
&lt;br&gt;&amp;nbsp; 	if (TransactionIdIsValid(s-&amp;gt;transactionId))
&lt;br&gt;***************
&lt;br&gt;*** 4259,4269 ****
&lt;br&gt;&amp;nbsp; 	TransactionId max_xid;
&lt;br&gt;&amp;nbsp; 	int			i;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 	TransactionIdCommit(xid);
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 	/* Mark committed subtransactions as committed */
&lt;br&gt;&amp;nbsp; 	sub_xids = (TransactionId *) &amp;(xlrec-&amp;gt;xnodes[xlrec-&amp;gt;nrels]);
&lt;br&gt;! 	TransactionIdCommitTree(xlrec-&amp;gt;nsubxacts, sub_xids);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Make sure nextXid is beyond any XID mentioned in the record */
&lt;br&gt;&amp;nbsp; 	max_xid = xid;
&lt;br&gt;--- 4218,4226 ----
&lt;br&gt;&amp;nbsp; 	TransactionId max_xid;
&lt;br&gt;&amp;nbsp; 	int			i;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Mark committed subtransactions as committed */
&lt;br&gt;&amp;nbsp; 	sub_xids = (TransactionId *) &amp;(xlrec-&amp;gt;xnodes[xlrec-&amp;gt;nrels]);
&lt;br&gt;! 	TransactionIdCommitTree(xid, xlrec-&amp;gt;nsubxacts, sub_xids);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Make sure nextXid is beyond any XID mentioned in the record */
&lt;br&gt;&amp;nbsp; 	max_xid = xid;
&lt;br&gt;***************
&lt;br&gt;*** 4299,4309 ****
&lt;br&gt;&amp;nbsp; 	TransactionId max_xid;
&lt;br&gt;&amp;nbsp; 	int			i;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;- 	TransactionIdAbort(xid);
&lt;br&gt;- 
&lt;br&gt;&amp;nbsp; 	/* Mark subtransactions as aborted */
&lt;br&gt;&amp;nbsp; 	sub_xids = (TransactionId *) &amp;(xlrec-&amp;gt;xnodes[xlrec-&amp;gt;nrels]);
&lt;br&gt;! 	TransactionIdAbortTree(xlrec-&amp;gt;nsubxacts, sub_xids);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Make sure nextXid is beyond any XID mentioned in the record */
&lt;br&gt;&amp;nbsp; 	max_xid = xid;
&lt;br&gt;--- 4256,4264 ----
&lt;br&gt;&amp;nbsp; 	TransactionId max_xid;
&lt;br&gt;&amp;nbsp; 	int			i;
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Mark subtransactions as aborted */
&lt;br&gt;&amp;nbsp; 	sub_xids = (TransactionId *) &amp;(xlrec-&amp;gt;xnodes[xlrec-&amp;gt;nrels]);
&lt;br&gt;! 	TransactionIdAbortTree(xid, xlrec-&amp;gt;nsubxacts, sub_xids);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 	/* Make sure nextXid is beyond any XID mentioned in the record */
&lt;br&gt;&amp;nbsp; 	max_xid = xid;
&lt;br&gt;Index: src/include/access/clog.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/clog.h,v
&lt;br&gt;retrieving revision 1.21
&lt;br&gt;diff -c -r1.21 clog.h
&lt;br&gt;*** src/include/access/clog.h	1 Jan 2008 19:45:56 -0000	1.21
&lt;br&gt;--- src/include/access/clog.h	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 32,38 ****
&lt;br&gt;&amp;nbsp; #define NUM_CLOG_BUFFERS	8
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
&lt;br&gt;&amp;nbsp; extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern Size CLOGShmemSize(void);
&lt;br&gt;--- 32,39 ----
&lt;br&gt;&amp;nbsp; #define NUM_CLOG_BUFFERS	8
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids, 
&lt;br&gt;! 					TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
&lt;br&gt;&amp;nbsp; extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;&amp;nbsp; extern Size CLOGShmemSize(void);
&lt;br&gt;Index: src/include/access/transam.h
&lt;br&gt;===================================================================
&lt;br&gt;RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
&lt;br&gt;retrieving revision 1.65
&lt;br&gt;diff -c -r1.65 transam.h
&lt;br&gt;*** src/include/access/transam.h	11 Mar 2008 20:20:35 -0000	1.65
&lt;br&gt;--- src/include/access/transam.h	24 Sep 2008 17:33:23 -0000
&lt;br&gt;***************
&lt;br&gt;*** 139,151 ****
&lt;br&gt;&amp;nbsp; extern bool TransactionIdDidCommit(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdDidAbort(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
&lt;br&gt;- extern void TransactionIdCommit(TransactionId transactionId);
&lt;br&gt;- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
&lt;br&gt;&amp;nbsp; extern void TransactionIdAbort(TransactionId transactionId);
&lt;br&gt;! extern void TransactionIdSubCommit(TransactionId transactionId);
&lt;br&gt;! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
&lt;br&gt;! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
&lt;br&gt;! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
&lt;br&gt;--- 139,148 ----
&lt;br&gt;&amp;nbsp; extern bool TransactionIdDidCommit(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdDidAbort(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
&lt;br&gt;&amp;nbsp; extern void TransactionIdAbort(TransactionId transactionId);
&lt;br&gt;! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
&lt;br&gt;! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
&lt;br&gt;! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
&lt;br&gt;&amp;nbsp; extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
&lt;br&gt;&lt;/tt&gt;&lt;hr align=&quot;left&quot; width=&quot;300&quot; /&gt;&lt;br /&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19658661&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Subtransaction-commits-and-Hot-Standby-tp19637765p19658661.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19652605</id>
	<title>Re: hash index improving v3</title>
	<published>2008-09-24T09:22:54Z</published>
	<updated>2008-09-24T09:22:54Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Wed, 2008-09-24 at 12:04 -0400, Bruce Momjian wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; Can we consider this hash thread closed/completed?
&lt;br&gt;&lt;br&gt;Please
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19652605&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hash-index-improving-v3-tp19025117p19652605.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19652173</id>
	<title>Re: hash index improving v3</title>
	<published>2008-09-24T09:04:22Z</published>
	<updated>2008-09-24T09:04:22Z</updated>
	<author>
		<name>Bruce Momjian-5</name>
	</author>
	<content type="html">&lt;br&gt;Can we consider this hash thread closed/completed?
&lt;br&gt;&lt;br&gt;---------------------------------------------------------------------------
&lt;br&gt;&lt;br&gt;Tom Lane wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Simon Riggs &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19652173&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;simon@...&lt;/a&gt;&amp;gt; writes:
&lt;br&gt;&amp;gt; &amp;gt; Thinks: Why not just sort all of the time and skip the debate entirely?
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; The sort is demonstrably a loser for smaller indexes. &amp;nbsp;Admittedly,
&lt;br&gt;&amp;gt; if the index is small then the sort can't cost all that much, but if
&lt;br&gt;&amp;gt; the (correct) threshold is some large fraction of shared_buffers then
&lt;br&gt;&amp;gt; it could still take awhile on installations with lots-o-buffers.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; The other side of that coin is that it's not clear this is really worth
&lt;br&gt;&amp;gt; arguing about, much less exposing a separate parameter for.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; 			regards, tom lane
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; -- 
&lt;br&gt;&amp;gt; Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19652173&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;&amp;gt; To make changes to your subscription:
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp; Bruce Momjian &amp;nbsp;&amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19652173&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;bruce@...&lt;/a&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://momjian.us&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://momjian.us&lt;/a&gt;&lt;br&gt;&amp;nbsp; EnterpriseDB &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://enterprisedb.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://enterprisedb.com&lt;/a&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp; + If your life is a hard drive, Christ can be your backup. +
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19652173&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hash-index-improving-v3-tp19025117p19652173.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19650772</id>
	<title>Re: Solve a problem of LC_TIME of windows.</title>
	<published>2008-09-24T07:55:18Z</published>
	<updated>2008-09-24T07:55:18Z</updated>
	<author>
		<name>Hiroshi Saito-2</name>
	</author>
	<content type="html">Hi.
&lt;br&gt;&lt;br&gt;----- Original Message ----- 
&lt;br&gt;From: &amp;quot;Alvaro Herrera&amp;quot; &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19650772&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;alvherre@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&amp;gt; What about this line?
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; #define STRLEN_MAX 255
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Are we really sure the strftime format cannot be longer than that?
&lt;br&gt;&lt;br&gt;Ahh, although the place to replace is here, is it said that MAX_L10 N_DATA is suitable? 
&lt;br&gt;--
&lt;br&gt;cache_locale_time(void)
&lt;br&gt;{
&lt;br&gt;..
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; char &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;buf[MAX_L10N_DATA];
&lt;br&gt;..
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; strftime(buf, MAX_L10N_DATA, &amp;quot;%a&amp;quot;, timeinfo);
&lt;br&gt;&lt;br&gt;Regards,
&lt;br&gt;Hiroshi Saito
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19650772&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Solve-a-problem-of-LC_TIME-of-windows.-tp19493102p19650772.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19650765</id>
	<title>Re: Solve a problem of LC_TIME of windows.</title>
	<published>2008-09-24T07:55:10Z</published>
	<updated>2008-09-24T07:55:10Z</updated>
	<author>
		<name>Hiroshi Saito-2</name>
	</author>
	<content type="html">Hi.
&lt;br&gt;&lt;br&gt;----- Original Message ----- 
&lt;br&gt;From: &amp;quot;Magnus Hagander&amp;quot; &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19650765&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;magnus@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;gt; In principle, I think this patch looks good.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Do you (or somebody else) have an example where this breaks in an
&lt;br&gt;&amp;gt; encoding where I can actually understand the characters, though ;-) That
&lt;br&gt;&amp;gt; would make testing a whole lot easier...
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Also, the patch needs error checking. strftime() can fail, and the
&lt;br&gt;&amp;gt; multibyte conversion functions can certainly fail. That will need to be
&lt;br&gt;&amp;gt; added.
&lt;br&gt;&lt;br&gt;Ok, thanks!
&lt;br&gt;&lt;br&gt;strftime return to 0.
&lt;br&gt;&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/fe06s4ak(VS.71).aspx&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://msdn.microsoft.com/en-us/library/fe06s4ak(VS.71).aspx&lt;/a&gt;&lt;br&gt;MultiByteToWideChar and WideCharToMultiByte return to GetLastError.
&lt;br&gt;&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms776413(VS.85).aspx&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://msdn.microsoft.com/en-us/library/ms776413(VS.85).aspx&lt;/a&gt;&lt;br&gt;&lt;br&gt;I will proposal the next patch.:-)
&lt;br&gt;&lt;br&gt;BTW, this is SQL for a check. 
&lt;br&gt;&lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK.sql&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK.sql&lt;/a&gt;&lt;br&gt;Probably, all are included. 
&lt;br&gt;&lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/LC_TIME_CHECK_LOCALE.sql&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/LC_TIME_CHECK_LOCALE.sql&lt;/a&gt;&lt;br&gt;&lt;br&gt;Regards,
&lt;br&gt;Hiroshi Saito
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19650765&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Solve-a-problem-of-LC_TIME-of-windows.-tp19493102p19650765.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19649885</id>
	<title>Re: Solve a problem of LC_TIME of windows.</title>
	<published>2008-09-24T07:15:19Z</published>
	<updated>2008-09-24T07:15:19Z</updated>
	<author>
		<name>Alvaro Herrera-7</name>
	</author>
	<content type="html">Magnus Hagander wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; Also, the patch needs error checking. strftime() can fail, and the
&lt;br&gt;&amp;gt; multibyte conversion functions can certainly fail. That will need to be
&lt;br&gt;&amp;gt; added.
&lt;br&gt;&lt;br&gt;What about this line?
&lt;br&gt;&lt;br&gt;#define STRLEN_MAX 255
&lt;br&gt;&lt;br&gt;Are we really sure the strftime format cannot be longer than that?
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Alvaro Herrera &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://www.CommandPrompt.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.CommandPrompt.com/&lt;/a&gt;&lt;br&gt;The PostgreSQL Company - Command Prompt, Inc.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19649885&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Solve-a-problem-of-LC_TIME-of-windows.-tp19493102p19649885.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19648033</id>
	<title>Re: [HACKERS] Subtransaction commits and Hot Standby</title>
	<published>2008-09-24T05:48:45Z</published>
	<updated>2008-09-24T05:48:45Z</updated>
	<author>
		<name>Simon Riggs</name>
	</author>
	<content type="html">&lt;br&gt;On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; I've tested this some more and am much happier with it now.
&lt;br&gt;&lt;br&gt;The concept is fine, but I've found a coding bug in further testing.
&lt;br&gt;Please wait now for new version before review.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;nbsp;Simon Riggs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; www.2ndQuadrant.com
&lt;br&gt;&amp;nbsp;PostgreSQL Training, Services and Support
&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19648033&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A--HACKERS--Subtransaction-commits-and-Hot-Standby-tp19637765p19648033.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19644914</id>
	<title>Re: Solve a problem of LC_TIME of windows.</title>
	<published>2008-09-24T02:11:41Z</published>
	<updated>2008-09-24T02:11:41Z</updated>
	<author>
		<name>Magnus Hagander-2</name>
	</author>
	<content type="html">Hiroshi Saito wrote:
&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Hi.
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; I have problem of LC_TIME of windows.(CVS-HEAD)
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; As for Version 8.3.3. It is edited by wrong gettext and is. (But, it is
&lt;br&gt;&amp;gt; expressed.)
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/pg8.3.3-to_char_gettext_format.png&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/pg8.3.3-to_char_gettext_format.png&lt;/a&gt;&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; As for Version 8.4. It came to be used by Tom-san in strftime of
&lt;br&gt;&amp;gt; Native-windowsAPI.
&lt;br&gt;&amp;gt; It is good improvement.! But, strftime of Native returns a result by
&lt;br&gt;&amp;gt; CODEPAGE of
&lt;br&gt;&amp;gt; environment of operation by Windows with LC_TIME. In Japanese
&lt;br&gt;&amp;gt; environment, return
&lt;br&gt;&amp;gt; value is SJIS(CP932). Then, SJIS(CP932) can't be chosen by
&lt;br&gt;&amp;gt; SERVER_ENCODING.:-(
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/pg84beta-03-to_char.png&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/pg84beta-03-to_char.png&lt;/a&gt;&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Then, I'm proposal patch. It is solved splendidly.
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK_EUCJP.txt&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK_EUCJP.txt&lt;/a&gt;&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK_UTF8.txt&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/LC_TIME_PATCH/DATECHECK_UTF8.txt&lt;/a&gt;&lt;/div&gt;&lt;br&gt;In principle, I think this patch looks good.
&lt;br&gt;&lt;br&gt;Do you (or somebody else) have an example where this breaks in an
&lt;br&gt;encoding where I can actually understand the characters, though ;-) That
&lt;br&gt;would make testing a whole lot easier...
&lt;br&gt;&lt;br&gt;Also, the patch needs error checking. strftime() can fail, and the
&lt;br&gt;multibyte conversion functions can certainly fail. That will need to be
&lt;br&gt;added.
&lt;br&gt;&lt;br&gt;//Magnus
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Sent via pgsql-patches mailing list (&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19644914&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;pgsql-patches@...&lt;/a&gt;)
&lt;br&gt;To make changes to your subscription:
&lt;br&gt;&lt;a href=&quot;http://www.postgresql.org/mailpref/pgsql-patches&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.postgresql.org/mailpref/pgsql-patches&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Solve-a-problem-of-LC_TIME-of-windows.-tp19493102p19644914.html" />
</entry>

</feed>
