|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
warm standby resume and take online problemsHi,
I have chain of warm stanby servers. One let's say db-01 is pushing updates to db-02 and then they are fetched to db-03. I decided to bring up online db-04 and stopped db-03 from warm standby with pg_ctl stop -m fast $PG_DATA And copied data over from db-03 to db-04. So now I have backup "data + binaries" that was taken from warm stanby when shut down. I have created recovery.conf with recovery_command, created recovery.sh (for recovery command), adjusted postgresql.conf with apropriate port + IP. recovery.sh is just a blind 'while' that is looking for trigger file then is ending. So I started: Removed everything from pg_xlog on backup that is going to be live. pg_controldata output: v pg_control version number: 822 Catalog version number: 200611241 Database system identifier: 5309237009736268543 Database cluster state: in archive recovery pg_control last modified: Thu Oct 29 11:30:04 2009 Current log file ID: 389 Next log file segment: 225 Latest checkpoint location: 2FA/BBA6B710 Prior checkpoint location: 2FA/AE916D60 Latest checkpoint's REDO location: 2FA/BBA38478 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 3/824035978 Latest checkpoint's NextOID: 59442871 Latest checkpoint's NextMultiXactId: 510637 Latest checkpoint's NextMultiOffset: 2076981 Time of latest checkpoint: Thu Oct 29 09:02:31 2009 Minimum recovery ending location: 186/80DCC48 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 First start ( no wal files in wal_recovery directory) 2009-11-01 16:09:10 PST : LOG: could not open file "pg_xlog/00000001000002FA000000BB" (log file 762, segment 187): No such file or directory 2009-11-01 16:09:10 PST : LOG: invalid primary checkpoint record 2009-11-01 16:09:10 PST : LOG: could not open file "pg_xlog/00000001000002FA000000AE" (log file 762, segment 174): No such file or directory 2009-11-01 16:09:10 PST : LOG: invalid secondary checkpoint record 2009-11-01 16:09:10 PST : PANIC: could not locate a valid checkpoint record 2009-11-01 16:09:10 PST : LOG: startup process (PID 1651) was terminated by signal 6 2009-11-01 16:09:10 PST : LOG: aborting startup due to startup process failure 2009-11-01 16:09:10 PST : LOG: logger shutting down Shipped it with everything from AE-BB to wal_recovery. It started in recovery mode asking for more WAL files. I started applying wal files and everything OK. Recovery in progress. When I feeded it with files up to ..2FB.08 (time around the oryginal data directory from warm standby server was copied) and triggered it came up online. Can connect select on some but when selected on logging.agentpagehit (35GB+) it crashed. It throwed on console: saturn=# select count(*) from logging.agentpagehit; ERROR: xlog flush request 2FB/45E1B8D0 is not satisfied --- flushed only to 2FB/8FFEA60 CONTEXT: writing block 874822 of relation 1663/20863/21548 Now it is saying constantly in log : 2009-11-04 04:57:39 PST : ERROR: XX000: xlog flush request 2FB/28CE63A8 is not satisfied --- flushed only to 2FB/8FFEA60 2009-11-04 04:57:39 PST : CONTEXT: writing block 874937 of relation 1663/20863/21548 2009-11-04 04:57:39 PST : LOCATION: XLogFlush, xlog.c:1865 2009-11-04 04:57:39 PST : WARNING: 58030: could not write block 874937 of 1663/20863/21548 2009-11-04 04:57:39 PST : DETAIL: Multiple failures --- write error may be permanent. 2009-11-04 04:57:39 PST : LOCATION: AbortBufferIO, bufmgr.c:2129 What am I missing? - Should I ship it with more WAL files from the past/future (if future until when) ? - Did 1st start without wal files broke it? - Did start without pg_xlog files broke it? - According to some post on the Web "Minimum recovery ending location: 186/80DCC48" means I should ship it with wal files since 188..80, is this correct? I havent checked yet what is first file it is asking (%f) when started without any WAL files in wal_recovery, will know it in few hours as now copying data over once again. Any thoughts? Michal -- Sent via pgsql-general mailing list (pgsql-general@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
|
|
Re: warm standby resume and take online problemsOn Wed, 4 Nov 2009, Michal Bicz wrote:
> Now it is saying constantly in log : > 2009-11-04 04:57:39 PST : ERROR: XX000: xlog flush request 2FB/28CE63A8 is not satisfied --- flushed only to 2FB/8FFEA60 > 2009-11-04 04:57:39 PST : CONTEXT: writing block 874937 of relation 1663/20863/21548 > 2009-11-04 04:57:39 PST : LOCATION: XLogFlush, xlog.c:1865 > 2009-11-04 04:57:39 PST : WARNING: 58030: could not write block 874937 of 1663/20863/21548 > 2009-11-04 04:57:39 PST : DETAIL: Multiple failures --- write error may be permanent. > 2009-11-04 04:57:39 PST : LOCATION: AbortBufferIO, bufmgr.c:2129 I think you can run into this if disk space on the xlog drive fills up, which is easy to do with complicated WAL shipping setups if you're not careful. You might want to double-check that, and check for general disk I/O errors too. A write error at this point is kind of odd even if you abused recovery a bit leading up to here. This might be a full disk or a bad block on the xlog drive instead of something more complicated. -- * Greg Smith gsmith@... http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-general mailing list (pgsql-general@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
|
|
Re: warm standby resume and take online problemsOn Wed, 4 Nov 2009, Greg Smith wrote:
> On Wed, 4 Nov 2009, Michal Bicz wrote: > >> Now it is saying constantly in log : >> 2009-11-04 04:57:39 PST : ERROR: XX000: xlog flush request 2FB/28CE63A8 is >> not satisfied --- flushed only to 2FB/8FFEA60 >> 2009-11-04 04:57:39 PST : CONTEXT: writing block 874937 of relation >> 1663/20863/21548 >> 2009-11-04 04:57:39 PST : LOCATION: XLogFlush, xlog.c:1865 > > I think you can run into this if disk space on the xlog drive fills up, which > is easy to do with complicated WAL shipping setups if you're not careful. > You might want to double-check that, and check for general disk I/O errors > too. Looks like Michal's response didn't go on-list, for anyone wandering what the resolution was he says: "Thanks but this is apparently is neither badblock nor space limits. I recreated scenario and apparently warm standby server is set to be respawned every time it is seen stopped.. That caused data to become corrupted." -- * Greg Smith gsmith@... http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-general mailing list (pgsql-general@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
| Free embeddable forum powered by Nabble | Forum Help |