Odd master/slave behavior for large domains

View: New views
6 Messages — Rating Filter:   Alert me  

Odd master/slave behavior for large domains

by thomas morgan-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm seeing some odd behavior in a traditional master and slave setup.

Both the master and slave are PowerDNS. I've tried 2.9.21.2 and 2.9.22  
and seem to get the same results. Memory usage and other figures below  
are from 2.9.22.

The backends are Postgres 8.3. Servers are amd64 Debian Lenny.

I created a single zone on the server and added 2 million host  
records. I know that's a bunch, but it is a specific use case, not  
just an attempt to break things.

Oddity #1:
The master seems to send 3-4 NOTIFYs when the zone is updated -- at  
least the slave is reporting in the logs to have received multiple  
NOTIFYs. They're pretty consistently spaced: in several instances, 4  
NOTIFYs with intervals 3sec, 5sec, 9sec.

Oddity #2:
The slave, upon receiving multiple NOTIFYs, initiates multiple AXFRs  
for the same zone. For a small zone this wouldn't be a big deal, but  
for a large zone it's fatal.

Oddity #3:
The master consumes a *huge* amount of RAM to do each AXFR: 283 MB  
worth per AXFR. Memory usage on the slave seems tolerable though.

Oddity #4:
If an AXFR doesn't finish properly, the memory is never released. I've  
managed to reproduce this using dig to perform the AXFR as well.

Strangely dig dies after 8600 records when piped to less, but makes it  
through all 2 million when piped to wc or just output to the terminal.  
This might be dig's problem, but it does make it easy to cause #4. The  
final line of dig's output when it fails is ";; communications error:  
end of file". When the AXFR completes, the log includes "AXFR of  
domain 'h0.com' to 64.<client> finished". When it doesn't (and memory  
isn't freed), this message never shows up.

I cannot tell, but it's possible the slave is eventually figuring out  
it has multiple transfers for the same zone going and aborts them. The  
logs (included at the bottom) give mixed readings on this (between the  
master and slave). It may also be the master puking and closing the  
connection.


On my test servers things usually end when the oomkiller rears its  
ugly head. ;)

The combination of these becomes a significant problem. #1 causes #2  
when then multiplies the effect of #3. #2 also increases the  
likelihood of #4.


Anything I'm missing or that I can do to help figure out what's wrong?  
Some logs are included below; they are not identical in every case,  
but this is representative.

--t


Master server:

Sep 11 02:07:45 pdns-master-test pdns[4156]: 1 domain for which we are  
master needs notifications
Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of  
domain 'h0.com' to 64.<master>
Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of  
domain 'h0.com' to 64.<slave>
Sep 11 02:07:45 pdns-master-test pdns[4156]: Received NOTIFY for  
h0.com from 64.<master> which is not a master
Sep 11 02:07:46 pdns-master-test pdns[4156]: Received unsuccesful  
notification report for 'h0.com' from 64.<master>, rcode: 5
Sep 11 02:07:46 pdns-master-test pdns[4156]: Removed from notification  
list: 'h0.com' to 64.<master>
Sep 11 02:07:46 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'  
initiated by 64.<slave>
Sep 11 02:07:46 pdns-master-test pdns[4156]: gpgsql Connection succesful
Sep 11 02:07:48 pdns-master-test pdns[4156]: No master domains need  
notifications
Sep 11 02:08:13 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'  
initiated by 64.<slave>
Sep 11 02:08:13 pdns-master-test pdns[4156]: gpgsql Connection succesful
Sep 11 02:08:23 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'  
initiated by 64.<slave>
Sep 11 02:08:25 pdns-master-test pdns[4156]: gpgsql Connection succesful
Sep 11 02:08:33 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'  
initiated by 64.<slave>
Sep 11 02:08:34 pdns-master-test pdns[4156]: gpgsql Connection succesful
Sep 11 02:09:21 pdns-master-test pdns[4156]: Communicator thread died  
because of error: GSQLBackend unable to retrieve list of slave  
domains: PostgreSQL failed to execute command: server closed the  
connection unexpectedly#012#011This probably means the server  
terminated abnormally#012#011before or while processing the request.
(The last line is where oomkiller decided to kill off postgres.)


Slave server:

Sep 11 02:07:45 pdns-slave-test pdns[4379]: Received valid NOTIFY for h0.com
  (id=1) from master 64.<master>: 1252634826 > 1252632174
Sep 11 02:07:46 pdns-slave-test pdns[4379]: gpgsql Connection succesful
Sep 11 02:07:48 pdns-slave-test pdns[4379]: Received valid NOTIFY for h0.com
  (id=1) from master 64.<master>: 1252634826 > 1252632174
Sep 11 02:07:51 pdns-slave-test pdns[4379]: AXFR started for 'h0.com',  
transaction started
Sep 11 02:07:53 pdns-slave-test pdns[4379]: Received valid NOTIFY for h0.com
  (id=1) from master 64.<master>: 1252634826 > 1252632174
Sep 11 02:08:02 pdns-slave-test pdns[4379]: Received valid NOTIFY for h0.com
  (id=1) from master 64.<master>: 1252634826 > 1252632174
Sep 11 02:08:13 pdns-slave-test pdns[4379]: Unable to AXFR zone  
'h0.com': Remote nameserver closed TCP connection
Sep 11 02:08:13 pdns-slave-test pdns[4379]: Aborting possible open  
transaction for domain 'h0.com' AXFR
Sep 11 02:08:13 pdns-slave-test pdns[4379]: gpgsql Connection succesful
Sep 11 02:08:23 pdns-slave-test pdns[4379]: Unable to AXFR zone  
'h0.com': Timeout waiting for answer from 64.<master> during AXFR
Sep 11 02:08:23 pdns-slave-test pdns[4379]: gpgsql Connection succesful
Sep 11 02:08:33 pdns-slave-test pdns[4379]: Unable to AXFR zone  
'h0.com': Timeout waiting for answer from 64.<master> during AXFR
Sep 11 02:08:33 pdns-slave-test pdns[4379]: gpgsql Connection succesful
Sep 11 02:08:43 pdns-slave-test pdns[4379]: Unable to AXFR zone  
'h0.com': Timeout waiting for answer from 64.<master> during AXFR
(nothing more until the master's postgres daemon was restarted)

The correlation between the "AXFR initated" on the master and the  
"Timeout waiting for answer" on the slaves is strange too.

--

_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Odd master/slave behavior for large domains

by Kenneth Marshall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Thomas,

I have read your post and your transfer problem is caused by
the OOM killer. You should not be running out of memory on
a database server. That is the first problem. Regarding
the AXFR transfers when PDNS is a slave, the process used
is to delete all the records in the zone and then re-insert
the new set of records retrieved from the PDNS master. For
an MVCC database like PostgreSQL this is a very expensive
process, particularly for a large zone since IXFR transfers
are not yet supported by PDNS.

We also use PostgreSQL as the backend database for PDNS. To
work around the lack of support for incremental updates, I
wrote a patch to the PDNS software to add an additional hook
to the database interface called gpgsql-commit-zone-axfr-query.
I posted a patch to the PDNS list several years ago and it is
still valid for the MySQL and PostgreSQL DB backends. The
purpose of this hook is to allow a user-defined SQL command
to be used to commit the records that have been received by
the PDNS slave to the DB backend. The default value of this
parameter is "COMMIT;", which is the hard-coded command that
is used in the plugins.

With the ability to control the SQL statement used to commit
the new zone information and custom definitions for gpgsql-
elete-zone-query and gpgsql-insert-record-query you can cause
the AXFR transfers to use a temp table for staging and then
only update the changed records in the actual table. Here are
the definitions that we are currently using in our pdns.conf:

#Update AXFR queries to use stage table for staging
gpgsql-delete-zone-query=CREATE TEMPORARY TABLE stage (id INT, domain_id INT, name VARCHAR(255), type VARCHAR(6), content VARCHAR(255), ttl INT, prio INT) ON COMMIT DROP; PREPARE axfrinsert (varchar, int, int, varchar, int, varchar) AS INSERT INTO stage (content,ttl,prio,type,domain_id,name) VALUES ($1,$2,$3,$4,$5,$6);

gpgsql-insert-record-query=EXECUTE axfrinsert ('%s',%d,%d,'%s',%d,'%s');

gpgsql-commit-zone-axfr-query=CREATE TEMPORARY TABLE axfrvars (name VARCHAR(6), value INT) ON COMMIT DROP; \
INSERT INTO axfrvars (name, value) VALUES ('domain', %d); \
DELETE FROM records \
WHERE domain_id = \
    (SELECT value FROM axfrvars WHERE name = 'domain') AND \
      records.id NOT IN \
          (SELECT records.id FROM records INNER JOIN stage \
           USING (domain_id, name, type, content, ttl, prio)); \
INSERT INTO records (domain_id, name, type, content, ttl, prio) \
SELECT domain_id, name, type, content, ttl, prio FROM stage \
WHERE stage.domain_id = \
    (SELECT value FROM axfrvars WHERE name = 'domain') EXCEPT \
        SELECT domain_id, name, type, content, ttl, prio from records \
        WHERE records.domain_id = \
            (SELECT value FROM axfrvars WHERE name = 'domain'); \
COMMIT; DEALLOCATE axfrinsert;

I included a couple of simpler variations in my original post
as well as the version we use in production. I was unable to
make a convincing case with the developer to get this commit-hook
added in to the default code so I have been re-patching ever since
because of the huge performance improvements offered in the absence
of IXFR support. My C++ coding ability is not at the level needed
to add that support. The use of this patch increased the speed of
large zone AXFRs by more than an order of magnitude. Let me know
if you have any questions. I think that shutting your OOM off will
solve your AXFR transfer failure, but you will need my patch to
get "reasonable" zone AXFR performance.

Regards,
Ken

On Thu, Sep 10, 2009 at 11:14:18PM -0600, thomas morgan wrote:

> I'm seeing some odd behavior in a traditional master and slave setup.
>
> Both the master and slave are PowerDNS. I've tried 2.9.21.2 and 2.9.22 and
> seem to get the same results. Memory usage and other figures below are from
> 2.9.22.
>
> The backends are Postgres 8.3. Servers are amd64 Debian Lenny.
>
> I created a single zone on the server and added 2 million host records. I
> know that's a bunch, but it is a specific use case, not just an attempt to
> break things.
>
> Oddity #1:
> The master seems to send 3-4 NOTIFYs when the zone is updated -- at least
> the slave is reporting in the logs to have received multiple NOTIFYs.
> They're pretty consistently spaced: in several instances, 4 NOTIFYs with
> intervals 3sec, 5sec, 9sec.
>
> Oddity #2:
> The slave, upon receiving multiple NOTIFYs, initiates multiple AXFRs for
> the same zone. For a small zone this wouldn't be a big deal, but for a
> large zone it's fatal.
>
> Oddity #3:
> The master consumes a *huge* amount of RAM to do each AXFR: 283 MB worth
> per AXFR. Memory usage on the slave seems tolerable though.
>
> Oddity #4:
> If an AXFR doesn't finish properly, the memory is never released. I've
> managed to reproduce this using dig to perform the AXFR as well.
>
> Strangely dig dies after 8600 records when piped to less, but makes it
> through all 2 million when piped to wc or just output to the terminal. This
> might be dig's problem, but it does make it easy to cause #4. The final
> line of dig's output when it fails is ";; communications error: end of
> file". When the AXFR completes, the log includes "AXFR of domain 'h0.com'
> to 64.<client> finished". When it doesn't (and memory isn't freed), this
> message never shows up.
>
> I cannot tell, but it's possible the slave is eventually figuring out it
> has multiple transfers for the same zone going and aborts them. The logs
> (included at the bottom) give mixed readings on this (between the master
> and slave). It may also be the master puking and closing the connection.
>
>
> On my test servers things usually end when the oomkiller rears its ugly
> head. ;)
>
> The combination of these becomes a significant problem. #1 causes #2 when
> then multiplies the effect of #3. #2 also increases the likelihood of #4.
>
>
> Anything I'm missing or that I can do to help figure out what's wrong? Some
> logs are included below; they are not identical in every case, but this is
> representative.
>
> --t
>
>
> Master server:
>
> Sep 11 02:07:45 pdns-master-test pdns[4156]: 1 domain for which we are
> master needs notifications
> Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of domain
> 'h0.com' to 64.<master>
> Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of domain
> 'h0.com' to 64.<slave>
> Sep 11 02:07:45 pdns-master-test pdns[4156]: Received NOTIFY for h0.com
> from 64.<master> which is not a master
> Sep 11 02:07:46 pdns-master-test pdns[4156]: Received unsuccesful
> notification report for 'h0.com' from 64.<master>, rcode: 5
> Sep 11 02:07:46 pdns-master-test pdns[4156]: Removed from notification
> list: 'h0.com' to 64.<master>
> Sep 11 02:07:46 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
> initiated by 64.<slave>
> Sep 11 02:07:46 pdns-master-test pdns[4156]: gpgsql Connection succesful
> Sep 11 02:07:48 pdns-master-test pdns[4156]: No master domains need
> notifications
> Sep 11 02:08:13 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
> initiated by 64.<slave>
> Sep 11 02:08:13 pdns-master-test pdns[4156]: gpgsql Connection succesful
> Sep 11 02:08:23 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
> initiated by 64.<slave>
> Sep 11 02:08:25 pdns-master-test pdns[4156]: gpgsql Connection succesful
> Sep 11 02:08:33 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
> initiated by 64.<slave>
> Sep 11 02:08:34 pdns-master-test pdns[4156]: gpgsql Connection succesful
> Sep 11 02:09:21 pdns-master-test pdns[4156]: Communicator thread died
> because of error: GSQLBackend unable to retrieve list of slave domains:
> PostgreSQL failed to execute command: server closed the connection
> unexpectedly#012#011This probably means the server terminated
> abnormally#012#011before or while processing the request.
> (The last line is where oomkiller decided to kill off postgres.)
>
>
> Slave server:
>
> Sep 11 02:07:45 pdns-slave-test pdns[4379]: Received valid NOTIFY for
> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
> Sep 11 02:07:46 pdns-slave-test pdns[4379]: gpgsql Connection succesful
> Sep 11 02:07:48 pdns-slave-test pdns[4379]: Received valid NOTIFY for
> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
> Sep 11 02:07:51 pdns-slave-test pdns[4379]: AXFR started for 'h0.com',
> transaction started
> Sep 11 02:07:53 pdns-slave-test pdns[4379]: Received valid NOTIFY for
> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
> Sep 11 02:08:02 pdns-slave-test pdns[4379]: Received valid NOTIFY for
> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
> Sep 11 02:08:13 pdns-slave-test pdns[4379]: Unable to AXFR zone 'h0.com':
> Remote nameserver closed TCP connection
> Sep 11 02:08:13 pdns-slave-test pdns[4379]: Aborting possible open
> transaction for domain 'h0.com' AXFR
> Sep 11 02:08:13 pdns-slave-test pdns[4379]: gpgsql Connection succesful
> Sep 11 02:08:23 pdns-slave-test pdns[4379]: Unable to AXFR zone 'h0.com':
> Timeout waiting for answer from 64.<master> during AXFR
> Sep 11 02:08:23 pdns-slave-test pdns[4379]: gpgsql Connection succesful
> Sep 11 02:08:33 pdns-slave-test pdns[4379]: Unable to AXFR zone 'h0.com':
> Timeout waiting for answer from 64.<master> during AXFR
> Sep 11 02:08:33 pdns-slave-test pdns[4379]: gpgsql Connection succesful
> Sep 11 02:08:43 pdns-slave-test pdns[4379]: Unable to AXFR zone 'h0.com':
> Timeout waiting for answer from 64.<master> during AXFR
> (nothing more until the master's postgres daemon was restarted)
>
> The correlation between the "AXFR initated" on the master and the "Timeout
> waiting for answer" on the slaves is strange too.
>
> --
>
> _______________________________________________
> Pdns-users mailing list
> Pdns-users@...
> http://mailman.powerdns.com/mailman/listinfo/pdns-users
>
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Odd master/slave behavior for large domains

by bert hubert-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Sep 11, 2009 at 7:14 AM, thomas morgan <tm@...> wrote:
> I created a single zone on the server and added 2 million host records. I
> know that's a bunch, but it is a specific use case, not just an attempt to
> break things.

Thomas,

Many thanks for your interesting and detailed bug report! I've done
some initial investigation, and my guess is that this is a deficiency
in our PostgreSQL support, or perhaps in the PostgreSQL client library
(unlikely I think).

Sadly I am very busy right now, but you could help debugging this if
you could reproduce this issue (or perhaps, fail to) using the MySQL
backend or the BIND backend.

If this clears up the issue, I know where to look.

>
> Oddity #1:
> The master seems to send 3-4 NOTIFYs when the zone is updated -- at least
> the slave is reporting in the logs to have received multiple NOTIFYs.
> They're pretty consistently spaced: in several instances, 4 NOTIFYs with
> intervals 3sec, 5sec, 9sec.

This is just laziness - we keep sending NOTIFY packets until they are
acknowledged, which PowerDNS only does after the AXFR succeeded.

> Oddity #2:
> The slave, upon receiving multiple NOTIFYs, initiates multiple AXFRs for the
> same zone. For a small zone this wouldn't be a big deal, but for a large
> zone it's fatal.

This one is stupid on our side.

> Oddity #3:
> The master consumes a *huge* amount of RAM to do each AXFR: 283 MB worth per
> AXFR. Memory usage on the slave seems tolerable though.
>
> Oddity #4:
> If an AXFR doesn't finish properly, the memory is never released. I've
> managed to reproduce this using dig to perform the AXFR as well.

3 and 4 should go away if we debug the (probable) PostgreSQL support
problem in PowerDNS.

> Anything I'm missing or that I can do to help figure out what's wrong? Some
> logs are included below; they are not identical in every case, but this is
> representative.

Reproducing with the BIND backend should help find the cause of this.
You can just ask it to load the output of dig -t AXFR > zone.

Please let me know! From there we'll look at the other issues you found.

    Bert
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Odd master/slave behavior for large domains

by Kenneth Marshall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bert,

I do not know if you saw my post on this problem earlier today,
but I do not think that there is a bug in the PDNS PostgreSQL
support, per se. I spent a bit of time with pdns's PostgreSQL
backend debugging the AXFR transfer problem that I described
in my earlier E-mail today. I performed a pretty thorough
analysis of the PostgreSQL backend versus the BIND and MySQL
backends. The performance problems stem from the MVCC nature
of the PostgreSQL database versus the file based nature of
the BIND backend and the in memory update nature of the MySQL
MyISM backend. The textbook translation of the AXFR process
while technically correct has bad performance implications
when used with the PostgreSQL backend. I will comment inline
on the source of the problems below.

On Fri, Sep 11, 2009 at 08:56:27PM +0200, bert hubert wrote:

> On Fri, Sep 11, 2009 at 7:14 AM, thomas morgan <tm@...> wrote:
> > I created a single zone on the server and added 2 million host records. I
> > know that's a bunch, but it is a specific use case, not just an attempt to
> > break things.
>
> Thomas,
>
> Many thanks for your interesting and detailed bug report! I've done
> some initial investigation, and my guess is that this is a deficiency
> in our PostgreSQL support, or perhaps in the PostgreSQL client library
> (unlikely I think).
>
> Sadly I am very busy right now, but you could help debugging this if
> you could reproduce this issue (or perhaps, fail to) using the MySQL
> backend or the BIND backend.
>

The actual transfer problem Thomas documents is caused by the Linux
OOM process killing the database process before it can complete. This
is not the fault of the PDNS PostgreSQL module or the PostgreSQL database.
This is a system misconfiguration that is exacerbated by the basic PDNS
AXFR implementation. In particular, the info-zone-query on the master
performs "select id,name,master,last_check,notified_serial,type from domains
where name='%s'". This has the effect of materializing the entire zone in
the memory footprint of the database. For this large zone, this results in
a substantially larger amount of memory used on the server. The usual
method to avoid this huge increase in memory resources used is to use
a CURSOR to retrieve the zone entries in smaller chunks. The OOM process
also targets processes using more memory and this increase can cause it
to trigger and the AXFR will fail.

> If this clears up the issue, I know where to look.
>
> >
> > Oddity #1:
> > The master seems to send 3-4 NOTIFYs when the zone is updated -- at least
> > the slave is reporting in the logs to have received multiple NOTIFYs.
> > They're pretty consistently spaced: in several instances, 4 NOTIFYs with
> > intervals 3sec, 5sec, 9sec.
>
> This is just laziness - we keep sending NOTIFY packets until they are
> acknowledged, which PowerDNS only does after the AXFR succeeded.
>
This is not really a problem as you have mentioned.

> > Oddity #2:
> > The slave, upon receiving multiple NOTIFYs, initiates multiple AXFRs for the
> > same zone. For a small zone this wouldn't be a big deal, but for a large
> > zone it's fatal.
>
> This one is stupid on our side.
>
If the zone's serial number is included in the NOTIFY, it should be
possible to tell if another AXFR is needed or not.

> > Oddity #3:
> > The master consumes a *huge* amount of RAM to do each AXFR: 283 MB worth per
> > AXFR. Memory usage on the slave seems tolerable though.
> >
> > Oddity #4:
> > If an AXFR doesn't finish properly, the memory is never released. I've
> > managed to reproduce this using dig to perform the AXFR as well.
>
> 3 and 4 should go away if we debug the (probable) PostgreSQL support
> problem in PowerDNS.
>
3 and 4 are caused by the select * from the domain and the triggering
of the Linux OOM process killer. Thomas did not mention it, but AXFR's
for large zones are very slow to a PDNS slave with a PostgreSQL backend.
Adding a simple hook to the commit of the transfer of the zone allows
for a much more efficient AXFR transfer. I would certainly like to help
with any improvements that can be made to the PostgreSQL support. I can
bring my commit-zone-axfr-query hook patch up to date if that would
help. Thank you again for a great name server.

Regards,
Ken

> > Anything I'm missing or that I can do to help figure out what's wrong? Some
> > logs are included below; they are not identical in every case, but this is
> > representative.
>
> Reproducing with the BIND backend should help find the cause of this.
> You can just ask it to load the output of dig -t AXFR > zone.
>
> Please let me know! From there we'll look at the other issues you found.
>
>     Bert
> _______________________________________________
> Pdns-users mailing list
> Pdns-users@...
> http://mailman.powerdns.com/mailman/listinfo/pdns-users
>
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Odd master/slave behavior for large domains

by thomas morgan-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ken--

Thanks much for your detailed reply.

I am quite intrigued by what you've done on the slave side and the  
corresponding performance benefit in consuming AXFRs. I will be going  
to dig up that patch and consider applying it here.

--t



On Sep 11, 2009, at 7:07 AM, Kenneth Marshall wrote:

> Hi Thomas,
>
> I have read your post and your transfer problem is caused by
> the OOM killer. You should not be running out of memory on
> a database server. That is the first problem. Regarding
> the AXFR transfers when PDNS is a slave, the process used
> is to delete all the records in the zone and then re-insert
> the new set of records retrieved from the PDNS master. For
> an MVCC database like PostgreSQL this is a very expensive
> process, particularly for a large zone since IXFR transfers
> are not yet supported by PDNS.
>
> We also use PostgreSQL as the backend database for PDNS. To
> work around the lack of support for incremental updates, I
> wrote a patch to the PDNS software to add an additional hook
> to the database interface called gpgsql-commit-zone-axfr-query.
> I posted a patch to the PDNS list several years ago and it is
> still valid for the MySQL and PostgreSQL DB backends. The
> purpose of this hook is to allow a user-defined SQL command
> to be used to commit the records that have been received by
> the PDNS slave to the DB backend. The default value of this
> parameter is "COMMIT;", which is the hard-coded command that
> is used in the plugins.
>
> With the ability to control the SQL statement used to commit
> the new zone information and custom definitions for gpgsql-
> elete-zone-query and gpgsql-insert-record-query you can cause
> the AXFR transfers to use a temp table for staging and then
> only update the changed records in the actual table. Here are
> the definitions that we are currently using in our pdns.conf:
>
> #Update AXFR queries to use stage table for staging
> gpgsql-delete-zone-query=CREATE TEMPORARY TABLE stage (id INT,  
> domain_id INT, name VARCHAR(255), type VARCHAR(6), content VARCHAR
> (255), ttl INT, prio INT) ON COMMIT DROP; PREPARE axfrinsert  
> (varchar, int, int, varchar, int, varchar) AS INSERT INTO stage  
> (content,ttl,prio,type,domain_id,name) VALUES ($1,$2,$3,$4,$5,$6);
>
> gpgsql-insert-record-query=EXECUTE axfrinsert ('%s',%d,%d,'%s',
> %d,'%s');
>
> gpgsql-commit-zone-axfr-query=CREATE TEMPORARY TABLE axfrvars (name  
> VARCHAR(6), value INT) ON COMMIT DROP; \
> INSERT INTO axfrvars (name, value) VALUES ('domain', %d); \
> DELETE FROM records \
> WHERE domain_id = \
>    (SELECT value FROM axfrvars WHERE name = 'domain') AND \
>      records.id NOT IN \
>          (SELECT records.id FROM records INNER JOIN stage \
>           USING (domain_id, name, type, content, ttl, prio)); \
> INSERT INTO records (domain_id, name, type, content, ttl, prio) \
> SELECT domain_id, name, type, content, ttl, prio FROM stage \
> WHERE stage.domain_id = \
>    (SELECT value FROM axfrvars WHERE name = 'domain') EXCEPT \
>        SELECT domain_id, name, type, content, ttl, prio from records \
>        WHERE records.domain_id = \
>            (SELECT value FROM axfrvars WHERE name = 'domain'); \
> COMMIT; DEALLOCATE axfrinsert;
>
> I included a couple of simpler variations in my original post
> as well as the version we use in production. I was unable to
> make a convincing case with the developer to get this commit-hook
> added in to the default code so I have been re-patching ever since
> because of the huge performance improvements offered in the absence
> of IXFR support. My C++ coding ability is not at the level needed
> to add that support. The use of this patch increased the speed of
> large zone AXFRs by more than an order of magnitude. Let me know
> if you have any questions. I think that shutting your OOM off will
> solve your AXFR transfer failure, but you will need my patch to
> get "reasonable" zone AXFR performance.
>
> Regards,
> Ken
>
> On Thu, Sep 10, 2009 at 11:14:18PM -0600, thomas morgan wrote:
>> I'm seeing some odd behavior in a traditional master and slave setup.
>>
>> Both the master and slave are PowerDNS. I've tried 2.9.21.2 and  
>> 2.9.22 and
>> seem to get the same results. Memory usage and other figures below  
>> are from
>> 2.9.22.
>>
>> The backends are Postgres 8.3. Servers are amd64 Debian Lenny.
>>
>> I created a single zone on the server and added 2 million host  
>> records. I
>> know that's a bunch, but it is a specific use case, not just an  
>> attempt to
>> break things.
>>
>> Oddity #1:
>> The master seems to send 3-4 NOTIFYs when the zone is updated -- at  
>> least
>> the slave is reporting in the logs to have received multiple NOTIFYs.
>> They're pretty consistently spaced: in several instances, 4 NOTIFYs  
>> with
>> intervals 3sec, 5sec, 9sec.
>>
>> Oddity #2:
>> The slave, upon receiving multiple NOTIFYs, initiates multiple  
>> AXFRs for
>> the same zone. For a small zone this wouldn't be a big deal, but  
>> for a
>> large zone it's fatal.
>>
>> Oddity #3:
>> The master consumes a *huge* amount of RAM to do each AXFR: 283 MB  
>> worth
>> per AXFR. Memory usage on the slave seems tolerable though.
>>
>> Oddity #4:
>> If an AXFR doesn't finish properly, the memory is never released.  
>> I've
>> managed to reproduce this using dig to perform the AXFR as well.
>>
>> Strangely dig dies after 8600 records when piped to less, but makes  
>> it
>> through all 2 million when piped to wc or just output to the  
>> terminal. This
>> might be dig's problem, but it does make it easy to cause #4. The  
>> final
>> line of dig's output when it fails is ";; communications error: end  
>> of
>> file". When the AXFR completes, the log includes "AXFR of domain  
>> 'h0.com'
>> to 64.<client> finished". When it doesn't (and memory isn't freed),  
>> this
>> message never shows up.
>>
>> I cannot tell, but it's possible the slave is eventually figuring  
>> out it
>> has multiple transfers for the same zone going and aborts them. The  
>> logs
>> (included at the bottom) give mixed readings on this (between the  
>> master
>> and slave). It may also be the master puking and closing the  
>> connection.
>>
>>
>> On my test servers things usually end when the oomkiller rears its  
>> ugly
>> head. ;)
>>
>> The combination of these becomes a significant problem. #1 causes  
>> #2 when
>> then multiplies the effect of #3. #2 also increases the likelihood  
>> of #4.
>>
>>
>> Anything I'm missing or that I can do to help figure out what's  
>> wrong? Some
>> logs are included below; they are not identical in every case, but  
>> this is
>> representative.
>>
>> --t
>>
>>
>> Master server:
>>
>> Sep 11 02:07:45 pdns-master-test pdns[4156]: 1 domain for which we  
>> are
>> master needs notifications
>> Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of  
>> domain
>> 'h0.com' to 64.<master>
>> Sep 11 02:07:45 pdns-master-test pdns[4156]: Queued notification of  
>> domain
>> 'h0.com' to 64.<slave>
>> Sep 11 02:07:45 pdns-master-test pdns[4156]: Received NOTIFY for h0.com
>> from 64.<master> which is not a master
>> Sep 11 02:07:46 pdns-master-test pdns[4156]: Received unsuccesful
>> notification report for 'h0.com' from 64.<master>, rcode: 5
>> Sep 11 02:07:46 pdns-master-test pdns[4156]: Removed from  
>> notification
>> list: 'h0.com' to 64.<master>
>> Sep 11 02:07:46 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
>> initiated by 64.<slave>
>> Sep 11 02:07:46 pdns-master-test pdns[4156]: gpgsql Connection  
>> succesful
>> Sep 11 02:07:48 pdns-master-test pdns[4156]: No master domains need
>> notifications
>> Sep 11 02:08:13 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
>> initiated by 64.<slave>
>> Sep 11 02:08:13 pdns-master-test pdns[4156]: gpgsql Connection  
>> succesful
>> Sep 11 02:08:23 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
>> initiated by 64.<slave>
>> Sep 11 02:08:25 pdns-master-test pdns[4156]: gpgsql Connection  
>> succesful
>> Sep 11 02:08:33 pdns-master-test pdns[4156]: AXFR of domain 'h0.com'
>> initiated by 64.<slave>
>> Sep 11 02:08:34 pdns-master-test pdns[4156]: gpgsql Connection  
>> succesful
>> Sep 11 02:09:21 pdns-master-test pdns[4156]: Communicator thread died
>> because of error: GSQLBackend unable to retrieve list of slave  
>> domains:
>> PostgreSQL failed to execute command: server closed the connection
>> unexpectedly#012#011This probably means the server terminated
>> abnormally#012#011before or while processing the request.
>> (The last line is where oomkiller decided to kill off postgres.)
>>
>>
>> Slave server:
>>
>> Sep 11 02:07:45 pdns-slave-test pdns[4379]: Received valid NOTIFY for
>> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
>> Sep 11 02:07:46 pdns-slave-test pdns[4379]: gpgsql Connection  
>> succesful
>> Sep 11 02:07:48 pdns-slave-test pdns[4379]: Received valid NOTIFY for
>> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
>> Sep 11 02:07:51 pdns-slave-test pdns[4379]: AXFR started for  
>> 'h0.com',
>> transaction started
>> Sep 11 02:07:53 pdns-slave-test pdns[4379]: Received valid NOTIFY for
>> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
>> Sep 11 02:08:02 pdns-slave-test pdns[4379]: Received valid NOTIFY for
>> h0.com (id=1) from master 64.<master>: 1252634826 > 1252632174
>> Sep 11 02:08:13 pdns-slave-test pdns[4379]: Unable to AXFR zone  
>> 'h0.com':
>> Remote nameserver closed TCP connection
>> Sep 11 02:08:13 pdns-slave-test pdns[4379]: Aborting possible open
>> transaction for domain 'h0.com' AXFR
>> Sep 11 02:08:13 pdns-slave-test pdns[4379]: gpgsql Connection  
>> succesful
>> Sep 11 02:08:23 pdns-slave-test pdns[4379]: Unable to AXFR zone  
>> 'h0.com':
>> Timeout waiting for answer from 64.<master> during AXFR
>> Sep 11 02:08:23 pdns-slave-test pdns[4379]: gpgsql Connection  
>> succesful
>> Sep 11 02:08:33 pdns-slave-test pdns[4379]: Unable to AXFR zone  
>> 'h0.com':
>> Timeout waiting for answer from 64.<master> during AXFR
>> Sep 11 02:08:33 pdns-slave-test pdns[4379]: gpgsql Connection  
>> succesful
>> Sep 11 02:08:43 pdns-slave-test pdns[4379]: Unable to AXFR zone  
>> 'h0.com':
>> Timeout waiting for answer from 64.<master> during AXFR
>> (nothing more until the master's postgres daemon was restarted)
>>
>> The correlation between the "AXFR initated" on the master and the  
>> "Timeout
>> waiting for answer" on the slaves is strange too.
>>
>> --
>>
>> _______________________________________________
>> Pdns-users mailing list
>> Pdns-users@...
>> http://mailman.powerdns.com/mailman/listinfo/pdns-users
>>

_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Odd master/slave behavior for large domains

by thomas morgan-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sep 11, 2009, at 12:56 PM, bert hubert wrote:

> On Fri, Sep 11, 2009 at 7:14 AM, thomas morgan <tm@...> wrote:
>> I created a single zone on the server and added 2 million host  
>> records. I
>> know that's a bunch, but it is a specific use case, not just an  
>> attempt to
>> break things.
>
> Thomas,
>
> Many thanks for your interesting and detailed bug report! I've done
> some initial investigation, and my guess is that this is a deficiency
> in our PostgreSQL support, or perhaps in the PostgreSQL client library
> (unlikely I think).
>
> Sadly I am very busy right now, but you could help debugging this if
> you could reproduce this issue (or perhaps, fail to) using the MySQL
> backend or the BIND backend.
>
> If this clears up the issue, I know where to look.

Bert--

I tried it with the bind backend. Notes on that and other items below.

>> Oddity #1:
>> The master seems to send 3-4 NOTIFYs when the zone is updated -- at  
>> least
>> the slave is reporting in the logs to have received multiple NOTIFYs.
>> They're pretty consistently spaced: in several instances, 4 NOTIFYs  
>> with
>> intervals 3sec, 5sec, 9sec.
>
> This is just laziness - we keep sending NOTIFY packets until they are
> acknowledged, which PowerDNS only does after the AXFR succeeded.
>
>> Oddity #2:
>> The slave, upon receiving multiple NOTIFYs, initiates multiple  
>> AXFRs for the
>> same zone. For a small zone this wouldn't be a big deal, but for a  
>> large
>> zone it's fatal.
>
> This one is stupid on our side.

Thinking on this some more, in addition to keeping the slave from  
initiating multiple, identical AXFRs, what about the possibility of  
the master limiting the number of simultaneous AXFRs from a remote IP?  
I'm thinking something configurable. An overall cap on AXFRs might be  
appropriate too -- for a server with zones of any larger size, this  
would seem like a fairly easy DoS vector due to the memory usage (even  
if memory usage is substantially reduced).

Maintaining a single in-memory copy of the AXFR even for multiple  
transfers from multiple IPs would also potentially be useful.

Ken's suggestion of using a cursor when preparing the AXFR might be  
another way to improve memory use in these situations.

>> Oddity #3:
>> The master consumes a *huge* amount of RAM to do each AXFR: 283 MB  
>> worth per
>> AXFR. Memory usage on the slave seems tolerable though.
>>
>> Oddity #4:
>> If an AXFR doesn't finish properly, the memory is never released.  
>> I've
>> managed to reproduce this using dig to perform the AXFR as well.
>
> 3 and 4 should go away if we debug the (probable) PostgreSQL support
> problem in PowerDNS.
>
>> Anything I'm missing or that I can do to help figure out what's  
>> wrong? Some
>> logs are included below; they are not identical in every case, but  
>> this is
>> representative.
>
> Reproducing with the BIND backend should help find the cause of this.
> You can just ask it to load the output of dig -t AXFR > zone.
>
> Please let me know! From there we'll look at the other issues you  
> found.


I did as suggested and used the dig AXFR output as the source zone file.

The bind backend does not significantly increase memory usage for an  
AXFR. This seems expected since the zone is already loaded into  
memory. Memory sits at 156m after start and zone load -- still quite a  
bit less than the 283m when building the zone from the PG database for  
an AXFR.

However, after that things get interesting. There appears to be a  
memory leak that shows up when an AXFR (possibly any query,  
unconfirmed) is attempted during a zone reload. This might be a  
separate leak from the leak during an aborted PG-backed AXFR.

Reloads take 20-30sec on this box for the 2 million host records, so  
there's a decent sized time window for a triggering query to occur.  
The exact size of the leak seems to vary somewhat, although I did see  
PDNS up to 248m a couple of times and 299m a couple of times. It did  
seem to vary; perhaps related to the point the reload was at when the  
triggering query came in.

A completed or failed AXFR (even several at once) didn't seem to  
trigger this -- it only seemed to happen when timed during the zone  
reload.


Additionally, twice, when the slave's AXFR request came in during the  
reload, PDNS aborted and had to be respawned by the guardian:  
"Communicator thread died because of error: Zone for 'h0.com' in '/
root/pdns/h0.com' temporarily not available (file missing, or master  
dead)". Most of the time it just returned SERVFAIL while reloading  
(which appears to be correct from the documentation).


Somewhat separate I think, during this round of testing I never did  
get the slave (still running PG) to successfully complete an AXFR,  
although I did yesterday. An AXFR via dig could complete just fine. I  
suspect this is related to what Ken has written about -- with the  
slave having trouble writing all that data to the DB inefficiently.


So to summarize, the memory use when preparing for an AXFR (and leak  
when aborted) seems to happen when backed by PG but not when backed by  
bind files. I haven't tried MySQL, but may this weekend if I find some  
spare time.

The leak and crash during bind-backed zone reload is probably bind  
backend specific, although maybe not and it's just more likely to  
happen because of the large zone reload time.

While I haven't tried it yet, the hooks Ken has added for slave AXFR  
commits seem like they would be a useful addition to improving PG-
backed (and maybe any DB backed) AXFR consumption.


Let me know what other info I can provide, what else I can test, etc.

--t


_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users