Error with master slave replication

View: New views
5 Messages — Rating Filter:   Alert me  

Error with master slave replication

by Bill Pitz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

We are having trouble with master/slave replication with a couple of our
large zone files.  This is PDNS-native replication, not database
replication.

Replication is working fine to one of our slaves, but not to the other.  
The broken slave continuously throws these errors when the AXFR of the
zone is attempted from the master:

Aug 26 14:46:09 dns2 pdns[16674]: AXFR started for 'xyzzy.net',
transaction started
Aug 26 14:46:31 dns2 pdns[16674]: Unable to AXFR zone 'xyzzy.net' from
remote 'XX.XX.XX.XX': Remote nameserver closed TCP connection
Aug 26 14:46:31 dns2 pdns[16674]: Aborting possible open transaction for
domain 'xyzzy.net' AXFR

(XX.XX.XX.XX is the IP of the master.)

This happens repeatedly and the zone transfer will not succeed.  pdns
2.9.22 built from source on OpenSuSE 10.2.

If I manually transfer the zone (using host -al <zone> <server>), the
transfer succeeds every time.  But the pdns server does not seem to be
able to complete it successfully.


Any help with this issue would be greatly appreciated.


Thanks,

-Bill
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Error with master slave replication

by Piotr Przybył :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Bill


> We are having trouble with master/slave replication with a couple of our
> large zone files.  This is PDNS-native replication, not database
> replication.
>
> Replication is working fine to one of our slaves, but not to the other.
> The broken slave continuously throws these errors when the AXFR of the
> zone is attempted from the master:

Could you specify if this happens for every zone for this single slave
or only for "large" ones?
Have you checked if this slave is allowed to do AXFR? Master's config,
allow-axfr-ips?

You may login to slave machine and try to dig @XX.XX.XX.XX xyzzy.net AXFR

>
> Aug 26 14:46:09 dns2 pdns[16674]: AXFR started for 'xyzzy.net',
> transaction started
> Aug 26 14:46:31 dns2 pdns[16674]: Unable to AXFR zone 'xyzzy.net' from
> remote 'XX.XX.XX.XX': Remote nameserver closed TCP connection
> Aug 26 14:46:31 dns2 pdns[16674]: Aborting possible open transaction for
> domain 'xyzzy.net' AXFR

Take a look at master's logs as well. You may find a reason why it
closes the connection.

Hope that helps.

Regards,
--
Piotr Przybył

Software Developer
Power Media S.A.

Phone: +48 71 341 06 96 ext. 703
callto://powermedia_pprzybyl (Skype)
http://www.power.com.pl




_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

signature.asc (268 bytes) Download Attachment

Re: Error with master slave replication

by Bill Pitz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Piotr Przybył wrote:
  
We are having trouble with master/slave replication with a couple of our
large zone files.  This is PDNS-native replication, not database
replication.

Replication is working fine to one of our slaves, but not to the other. 
The broken slave continuously throws these errors when the AXFR of the
zone is attempted from the master:
    

Could you specify if this happens for every zone for this single slave
or only for "large" ones?
Have you checked if this slave is allowed to do AXFR? Master's config,
allow-axfr-ips?
  
The problem occurs for the "large" zone.  The smaller zones update just fine.

And yes, this slave is allowed to AXFR -- it retrieved the larger zone fine initially, and a few times after that.  Then it stopped.  I set up an additional slave and it has the same problem.  As it stands now, two of the slaves are in the same datacenter with the master and one is in a different facility.  Only the slave in a different facility is able to regularly transfer the large zone without issue.  I have a feeling that there is some sort of timeout occurring with database access when the slave is local, but the remote slave has a natural "throttling" due to its distance from the master.
You may login to slave machine and try to dig @XX.XX.XX.XX xyzzy.net AXFR
  
Yes, manual AXFR requests work from all of the slaves, every time.  No errors.
Aug 26 14:46:09 dns2 pdns[16674]: AXFR started for 'xyzzy.net',
transaction started
Aug 26 14:46:31 dns2 pdns[16674]: Unable to AXFR zone 'xyzzy.net' from
remote 'XX.XX.XX.XX': Remote nameserver closed TCP connection
Aug 26 14:46:31 dns2 pdns[16674]: Aborting possible open transaction for
domain 'xyzzy.net' AXFR
    

Take a look at master's logs as well. You may find a reason why it
closes the connection.
  
I've checked there as well, and unfortunately nothing of interest appears in the master's logs, even with logging increased to the maximum.

Any other ideas?

Thanks,

-Bill

_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Error with master slave replication

by Ton van Rosmalen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bill,

Bill Pitz schreef:

> Piotr Przybył wrote:
>>  
>>> We are having trouble with master/slave replication with a couple of our
>>> large zone files.  This is PDNS-native replication, not database
>>> replication.
>>>
>>> Replication is working fine to one of our slaves, but not to the other.
>>> The broken slave continuously throws these errors when the AXFR of the
>>> zone is attempted from the master:
>>>    
>>
>> Could you specify if this happens for every zone for this single slave
>> or only for "large" ones?
>> Have you checked if this slave is allowed to do AXFR? Master's config,
>> allow-axfr-ips?
>>  
> The problem occurs for the "large" zone.  The smaller zones update
> just fine.
>
> And yes, this slave is allowed to AXFR -- it retrieved the larger zone
> fine initially, and a few times after that.  Then it stopped.  I set
> up an additional slave and it has the same problem.  As it stands now,
> two of the slaves are in the same datacenter with the master and one
> is in a different facility.  Only the slave in a different facility is
> able to regularly transfer the large zone without issue.  I have a
> feeling that there is some sort of timeout occurring with database
> access when the slave is local, but the remote slave has a natural
> "throttling" due to its distance from the master.
>> You may login to slave machine and try to dig @XX.XX.XX.XX xyzzy.net AXFR
>>  
> Yes, manual AXFR requests work from all of the slaves, every time.  No
> errors.
>>> Aug 26 14:46:09 dns2 pdns[16674]: AXFR started for 'xyzzy.net',
>>> transaction started
>>> Aug 26 14:46:31 dns2 pdns[16674]: Unable to AXFR zone 'xyzzy.net' from
>>> remote 'XX.XX.XX.XX': Remote nameserver closed TCP connection
>>> Aug 26 14:46:31 dns2 pdns[16674]: Aborting possible open transaction for
>>> domain 'xyzzy.net' AXFR
>>>    
>>
>> Take a look at master's logs as well. You may find a reason why it
>> closes the connection.
>>  
> I've checked there as well, and unfortunately nothing of interest
> appears in the master's logs, even with logging increased to the maximum.
>
> Any other ideas?
Is it possible to 'force' the error by issueing a:
dig +tcp @<masternameserver ip> xyzzy.net AXFR

The error indicates a problem in tcp-connections to the supermaster.

Regards,

Ton
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Error with master slave replication

by Bill Pitz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
> Is it possible to 'force' the error by issueing a:
> dig +tcp @<masternameserver ip> xyzzy.net AXFR
>
> The error indicates a problem in tcp-connections to the supermaster.

Thanks for your reply.  It is not possible to reproduce the error by
doing manual zone transfers with dig or host.  These work and complete
successfully, every time, within a few seconds.

When the pdns server does the AXFRs, it takes longer (30+ seconds) and
then fails with the error I originally posted.  I ran some packet
captures as well, and it appears that the transmit window on the master
server fills up, there are some packets with adjusted window size, and
then the transfer fails.

This leads to my next question:

When the backend on the slave is PostgreSQL, how does the AXFR process
actually work?  Is the zone loaded into a buffer in the slave pdns
server and then inserted into the database, or does it attempt to insert
it into the database in real time?  It seems like the size of the zone
causes some delay in the completion of the database insert (since the
full zone transfer for powerdns takes 40+ seconds when it is successful,
while it only takes 4-5 seconds with dig or host) and this ultimately
triggers some sort of timeout on the master server that causes it to
dump the connection.


Thanks,

-Bill

_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users