Possible DNS DOS?

View: New views
5 Messages — Rating Filter:   Alert me  

Possible DNS DOS?

by Chris Modesitt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

I have an interesting problem that has been happening for about 2 weeks.  First a little about my setup, currently I am running the following:

 

Debian 5.0 (Lenny)

Pdns-server 2.9.22-1

Pdns-backend-mysql 2.9.21.2-1

Pdns-recursor 3.1.7-1

 

Hardware Platform is a Dell 1850 (dual processor), 8 GIG ram running a VMWARE virtualized environment.

 

We are hosting about 100 forwarding lookup domains and a lot of reverse delegation zones (we are an ISP with about 40,000 IP addresses we currently manage).

 

Our system is fairly busy but under normal traffic I very rarely see much load on the processor/network cards.

 

This server is the primary server and we have a few (mysql slaves) that replicate off of its database.  Under normal circumstances (4 or 5 days in a row) database queue averages 0 and spikes to 2 (so the database is keeping up just fine).

 

What I have been seeing recently show up in the logs is:

 

Jun 22 09:09:38 dns1 pdns[10948]: 5003 questions waiting for database attention. Limit is 5000, respawning

Jun 22 09:09:39 dns1 pdns[2538]: Our pdns instance exited with code 1

Jun 22 09:09:39 dns1 pdns[2538]: Respawning

Jun 22 09:09:39 dns1 kernel: [724751.668503] UDP: bad checksum. From 71.113.153.36:61250 to 208.187.180.2:53 ulen 46

Jun 22 09:09:40 dns1 pdns[10957]: Guardian is launching an instance

Jun 22 09:09:40 dns1 pdns[10957]: Reading random entropy from '/dev/urandom'

Jun 22 09:09:40 dns1 pdns[10957]: This is module gmysqlbackend.so reporting

Jun 22 09:09:40 dns1 pdns[10957]: This is a guarded instance of pdns

Jun 22 09:09:40 dns1 pdns[10957]: It is advised to bind to explicit addresses with the --local-address option

Jun 22 09:09:40 dns1 pdns[10957]: UDP server bound to 0.0.0.0:53

Jun 22 09:09:40 dns1 pdns[10957]: TCP server bound to 0.0.0.0:53

Jun 22 09:09:40 dns1 pdns[10957]: PowerDNS 2.9.22 (C) 2001-2009 PowerDNS.COM BV (Mar 22 2009, 16:58:52, gcc 4.3.2) starting up

Jun 22 09:09:40 dns1 pdns[10957]: PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.

Jun 22 09:09:40 dns1 pdns[10957]: DNS Proxy launched, local port 24312, remote 127.0.0.1:5300

Jun 22 09:09:40 dns1 pdns[10957]: Master/slave communicator launching

Jun 22 09:09:40 dns1 pdns[10957]: Creating backend connection for TCP

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: About to create 3 backend threads for UDP

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: All slave domains are fresh

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: Done launching threads, ready to distribute questions

 

I will see this 11 to 12 times in less than 1 minute, network traffic and eth0 interrupts spike quickly during this time (feeling a little like a DNS denial of service).  After this happens about 11 times I see the following in the logs:

 

Jun 22 09:09:41 dns1 pdns[10957]: 5029 questions waiting for database attention. Limit is 5000, respawning

Jun 22 09:09:41 dns1 pdns[10957]: Got a signal 11, attempting to print trace:

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance [0x80ba397]

Jun 22 09:09:41 dns1 pdns[10957]: [0xb7f83400]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN5boost11multi_index6detail13ordered_indexINS0_13composite_keyIN11PacketCache10CacheEntryENS0_6memberIS5_SsXadL_ZNS5_5qnameEEEEENS6_IS5_tXadL_

ZNS5_5qtypeEEEEENS6_IS5_tXadL_ZNS5_5ctypeEEEEENS6_IS5_iXadL_ZNS5_6zoneIDEEEEENS6_IS5_bXadL_ZNS5_15meritsRecursionEEEEENS_6tuples9null_typeESD_SD_SD_SD_EENS0_21composite_key_compareI24CIBackwardsStringCompareSt

4lessItESI_SH_IiESH_IbESD_SD_SD_SD_SD_EENS1_9nth_layerILi1ES5_NS0_10indexed_byINS0_14ordered_uniqueISE_SL_N4mpl_2naEEENS0_9sequencedINS0_3tagISQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EEEESQ_

SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EESaIS5_EEENS_3mpl7vector0ISQ_EENS1_18ordered_unique_tagEE10link_pointERKNS0_20composite_key_resultISE_EERNS13_9link_infoES12_+0x286) [0x809f606]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN11PacketCache6insertERKSsRK5QTypeNS_14CacheEntryTypeES1_jib+0x103) [0x809a3c3]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend11addNegCacheERKNS_8QuestionE+0x8e) [0x80c32de]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend3getER17DNSResourceRecord+0x12f) [0x80c351f]

 

After this entry PDNS is down and stays down.

 

So a couple of questions for the group, I already have a wire shark up doing a long term capture (so I can see what is being sent at the server).  However is there a way PDNS can email/notify when it dies and does not come back?  Also what type of information/logging should I be enabling the system to further diagnose or troubleshoot the issue?

 

Any help/feedback is greatly appreciated.

 

Thanks

 

--Chris


_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Possible DNS DOS?

by Brad Dameron (Contractor) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Look at using monit. It can monitor services and email or even restart the service for you.
 
Brad Dameron

(425)216-4691 Desk
(360)340-7431 Mobile
IM: serpent6877@...

 


From: pdns-users-bounces@... [mailto:pdns-users-bounces@...] On Behalf Of Chris Modesitt
Sent: Monday, June 22, 2009 3:28 PM
To: pdns-users@...
Subject: [Pdns-users] Possible DNS DOS?

I have an interesting problem that has been happening for about 2 weeks.  First a little about my setup, currently I am running the following:

 

Debian 5.0 (Lenny)

Pdns-server 2.9.22-1

Pdns-backend-mysql 2.9.21.2-1

Pdns-recursor 3.1.7-1

 

Hardware Platform is a Dell 1850 (dual processor), 8 GIG ram running a VMWARE virtualized environment.

 

We are hosting about 100 forwarding lookup domains and a lot of reverse delegation zones (we are an ISP with about 40,000 IP addresses we currently manage).

 

Our system is fairly busy but under normal traffic I very rarely see much load on the processor/network cards.

 

This server is the primary server and we have a few (mysql slaves) that replicate off of its database.  Under normal circumstances (4 or 5 days in a row) database queue averages 0 and spikes to 2 (so the database is keeping up just fine).

 

What I have been seeing recently show up in the logs is:

 

Jun 22 09:09:38 dns1 pdns[10948]: 5003 questions waiting for database attention. Limit is 5000, respawning

Jun 22 09:09:39 dns1 pdns[2538]: Our pdns instance exited with code 1

Jun 22 09:09:39 dns1 pdns[2538]: Respawning

Jun 22 09:09:39 dns1 kernel: [724751.668503] UDP: bad checksum. From 71.113.153.36:61250 to 208.187.180.2:53 ulen 46

Jun 22 09:09:40 dns1 pdns[10957]: Guardian is launching an instance

Jun 22 09:09:40 dns1 pdns[10957]: Reading random entropy from '/dev/urandom'

Jun 22 09:09:40 dns1 pdns[10957]: This is module gmysqlbackend.so reporting

Jun 22 09:09:40 dns1 pdns[10957]: This is a guarded instance of pdns

Jun 22 09:09:40 dns1 pdns[10957]: It is advised to bind to explicit addresses with the --local-address option

Jun 22 09:09:40 dns1 pdns[10957]: UDP server bound to 0.0.0.0:53

Jun 22 09:09:40 dns1 pdns[10957]: TCP server bound to 0.0.0.0:53

Jun 22 09:09:40 dns1 pdns[10957]: PowerDNS 2.9.22 (C) 2001-2009 PowerDNS.COM BV (Mar 22 2009, 16:58:52, gcc 4.3.2) starting up

Jun 22 09:09:40 dns1 pdns[10957]: PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.

Jun 22 09:09:40 dns1 pdns[10957]: DNS Proxy launched, local port 24312, remote 127.0.0.1:5300

Jun 22 09:09:40 dns1 pdns[10957]: Master/slave communicator launching

Jun 22 09:09:40 dns1 pdns[10957]: Creating backend connection for TCP

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: About to create 3 backend threads for UDP

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: All slave domains are fresh

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: gmysql Connection succesful

Jun 22 09:09:40 dns1 pdns[10957]: Done launching threads, ready to distribute questions

 

I will see this 11 to 12 times in less than 1 minute, network traffic and eth0 interrupts spike quickly during this time (feeling a little like a DNS denial of service).  After this happens about 11 times I see the following in the logs:

 

Jun 22 09:09:41 dns1 pdns[10957]: 5029 questions waiting for database attention. Limit is 5000, respawning

Jun 22 09:09:41 dns1 pdns[10957]: Got a signal 11, attempting to print trace:

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance [0x80ba397]

Jun 22 09:09:41 dns1 pdns[10957]: [0xb7f83400]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN5boost11multi_index6detail13ordered_indexINS0_13composite_keyIN11PacketCache10CacheEntryENS0_6memberIS5_SsXadL_ZNS5_5qnameEEEEENS6_IS5_tXadL_

ZNS5_5qtypeEEEEENS6_IS5_tXadL_ZNS5_5ctypeEEEEENS6_IS5_iXadL_ZNS5_6zoneIDEEEEENS6_IS5_bXadL_ZNS5_15meritsRecursionEEEEENS_6tuples9null_typeESD_SD_SD_SD_EENS0_21composite_key_compareI24CIBackwardsStringCompareSt

4lessItESI_SH_IiESH_IbESD_SD_SD_SD_SD_EENS1_9nth_layerILi1ES5_NS0_10indexed_byINS0_14ordered_uniqueISE_SL_N4mpl_2naEEENS0_9sequencedINS0_3tagISQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EEEESQ_

SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EESaIS5_EEENS_3mpl7vector0ISQ_EENS1_18ordered_unique_tagEE10link_pointERKNS0_20composite_key_resultISE_EERNS13_9link_infoES12_+0x286) [0x809f606]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN11PacketCache6insertERKSsRK5QTypeNS_14CacheEntryTypeES1_jib+0x103) [0x809a3c3]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend11addNegCacheERKNS_8QuestionE+0x8e) [0x80c32de]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend3getER17DNSResourceRecord+0x12f) [0x80c351f]

 

After this entry PDNS is down and stays down.

 

So a couple of questions for the group, I already have a wire shark up doing a long term capture (so I can see what is being sent at the server).  However is there a way PDNS can email/notify when it dies and does not come back?  Also what type of information/logging should I be enabling the system to further diagnose or troubleshoot the issue?

 

Any help/feedback is greatly appreciated.

 

Thanks

 

--Chris



_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Possible DNS DOS?

by Pascal-42 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend11addNegCacheERKNS_8QuestionE+0x8e) [0x80c32de]

Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance(_ZN12UeberBackend3getER17DNSResourceRecord+0x12f) [0x80c351f]

 

After this entry PDNS is down and stays down.

We have not experienced problems like this (and I managed dns servers with 100k+ of domains so a lot of queries there).

But I am curious does the guardian not restart it automagicly? Or does the guardian even run at all?

As for me the guardian keeps on respawning until forever or until the problem is fixed. Not that that happens a lot and is usually because the database is (intentionally) down.

Apart from this we monitor all systems with nagios. Through an event handler it can also restart services.


Cheers,
Pascal


_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Possible DNS DOS?

by bert hubert-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 23, 2009 at 12:27 AM, Chris Modesitt<chris@...> wrote:
> What I have been seeing recently show up in the logs is:
> Jun 22 09:09:38 dns1 pdns[10948]: 5003 questions waiting for database
> attention. Limit is 5000, respawning

This is very consistent with a (brief) spike in queries.


> Jun 22 09:09:41 dns1 pdns[10957]: Got a signal 11, attempting to print
> trace:
>
> Jun 22 09:09:41 dns1 pdns[10957]: /usr/sbin/pdns_server-instance [0x80ba397]
>
> Jun 22 09:09:41 dns1 pdns[10957]: [0xb7f83400]
>
> Jun 22 09:09:41 dns1 pdns[10957]:
> /usr/sbin/pdns_server-instance(_ZN5boost11multi_index6detail13ordered_indexINS0_13composite_keyIN11PacketCache10CacheEntryENS0_6memberIS5_SsXadL_ZNS5_5qnameEEEEENS6_IS5_tXadL_

This is the second message in two days reporting a crash in this
place. Something interesting must be going on there, will look into
it.

> After this entry PDNS is down and stays down.

If you see this happening again, can you check if all PowerDNS
processes are gone, or if one is 'hanging around', preventing a
restart?

> So a couple of questions for the group, I already have a wire shark up doing
> a long term capture (so I can see what is being sent at the server).
> However is there a way PDNS can email/notify when it dies and does not come
> back?  Also what type of information/logging should I be enabling the system
> to further diagnose or troubleshoot the issue?

Other messages had good suggestions, in general I'd advise to run
monitoring tools that provide graphs of query rates.

Another trick is to run PowerDNS like this:
# while true; do pdns_server --daemon=no ; done

But this really should not be necessary of course. I'm looking into the crash.

    Bert
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Re: Possible DNS DOS?

by bert hubert-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 23, 2009 at 12:27 AM, Chris Modesitt<chris@...> wrote:
> I have an interesting problem that has been happening for about 2 weeks.
> First a little about my setup, currently I am running the following:

Ok - this issue has probably been fixed in commits 1364 and 1365.

What happened was that during the attempt to restart PowerDNS it
either crashed, or blocked.

This in turn happened because a restart attempts to do a full cleanup
of the packet cache, which had problems if it happened under high
query load (which does not stop for the restart).

I've now modified the code not to do a full cleanup when attempting to
restart for such purposes. In addition, the packet cache should now be
able to deal with a cleanup in progress during queries, but that
should not happen anymore.

The quick workaround is to raise the 5000 query limit to 50000, and
hope the problem goes away, or to run one of the 2.9.23 snapshots I'll
be generating shortly.

   Bert
_______________________________________________
Pdns-users mailing list
Pdns-users@...
http://mailman.powerdns.com/mailman/listinfo/pdns-users