mta hanging reading smtpaccess.dat

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi,
  I have an email system with more then 35000 accounts and that process arround 380000 messages a day on 4 servers. For more then an year I am facing random hangs on the smtp server.
 
  At a random time  (may be hours, days or weeks) the main couriertcpd keeps running and accepting connections (until the max clients are reached) but the childs processess never ends.
  Today I get some usefull strace outputs that may help to solve the problem.
  The child process get locked on a infinity loop reading the smtpaccess.dat. All the child couriertcpd I strace are on the same loop.
  smtpaccess.dat was not modified.
 
  The problem occour in all servers. I already reinstall some them. Some run Debian 32bits, Some Debian 64bits. Some are fresh install, some are old install. But the problem happens in all them.
 
  I try to debug the source code to find where is the problema but it seems too complex for me. May be you (Sam?) can help me on what I can to to solve this problem?
 
Marcus
 
1) strace for a child couriertcpd process during normal operation
------------------------------------------------
17:46:05.482106 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:05.482278 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:05.482683 brk(0x2503000) = 0x2503000
17:46:05.482974 brk(0x2524000)               = 0x2524000
17:46:05.483454 brk(0x2545000)               = 0x2545000
17:46:05.483729 lseek(4, 8192, SEEK_SET)               = 8192
17:46:05.484210 read(4, "\0\0\0\0\r\0\0\0\31\0\0\0\271\3627 216.\2474\0\0\0\0\0\0\16\0\0\0\31"..., 4096) = 4096
17:46:05.484686 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
17:46:05.485206 open("/etc/resolv.conf", O_RDONLY)               = 6
.
.
. normal process end..
---------------------------------------------
 
 
2) strace for a child couriertcpd process while on start of the lock
----------------------------------------------
17:46:43.742191 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:43.743147 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:43.748307 brk(0x2503000) = 0x2503000
17:46:43.753416 brk(0x2524000)               = 0x2524000
17:46:43.753745 brk(0x2545000) = 0x2545000
17:46:43.754507 lseek(4, 8192, SEEK_SET) = 8192
17:46:43.754836 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650
17:46:43.755069 read(4, ""..., 2446) = 0
17:46:43.756458 read(4, ""..., 2446) = 0
17:46:43.756556 read(4, ""..., 2446) = 0
17:46:43.756994 read(4, ""..., 2446) = 0
17:46:43.757080 read(4, ""..., 2446)   = 0
17:46:43.757188 read(4, ""..., 2446) = 0
17:46:43.757276 read(4, ""..., 2446)    = 0
17:46:43.757367 read(4, ""..., 2446)    = 0
17:46:43.757452 read(4, ""..., 2446)     = 0
17:46:43.757534 read(4, ""..., 2446) = 0
17:46:43.757617 read(4, ""..., 2446) = 0
17:46:43.757703 read(4, ""..., 2446)    = 0
17:46:43.757794 read(4, ""..., 2446)     = 0
17:46:43.757877 read(4, ""..., 2446) = 0
17:46:43.757960 read(4, ""..., 2446) = 0
17:46:43.758047 read(4, ""..., 2446) = 0
17:46:43.758155 read(4, ""..., 2446) = 0
17:46:43.758260 read(4, ""..., 2446) = 0
17:46:43.758570 read(4, ""..., 2446) = 0
17:46:43.758654 read(4, ""..., 2446) = 0
17:46:43.758762 read(4, "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2446) = 2446
17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
17:46:43.759258 open("/etc/resolv.conf", O_RDONLY) = 6
.
.
.
. normal process end..
----------------------------------------------
 
 
3) strace for a child couriertcpd during the smtp hang
---------------------------------------------
17:46:49.944384 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:49.950918 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:49.951420 brk(0x2503000)          = 0x2503000
17:48:05.794865 brk(0x2524000) = 0x2524000
17:48:05.795256 brk(0x2545000) = 0x2545000
17:48:45.322594 lseek(4, 8192, SEEK_SET)               = 8192
17:48:45.322770 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650
17:48:45.322967 read(4, ""..., 2446) = 0
17:48:45.323220 read(4, ""..., 2446)   = 0
17:48:45.323390 read(4, ""..., 2446) = 0
17:48:45.323529 read(4, ""..., 2446)    = 0
17:48:45.323675 read(4, ""..., 2446)    = 0
17:48:45.323812 read(4, ""..., 2446)     = 0
17:48:45.323937 read(4, ""..., 2446) = 0
17:48:45.324074 read(4, ""..., 2446) = 0
17:48:45.324215 read(4, ""..., 2446)    = 0
17:48:45.324377 read(4, ""..., 2446)     = 0
.
.
. until I restart courier-mta
---------------------------------------------


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Sam Varshavchik :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira writes:

>
>   The problem occour in all servers. I already reinstall some them. Some
> run Debian 32bits, Some Debian 64bits. Some are fresh install, some are
> old install. But the problem happens in all them.

Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't make a
difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat file. That
should take care of it.



------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

attachment0 (204 bytes) Download Attachment

Re: mta hanging reading smtpaccess.dat

by Gordon Messmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sam Varshavchik wrote:
> Marcus Pereira writes:
>>
>>   The problem occour in all servers. I already reinstall some them.
>> Some run Debian 32bits, Some Debian 64bits. Some are fresh install,
>> some are old install. But the problem happens in all them.
>
> Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't
> make a difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat
> file. That should take care of it.

Except that it's been happening for more than a year, and occurs
randomly.  My guess is a db library bug.  I'd try a different library
(probably gdbm).

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Ale2008 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira wrote:

>   At a random time  (may be hours, days or weeks) the main couriertcpd
> keeps running and accepting connections (until the max clients are
> reached) but the childs processess never ends.
> [...]
> 2) strace for a child couriertcpd process while on start of the lock
> [...]
> 17:46:43.758570 read(4, ""..., 2446) = 0
> 17:46:43.758654 read(4, ""..., 2446) = 0
> 17:46:43.758762 read(4,
> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 2446) = 2446
> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25),

If that is called before getsockname, it means it is in bdbobj_open,
right? Are processes starving because of some locking mechanism?

Is it bdb4? Is it NFS mounted?
(http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/remote.html)

Gordon Messmer wrote:
> Except that it's been happening for more than a year, and occurs
> randomly.  My guess is a db library bug.  I'd try a different library
> (probably gdbm).

Id bdb buggy?















































------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Sam Varshavchik wrote:
>> Marcus Pereira writes:
>>>
>>>   The problem occour in all servers. I already reinstall some them.
>>> Some run Debian 32bits, Some Debian 64bits. Some are fresh install,
>>> some are old install. But the problem happens in all them.
>>
>> Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't
>> make a difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat
>> file. That should take care of it.

Its not a corrupted smtpaccess.dat. I already changed it and rebuild it lot
of times.
Last week I made a cleanup and only left arround 40 lines on the smtpaccess
file.

Just rebuild it today and I already face the problem again.
For some reason the hangs become more frequenty this month.

> Except that it's been happening for more than a year, and occurs
> randomly.  My guess is a db library bug.  I'd try a different library
> (probably gdbm).

I could do that, but how?
I am using libgdbm 1.8.3

Marcus Pereira





------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by "Paweł Tęcza" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira pisze:
>> Sam Varshavchik wrote:
[...]
>> Except that it's been happening for more than a year, and occurs
>> randomly.  My guess is a db library bug.  I'd try a different library
>> (probably gdbm).
>
> I could do that, but how?
> I am using libgdbm 1.8.3

Hello Marcus,

We also noticed that problem with smtpaccess.dat file many times. We are
using libgdbm 1.8.3 (from Ubuntu package libgdbm3 1.8.3-3) like you.
It's the most recently version of libgdbm for Debian and Ubuntu.

My best regards,

Pawel


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>   At a random time  (may be hours, days or weeks) the main couriertcpd
>> keeps running and accepting connections (until the max clients are
>> reached) but the childs processess never ends.
>> [...]
>> 2) strace for a child couriertcpd process while on start of the lock
>> [...]
>> 17:46:43.758570 read(4, ""..., 2446) = 0
>> 17:46:43.758654 read(4, ""..., 2446) = 0
>> 17:46:43.758762 read(4,
>> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 2446) = 2446
>> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25),
>
> If that is called before getsockname, it means it is in bdbobj_open,
> right? Are processes starving because of some locking mechanism?

I think at this point the db (smtpaccess.dat) is already open, the hang is
when
the process makes querys.

As I could trace:
  .  tcpd/tcpd.c:
       function "accepted" calls "allowaccess"
       function "allowaccess" calls "doallowaccess"
       function  "doallowaccess" calls "chkaccess"
  . tcpd/tcpdaccess.c:
      function "chkacess" calls "dbobj_fetch"
  . bdbobj/bdbobj.c:
      function "dbobj_fetch" calls "doquery"
      ** The process get locked on a infinity loop at "doquery" function (
for (;;) )
      function "doquery"  calls "dofetch"
      function "dofetch" calls "(*obj->dbf->get)"
   From here I could not trace anymore, but I guess it is a call for the
gdbm library.
   The fetch is never returning successfully. So the function get locked on
the loop.

   May be its some lock at the file or a bug at the library, but since I
could not trace more
I send the message to the list.

> Is it bdb4? Is it NFS mounted?
> (http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/remote.html)

No, all smtpaccess.dat files are local.
Some mailboxes are NFS mounted, but at the point of the hang no NFS mounted
file is accessed.

Marcus
 


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Except that it's been happening for more than a year, and occurs
>>> randomly.  My guess is a db library bug.  I'd try a different library
>>> (probably gdbm).
>>
>> I could do that, but how?
>> I am using libgdbm 1.8.3
>
> Hello Marcus,
>
> We also noticed that problem with smtpaccess.dat file many times. We are
> using libgdbm 1.8.3 (from Ubuntu package libgdbm3 1.8.3-3) like you.
> It's the most recently version of libgdbm for Debian and Ubuntu.
>
> My best regards,
>
> Pawel

Hi Pawel,
   I have servers using Debian libdbdm3 1.8.3-3 and 1.8.3-4.

  Studing libdbm3 package on Debian I found this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
  I feel suspecious about the fix they made. I removed the patch and rebuild
the package. Its now running on 4 of my servers, no problem at all but still
too early to say this is the problem.
  If you want to try my package:
     http://dl.task.net.br/libgdbm3_1.8.3-4.1_amd64.deb   for Debian 64.
     http://dl.task.net.br/libgdbm3_1.8.3-4.1_i386.deb   for Debian 32.

Marcus Pereira


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Gordon Messmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira wrote:
>
>   Studing libdbm3 package on Debian I found this bug report:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
>   I feel suspecious about the fix they made. I removed the patch and rebuild
> the package. Its now running on 4 of my servers, no problem at all but still
> too early to say this is the problem.

That's pretty odd.  Are your dat files also on Reiser filesystems?


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>   Studing libdbm3 package on Debian I found this bug report:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
>>   I feel suspecious about the fix they made. I removed the patch and
>> rebuild
>> the package. Its now running on 4 of my servers, no problem at all but
>> still
>> too early to say this is the problem.
>
> That's pretty odd.  Are your dat files also on Reiser filesystems?

No, its on ext3.


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>   Studing libdbm3 package on Debian I found this bug report:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
>>   I feel suspecious about the fix they made. I removed the patch and
>> rebuild
>> the package. Its now running on 4 of my servers, no problem at all but
>> still
>> too early to say this is the problem.
>
> That's pretty odd.  Are your dat files also on Reiser filesystems?

No, its on ext3.


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by "Paweł Tęcza" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira pisze:

> Hi Pawel,
>    I have servers using Debian libdbdm3 1.8.3-3 and 1.8.3-4.
>
>   Studing libdbm3 package on Debian I found this bug report:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
>   I feel suspecious about the fix they made. I removed the patch and rebuild
> the package. Its now running on 4 of my servers, no problem at all but still
> too early to say this is the problem.
>   If you want to try my package:
>      http://dl.task.net.br/libgdbm3_1.8.3-4.1_amd64.deb   for Debian 64.
>      http://dl.task.net.br/libgdbm3_1.8.3-4.1_i386.deb   for Debian 32.

Thanks for the tip, Marcus! I can take a look at your packages, but I
don't know when the problem will occur again, of course.

BTW, we also have ext3 filesystem :)

Good night,

P.

PS. Here in Poland we have TASK too. It's Polish acronym for Tri-City
Academic Computer Network :D


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Ale2008 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira wrote:

>>>   At a random time  (may be hours, days or weeks) the main couriertcpd
>>> keeps running and accepting connections (until the max clients are
>>> reached) but the childs processess never ends.
>>> [...]
>>> 2) strace for a child couriertcpd process while on start of the lock
>>> [...]
>>> 17:46:43.758570 read(4, ""..., 2446) = 0
>>> 17:46:43.758654 read(4, ""..., 2446) = 0
>>> 17:46:43.758762 read(4,
>>> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>>> 2446) = 2446
>>> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25),
>>
>> If that is called before getsockname, it means it is in bdbobj_open,
>> right? Are processes starving because of some locking mechanism?
>
> I think at this point the db (smtpaccess.dat) is already open, the hang is
> when the process makes queries.
>
> As I could trace:
>   .  tcpd/tcpd.c:
>        function "accepted" calls "allowaccess"

I saw "sox_getsockname" is called before "allowaccess", thence my
guess that it was in the former call. However, yes, there is a further
call to "sox_getsockname" in "run", after "allowaccess" in the child.

>        function "allowaccess" calls "doallowaccess"
>        function  "doallowaccess" calls "chkaccess"
>   . tcpd/tcpdaccess.c:
>       function "chkacess" calls "dbobj_fetch"

Mind that you have twice #define dbobj_fetch in dbobj.h

>   . bdbobj/bdbobj.c:
>       function "dbobj_fetch" calls "doquery"
>       ** The process get locked on a infinity loop at "doquery" function (
> for (;;) )
>       function "doquery"  calls "dofetch"
>       function "dofetch" calls "(*obj->dbf->get)"
>    From here I could not trace anymore, but I guess it is a call for the
> gdbm library.

The _db-4_ library, actually. dbf->get gets mapped to an interface
function in the call to db_create (e.g. "__db_get_pp").

However, your further posts imply you are using gdbm, thus you should
check gdbmobj/gdbmobj.c: function "gdbmobj_fetch". Or have you been
switching from bdb to gdbm during the weekend?

>    The fetch is never returning successfully. So the function get locked on
> the loop.

I don't understand those repeated read() calls returning 0. It should
mean end of file, thus there should be no point in insisting. For
EINTR, read should return a negative number, and strace should report
that. (My strace looks quite similar to your "normal operation" one,
however I have pread rather than read, and no lseek, with bdb4.4)

Good luck















































------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> If that is called before getsockname, it means it is in bdbobj_open,
>>> right? Are processes starving because of some locking mechanism?
>>
>> I think at this point the db (smtpaccess.dat) is already open, the hang
>> is
>> when the process makes queries.
>>
>> As I could trace:
>>   .  tcpd/tcpd.c:
>>        function "accepted" calls "allowaccess"
>
> I saw "sox_getsockname" is called before "allowaccess", thence my
> guess that it was in the former call. However, yes, there is a further
> call to "sox_getsockname" in "run", after "allowaccess" in the child.

The call for "sox_getsockname" before "allowaccess" only happens if
"accesslocal" is set. And its not on my system.

>>        function "allowaccess" calls "doallowaccess"
>>        function  "doallowaccess" calls "chkaccess"
>>   . tcpd/tcpdaccess.c:
>>       function "chkacess" calls "dbobj_fetch"
>
> Mind that you have twice #define dbobj_fetch in dbobj.h
>
>>   . bdbobj/bdbobj.c:
>>       function "dbobj_fetch" calls "doquery"
>>       ** The process get locked on a infinity loop at "doquery" function
>> (
>> for (;;) )
>>       function "doquery"  calls "dofetch"
>>       function "dofetch" calls "(*obj->dbf->get)"
>>    From here I could not trace anymore, but I guess it is a call for the
>> gdbm library.
>
> The _db-4_ library, actually. dbf->get gets mapped to an interface
> function in the call to db_create (e.g. "__db_get_pp").
>
> However, your further posts imply you are using gdbm, thus you should
> check gdbmobj/gdbmobj.c: function "gdbmobj_fetch". Or have you been
> switching from bdb to gdbm during the weekend?

I had always use gdbm. I do not know yet how to switch to bdb, but if my fix
on gdbm do not work I will try to move.

>>    The fetch is never returning successfully. So the function get locked
>> on
>> the loop.
>
> I don't understand those repeated read() calls returning 0. It should
> mean end of file, thus there should be no point in insisting. For
> EINTR, read should return a negative number, and strace should report
> that. (My strace looks quite similar to your "normal operation" one,
> however I have pread rather than read, and no lseek, with bdb4.4)

I think this is the bug on Debian gdbm package. The patch they made to the
official release may ignore EINTR. I will test my package for more a few
days and if there is no more hangs I will repot to Debian bugtrack.

Thanks for pointing me the way.

Marcus




------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Gordon Messmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alessandro Vesely wrote:
> I don't understand those repeated read() calls returning 0. It should
> mean end of file, thus there should be no point in insisting. For
> EINTR, read should return a negative number, and strace should report
> that.

I think you're right.  You can see Debian's patch here:
http://patch-tracking.debian.net/patch/series/view/gdbm/1.8.3-3/05_handle-short-read

They've introduced a while loop that will continue if bytes were read,
or if errno == EINTR.  However, they check errno even if the return
value of read() doesn't indicate that they should.  Since read() won't
reset errno on EOF, this creates an infinite loop if errno was already
EINTR.

I'm still not enlightened as to the cause of the problem, but it seems
clear that gdbm on Debian is broken.  An interested party should file a
new bug report and ask them to fix this properly or take it out, and to
push the changes to the gdbm maintainer for review.  Like their SSL
blunder, I believe that someone who knows what they're doing might be
able to set them straight.

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: mta hanging reading smtpaccess.dat

by Gordon Messmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira wrote:
>>>   Studing libdbm3 package on Debian I found this bug report:
>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417
>> That's pretty odd.  Are your dat files also on Reiser filesystems?
> No, its on ext3.

Do you have any cron jobs that call "makesmtpaccess"?  I wonder if they
might be stomping on each other's output.

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Zombies blocking my smtp

by Marcus Pereira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi.
 
I am getting some issues at incoming smtp connections for some weeks.
 
esmtpd MAXDAEMONS is set to 350 and I at peak times I used to readh at max 150 simultaneous courieresmtpd process running.
But now I am reaching the top limit many times a day, then my servers stop acceping new connections.
 
After collecting some data on this problem it seems is related to the raising number of infected machines running smtp Zombies. This Zombies keeps trying to send spam, and, even if I deny all try, they never close the smtp connection. Courieresmtpd process keeps running for hours. At some poing submit process kills himself at the hard 1800 seconds alarm timeout. But even with the submit at defunct state courieresmtpd process keeps running.
 
My only way to keep the low number of courieresmtpd process is to block the originating IP at firewall or at a smtpaccess DENY at couriertcpd. If the send is blocked at some other point (BLACKLIST, BOTH, SPF, smtpaccess BLOCK, RELAY DENY or USER UNKNOWN) the submit process had already started and the courieresmtpd process stay locked for hours. I am loosing this game. I already blocked more 500 /24 nets but the number of locked connections keeps raising until I restart esmtpd.
 
Zombies is what I suspect, but seems smtp incomming connections are getting stucked for other reasons too.
Other strange issue that start happening is lots of "writev: Broken pipe" errors. May be its related to the Zombies, but I have seen this error happening on normal smtp connections too.
 
Is this happening to any other?
 
Anyone have an idea of how to get this problem fixed. Seems I need some configuration on courieresmtpd process to a faster timeout, or a limit of errors each smtp connection can have, or a time limit where a smtp conection shouls start the DATA transfer.
 
Below is a lot of debugging information I got. Sorry for the long post.
 
Marcus Pereira
 
 
***************************
Showing lots of submit process in defunct state:
# ps f -o pid,ppid,etime,cmd -C courieresmtpd,submit | grep -B1 defunct
18207 12590       00:09 /usr/sbin/courieresmtpd
18587 18207       00:01  \_ [submit] <defunct>
--
 8586 12590       06:43 /usr/sbin/courieresmtpd
17753  8586       00:23  \_ [submit] <defunct>
--
32301 12590       30:38 /usr/sbin/courieresmtpd
32324 32301       30:38  \_ [submit] <defunct>
--
17572 12590       39:25 /usr/sbin/courieresmtpd
17700 17572       39:21  \_ [submit] <defunct>
17383 12590       39:32 /usr/sbin/courieresmtpd
17397 17383       39:31  \_ [submit] <defunct>
15410 12590       40:29 /usr/sbin/courieresmtpd
15497 15410       40:26  \_ [submit] <defunct>
13700 12590       41:26 /usr/sbin/courieresmtpd
13746 13700       41:24  \_ [submit] <defunct>
13438 12590       41:33 /usr/sbin/courieresmtpd
13449 13438       41:33  \_ [submit] <defunct>
10004 12590       43:27 /usr/sbin/courieresmtpd
10013 10004       43:27  \_ [submit] <defunct>
 8705 12590       44:17 /usr/sbin/courieresmtpd
 8887  8705       44:11  \_ [submit] <defunct>
 7493 12590       45:10 /usr/sbin/courieresmtpd
 7716  7493       45:00  \_ [submit] <defunct>
 7172 12590       45:27 /usr/sbin/courieresmtpd
 7195  7172       45:26  \_ [submit] <defunct>
 6222 12590       46:02 /usr/sbin/courieresmtpd
 6304  6222       45:58  \_ [submit] <defunct>
 5028 12590       46:51 /usr/sbin/courieresmtpd
 5032  5028       46:51  \_ [submit] <defunct>
 5025 12590       46:51 /usr/sbin/courieresmtpd
 5031  5025       46:51  \_ [submit] <defunct>
 4328 12590       47:15 /usr/sbin/courieresmtpd
 4355  4328       47:14  \_ [submit] <defunct>
 2213 12590       48:42 /usr/sbin/courieresmtpd
 2269  2213       48:40  \_ [submit] <defunct>
 2045 12590       48:46 /usr/sbin/courieresmtpd
 2178  2045       48:44  \_ [submit] <defunct>
 1804 12590       48:55 /usr/sbin/courieresmtpd
 1856  1804       48:53  \_ [submit] <defunct>
 1726 12590       48:59 /usr/sbin/courieresmtpd
 1740  1726       48:58  \_ [submit] <defunct>
  980 12590       49:22 /usr/sbin/courieresmtpd
 2634   980       48:25  \_ [submit] <defunct>
  581 12590       49:34 /usr/sbin/courieresmtpd
  593   581       49:33  \_ [submit] <defunct>
31665 12590       50:29 /usr/sbin/courieresmtpd
31725 31665       50:28  \_ [submit] <defunct>
29477 12590       51:47 /usr/sbin/courieresmtpd
16372 29477       01:04  \_ [submit] <defunct>
28991 12590       52:06 /usr/sbin/courieresmtpd
29053 28991       52:05  \_ [submit] <defunct>
28727 12590       52:18 /usr/sbin/courieresmtpd
28740 28727       52:17  \_ [submit] <defunct>
28483 12590       52:29 /usr/sbin/courieresmtpd
28518 28483       52:27  \_ [submit] <defunct>
28121 12590       52:44 /usr/sbin/courieresmtpd
31412 28121       50:41  \_ [submit] <defunct>
27343 12590       53:11 /usr/sbin/courieresmtpd
27397 27343       53:09  \_ [submit] <defunct>
27338 12590       53:11 /usr/sbin/courieresmtpd
27347 27338       53:11  \_ [submit] <defunct>
25839 12590       54:14 /usr/sbin/courieresmtpd
25880 25839       54:13  \_ [submit] <defunct>
24281 12590       55:13 /usr/sbin/courieresmtpd
24392 24281       55:11  \_ [submit] <defunct>
23157 12590       55:54 /usr/sbin/courieresmtpd
23210 23157       55:52  \_ [submit] <defunct>
21497 12590       56:58 /usr/sbin/courieresmtpd
21502 21497       56:57  \_ [submit] <defunct>
20528 12590       57:40 /usr/sbin/courieresmtpd
20564 20528       57:38  \_ [submit] <defunct>
20255 12590       57:50 /usr/sbin/courieresmtpd
20256 20255       57:50  \_ [submit] <defunct>
19662 12590       58:16 /usr/sbin/courieresmtpd
19685 19662       58:15  \_ [submit] <defunct>
18499 12590       58:56 /usr/sbin/courieresmtpd
18562 18499       58:55  \_ [submit] <defunct>
16709 12590    01:00:03 /usr/sbin/courieresmtpd
16770 16709    01:00:01  \_ [submit] <defunct>
16433 12590    01:00:16 /usr/sbin/courieresmtpd
16447 16433    01:00:16  \_ [submit] <defunct>
15893 12590    01:00:45 /usr/sbin/courieresmtpd
15927 15893    01:00:44  \_ [submit] <defunct>
15785 12590    01:00:49 /usr/sbin/courieresmtpd
15845 15785    01:00:47  \_ [submit] <defunct>
15079 12590    01:01:11 /usr/sbin/courieresmtpd
15150 15079    01:01:09  \_ [submit] <defunct>
14685 12590    01:01:28 /usr/sbin/courieresmtpd
14720 14685    01:01:27  \_ [submit] <defunct>
13389 12590    01:02:21 /usr/sbin/courieresmtpd
13426 13389    01:02:20  \_ [submit] <defunct>
12599 12590    01:03:00 /usr/sbin/courieresmtpd
 9451 12599       43:50  \_ [submit] <defunct>
###############################################################
 
Getting, for example, strace for courieresmtpd process 20528
# tail -n 27 strace.20528
10:52:21 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:52:21 nanosleep({128, 0}, {128, 0})              = 0
10:54:29 sendto(3, "<19>Apr 17 10:54:29 courieresmtpd: error,relay=::f"..., 189, MSG_NOSIGNAL, NULL, 0)                    = 189
10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe)
10:54:29 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:54:29 write(5, "pessoal@...", 36) = 36
10:54:29 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096) = 54
10:54:29 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:54:29 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
10:54:29 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:54:29 nanosleep({128, 0}, {128, 0}) = 0
10:56:37 sendto(3, "<19>Apr 17 10:56:37 courieresmtpd: error,relay=::f"..., 189, MSG_NOSIGNAL, NULL, 0) = 189
10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe)
10:56:37 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:56:37 write(5, "elaine@...", 35) = 35
10:56:37 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096) = 54
10:56:37 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
10:56:37 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
10:56:37 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
10:56:37 nanosleep({128, 0}, {128, 0}) = 0
10:58:45 sendto(3, "<19>Apr 17 10:58:45 courieresmtpd: error,relay=::f"..., 188, MSG_NOSIGNAL, NULL, 0) = 188
10:58:45 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe)
10:58:45 --- SIGPIPE (Broken pipe) @ 0 (0) ---
10:58:45 sendto(3, "<22>Apr 17 10:58:45 courieresmtpd: error,relay=::f"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
10:58:45 select(2, NULL, [1], NULL, {4800, 0})    = ? ERESTARTNOHAND (To be restarted)
11:18:10 --- SIGCHLD (Child exited) @ 0 (0) ---
11:18:10 select(2, NULL, [1], NULL, {3634, 984000}
#########################
 
And for the child submit process pid 20564
# tail -n 27 strace.20564
10:54:29 lseek(6, 0, SEEK_SET) = 0
10:54:29 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874
10:54:29 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
10:54:29 close(6) = 0
10:54:29 munmap(0x7f368cbd3000, 4096) = 0
10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54) = 54
10:54:29 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1)                        = 1 ([{fd=0, revents=POLLIN}])
10:56:37 read(0, "elaine@...", 8192) = 35
10:56:37 statfs(".", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=34151844, f_bfree=32936550, f_bavail=31201734, f_files=8675328, f_ffree=8559136, f_fsid={162176456, -721074552}, f_namelen=255, f_frsize=4096}) = 0
10:56:37 stat(".", {st_mode=S_IFDIR|0770, st_size=4096, ...}) = 0
10:56:37 open("/proc/mounts", O_RDONLY) = 6
10:56:37 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
10:56:37 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f368cbd3000
10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874
10:56:37 read(6, "", 1024) = 0
10:56:37 lseek(6, 0, SEEK_SET)                      = 0
10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874
10:56:37 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
10:56:37 close(6) = 0
10:56:37 munmap(0x7f368cbd3000, 4096) = 0
10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54) = 54
10:56:37 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = ? ERESTART_RESTARTBLOCK (To be restarted)
11:18:10 --- SIGALRM (Alarm clock) @ 0 (0) ---
11:18:10 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7f368b8ac190}, {0x411620, [INT], SA_RESTORER|SA_RESTART, 0x7f368b8ac190}, 8) = 0
11:18:10 getpid()                       = 20564
11:18:10 kill(20564, SIGKILL)           = 0
11:18:10 +++ killed by SIGKILL +++
###############################
mail.log for this IP:
# grep 190.42.107 /var/log/mail.log
Apr 17 10:48:08 cuma courieresmtpd: started,ip=[::ffff:190.42.107.41]
Apr 17 10:48:20 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:48:36 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>,to=<caroline@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:49:08 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:50:13 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:52:21 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:54:29 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:56:37 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:58:45 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41
Apr 17 10:58:45 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,msg="502 ESMTP command error",cmd: DATA
##################################
Broken pipe logs. I am getting thousands of this message per hour:
.
.
.
Apr 17 11:04:05 cuma courieresmtpd: error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:04:18 cuma courieresmtpd: error,relay=::ffff:201.20.24.36,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:05:13 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:05:13 cuma courieresmtpd: error,relay=::ffff:81.184.51.141,msg="writev: Broken pipe",cmd: DATA
Apr 17 11:05:32 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:06:10 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET
Apr 17 11:06:49 cuma courieresmtpd: error,relay=::ffff:208.43.194.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:07:07 cuma courieresmtpd: error,relay=::ffff:64.118.95.132,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:07:49 cuma courieresmtpd: error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:04 cuma courieresmtpd: error,relay=::ffff:200.162.176.8,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:30 cuma courieresmtpd: error,relay=::ffff:174.133.25.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:11:35 cuma courieresmtpd: error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:12:28 cuma courieresmtpd: error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:12:35 cuma courieresmtpd: error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:13:09 cuma courieresmtpd: error,relay=::ffff:81.215.80.190,msg="writev: Broken pipe",cmd: QUIT
.
.
.
 
 

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: Zombies blocking my smtp

by Sam Varshavchik :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcus Pereira writes:

> and the courieresmtpd process stay locked for hours. I am loosing this
> game. I already blocked more 500 /24 nets but the number of locked
> connections keeps raising until I restart esmtpd.

I have several thousand spam sources blacklisted. I am just a couple of
vanity domains. You are a large provider.

The recommended way to address this situation is to set up a script that
monitors syslogs, and flags IP addresses that log more than X errors in Y
minutes, then firewall them. With Linux, you can insert an iptables rule
that rejects *out*bound packets to the blacklisted IPs in a manner that the
sender, in this case courieresmtpd, thinks that the connection is broken and
exits out, releasing the socket. Meanwhile the sender still remains
tarpitted, thinking that the socket is still alive.

That, and a lengthy timeout before the iptables entry gets removed, so that
the iptables does not grow without bounds, should do the trick.

> Other strange issue that start happening is lots of "writev: Broken pipe"
> errors. May be its related to the Zombies, but I have seen this error
> happening on normal smtp connections too.

This is harmless. It means that the sender dropped the connection while
Courier still had something to send it. Poorly written SMTP senders are
sending you a QUIT command, but don't bother wait for Courier to acknowledge
the QUIT and orderly shut down the socket.



------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

attachment0 (204 bytes) Download Attachment

Re: Zombies blocking my smtp

by J Potter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Sam -- I've been seeing similar problems in the past few weeks on our  
servers.

Would it be possible to have a config option for max idletime of an  
smtp connection? I.e., if no data comes in or goes out for a period  
of, say, 120 seconds, drop the connection and have the submit process  
exit? (Equivalent to Apache's "Timeout" setting.)

I understand that there's no graceful way to tell the sender that the  
connection is being quit, but at this point, it's a denial-of-service  
problem.  (And I keep increasing the MAXDAEMONS setting, but it hasn't  
helped. 200 to 300 to 400, the connections just keep stacking up.)

-Jeff

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: Zombies blocking my smtp

by Ale2008 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sam Varshavchik wrote:

> Marcus Pereira writes:
>
>> and the courieresmtpd process stay locked for hours. I am loosing this
>> game. I already blocked more 500 /24 nets but the number of locked
>> connections keeps raising until I restart esmtpd.
>
> I have several thousand spam sources blacklisted. I am just a couple of
> vanity domains. You are a large provider.
>
> The recommended way to address this situation is to set up a script that
> monitors syslogs, and flags IP addresses that log more than X errors in
> Y minutes, then firewall them.

Do you mean _any_ error? I'm only catching imap/pop3 errors: since I
see the SMTP daemon does its own tarpitting, I thought it's better not
to interfere... Actually, I've tried and block spammers' sync requests
after they send to a spamtrap, but their footprint is so light that I
don't gain much that way.

However, since you recommend firewalling, wouldn't it be advisable to
configure some kind of callback for flagging IP addresses? It seems
that monitoring syslogs is where one spends most of the cpu time.

> With Linux, you can insert an iptables
> rule that rejects *out*bound packets to the blacklisted IPs in a manner
> that the sender, in this case courieresmtpd, thinks that the connection
> is broken and exits out, releasing the socket. Meanwhile the sender
> still remains tarpitted, thinking that the socket is still alive.

Hm... blocking outbound so as to induce tarpitting apparently only
affects what the server inflicts in retribution. If I understand well,
blocking just (future) sync requests from the same IPs would result in
similar figures for the attempted connections to port 25. Correct?

> That, and a lengthy timeout before the iptables entry gets removed, so
> that the iptables does not grow without bounds, should do the trick.

I guess you mean more than a few minutes timeout... Hours? Days?

>> Other strange issue that start happening is lots of "writev: Broken
>> pipe" errors. May be its related to the Zombies, but I have seen this
>> error happening on normal smtp connections too.
>
> This is harmless. It means that the sender dropped the connection while
> Courier still had something to send it. Poorly written SMTP senders are
> sending you a QUIT command, but don't bother wait for Courier to
> acknowledge the QUIT and orderly shut down the socket.

Does that mean one shouldn't block based on this error? (Well,
blocking outbound packets should be a no-op at this point, but what
about blocking future sync requests?)

Finally, it looks like you use netfilter. May I ask what userspace
program do you run to issue verdicts? I had been looking for such
utility, but eventually had to roll my own one...
































------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
courier-users mailing list
courier-users@...
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users
< Prev | 1 - 2 - 3 | Next >