|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
mta hanging reading smtpaccess.datHi,
I have an email system with more
then 35000 accounts and that process arround 380000 messages a day on 4
servers. For more then an year I am facing random hangs on the smtp
server.
At a random time (may be hours,
days or weeks) the main couriertcpd keeps running and accepting connections
(until the max clients are reached) but the childs processess never
ends.
Today I get some usefull strace outputs that
may help to solve the problem.
The child process get
locked on a infinity loop reading the smtpaccess.dat. All the child
couriertcpd I strace are on the same loop.
smtpaccess.dat was not
modified.
The problem occour in all servers. I already
reinstall some them. Some run Debian 32bits, Some Debian 64bits. Some are fresh
install, some are old install. But the problem happens in all them.
Marcus
1) strace for a child couriertcpd process during
normal operation
------------------------------------------------
17:46:05.482106 rt_sigaction(SIGCHLD, {SIG_DFL},
{0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) =
0
17:46:05.482278 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 17:46:05.482683 brk(0x2503000) = 0x2503000 17:46:05.482974 brk(0x2524000) = 0x2524000 17:46:05.483454 brk(0x2545000) = 0x2545000 17:46:05.483729 lseek(4, 8192, SEEK_SET) = 8192 17:46:05.484210 read(4, "\0\0\0\0\r\0\0\0\31\0\0\0\271\3627 216.\2474\0\0\0\0\0\0\16\0\0\0\31"..., 4096) = 4096 17:46:05.484686 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 17:46:05.485206 open("/etc/resolv.conf", O_RDONLY) = 6 . .
. normal process end..
---------------------------------------------
2) strace for a child couriertcpd process while on
start of the lock
----------------------------------------------
17:46:43.742191 rt_sigaction(SIGCHLD, {SIG_DFL},
{0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) =
0
17:46:43.743147 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 17:46:43.748307 brk(0x2503000) = 0x2503000 17:46:43.753416 brk(0x2524000) = 0x2524000 17:46:43.753745 brk(0x2545000) = 0x2545000 17:46:43.754507 lseek(4, 8192, SEEK_SET) = 8192 17:46:43.754836 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650 17:46:43.755069 read(4, ""..., 2446) = 0 17:46:43.756458 read(4, ""..., 2446) = 0 17:46:43.756556 read(4, ""..., 2446) = 0 17:46:43.756994 read(4, ""..., 2446) = 0 17:46:43.757080 read(4, ""..., 2446) = 0 17:46:43.757188 read(4, ""..., 2446) = 0 17:46:43.757276 read(4, ""..., 2446) = 0 17:46:43.757367 read(4, ""..., 2446) = 0 17:46:43.757452 read(4, ""..., 2446) = 0 17:46:43.757534 read(4, ""..., 2446) = 0 17:46:43.757617 read(4, ""..., 2446) = 0 17:46:43.757703 read(4, ""..., 2446) = 0 17:46:43.757794 read(4, ""..., 2446) = 0 17:46:43.757877 read(4, ""..., 2446) = 0 17:46:43.757960 read(4, ""..., 2446) = 0 17:46:43.758047 read(4, ""..., 2446) = 0 17:46:43.758155 read(4, ""..., 2446) = 0 17:46:43.758260 read(4, ""..., 2446) = 0 17:46:43.758570 read(4, ""..., 2446) = 0 17:46:43.758654 read(4, ""..., 2446) = 0 17:46:43.758762 read(4, "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2446) = 2446 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 17:46:43.759258 open("/etc/resolv.conf", O_RDONLY) = 6 . .
.
. normal process
end.. ----------------------------------------------3) strace for a child couriertcpd during the smtp
hang
---------------------------------------------
17:46:49.944384 rt_sigaction(SIGCHLD, {SIG_DFL},
{0x404ae0, [CHLD], SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) =
0
17:46:49.950918 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 17:46:49.951420 brk(0x2503000) = 0x2503000 17:48:05.794865 brk(0x2524000) = 0x2524000 17:48:05.795256 brk(0x2545000) = 0x2545000 17:48:45.322594 lseek(4, 8192, SEEK_SET) = 8192 17:48:45.322770 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650 17:48:45.322967 read(4, ""..., 2446) = 0 17:48:45.323220 read(4, ""..., 2446) = 0 17:48:45.323390 read(4, ""..., 2446) = 0 17:48:45.323529 read(4, ""..., 2446) = 0 17:48:45.323675 read(4, ""..., 2446) = 0 17:48:45.323812 read(4, ""..., 2446) = 0 17:48:45.323937 read(4, ""..., 2446) = 0 17:48:45.324074 read(4, ""..., 2446) = 0 17:48:45.324215 read(4, ""..., 2446) = 0 17:48:45.324377 read(4, ""..., 2446) = 0 . .
. until I restart courier-mta
---------------------------------------------
------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira writes:
> > The problem occour in all servers. I already reinstall some them. Some > run Debian 32bits, Some Debian 64bits. Some are fresh install, some are > old install. But the problem happens in all them. Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't make a difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat file. That should take care of it. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datSam Varshavchik wrote:
> Marcus Pereira writes: >> >> The problem occour in all servers. I already reinstall some them. >> Some run Debian 32bits, Some Debian 64bits. Some are fresh install, >> some are old install. But the problem happens in all them. > > Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't > make a difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat > file. That should take care of it. Except that it's been happening for more than a year, and occurs randomly. My guess is a db library bug. I'd try a different library (probably gdbm). ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira wrote:
> At a random time (may be hours, days or weeks) the main couriertcpd > keeps running and accepting connections (until the max clients are > reached) but the childs processess never ends. > [...] > 2) strace for a child couriertcpd process while on start of the lock > [...] > 17:46:43.758570 read(4, ""..., 2446) = 0 > 17:46:43.758654 read(4, ""..., 2446) = 0 > 17:46:43.758762 read(4, > "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 2446) = 2446 > 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), If that is called before getsockname, it means it is in bdbobj_open, right? Are processes starving because of some locking mechanism? Is it bdb4? Is it NFS mounted? (http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/remote.html) Gordon Messmer wrote: > Except that it's been happening for more than a year, and occurs > randomly. My guess is a db library bug. I'd try a different library > (probably gdbm). Id bdb buggy? ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat> Sam Varshavchik wrote:
>> Marcus Pereira writes: >>> >>> The problem occour in all servers. I already reinstall some them. >>> Some run Debian 32bits, Some Debian 64bits. Some are fresh install, >>> some are old install. But the problem happens in all them. >> >> Sounds like a corrupted smtpaccess.dat file. A simple reinstall won't >> make a difference. Run 'makesmtpaccess' to rebuild the smtpaccess.dat >> file. That should take care of it. Its not a corrupted smtpaccess.dat. I already changed it and rebuild it lot of times. Last week I made a cleanup and only left arround 40 lines on the smtpaccess file. Just rebuild it today and I already face the problem again. For some reason the hangs become more frequenty this month. > Except that it's been happening for more than a year, and occurs > randomly. My guess is a db library bug. I'd try a different library > (probably gdbm). I could do that, but how? I am using libgdbm 1.8.3 Marcus Pereira ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira pisze:
>> Sam Varshavchik wrote: [...] >> Except that it's been happening for more than a year, and occurs >> randomly. My guess is a db library bug. I'd try a different library >> (probably gdbm). > > I could do that, but how? > I am using libgdbm 1.8.3 Hello Marcus, We also noticed that problem with smtpaccess.dat file many times. We are using libgdbm 1.8.3 (from Ubuntu package libgdbm3 1.8.3-3) like you. It's the most recently version of libgdbm for Debian and Ubuntu. My best regards, Pawel ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat>> At a random time (may be hours, days or weeks) the main couriertcpd
>> keeps running and accepting connections (until the max clients are >> reached) but the childs processess never ends. >> [...] >> 2) strace for a child couriertcpd process while on start of the lock >> [...] >> 17:46:43.758570 read(4, ""..., 2446) = 0 >> 17:46:43.758654 read(4, ""..., 2446) = 0 >> 17:46:43.758762 read(4, >> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> 2446) = 2446 >> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), > > If that is called before getsockname, it means it is in bdbobj_open, > right? Are processes starving because of some locking mechanism? I think at this point the db (smtpaccess.dat) is already open, the hang is when the process makes querys. As I could trace: . tcpd/tcpd.c: function "accepted" calls "allowaccess" function "allowaccess" calls "doallowaccess" function "doallowaccess" calls "chkaccess" . tcpd/tcpdaccess.c: function "chkacess" calls "dbobj_fetch" . bdbobj/bdbobj.c: function "dbobj_fetch" calls "doquery" ** The process get locked on a infinity loop at "doquery" function ( for (;;) ) function "doquery" calls "dofetch" function "dofetch" calls "(*obj->dbf->get)" From here I could not trace anymore, but I guess it is a call for the gdbm library. The fetch is never returning successfully. So the function get locked on the loop. May be its some lock at the file or a bug at the library, but since I could not trace more I send the message to the list. > Is it bdb4? Is it NFS mounted? > (http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/remote.html) No, all smtpaccess.dat files are local. Some mailboxes are NFS mounted, but at the point of the hang no NFS mounted file is accessed. Marcus ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat>>> Except that it's been happening for more than a year, and occurs
>>> randomly. My guess is a db library bug. I'd try a different library >>> (probably gdbm). >> >> I could do that, but how? >> I am using libgdbm 1.8.3 > > Hello Marcus, > > We also noticed that problem with smtpaccess.dat file many times. We are > using libgdbm 1.8.3 (from Ubuntu package libgdbm3 1.8.3-3) like you. > It's the most recently version of libgdbm for Debian and Ubuntu. > > My best regards, > > Pawel Hi Pawel, I have servers using Debian libdbdm3 1.8.3-3 and 1.8.3-4. Studing libdbm3 package on Debian I found this bug report: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 I feel suspecious about the fix they made. I removed the patch and rebuild the package. Its now running on 4 of my servers, no problem at all but still too early to say this is the problem. If you want to try my package: http://dl.task.net.br/libgdbm3_1.8.3-4.1_amd64.deb for Debian 64. http://dl.task.net.br/libgdbm3_1.8.3-4.1_i386.deb for Debian 32. Marcus Pereira ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira wrote:
> > Studing libdbm3 package on Debian I found this bug report: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 > I feel suspecious about the fix they made. I removed the patch and rebuild > the package. Its now running on 4 of my servers, no problem at all but still > too early to say this is the problem. That's pretty odd. Are your dat files also on Reiser filesystems? ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat>> Studing libdbm3 package on Debian I found this bug report:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 >> I feel suspecious about the fix they made. I removed the patch and >> rebuild >> the package. Its now running on 4 of my servers, no problem at all but >> still >> too early to say this is the problem. > > That's pretty odd. Are your dat files also on Reiser filesystems? No, its on ext3. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat>> Studing libdbm3 package on Debian I found this bug report:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 >> I feel suspecious about the fix they made. I removed the patch and >> rebuild >> the package. Its now running on 4 of my servers, no problem at all but >> still >> too early to say this is the problem. > > That's pretty odd. Are your dat files also on Reiser filesystems? No, its on ext3. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira pisze:
> Hi Pawel, > I have servers using Debian libdbdm3 1.8.3-3 and 1.8.3-4. > > Studing libdbm3 package on Debian I found this bug report: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 > I feel suspecious about the fix they made. I removed the patch and rebuild > the package. Its now running on 4 of my servers, no problem at all but still > too early to say this is the problem. > If you want to try my package: > http://dl.task.net.br/libgdbm3_1.8.3-4.1_amd64.deb for Debian 64. > http://dl.task.net.br/libgdbm3_1.8.3-4.1_i386.deb for Debian 32. Thanks for the tip, Marcus! I can take a look at your packages, but I don't know when the problem will occur again, of course. BTW, we also have ext3 filesystem :) Good night, P. PS. Here in Poland we have TASK too. It's Polish acronym for Tri-City Academic Computer Network :D ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira wrote:
>>> At a random time (may be hours, days or weeks) the main couriertcpd >>> keeps running and accepting connections (until the max clients are >>> reached) but the childs processess never ends. >>> [...] >>> 2) strace for a child couriertcpd process while on start of the lock >>> [...] >>> 17:46:43.758570 read(4, ""..., 2446) = 0 >>> 17:46:43.758654 read(4, ""..., 2446) = 0 >>> 17:46:43.758762 read(4, >>> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>> 2446) = 2446 >>> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), >> >> If that is called before getsockname, it means it is in bdbobj_open, >> right? Are processes starving because of some locking mechanism? > > I think at this point the db (smtpaccess.dat) is already open, the hang is > when the process makes queries. > > As I could trace: > . tcpd/tcpd.c: > function "accepted" calls "allowaccess" I saw "sox_getsockname" is called before "allowaccess", thence my guess that it was in the former call. However, yes, there is a further call to "sox_getsockname" in "run", after "allowaccess" in the child. > function "allowaccess" calls "doallowaccess" > function "doallowaccess" calls "chkaccess" > . tcpd/tcpdaccess.c: > function "chkacess" calls "dbobj_fetch" Mind that you have twice #define dbobj_fetch in dbobj.h > . bdbobj/bdbobj.c: > function "dbobj_fetch" calls "doquery" > ** The process get locked on a infinity loop at "doquery" function ( > for (;;) ) > function "doquery" calls "dofetch" > function "dofetch" calls "(*obj->dbf->get)" > From here I could not trace anymore, but I guess it is a call for the > gdbm library. The _db-4_ library, actually. dbf->get gets mapped to an interface function in the call to db_create (e.g. "__db_get_pp"). However, your further posts imply you are using gdbm, thus you should check gdbmobj/gdbmobj.c: function "gdbmobj_fetch". Or have you been switching from bdb to gdbm during the weekend? > The fetch is never returning successfully. So the function get locked on > the loop. I don't understand those repeated read() calls returning 0. It should mean end of file, thus there should be no point in insisting. For EINTR, read should return a negative number, and strace should report that. (My strace looks quite similar to your "normal operation" one, however I have pread rather than read, and no lseek, with bdb4.4) Good luck ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.dat>>> If that is called before getsockname, it means it is in bdbobj_open,
>>> right? Are processes starving because of some locking mechanism? >> >> I think at this point the db (smtpaccess.dat) is already open, the hang >> is >> when the process makes queries. >> >> As I could trace: >> . tcpd/tcpd.c: >> function "accepted" calls "allowaccess" > > I saw "sox_getsockname" is called before "allowaccess", thence my > guess that it was in the former call. However, yes, there is a further > call to "sox_getsockname" in "run", after "allowaccess" in the child. The call for "sox_getsockname" before "allowaccess" only happens if "accesslocal" is set. And its not on my system. >> function "allowaccess" calls "doallowaccess" >> function "doallowaccess" calls "chkaccess" >> . tcpd/tcpdaccess.c: >> function "chkacess" calls "dbobj_fetch" > > Mind that you have twice #define dbobj_fetch in dbobj.h > >> . bdbobj/bdbobj.c: >> function "dbobj_fetch" calls "doquery" >> ** The process get locked on a infinity loop at "doquery" function >> ( >> for (;;) ) >> function "doquery" calls "dofetch" >> function "dofetch" calls "(*obj->dbf->get)" >> From here I could not trace anymore, but I guess it is a call for the >> gdbm library. > > The _db-4_ library, actually. dbf->get gets mapped to an interface > function in the call to db_create (e.g. "__db_get_pp"). > > However, your further posts imply you are using gdbm, thus you should > check gdbmobj/gdbmobj.c: function "gdbmobj_fetch". Or have you been > switching from bdb to gdbm during the weekend? I had always use gdbm. I do not know yet how to switch to bdb, but if my fix on gdbm do not work I will try to move. >> The fetch is never returning successfully. So the function get locked >> on >> the loop. > > I don't understand those repeated read() calls returning 0. It should > mean end of file, thus there should be no point in insisting. For > EINTR, read should return a negative number, and strace should report > that. (My strace looks quite similar to your "normal operation" one, > however I have pread rather than read, and no lseek, with bdb4.4) I think this is the bug on Debian gdbm package. The patch they made to the official release may ignore EINTR. I will test my package for more a few days and if there is no more hangs I will repot to Debian bugtrack. Thanks for pointing me the way. Marcus ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datAlessandro Vesely wrote:
> I don't understand those repeated read() calls returning 0. It should > mean end of file, thus there should be no point in insisting. For > EINTR, read should return a negative number, and strace should report > that. I think you're right. You can see Debian's patch here: http://patch-tracking.debian.net/patch/series/view/gdbm/1.8.3-3/05_handle-short-read They've introduced a while loop that will continue if bytes were read, or if errno == EINTR. However, they check errno even if the return value of read() doesn't indicate that they should. Since read() won't reset errno on EOF, this creates an infinite loop if errno was already EINTR. I'm still not enlightened as to the cause of the problem, but it seems clear that gdbm on Debian is broken. An interested party should file a new bug report and ask them to fix this properly or take it out, and to push the changes to the gdbm maintainer for review. Like their SSL blunder, I believe that someone who knows what they're doing might be able to set them straight. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: mta hanging reading smtpaccess.datMarcus Pereira wrote:
>>> Studing libdbm3 package on Debian I found this bug report: >>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274417 >> That's pretty odd. Are your dat files also on Reiser filesystems? > No, its on ext3. Do you have any cron jobs that call "makesmtpaccess"? I wonder if they might be stomping on each other's output. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Zombies blocking my smtpHi.
I am getting some issues at incoming
smtp connections for some weeks.
esmtpd MAXDAEMONS is set to 350 and I at peak times
I used to readh at max 150 simultaneous courieresmtpd process
running.
But now I am reaching the top limit many times a
day, then my servers stop acceping new connections.
After collecting some data on this problem it seems
is related to the raising number of infected machines running smtp Zombies. This
Zombies keeps trying to send spam, and, even if I deny all try, they never close
the smtp connection. Courieresmtpd process keeps running for hours. At some
poing submit process kills himself at the hard 1800 seconds alarm timeout. But
even with the submit at defunct state courieresmtpd process keeps
running.
My only way to keep the low number of courieresmtpd
process is to block the originating IP at firewall or at a smtpaccess DENY at
couriertcpd. If the send is blocked at some other point (BLACKLIST, BOTH, SPF,
smtpaccess BLOCK, RELAY DENY or USER UNKNOWN) the submit process had already
started and the courieresmtpd process stay locked for hours. I am loosing this
game. I already blocked more 500 /24 nets but the number of locked connections
keeps raising until I restart esmtpd.
Zombies is what I suspect, but seems smtp incomming
connections are getting stucked for other reasons too.
Other strange issue that start happening is lots of
"writev: Broken pipe" errors. May be its related to the Zombies, but I have seen
this error happening on normal smtp connections too.
Is this happening to any other?
Anyone have an idea of how to get this problem
fixed. Seems I need some configuration on courieresmtpd process to a faster
timeout, or a limit of errors each smtp connection can have, or a time limit
where a smtp conection shouls start the DATA transfer.
Below is a lot of debugging information I got.
Sorry for the long post.
Marcus Pereira
***************************
Showing lots of submit process in defunct
state:
# ps f -o pid,ppid,etime,cmd -C
courieresmtpd,submit | grep -B1 defunct
18207 12590 00:09 /usr/sbin/courieresmtpd 18587 18207 00:01 \_ [submit] <defunct> -- 8586 12590 06:43 /usr/sbin/courieresmtpd 17753 8586 00:23 \_ [submit] <defunct> -- 32301 12590 30:38 /usr/sbin/courieresmtpd 32324 32301 30:38 \_ [submit] <defunct> -- 17572 12590 39:25 /usr/sbin/courieresmtpd 17700 17572 39:21 \_ [submit] <defunct> 17383 12590 39:32 /usr/sbin/courieresmtpd 17397 17383 39:31 \_ [submit] <defunct> 15410 12590 40:29 /usr/sbin/courieresmtpd 15497 15410 40:26 \_ [submit] <defunct> 13700 12590 41:26 /usr/sbin/courieresmtpd 13746 13700 41:24 \_ [submit] <defunct> 13438 12590 41:33 /usr/sbin/courieresmtpd 13449 13438 41:33 \_ [submit] <defunct> 10004 12590 43:27 /usr/sbin/courieresmtpd 10013 10004 43:27 \_ [submit] <defunct> 8705 12590 44:17 /usr/sbin/courieresmtpd 8887 8705 44:11 \_ [submit] <defunct> 7493 12590 45:10 /usr/sbin/courieresmtpd 7716 7493 45:00 \_ [submit] <defunct> 7172 12590 45:27 /usr/sbin/courieresmtpd 7195 7172 45:26 \_ [submit] <defunct> 6222 12590 46:02 /usr/sbin/courieresmtpd 6304 6222 45:58 \_ [submit] <defunct> 5028 12590 46:51 /usr/sbin/courieresmtpd 5032 5028 46:51 \_ [submit] <defunct> 5025 12590 46:51 /usr/sbin/courieresmtpd 5031 5025 46:51 \_ [submit] <defunct> 4328 12590 47:15 /usr/sbin/courieresmtpd 4355 4328 47:14 \_ [submit] <defunct> 2213 12590 48:42 /usr/sbin/courieresmtpd 2269 2213 48:40 \_ [submit] <defunct> 2045 12590 48:46 /usr/sbin/courieresmtpd 2178 2045 48:44 \_ [submit] <defunct> 1804 12590 48:55 /usr/sbin/courieresmtpd 1856 1804 48:53 \_ [submit] <defunct> 1726 12590 48:59 /usr/sbin/courieresmtpd 1740 1726 48:58 \_ [submit] <defunct> 980 12590 49:22 /usr/sbin/courieresmtpd 2634 980 48:25 \_ [submit] <defunct> 581 12590 49:34 /usr/sbin/courieresmtpd 593 581 49:33 \_ [submit] <defunct> 31665 12590 50:29 /usr/sbin/courieresmtpd 31725 31665 50:28 \_ [submit] <defunct> 29477 12590 51:47 /usr/sbin/courieresmtpd 16372 29477 01:04 \_ [submit] <defunct> 28991 12590 52:06 /usr/sbin/courieresmtpd 29053 28991 52:05 \_ [submit] <defunct> 28727 12590 52:18 /usr/sbin/courieresmtpd 28740 28727 52:17 \_ [submit] <defunct> 28483 12590 52:29 /usr/sbin/courieresmtpd 28518 28483 52:27 \_ [submit] <defunct> 28121 12590 52:44 /usr/sbin/courieresmtpd 31412 28121 50:41 \_ [submit] <defunct> 27343 12590 53:11 /usr/sbin/courieresmtpd 27397 27343 53:09 \_ [submit] <defunct> 27338 12590 53:11 /usr/sbin/courieresmtpd 27347 27338 53:11 \_ [submit] <defunct> 25839 12590 54:14 /usr/sbin/courieresmtpd 25880 25839 54:13 \_ [submit] <defunct> 24281 12590 55:13 /usr/sbin/courieresmtpd 24392 24281 55:11 \_ [submit] <defunct> 23157 12590 55:54 /usr/sbin/courieresmtpd 23210 23157 55:52 \_ [submit] <defunct> 21497 12590 56:58 /usr/sbin/courieresmtpd 21502 21497 56:57 \_ [submit] <defunct> 20528 12590 57:40 /usr/sbin/courieresmtpd 20564 20528 57:38 \_ [submit] <defunct> 20255 12590 57:50 /usr/sbin/courieresmtpd 20256 20255 57:50 \_ [submit] <defunct> 19662 12590 58:16 /usr/sbin/courieresmtpd 19685 19662 58:15 \_ [submit] <defunct> 18499 12590 58:56 /usr/sbin/courieresmtpd 18562 18499 58:55 \_ [submit] <defunct> 16709 12590 01:00:03 /usr/sbin/courieresmtpd 16770 16709 01:00:01 \_ [submit] <defunct> 16433 12590 01:00:16 /usr/sbin/courieresmtpd 16447 16433 01:00:16 \_ [submit] <defunct> 15893 12590 01:00:45 /usr/sbin/courieresmtpd 15927 15893 01:00:44 \_ [submit] <defunct> 15785 12590 01:00:49 /usr/sbin/courieresmtpd 15845 15785 01:00:47 \_ [submit] <defunct> 15079 12590 01:01:11 /usr/sbin/courieresmtpd 15150 15079 01:01:09 \_ [submit] <defunct> 14685 12590 01:01:28 /usr/sbin/courieresmtpd 14720 14685 01:01:27 \_ [submit] <defunct> 13389 12590 01:02:21 /usr/sbin/courieresmtpd 13426 13389 01:02:20 \_ [submit] <defunct> 12599 12590 01:03:00 /usr/sbin/courieresmtpd 9451 12599 43:50 \_ [submit] <defunct> ############################################################### Getting, for example, strace for courieresmtpd
process 20528
# tail -n 27 strace.20528
10:52:21 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 10:52:21 nanosleep({128, 0}, {128, 0}) = 0 10:54:29 sendto(3, "<19>Apr 17 10:54:29 courieresmtpd: error,relay=::f"..., 189, MSG_NOSIGNAL, NULL, 0) = 189 10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe) 10:54:29 --- SIGPIPE (Broken pipe) @ 0 (0) --- 10:54:29 write(5, "pessoal@...", 36) = 36 10:54:29 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096) = 54 10:54:29 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 10:54:29 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 10:54:29 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 10:54:29 nanosleep({128, 0}, {128, 0}) = 0 10:56:37 sendto(3, "<19>Apr 17 10:56:37 courieresmtpd: error,relay=::f"..., 189, MSG_NOSIGNAL, NULL, 0) = 189 10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe) 10:56:37 --- SIGPIPE (Broken pipe) @ 0 (0) --- 10:56:37 write(5, "elaine@...", 35) = 35 10:56:37 read(6, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 4096) = 54 10:56:37 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 10:56:37 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 10:56:37 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 10:56:37 nanosleep({128, 0}, {128, 0}) = 0 10:58:45 sendto(3, "<19>Apr 17 10:58:45 courieresmtpd: error,relay=::f"..., 188, MSG_NOSIGNAL, NULL, 0) = 188 10:58:45 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 55) = -1 EPIPE (Broken pipe) 10:58:45 --- SIGPIPE (Broken pipe) @ 0 (0) --- 10:58:45 sendto(3, "<22>Apr 17 10:58:45 courieresmtpd: error,relay=::f"..., 107, MSG_NOSIGNAL, NULL, 0) = 107 10:58:45 select(2, NULL, [1], NULL, {4800, 0}) = ? ERESTARTNOHAND (To be restarted) 11:18:10 --- SIGCHLD (Child exited) @ 0 (0) --- 11:18:10 select(2, NULL, [1], NULL, {3634, 984000} #########################
And for the child submit process pid
20564
# tail -n 27 strace.20564
10:54:29 lseek(6, 0, SEEK_SET) = 0 10:54:29 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874 10:54:29 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 10:54:29 close(6) = 0 10:54:29 munmap(0x7f368cbd3000, 4096) = 0 10:54:29 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54) = 54 10:54:29 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=0, revents=POLLIN}]) 10:56:37 read(0, "elaine@...", 8192) = 35 10:56:37 statfs(".", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=34151844, f_bfree=32936550, f_bavail=31201734, f_files=8675328, f_ffree=8559136, f_fsid={162176456, -721074552}, f_namelen=255, f_frsize=4096}) = 0 10:56:37 stat(".", {st_mode=S_IFDIR|0770, st_size=4096, ...}) = 0 10:56:37 open("/proc/mounts", O_RDONLY) = 6 10:56:37 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 10:56:37 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f368cbd3000 10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874 10:56:37 read(6, "", 1024) = 0 10:56:37 lseek(6, 0, SEEK_SET) = 0 10:56:37 read(6, "rootfs / rootfs rw 0 0\nnone /sys sysfs rw,nosuid,n"..., 1024) = 874 10:56:37 stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 10:56:37 close(6) = 0 10:56:37 munmap(0x7f368cbd3000, 4096) = 0 10:56:37 write(1, "511 http://www.spamhaus.org/query/bl?ip=190.42.107"..., 54) = 54 10:56:37 poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = ? ERESTART_RESTARTBLOCK (To be restarted) 11:18:10 --- SIGALRM (Alarm clock) @ 0 (0) --- 11:18:10 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7f368b8ac190}, {0x411620, [INT], SA_RESTORER|SA_RESTART, 0x7f368b8ac190}, 8) = 0 11:18:10 getpid() = 20564 11:18:10 kill(20564, SIGKILL) = 0 11:18:10 +++ killed by SIGKILL +++ ###############################
mail.log for this IP:
# grep 190.42.107 /var/log/mail.log
Apr 17 10:48:08 cuma courieresmtpd: started,ip=[::ffff:190.42.107.41] Apr 17 10:48:20 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:48:36 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>,to=<caroline@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:49:08 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:50:13 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:52:21 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:54:29 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:56:37 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:58:45 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,from=<krsalleyly@...>: 511 http://www.spamhaus.org/query/bl?ip=190.42.107.41 Apr 17 10:58:45 cuma courieresmtpd: error,relay=::ffff:190.42.107.41,msg="502 ESMTP command error",cmd: DATA ##################################
Broken pipe logs. I am getting thousands of this
message per hour:
.
.
.
Apr 17 11:04:05 cuma courieresmtpd:
error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT
Apr 17 11:04:18 cuma courieresmtpd: error,relay=::ffff:201.20.24.36,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:05:13 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET Apr 17 11:05:13 cuma courieresmtpd: error,relay=::ffff:81.184.51.141,msg="writev: Broken pipe",cmd: DATA Apr 17 11:05:32 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET Apr 17 11:06:10 cuma courieresmtpd: error,relay=::ffff:201.17.146.71,msg="writev: Broken pipe",cmd: RSET Apr 17 11:06:49 cuma courieresmtpd: error,relay=::ffff:208.43.194.2,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:07:07 cuma courieresmtpd: error,relay=::ffff:64.118.95.132,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:07:49 cuma courieresmtpd: error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:11:04 cuma courieresmtpd: error,relay=::ffff:200.162.176.8,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:11:30 cuma courieresmtpd: error,relay=::ffff:174.133.25.2,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:11:35 cuma courieresmtpd: error,relay=::ffff:194.0.252.241,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:12:28 cuma courieresmtpd: error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:12:35 cuma courieresmtpd: error,relay=::ffff:72.232.211.2,msg="writev: Broken pipe",cmd: QUIT Apr 17 11:13:09 cuma courieresmtpd: error,relay=::ffff:81.215.80.190,msg="writev: Broken pipe",cmd: QUIT . .
.
------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: Zombies blocking my smtpMarcus Pereira writes:
> and the courieresmtpd process stay locked for hours. I am loosing this > game. I already blocked more 500 /24 nets but the number of locked > connections keeps raising until I restart esmtpd. I have several thousand spam sources blacklisted. I am just a couple of vanity domains. You are a large provider. The recommended way to address this situation is to set up a script that monitors syslogs, and flags IP addresses that log more than X errors in Y minutes, then firewall them. With Linux, you can insert an iptables rule that rejects *out*bound packets to the blacklisted IPs in a manner that the sender, in this case courieresmtpd, thinks that the connection is broken and exits out, releasing the socket. Meanwhile the sender still remains tarpitted, thinking that the socket is still alive. That, and a lengthy timeout before the iptables entry gets removed, so that the iptables does not grow without bounds, should do the trick. > Other strange issue that start happening is lots of "writev: Broken pipe" > errors. May be its related to the Zombies, but I have seen this error > happening on normal smtp connections too. This is harmless. It means that the sender dropped the connection while Courier still had something to send it. Poorly written SMTP senders are sending you a QUIT command, but don't bother wait for Courier to acknowledge the QUIT and orderly shut down the socket. ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: Zombies blocking my smtpSam -- I've been seeing similar problems in the past few weeks on our servers. Would it be possible to have a config option for max idletime of an smtp connection? I.e., if no data comes in or goes out for a period of, say, 120 seconds, drop the connection and have the submit process exit? (Equivalent to Apache's "Timeout" setting.) I understand that there's no graceful way to tell the sender that the connection is being quit, but at this point, it's a denial-of-service problem. (And I keep increasing the MAXDAEMONS setting, but it hasn't helped. 200 to 300 to 400, the connections just keep stacking up.) -Jeff ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
|
|
Re: Zombies blocking my smtpSam Varshavchik wrote:
> Marcus Pereira writes: > >> and the courieresmtpd process stay locked for hours. I am loosing this >> game. I already blocked more 500 /24 nets but the number of locked >> connections keeps raising until I restart esmtpd. > > I have several thousand spam sources blacklisted. I am just a couple of > vanity domains. You are a large provider. > > The recommended way to address this situation is to set up a script that > monitors syslogs, and flags IP addresses that log more than X errors in > Y minutes, then firewall them. Do you mean _any_ error? I'm only catching imap/pop3 errors: since I see the SMTP daemon does its own tarpitting, I thought it's better not to interfere... Actually, I've tried and block spammers' sync requests after they send to a spamtrap, but their footprint is so light that I don't gain much that way. However, since you recommend firewalling, wouldn't it be advisable to configure some kind of callback for flagging IP addresses? It seems that monitoring syslogs is where one spends most of the cpu time. > With Linux, you can insert an iptables > rule that rejects *out*bound packets to the blacklisted IPs in a manner > that the sender, in this case courieresmtpd, thinks that the connection > is broken and exits out, releasing the socket. Meanwhile the sender > still remains tarpitted, thinking that the socket is still alive. Hm... blocking outbound so as to induce tarpitting apparently only affects what the server inflicts in retribution. If I understand well, blocking just (future) sync requests from the same IPs would result in similar figures for the attempted connections to port 25. Correct? > That, and a lengthy timeout before the iptables entry gets removed, so > that the iptables does not grow without bounds, should do the trick. I guess you mean more than a few minutes timeout... Hours? Days? >> Other strange issue that start happening is lots of "writev: Broken >> pipe" errors. May be its related to the Zombies, but I have seen this >> error happening on normal smtp connections too. > > This is harmless. It means that the sender dropped the connection while > Courier still had something to send it. Poorly written SMTP senders are > sending you a QUIT command, but don't bother wait for Courier to > acknowledge the QUIT and orderly shut down the socket. Does that mean one shouldn't block based on this error? (Well, blocking outbound packets should be a no-op at this point, but what about blocking future sync requests?) Finally, it looks like you use netfilter. May I ask what userspace program do you run to issue verdicts? I had been looking for such utility, but eventually had to roll my own one... ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ courier-users mailing list courier-users@... Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users |
| < Prev | 1 - 2 - 3 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |