"connection died" errors and Postini - patch

View: New views
2 Messages — Rating Filter:   Alert me  

"connection died" errors and Postini - patch

by Tregaron Bayly :: Rate this Message:

| View Threaded | Show Only this Message

Not long ago we noticed on our outbound mail servers that a surprising
number of qmail-remote processes were living a much longer than expected
life before throwing a "connected to <foo> but connection died" error
and deferring the message.  We were able to tie these down to processes
communicating with Google Postini and catch an strace that showed a
surprising bug with Google's mail server.  After receiving the DATA
Google intends to reject the message with a 571 code, but sends this:

read(3, "571 Message Refused\r", 128)

Unfortunately the message really needs to have both a CR and LF at the
end (\r\n) according to RFC 821 (section 4.1.2):  "The argument field
consists of a variable length character string ending with the character
sequence <CRLF>. The receiver is to take no action until this sequence
is received."

I brought this to Google's attention and they said "we don't have such
issue with any of our customers" and "sending server will know what to
do from there", which seems bogus - qmail, sendmail and postfix all
expect RFC-compliant responses and choke on this.  On one hand I feel
that Google must fix this rather than the MTA (why should everyone else
work around their broken software?).  On the other hand, putting this
patch on our qmail servers dropped the size of our outgoing mail queue
by more than 1/3.

Here's the patch:

--- qmail-1.03/qmail-remote.c   1998-06-15 04:53:16.000000000 -0600
+++ qmail-1.03-571/qmail-remote.c       2011-04-25 11:39:07.513583926
-0600
@@ -158,7 +158,32 @@
     get(&ch);
     get(&ch);
   }
-  while (ch != '\n') get(&ch);
+  while (ch != '\n')
+  {
+    // Postini can return a 571 SMTP code terminated with a '\r' but no
+    // '\n'.  The result is that we are trapped in this while loop
until
+    // saferead() times out and the message is drop()ed. This means a
+    // message that should be a permanent failure is instead requeued
+    // continually until it ages out.  Try to catch and prevent this:
+    if (ch == '\r' && code == 571)
+    {
+
+     // Cap smtptext with a '\n' so future uses of the string look
right
+      unsigned char LF;
+      LF = '\n';
+      if (!stralloc_append(&smtptext,&LF)) temp_nomem();
+
+     // Returning here potentially leaves a '\n' in the fd (if someone
+     // later correctly implements a 571 error with CRLF), but since
+     // this means we will be quitting before we read() again we'll
+     // not worry about it.
+     return code;
+    }
+    else
+    {
+      get(&ch);
+    }
+  }
 
   return code;
 }



Re: "connection died" errors and Postini - patch

by Andy Bradford-62 :: Rate this Message:

| View Threaded | Show Only this Message

Thus said Tregaron Bayly on Tue, 19 Jul 2011 11:13:09 MDT:

> I brought this to Google's attention and they said "we don't have such
> issue with any of our customers" and "sending server will know what to
> do from  there", which seems bogus  - qmail, sendmail and  postfix all
> expect RFC-compliant responses and choke on this.

Why do  companies think this  an acceptable response to  broken software
and poor software  design? This is probably an attempt  on their part at
some kind  of elaborate anti-spam technique.  I can see no  other reason
why they would reject an email with a 5xx permanent failure but not give
the complete \r\n to terminate the response except this. Yes, I'm giving
google developers the benefit of the  doubt... I hear that they hire the
best.

Andy