Plans for email 6.0

View: New views
7 Messages — Rating Filter:   Alert me  

Plans for email 6.0

by Barry Warsaw :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everyone.

Today's the last day of Pycon 2009 sprints and I'm eager to return  
home and see my family.  Chris Withers and I had a good day sprinting  
on the email package before he had to jet out, and although we only  
closed one bug in Python 2.7 (this is where Chris's mantra "backport,  
backport" begins :) we had a lot of good discussions about how and  
where to fix outstanding problems in email.

I have lots of ideas on how to improve the email package.  I plan on  
creating a bit of space on the Python wiki to consolidate my thoughts  
and to coordinate implementation.  I'm hoping some of you will be  
interested enough to help with design, testing, use cases, and coding.

We have a few older pages in the wiki covering the email package:

http://wiki.python.org/moin/EmailSigSprint
http://wiki.python.org/moin/EmailSprint

Some of this we've accomplished.  Here's a rambling of some of my  
thoughts on things we should do.

* Turn all header values into Header instances.  It's difficult and  
error prone to have to manage both strings and Headers as values, so  
they should always be Header instances.  We should add a registry of  
Header subclasses, based on the lower cased header name, for allowing  
higher level semantic folding of header strings.

* Implement a Message subclass registry for parsing.  This would allow  
the parser to create custom subclasses based on the Content-Type found  
while parsing the message.

* Bytes and string interfaces.  This is the trickiest one.  I think  
that internally, header names and values, and payloads should all be  
represented as bytes.  But APIs should accept bytes and strings,  
converting to bytes on input, and provide APIs to extract information  
as either bytes or strings.  I've thought about a few ways to do this  
cleanly, but haven't found anything I particularly like yet.  Remember  
that in email in Py2 is horribly broken in its discrimination between  
bytes and strings, but Py3 forces us to make a choice (which is a good  
thing).

* Clean up the API.  Where possible, simple attribute access should be  
the norm.  Let's get rid of dumb API decisions (like str(msg)  
including the Unix-From).  Let's fix the whole  
get_payload(decode=True) debacle.  Let's fix stuff like needing to  
specify unicode encodings twice in the same call.  Etc.

* Add an external storage API so that messages with huge binary  
payloads don't need to be fully stored in memory.

* Let's target Python 3.1 (coming very soon) if possible, or Python  
3.2 if not.  We should back port email 6.0 to Python 2.x, though we'll  
have to decide how far back we should go (my suggestion: no earlier  
than Python 2.5).

* Fix the myriad of bugs in the tracker!

That's it for now.  I'll figure out a place in the wiki for this and  
we can start capturing our thoughts there.  One thing I've heard  
pretty consistently is that while the email package has its problems,  
it's one of the best email packages available for any language.  Let's  
make it rock.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdS1cHEjvBPtnXfVAQL7egQAk4LQpdfruSdW3R+Egz7dqAWfbftBnQio
dGdyZT/X8cyjGVO9wwcwo2u2c7+JPElpnvBnYZc9oMSFErfUvgumXZo3mEORaGpm
hj/+s0vG8c79SzA9Jz5wB1sBj50c7xN1L7kDCR3Ncwhz4vJSkO8nLvOqaJiccuF8
7s76zNewnO8=
=Dayc
-----END PGP SIGNATURE-----
_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Tony Nelson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At 07:54 -0500 2009/04/02, Barry Warsaw wrote:
 ...
>...Here's a rambling of some of my thoughts on things we should do.
 ...

>* Bytes and string interfaces.  This is the trickiest one.  I think
>that internally, header names and values, and payloads should all be
>represented as bytes.  But APIs should accept bytes and strings,
>converting to bytes on input, and provide APIs to extract information
>as either bytes or strings.  I've thought about a few ways to do this
>cleanly, but haven't found anything I particularly like yet.  Remember
>that in email in Py2 is horribly broken in its discrimination between
>bytes and strings, but Py3 forces us to make a choice (which is a good
>thing).

AIUI, this or something like it must be done soon, as the email package is
broken on 3.x now.


>* Clean up the API.  Where possible, simple attribute access should be
>the norm.  Let's get rid of dumb API decisions (like str(msg)
>including the Unix-From).  Let's fix the whole
>get_payload(decode=True) debacle.  Let's fix stuff like needing to
>specify unicode encodings twice in the same call.  Etc.

Sounds good.  I'd like __setitem__ (msg[hdr] = foo) to act more like a
mapping, and not just append new header fields, with .replace_header() and
.add_header() folded together as .set_header().


>* Add an external storage API so that messages with huge binary
>payloads don't need to be fully stored in memory.
>
>* Let's target Python 3.1 (coming very soon) if possible, or Python
>3.2 if not.  We should back port email 6.0 to Python 2.x, though we'll
>have to decide how far back we should go (my suggestion: no earlier
>than Python 2.5).

Python 3.1 should have a working email package, and a simple way for users
needing more to get a better replacement (which they'd install as a
site-package).  I think that a sane split between bytes and string (or
string and Unicode on 2.x) is most needed.


>* Fix the myriad of bugs in the tracker!

Sure, I'm game!  We 2.x users would benefit.  Again, a place for users to
get an "official" current package is needed, as 2.7 is a ways off.
--
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson@...>
      '                              <http://www.georgeanelson.com/>
_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Barry Warsaw :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 2, 2009, at 10:16 AM, Tony Nelson wrote:

>> * Bytes and string interfaces.  This is the trickiest one.  I think
>> that internally, header names and values, and payloads should all be
>> represented as bytes.  But APIs should accept bytes and strings,
>> converting to bytes on input, and provide APIs to extract information
>> as either bytes or strings.  I've thought about a few ways to do this
>> cleanly, but haven't found anything I particularly like yet.  
>> Remember
>> that in email in Py2 is horribly broken in its discrimination between
>> bytes and strings, but Py3 forces us to make a choice (which is a  
>> good
>> thing).
>
> AIUI, this or something like it must be done soon, as the email  
> package is
> broken on 3.x now.

Indeed.

>> * Clean up the API.  Where possible, simple attribute access should  
>> be
>> the norm.  Let's get rid of dumb API decisions (like str(msg)
>> including the Unix-From).  Let's fix the whole
>> get_payload(decode=True) debacle.  Let's fix stuff like needing to
>> specify unicode encodings twice in the same call.  Etc.
>
> Sounds good.  I'd like __setitem__ (msg[hdr] = foo) to act more like a
> mapping, and not just append new header fields,  
> with .replace_header() and
> .add_header() folded together as .set_header().

Is there a reason for this?  This is one part of the API that I've  
found where practicality beats purity.

>> * Add an external storage API so that messages with huge binary
>> payloads don't need to be fully stored in memory.
>>
>> * Let's target Python 3.1 (coming very soon) if possible, or Python
>> 3.2 if not.  We should back port email 6.0 to Python 2.x, though  
>> we'll
>> have to decide how far back we should go (my suggestion: no earlier
>> than Python 2.5).
>
> Python 3.1 should have a working email package, and a simple way for  
> users
> needing more to get a better replacement (which they'd install as a
> site-package).  I think that a sane split between bytes and string (or
> string and Unicode on 2.x) is most needed.

Unfortunately, it's a /very/ tricky problem.  This pervades every  
aspect of the package.  I'm slowly byte-ifying the internals as I  
refactor the tests.  That's the first step IMO, but it doesn't make  
for a very convenient API.

>> * Fix the myriad of bugs in the tracker!
>
> Sure, I'm game!  We 2.x users would benefit.  Again, a place for  
> users to
> get an "official" current package is needed, as 2.7 is a ways off.

We will definitely make standalone packages available on the  
Cheeseshop for Python 2.x and 3.x.  The question of what goes into 3.1  
is still up in the air I think.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdjqsHEjvBPtnXfVAQJZSwP/fABeQG7Q1c4LOZhwCZBcb41Gh4ybZVoK
tZFM2Q1UTdq0bvaEG5xKMkGPHd1S/+AovrwtC4qTIL531p/RJZp3KaDvucGLfWJ3
w61Mk75Zj6yTEbg2GtJwKiY1Zj7oYZgod0NEQ6vgaBAchLAWrnwsE52ap3w+9K7M
wzmppfl/r/I=
=sxwD
-----END PGP SIGNATURE-----
_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Tony Nelson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Traffic!

At 13:30 -0400 04/05/2009, Barry Warsaw wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On Apr 2, 2009, at 10:16 AM, Tony Nelson wrote:

>>>* Clean up the API. Where possible, simple attribute access should be
>>>the norm. Let's get rid of dumb API decisions (like str(msg) including
>>>the Unix-From). Let's fix the whole get_payload(decode=True) debacle.
>>>Let's fix stuff like needing to specify unicode encodings twice in the
>>>same call. Etc.
>>
>>Sounds good. I'd like __setitem__ (msg[hdr] = foo) to act more like a
>>mapping, and not just append new header fields, with .replace_header()
>>and .add_header() folded together as .set_header().
>
>Is there a reason for this?  This is one part of the API that I've
>found where practicality beats purity.

What part of saying:

    msg["Subject"] = "new subject line"

and getting a second Subject: header field is practical?  For those times
when you really want more then one instance of a header field:

    msg.append_header("Subject", "new subject line")

In general, users of the email package must currently be familiar with all
the mail RFCs in order to properly use the package to create or manipulate
any but the simplest messages, and having "[]" mean "append" isn't helping.
Your suggestion that header fields should always be represented as Header
objects is urgently needed.  Those Header objects will need to be smart
about the header field they represent, and apply all the various encodings
etc. as necessary.


 ...

>>>* Let's target Python 3.1 (coming very soon) if possible, or Python 3.2
>>>if not. We should back port email 6.0 to Python 2.x, though we'll have
>>>to decide how far back we should go (my suggestion: no earlier than
>>>Python 2.5).
>>
>>Python 3.1 should have a working email package, and a simple way for
>>users needing more to get a better replacement (which they'd install as a
>>site-package). I think that a sane split between bytes and string (or
>>string and Unicode on 2.x) is most needed.
>
>Unfortunately, it's a /very/ tricky problem.

I assume you mean "working email package", not "a simple way for users ...
to get a better replacement".

>This pervades every
>aspect of the package.  I'm slowly byte-ifying the internals as I
>refactor the tests.  That's the first step IMO, but it doesn't make
>for a very convenient API.

So it goes.  It may make more sense as you get farther along.  What parts
of that work can you farm out?  Do you need a RFC-compliant header parser?
I could write one in a few days, I think.


>>> * Fix the myriad of bugs in the tracker!
>>
>>Sure, I'm game! We 2.x users would benefit. Again, a place for users to
>>get an "official" current package is needed, as 2.7 is a ways off.
>
>We will definitely make standalone packages available on the
>Cheeseshop for Python 2.x and 3.x.  The question of what goes into 3.1
>is still up in the air I think.

Well, I think that the bugs I've worked on so far should go into 2.6, 2.7,
and 3.1 (unless 3.1 makes a lot of progress and renders some of the bugs
obsolete).

    [issue5610] email feedparser.py CRLFLF bug: $ vs \Z
    [issue5638] test_httpservers fails CGI tests if --enable-shared
    [issue1555570] email parser incorrectly breaks headers with a CRLF
        at 8192
    [issue3169] email/header.py doesn't handle Base64 headers that have
        been insufficiently padded.
    [issue4487] Add utf8 alias for email charsets
    [issue1079] decode_header does not follow RFC 2047

(There's some argument on the last one, where R. David Murray doesn't want
any header that might not conform to the RFCs to be decoded, and I want any
header that might corform to be decoded -- I cite Postel's law in another
issue, and I think it applies here as well.  A full header parser and
Header implementation would solve the problem properly, but only for Python
3.2 or later.)
--
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson@...>
      '                              <http://www.georgeanelson.com/>
_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Stephen J. Turnbull :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tony Nelson writes:

 > In general, users of the email package must currently be familiar with all
 > the mail RFCs in order to properly use the package to create or manipulate
 > any but the simplest messages,

IMHO, that's a problem with the mail RFCs, not with the email
package.  Internet messaging is inherently complex because of the
backward and Microsoft compatibility requirements.

 > and having "[]" mean "append" isn't helping.

That's probably true, but that's because in Python mapping semantics
are invariably replace rather than append in this circumstance.  It
has nothing to do with the RFCs per se.

 > Your suggestion that header fields should always be represented as
 > Header objects is urgently needed.  Those Header objects will need
 > to be smart about the header field they represent, and apply all
 > the various encodings etc. as necessary.

That's not a good idea.  Header methods should be strict about what
encodings are allowed, but all too often the decisions between
quoted-printable and base64 transfer encodings, and among various
possible text encodings (Japanese alone has 4 majors ones in *daily*
use, with different ones typically used in the header and body! and
Chinese isn't much better) are dependent on content or receiver and/or
sender.

It's reasonable for email to have "recommendations", perhaps
implemented as defaults, for each situation, but programmers should be
reminded that that the text they provide to the Header class etc is
being munged as it gets inserted into the message.  For simple
situations, of course it makes sense to provide a high-level
interface, such as a string:contents dictionary for headers.

headers = { "From" : [("Stephen J. Turnbull", "stephen@...")],
            "To" : [("Email SIG", "email-sig@..."),
                    ("da FLUFL", "barry@...")],
            "Subject" : "Don't DO that!"
            "Summary" : "This could go on forever but doesn't." }

body = """I just wanted you to know that I
don't think it's a good idea.

Just-yer-neighborhood-busybody-ly y'rs
"""

ready_for_sendmail = email.format_simple_message (headers, body)

And that would be encoded in some lowest-common-denominator charset
like ASCII, ISO-8859-15, ISO-8859-1, or UTF-8 with the earliest
feasible one used, and some heuristic like minimum encoded size or
fraction of non-ASCII used to determine content-transfer-encoding.

But it should be implemented by .format_simple_message, not Header,
IMHO.

_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Glenn Linderman-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On approximately 4/6/2009 10:22 PM, came the following characters from
the keyboard of Stephen J. Turnbull:
> IMHO, that's a problem with the mail RFCs, not with the email
> package.  Internet messaging is inherently complex because of the
> backward and Microsoft compatibility requirements.

I agree that Internet messaging, particularly some of the character
encodings, in inherently complex due to backward compatibility requirements.

I'm not surprised that you mention Microsoft issues, as I've found quite
a few cases of messages from Microsoft email clients that do not conform
to the RFCs.  Apple Mail violates a number of them, also, especially
with MIME constructions.  But I've never attempted to track the
Microsoft violations of the RFCs... do you have or know of a list of such?

--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Plans for email 6.0

by Stephen J. Turnbull :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Glenn Linderman writes:

 > I'm not surprised that you mention Microsoft issues, as I've found
 > quite a few cases of messages from Microsoft email clients that do
 > not conform to the RFCs.  Apple Mail violates a number of them,
 > also, especially with MIME constructions.  But I've never attempted
 > to track the Microsoft violations of the RFCs... do you have or
 > know of a list of such?

No, I don't.  For me it's not been worth keeping one, but if email is
going to be the world-beating email library, it might be worth keeping
one.  I mean, just how many people would fall in love with Mailman if
there were a "select your broken MUA here" in the personal user's
page, and selecting actually got you a personalized message that
didn't display Sender in the From field in Outlook Express? :-)

_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com