|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Plans for email 6.0-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Hello everyone. Today's the last day of Pycon 2009 sprints and I'm eager to return home and see my family. Chris Withers and I had a good day sprinting on the email package before he had to jet out, and although we only closed one bug in Python 2.7 (this is where Chris's mantra "backport, backport" begins :) we had a lot of good discussions about how and where to fix outstanding problems in email. I have lots of ideas on how to improve the email package. I plan on creating a bit of space on the Python wiki to consolidate my thoughts and to coordinate implementation. I'm hoping some of you will be interested enough to help with design, testing, use cases, and coding. We have a few older pages in the wiki covering the email package: http://wiki.python.org/moin/EmailSigSprint http://wiki.python.org/moin/EmailSprint Some of this we've accomplished. Here's a rambling of some of my thoughts on things we should do. * Turn all header values into Header instances. It's difficult and error prone to have to manage both strings and Headers as values, so they should always be Header instances. We should add a registry of Header subclasses, based on the lower cased header name, for allowing higher level semantic folding of header strings. * Implement a Message subclass registry for parsing. This would allow the parser to create custom subclasses based on the Content-Type found while parsing the message. * Bytes and string interfaces. This is the trickiest one. I think that internally, header names and values, and payloads should all be represented as bytes. But APIs should accept bytes and strings, converting to bytes on input, and provide APIs to extract information as either bytes or strings. I've thought about a few ways to do this cleanly, but haven't found anything I particularly like yet. Remember that in email in Py2 is horribly broken in its discrimination between bytes and strings, but Py3 forces us to make a choice (which is a good thing). * Clean up the API. Where possible, simple attribute access should be the norm. Let's get rid of dumb API decisions (like str(msg) including the Unix-From). Let's fix the whole get_payload(decode=True) debacle. Let's fix stuff like needing to specify unicode encodings twice in the same call. Etc. * Add an external storage API so that messages with huge binary payloads don't need to be fully stored in memory. * Let's target Python 3.1 (coming very soon) if possible, or Python 3.2 if not. We should back port email 6.0 to Python 2.x, though we'll have to decide how far back we should go (my suggestion: no earlier than Python 2.5). * Fix the myriad of bugs in the tracker! That's it for now. I'll figure out a place in the wiki for this and we can start capturing our thoughts there. One thing I've heard pretty consistently is that while the email package has its problems, it's one of the best email packages available for any language. Let's make it rock. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdS1cHEjvBPtnXfVAQL7egQAk4LQpdfruSdW3R+Egz7dqAWfbftBnQio dGdyZT/X8cyjGVO9wwcwo2u2c7+JPElpnvBnYZc9oMSFErfUvgumXZo3mEORaGpm hj/+s0vG8c79SzA9Jz5wB1sBj50c7xN1L7kDCR3Ncwhz4vJSkO8nLvOqaJiccuF8 7s76zNewnO8= =Dayc -----END PGP SIGNATURE----- _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0At 07:54 -0500 2009/04/02, Barry Warsaw wrote:
... >...Here's a rambling of some of my thoughts on things we should do. ... >* Bytes and string interfaces. This is the trickiest one. I think >that internally, header names and values, and payloads should all be >represented as bytes. But APIs should accept bytes and strings, >converting to bytes on input, and provide APIs to extract information >as either bytes or strings. I've thought about a few ways to do this >cleanly, but haven't found anything I particularly like yet. Remember >that in email in Py2 is horribly broken in its discrimination between >bytes and strings, but Py3 forces us to make a choice (which is a good >thing). AIUI, this or something like it must be done soon, as the email package is broken on 3.x now. >* Clean up the API. Where possible, simple attribute access should be >the norm. Let's get rid of dumb API decisions (like str(msg) >including the Unix-From). Let's fix the whole >get_payload(decode=True) debacle. Let's fix stuff like needing to >specify unicode encodings twice in the same call. Etc. Sounds good. I'd like __setitem__ (msg[hdr] = foo) to act more like a mapping, and not just append new header fields, with .replace_header() and .add_header() folded together as .set_header(). >* Add an external storage API so that messages with huge binary >payloads don't need to be fully stored in memory. > >* Let's target Python 3.1 (coming very soon) if possible, or Python >3.2 if not. We should back port email 6.0 to Python 2.x, though we'll >have to decide how far back we should go (my suggestion: no earlier >than Python 2.5). Python 3.1 should have a working email package, and a simple way for users needing more to get a better replacement (which they'd install as a site-package). I think that a sane split between bytes and string (or string and Unicode on 2.x) is most needed. >* Fix the myriad of bugs in the tracker! Sure, I'm game! We 2.x users would benefit. Again, a place for users to get an "official" current package is needed, as 2.7 is a ways off. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 On Apr 2, 2009, at 10:16 AM, Tony Nelson wrote: >> * Bytes and string interfaces. This is the trickiest one. I think >> that internally, header names and values, and payloads should all be >> represented as bytes. But APIs should accept bytes and strings, >> converting to bytes on input, and provide APIs to extract information >> as either bytes or strings. I've thought about a few ways to do this >> cleanly, but haven't found anything I particularly like yet. >> Remember >> that in email in Py2 is horribly broken in its discrimination between >> bytes and strings, but Py3 forces us to make a choice (which is a >> good >> thing). > > AIUI, this or something like it must be done soon, as the email > package is > broken on 3.x now. Indeed. >> * Clean up the API. Where possible, simple attribute access should >> be >> the norm. Let's get rid of dumb API decisions (like str(msg) >> including the Unix-From). Let's fix the whole >> get_payload(decode=True) debacle. Let's fix stuff like needing to >> specify unicode encodings twice in the same call. Etc. > > Sounds good. I'd like __setitem__ (msg[hdr] = foo) to act more like a > mapping, and not just append new header fields, > with .replace_header() and > .add_header() folded together as .set_header(). Is there a reason for this? This is one part of the API that I've found where practicality beats purity. >> * Add an external storage API so that messages with huge binary >> payloads don't need to be fully stored in memory. >> >> * Let's target Python 3.1 (coming very soon) if possible, or Python >> 3.2 if not. We should back port email 6.0 to Python 2.x, though >> we'll >> have to decide how far back we should go (my suggestion: no earlier >> than Python 2.5). > > Python 3.1 should have a working email package, and a simple way for > users > needing more to get a better replacement (which they'd install as a > site-package). I think that a sane split between bytes and string (or > string and Unicode on 2.x) is most needed. Unfortunately, it's a /very/ tricky problem. This pervades every aspect of the package. I'm slowly byte-ifying the internals as I refactor the tests. That's the first step IMO, but it doesn't make for a very convenient API. >> * Fix the myriad of bugs in the tracker! > > Sure, I'm game! We 2.x users would benefit. Again, a place for > users to > get an "official" current package is needed, as 2.7 is a ways off. We will definitely make standalone packages available on the Cheeseshop for Python 2.x and 3.x. The question of what goes into 3.1 is still up in the air I think. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdjqsHEjvBPtnXfVAQJZSwP/fABeQG7Q1c4LOZhwCZBcb41Gh4ybZVoK tZFM2Q1UTdq0bvaEG5xKMkGPHd1S/+AovrwtC4qTIL531p/RJZp3KaDvucGLfWJ3 w61Mk75Zj6yTEbg2GtJwKiY1Zj7oYZgod0NEQ6vgaBAchLAWrnwsE52ap3w+9K7M wzmppfl/r/I= =sxwD -----END PGP SIGNATURE----- _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0Traffic!
At 13:30 -0400 04/05/2009, Barry Warsaw wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >On Apr 2, 2009, at 10:16 AM, Tony Nelson wrote: >>>* Clean up the API. Where possible, simple attribute access should be >>>the norm. Let's get rid of dumb API decisions (like str(msg) including >>>the Unix-From). Let's fix the whole get_payload(decode=True) debacle. >>>Let's fix stuff like needing to specify unicode encodings twice in the >>>same call. Etc. >> >>Sounds good. I'd like __setitem__ (msg[hdr] = foo) to act more like a >>mapping, and not just append new header fields, with .replace_header() >>and .add_header() folded together as .set_header(). > >Is there a reason for this? This is one part of the API that I've >found where practicality beats purity. What part of saying: msg["Subject"] = "new subject line" and getting a second Subject: header field is practical? For those times when you really want more then one instance of a header field: msg.append_header("Subject", "new subject line") In general, users of the email package must currently be familiar with all the mail RFCs in order to properly use the package to create or manipulate any but the simplest messages, and having "[]" mean "append" isn't helping. Your suggestion that header fields should always be represented as Header objects is urgently needed. Those Header objects will need to be smart about the header field they represent, and apply all the various encodings etc. as necessary. ... >>>* Let's target Python 3.1 (coming very soon) if possible, or Python 3.2 >>>if not. We should back port email 6.0 to Python 2.x, though we'll have >>>to decide how far back we should go (my suggestion: no earlier than >>>Python 2.5). >> >>Python 3.1 should have a working email package, and a simple way for >>users needing more to get a better replacement (which they'd install as a >>site-package). I think that a sane split between bytes and string (or >>string and Unicode on 2.x) is most needed. > >Unfortunately, it's a /very/ tricky problem. I assume you mean "working email package", not "a simple way for users ... to get a better replacement". >This pervades every >aspect of the package. I'm slowly byte-ifying the internals as I >refactor the tests. That's the first step IMO, but it doesn't make >for a very convenient API. So it goes. It may make more sense as you get farther along. What parts of that work can you farm out? Do you need a RFC-compliant header parser? I could write one in a few days, I think. >>> * Fix the myriad of bugs in the tracker! >> >>Sure, I'm game! We 2.x users would benefit. Again, a place for users to >>get an "official" current package is needed, as 2.7 is a ways off. > >We will definitely make standalone packages available on the >Cheeseshop for Python 2.x and 3.x. The question of what goes into 3.1 >is still up in the air I think. Well, I think that the bugs I've worked on so far should go into 2.6, 2.7, and 3.1 (unless 3.1 makes a lot of progress and renders some of the bugs obsolete). [issue5610] email feedparser.py CRLFLF bug: $ vs \Z [issue5638] test_httpservers fails CGI tests if --enable-shared [issue1555570] email parser incorrectly breaks headers with a CRLF at 8192 [issue3169] email/header.py doesn't handle Base64 headers that have been insufficiently padded. [issue4487] Add utf8 alias for email charsets [issue1079] decode_header does not follow RFC 2047 (There's some argument on the last one, where R. David Murray doesn't want any header that might not conform to the RFCs to be decoded, and I want any header that might corform to be decoded -- I cite Postel's law in another issue, and I think it applies here as well. A full header parser and Header implementation would solve the problem properly, but only for Python 3.2 or later.) -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0Tony Nelson writes:
> In general, users of the email package must currently be familiar with all > the mail RFCs in order to properly use the package to create or manipulate > any but the simplest messages, IMHO, that's a problem with the mail RFCs, not with the email package. Internet messaging is inherently complex because of the backward and Microsoft compatibility requirements. > and having "[]" mean "append" isn't helping. That's probably true, but that's because in Python mapping semantics are invariably replace rather than append in this circumstance. It has nothing to do with the RFCs per se. > Your suggestion that header fields should always be represented as > Header objects is urgently needed. Those Header objects will need > to be smart about the header field they represent, and apply all > the various encodings etc. as necessary. That's not a good idea. Header methods should be strict about what encodings are allowed, but all too often the decisions between quoted-printable and base64 transfer encodings, and among various possible text encodings (Japanese alone has 4 majors ones in *daily* use, with different ones typically used in the header and body! and Chinese isn't much better) are dependent on content or receiver and/or sender. It's reasonable for email to have "recommendations", perhaps implemented as defaults, for each situation, but programmers should be reminded that that the text they provide to the Header class etc is being munged as it gets inserted into the message. For simple situations, of course it makes sense to provide a high-level interface, such as a string:contents dictionary for headers. headers = { "From" : [("Stephen J. Turnbull", "stephen@...")], "To" : [("Email SIG", "email-sig@..."), ("da FLUFL", "barry@...")], "Subject" : "Don't DO that!" "Summary" : "This could go on forever but doesn't." } body = """I just wanted you to know that I don't think it's a good idea. Just-yer-neighborhood-busybody-ly y'rs """ ready_for_sendmail = email.format_simple_message (headers, body) And that would be encoded in some lowest-common-denominator charset like ASCII, ISO-8859-15, ISO-8859-1, or UTF-8 with the earliest feasible one used, and some heuristic like minimum encoded size or fraction of non-ASCII used to determine content-transfer-encoding. But it should be implemented by .format_simple_message, not Header, IMHO. _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0On approximately 4/6/2009 10:22 PM, came the following characters from
the keyboard of Stephen J. Turnbull: > IMHO, that's a problem with the mail RFCs, not with the email > package. Internet messaging is inherently complex because of the > backward and Microsoft compatibility requirements. I agree that Internet messaging, particularly some of the character encodings, in inherently complex due to backward compatibility requirements. I'm not surprised that you mention Microsoft issues, as I've found quite a few cases of messages from Microsoft email clients that do not conform to the RFCs. Apple Mail violates a number of them, also, especially with MIME constructions. But I've never attempted to track the Microsoft violations of the RFCs... do you have or know of a list of such? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Plans for email 6.0Glenn Linderman writes:
> I'm not surprised that you mention Microsoft issues, as I've found > quite a few cases of messages from Microsoft email clients that do > not conform to the RFCs. Apple Mail violates a number of them, > also, especially with MIME constructions. But I've never attempted > to track the Microsoft violations of the RFCs... do you have or > know of a list of such? No, I don't. For me it's not been worth keeping one, but if email is going to be the world-beating email library, it might be worth keeping one. I mean, just how many people would fall in love with Mailman if there were a "select your broken MUA here" in the personal user's page, and selecting actually got you a personalized message that didn't display Sender in the From field in Outlook Express? :-) _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
| Free embeddable forum powered by Nabble | Forum Help |