|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
|
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonAt 22:26 -0400 04/09/2009, Barry Warsaw wrote:
>There are really two ways to look at an email message. It's either an >unstructured blob of bytes, or it's a structured tree of objects. >Those objects have headers and payload. The payload can be of any >type, though I think it generally breaks down into "strings" for text/ >* types and bytes for anything else (not counting multiparts). > >The email package isn't a perfect mapping to this, which is something >I want to improve. That aside, I think storing a message in a >database means storing some or all of the headers separately from the >byte stream (or text?) of its payload. That's for non-multipart >types. It would be more complicated to represent a message tree of >course. Storing an email message in a database does mean storing some of the header fields as database fields, but the set of email header fields is open, so any "unused" fields in a message must be stored elsewhere. It isn't useful to just have a bag of name/value pairs in a table. General message MIME payload trees don't map well to a database either, unless one wants to get very relational. Sometimes the database needs to represent the entire email message, header fields and MIME tree, but only if it is an email program and usually not even then. Usually, the database has a specific purpose, and can be designed for the data it cares about; it may choose to keep the original message as bytes. >It does seem to make sense to think about headers as text header names >and text header values. Of course, header values can contain almost >anything and there's an encoding to bring it back to 7-bit ASCII, but >again, you really have two views of a header value. Which you want >really depends on your application. I think of header fields as having text-like names (the set of allowed characters is more than just text, though defined headers don't make use of that), but the data is either bytes or it should be parsed into something appropriate: text for unstructured fields like Subject:, a list of addresses for address fields like To:. Many of the structured header fields have a reasonable mapping to text; certainly this is true for adress header fields. Content-Type header fields are barely text, they can be so convolutedly structured, but I suppose one could flatten one of them to text instead of bytes if the user wanted. It's not very useful, though, except for debugging (either by the programmer or the recipient who wants to know what was cleaned from the message). >Maybe you just care about the text of both the header name and value. >In that case, I think you want the values as unicodes, and probably >the headers as unicodes containing only ASCII. So your table would be >strings in both cases. OTOH, maybe your application cares about the >raw underlying encoded data, in which case the header names are >probably still strings of ASCII-ish unicodes and the values are >bytes. It's this distinction (and I think the competing use cases) >that make a true Python 3.x API for email more complicated. If a database stores the Subject: header field, it would be as text. The various recipient address fields are a one message to many names and addresses mapping, and need a related table of name/address fields, with each field being text. The original message (or whatever part of it one preserves) should be bytes. I don't think this complicates the email package API; rather, it just shows where generality is needed. >Thinking about this stuff makes me nostalgic for the sloppy happy days >of Python 2.x You now have the opportunity to finally unsnarl that mess. It is not an insurmountable opportunity. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonBarry Warsaw writes:
> There are really two ways to look at an email message. It's either an > unstructured blob of bytes, or it's a structured tree of objects. Indeed! > Those objects have headers and payload. The payload can be of any > type, though I think it generally breaks down into "strings" for text/ > * types and bytes for anything else (not counting multiparts). *sigh* Why are you back-tracking? The payload should be of an appropriate *object* type. Atomic object types will have their content stored as string or bytes [nb I use Python 3 terminology throughout]. Composite types (multipart/*) won't need string or bytes attributes AFAICS. Start by implementing the application/octet-stream and text/plain;charset=utf-8 object types, of course. > It does seem to make sense to think about headers as text header names > and text header values. I disagree. IMHO, structured header types should have object values, and something like message['to'] = "Barry 'da FLUFL' Warsaw <barry@...>" should be smart enough to detect that it's a string and attempt to (flexibly) parse it into a fullname and a mailbox adding escapes, etc. Whether these should be structured objects or they can be strings or bytes, I'm not sure (probably bytes, not strings, though -- see next exampl). OTOH message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry@...>''' should assume that the client knows what they are doing, and should parse it strictly (and I mean "be a real bastard", eg, raise an exception on any non-ASCII octet), merely dividing it into fullname and mailbox, and caching the bytes for later insertion in a wire-format message. > In that case, I think you want the values as unicodes, and probably > the headers as unicodes containing only ASCII. So your table would be > strings in both cases. OTOH, maybe your application cares about the > raw underlying encoded data, in which case the header names are > probably still strings of ASCII-ish unicodes and the values are > bytes. It's this distinction (and I think the competing use cases) > that make a true Python 3.x API for email more complicated. I don't see why you can't have the email API be specific, with message['to'] always returning a structured_header object (or maybe even more specifically an address_header object), and methods like message['to'].build_header_as_text() which returns """To: "Barry 'da.FLUFL' Warsaw" <barry@...>""" and message['to'].build_header_in_wire_format() which returns b"""To: "Barry 'da.FLUFL' Warsaw" <barry@...>""" Then have email.textview.Message and email.wireview.Message which provide a simple interface where message['to'] would invoke .build_header_as_text() and .build_header_in_wire_format() respectively. > Thinking about this stuff makes me nostalgic for the sloppy happy days > of Python 2.x Er, yeah. Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs, _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)Barry Warsaw <barry@...> wrote:
> In that case, we really need the > bytes-in-bytes-out-bytes-in-the-chewy- > center API first, and build things on top of that. Yep. Bill _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 9, 2009, at 11:59 PM, Tony Nelson wrote:
>> Thinking about this stuff makes me nostalgic for the sloppy happy >> days >> of Python 2.x > > You now have the opportunity to finally unsnarl that mess. It is > not an > insurmountable opportunity. No, it's just a full time job <wink>. Now where did I put that hack- drink-coffee-twitter clone? -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 10, 2009, at 1:22 AM, Stephen J. Turnbull wrote:
>> Those objects have headers and payload. The payload can be of any >> type, though I think it generally breaks down into "strings" for >> text/ >> * types and bytes for anything else (not counting multiparts). > > *sigh* Why are you back-tracking? I'm not. Sleep deprivation on makes it seem like that. > The payload should be of an appropriate *object* type. Atomic object > types will have their content stored as string or bytes [nb I use > Python 3 terminology throughout]. Composite types (multipart/*) won't > need string or bytes attributes AFAICS. Yes, agreed. > Start by implementing the application/octet-stream and > text/plain;charset=utf-8 object types, of course. Yes. See my lament about using inheritance for this. >> It does seem to make sense to think about headers as text header >> names >> and text header values. > > I disagree. IMHO, structured header types should have object values, > and something like While I agree, there's still a need for a higher level API that make it easy to do the simple things. > message['to'] = "Barry 'da FLUFL' Warsaw <barry@...>" > > should be smart enough to detect that it's a string and attempt to > (flexibly) parse it into a fullname and a mailbox adding escapes, etc. > Whether these should be structured objects or they can be strings or > bytes, I'm not sure (probably bytes, not strings, though -- see next > exampl). OTOH > > message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry@...>''' > > should assume that the client knows what they are doing, and should > parse it strictly (and I mean "be a real bastard", eg, raise an > exception on any non-ASCII octet), merely dividing it into fullname > and mailbox, and caching the bytes for later insertion in a > wire-format message. be lenient; see the .defects attribute introduced in the current email package. Oh, and this reminds me that we still haven't talked about idempotency. That's an important principle in the current email package, but do we need to give up on that? >> In that case, I think you want the values as unicodes, and probably >> the headers as unicodes containing only ASCII. So your table would >> be >> strings in both cases. OTOH, maybe your application cares about the >> raw underlying encoded data, in which case the header names are >> probably still strings of ASCII-ish unicodes and the values are >> bytes. It's this distinction (and I think the competing use cases) >> that make a true Python 3.x API for email more complicated. > > I don't see why you can't have the email API be specific, with > message['to'] always returning a structured_header object (or maybe > even more specifically an address_header object), and methods like > > message['to'].build_header_as_text() > > which returns > > """To: "Barry 'da.FLUFL' Warsaw" <barry@...>""" > > and > > message['to'].build_header_in_wire_format() > > which returns > > b"""To: "Barry 'da.FLUFL' Warsaw" <barry@...>""" > > Then have email.textview.Message and email.wireview.Message which > provide a simple interface where message['to'] would invoke > .build_header_as_text() and .build_header_in_wire_format() > respectively. >> Thinking about this stuff makes me nostalgic for the sloppy happy >> days >> of Python 2.x > > Er, yeah. > > Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly > y'rs, Can I have my uucp address back now? -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonShouldn't this thread move lock stock and .signature to email-sig?
Barry Warsaw writes: > >> It does seem to make sense to think about headers as text header > >> names and text header values. > > > > I disagree. IMHO, structured header types should have object values, > > and something like > > While I agree, there's still a need for a higher level API that make > it easy to do the simple things. Sure. I'm suggesting that the way to determine whether something is simple or not is by whether it falls out naturally from correct structure. Ie, no operations that only a Cirque du Soleil juggler can perform are allowed. > I agree that the Message class needs to be strict. A parser needs to > be lenient; Not always. The Postel Principle only applies to stuph coming in off the wire. But we're *also* going to be parsing pseudo-email components that are being handed to us by applications (eg, the perennial control-character-in-the-unremovable-address Mailman bug). Our parser should Just Say No to that crap. > see the .defects attribute introduced in the current email > package. Oh, and this reminds me that we still haven't talked about > idempotency. That's an important principle in the current email > package, but do we need to give up on that? "Idempotency"? I'm not sure what that means in the context of the email package ... multiplication by zero?<wink> Do you mean that .parse().to_wire() should be idempotent? Yes, I think that's a good idea, and it shouldn't be too hard to implement by (optionally?) caching the whole original message or individual components (headers with all whitespace including folding cached verbatim, etc). I think caching has to be done, since stuff like "did the original fold with a leading tab or a leading space, and at what column" and so on seems kind of pointless to encode as attributes on Header objects. [Description of MessageTextView and MessageWireView elided.] > This seems similar to Glyph's basic idea, but with a different spelling. Yes. I don't much care which way it's done, and Glyph's style of spelling is more explicit. But I was thinking in terms of the number of people who are surely going to sing "Mama don' 'low no Unicodes roun' here" and squeal "codec WTF?! outta mah face, man!" _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)Bill Janssen writes:
> Barry Warsaw <barry@...> wrote: > > > In that case, we really need the > > bytes-in-bytes-out-bytes-in-the-chewy- > > center API first, and build things on top of that. > > Yep. Uh, I hate to rain on a parade, but isn't that how we arrived at the *current* email package? _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote:
> Bill Janssen writes: >> Barry Warsaw <barry@...> wrote: >> >>> In that case, we really need the >>> bytes-in-bytes-out-bytes-in-the-chewy- >>> center API first, and build things on top of that. >> >> Yep. > > Uh, I hate to rain on a parade, but isn't that how we arrived at the > *current* email package? about the distinction. I'm going to remove python-dev from subsequent follow ups. Please join us at email-sig for further discussion. Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Dropping bytes "support" in jsonStephen J. Turnbull wrote:
>Shouldn't this thread move lock stock and .signature to email-sig? I'm doing my part :) >"Idempotency"? I'm not sure what that means in the context of the >email package ... multiplication by zero?<wink> Do you mean that >.parse().to_wire() should be idempotent? Yes, I think that's a good >idea, and it shouldn't be too hard to implement by (optionally?) >caching the whole original message or individual components (headers >with all whitespace including folding cached verbatim, etc). I think >caching has to be done, since stuff like "did the original fold with a >leading tab or a leading space, and at what column" and so on seems >kind of pointless to encode as attributes on Header objects. My response here is probably OT, but RFC 822 is the only RFC that talks about folding by *inserting* whitespace. both RFC 2822 and RFC 5322 say folding is done by inserting <CRLF> ahead of *existing* whitespace and unfolding is done by removing the <CRLF> (only). Thus, the question of whether folding was with <tab> or <space> should not arise. Of course, in terms of trying to reconstruct the original on_the_wire message exactly, the question of where the folding occurred is still relevant. but if we're doing the right thing, the question of what character should follow the <CRLF> is not. -- Mark Sapiro <mark@...> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Dropping bytes "support" in jsonOn Apr 10, 2009, at 3:34 PM, Mark Sapiro wrote:
> My response here is probably OT, but RFC 822 is the only RFC that > talks > about folding by *inserting* whitespace. both RFC 2822 and RFC 5322 > say folding is done by inserting <CRLF> ahead of *existing* whitespace > and unfolding is done by removing the <CRLF> (only). Thus, the > question of whether folding was with <tab> or <space> should not > arise. > > Of course, in terms of trying to reconstruct the original on_the_wire > message exactly, the question of where the folding occurred is still > relevant. but if we're doing the right thing, the question of what > character should follow the <CRLF> is not. I /think/ the email package in Python 3.0 DTRT here, or well, at least does better than the one in 2.6. Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 10, 2009, at 3:04 PM, Stephen J. Turnbull wrote:
> Shouldn't this thread move lock stock and .signature to email-sig? Yep. I'll try to be more conscientious about removing python-dev from the CC. > "Idempotency"? I'm not sure what that means in the context of the > email package ... multiplication by zero?<wink> Do you mean that > .parse().to_wire() should be idempotent? Yes, I think that's a good > idea, and it shouldn't be too hard to implement by (optionally?) > caching the whole original message or individual components (headers > with all whitespace including folding cached verbatim, etc). I think > caching has to be done, since stuff like "did the original fold with a > leading tab or a leading space, and at what column" and so on seems > kind of pointless to encode as attributes on Header objects. I tend to agree. I'm also happy of there's a way to tell say the parser that an application doesn't care about that. All that extra caching will have a memory overhead that you should only pay for if you care. -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonAt 10:18 -0400 04/13/2009, Barry Warsaw wrote:
>On Apr 10, 2009, at 3:04 PM, Stephen J. Turnbull wrote: ... >> "Idempotency"? I'm not sure what that means in the context of the >> email package ... multiplication by zero?<wink> Do you mean that >> .parse().to_wire() should be idempotent? Yes, I think that's a good >> idea, and it shouldn't be too hard to implement by (optionally?) >> caching the whole original message or individual components (headers >> with all whitespace including folding cached verbatim, etc). I think >> caching has to be done, since stuff like "did the original fold with a >> leading tab or a leading space, and at what column" and so on seems >> kind of pointless to encode as attributes on Header objects. > >I tend to agree. I'm also happy of there's a way to tell say the >parser that an application doesn't care about that. All that extra >caching will have a memory overhead that you should only pay for if >you care. I'd expect the caching to have very low overhead. Message bodies will not be cached (an extra time), only some headers (when the Header isn't idempotent already) and the preamble and epiloge around message bodies. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
| Free embeddable forum powered by Nabble | Forum Help |