|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
|
|
|
|
|
|
|
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonAt 22:38 -0400 04/09/2009, Barry Warsaw wrote:
... >So, what I'm really asking is this. Let's say you agree that there >are use cases for accessing a header value as either the raw encoded >bytes or the decoded unicode. What should this return: > > >>> message['Subject'] > >The raw bytes or the decoded unicode? That's an easy one: Subject: is an unstructured header, so it must be text, thus Unicode. We're looking at a high-level representation of an email message, with parsed header fields and a MIME message tree. >Okay, so you've picked one. Now how do you spell the other way? message.get_header_bytes('Subject') Oh, I see that's what you picked. >The Message class probably has these explicit methods: > > >>> Message.get_header_bytes('Subject') > >>> Message.get_header_string('Subject') > >(or better names... it's late and I'm tired ;). One of those maps to >message['Subject'] but which is the more obvious choice? Structured header fields are more of a problem. Any header with addresses should return a list of addresses. I think the default return type should depend on the data type. To get an explicit bytes or string or list of addresses, be explicit; otherwise, for convenience, return the appropriate type for the particular header field name. >Now, setting headers. Sometimes you have some unicode thing and >sometimes you have some bytes. You need to end up with bytes in the >ASCII range and you'd like to leave the header value unencoded if so. >But in both cases, you might have bytes or characters outside that >range, so you need an explicit encoding, defaulting to utf-8 probably. Never for header fields. The default is always RFC 2047, unless it isn't, say for params. The Message class should create an object of the appropriate subclass of Header based on the name (or use the existing object, see other discussion), and that should inspect its argument and DTRT or complain. > > >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >>> Message.set_header('Subject', b'Some bytes') > >One of those maps to > > >>> message['Subject'] = ??? The expected data type should depend on the header field. For Subject:, it should be bytes to be parsed or verbatim text. For To:, it should be a list of addresses or bytes or text to be parsed. The email package should be pythonic, and not require deep understanding of dozens of RFCs to use properly. Users don't need to know about the raw bytes; that's the whole point of MIME and any email package. It should be easy to set header fields with their natural data types, and doing it with bad data should produce an error. This may require a bit more care in the message parser, to always produce a parsed message with defects. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
|
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:
> At 22:38 -0400 04/09/2009, Barry Warsaw wrote: > ... >> So, what I'm really asking is this. Let's say you agree that there >> are use cases for accessing a header value as either the raw encoded >> bytes or the decoded unicode. What should this return: >> >>>>> message['Subject'] >> >> The raw bytes or the decoded unicode? > > That's an easy one: Subject: is an unstructured header, so it must be > text, thus Unicode. We're looking at a high-level representation of > an > email message, with parsed header fields and a MIME message tree. the message['Subject'] API for backward compatibility, but in that case it really should be a bytes API. >> (or better names... it's late and I'm tired ;). One of those maps to >> message['Subject'] but which is the more obvious choice? > > Structured header fields are more of a problem. Any header with > addresses > should return a list of addresses. I think the default return type > should > depend on the data type. To get an explicit bytes or string or list > of > addresses, be explicit; otherwise, for convenience, return the > appropriate > type for the particular header field name. Knight makes some excellent points, which I agree with. However the email package obviously cannot support every time of structured header possible. It must support this through extensibility. The obvious way is through inheritance (i.e. subclasses of Header), but in my experience, using inheritance of the Message class really doesn't work very well. You need to pass around factories to parsing functions and your application tends to have its own hierarchy of subclasses for whatever extra things it needs. ISTM that subclassing is simply not the right pattern to support extensibility in the Message objects or Header objects. Yes, this leads me to think that all the MIME* subclasses are essentially /wrong/. Having said all that, the email package must support structured headers. Look at the insanity which is the current folding whitespace splitting and the impossibility of the current code to do the right thing for say Subject headers and Received headers, and you begin to see why it must be possible to extend this stuff. >> Now, setting headers. Sometimes you have some unicode thing and >> sometimes you have some bytes. You need to end up with bytes in the >> ASCII range and you'd like to leave the header value unencoded if so. >> But in both cases, you might have bytes or characters outside that >> range, so you need an explicit encoding, defaulting to utf-8 >> probably. > > Never for header fields. The default is always RFC 2047, unless it > isn't, > say for params. > > The Message class should create an object of the appropriate > subclass of > Header based on the name (or use the existing object, see other > discussion), and that should inspect its argument and DTRT or > complain. >>>>> Message.set_header('Subject', 'Some text', encoding='utf-8') >>>>> Message.set_header('Subject', b'Some bytes') >> >> One of those maps to >> >>>>> message['Subject'] = ??? > > The expected data type should depend on the header field. For > Subject:, it > should be bytes to be parsed or verbatim text. For To:, it should > be a > list of addresses or bytes or text to be parsed. > The email package should be pythonic, and not require deep > understanding of > dozens of RFCs to use properly. Users don't need to know about the > raw > bytes; that's the whole point of MIME and any email package. It > should be > easy to set header fields with their natural data types, and doing > it with > bad data should produce an error. This may require a bit more care > in the > message parser, to always produce a parsed message with defects. to compose email messages, and probably easy-ish to parse a byte stream into an email message tree. But we can't build those without the lower level raw support. I'm also convinced that this lower level will be the domain of those crazy enough to have the RFCs tattooed to the back of their eyelids. -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn approximately 4/10/2009 9:56 AM, came the following characters from
the keyboard of Barry Warsaw: > On Apr 10, 2009, at 1:19 AM, glyph@... wrote: >> On 02:38 am, barry@... wrote: >>> So, what I'm really asking is this. Let's say you agree that there >>> are use cases for accessing a header value as either the raw encoded >>> bytes or the decoded unicode. What should this return: >>> >>> >>> message['Subject'] >>> >>> The raw bytes or the decoded unicode? >> >> My personal preference would be to just get deprecate this API, and >> get rid of it, replacing it with a slightly more explicit one. >> >> message.headers['Subject'] >> message.bytes_headers['Subject'] > > This is pretty darn clever Glyph. Stop that! :) > > I'm not 100% sure I like the name .bytes_headers or that .headers > should be the decoded header (rather than have .headers return the > bytes thingie and say .decoded_headers return the decoded thingies), > but I do like the general approach. If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. Of course, one could use message.header and message.bythdr and they'd be the same length. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote:
> If one name has to be longer than the other, it should be the bytes > version. Real user code is more likely to want to use the text > version, and hopefully there will be more of that type of code than > implementations using bytes. I'm not sure we know that yet, actually. Nothing written for Python 2 counts, and email is too broken in 3 for any sane person to be writing such code for Python 3. > Of course, one could use message.header and message.bythdr and > they'd be the same length. I was trying to figure out what a 'thdr' was that we'd want to index 'by' it. :) -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
|
|
|
|
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonOn Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote:
> If one name has to be longer than the other, it should be the bytes > version. Real user code is more likely to want to use the text > version, and hopefully there will be more of that type of code than > implementations using bytes. > > Of course, one could use message.header and message.bythdr and > they'd be the same length. Actually, thinking about this over the weekend, it's much better for message['subject'] to return a Header instance in all cases. Use bytes(header) to get the raw bytes. A good API for getting the parsed and decoded header values needs to take into account that it won't always be a string. For unstructured headers like Subject, str(header) would work just fine. For an Originator or Destination address, what does str(header) return? And what would be the API for getting the set of realname/addresses out of the header? -Barry _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
|
|
|
|
|
|
Re: [Python-Dev] headers api for email packageOn Mon, 13 Apr 2009 at 10:28, Barry Warsaw wrote:
> On Apr 11, 2009, at 8:39 AM, Chris Withers wrote: > >> Barry Warsaw wrote: >> > > > > message['Subject'] >> > The raw bytes or the decoded unicode? >> >> A header object. > > Yep. You got there before I did. :) +1 >> > Okay, so you've picked one. Now how do you spell the other way? >> >> str(message['Subject']) > > Yes for unstructured headers like Subject. For structured headers... hmm. Some "reasonable" printable interpretation that has no semantic meaning? >> bytes(message['Subject']) > > Yes. > >> > Now, setting headers. Sometimes you have some unicode thing and >> > sometimes you have some bytes. You need to end up with bytes in the >> > ASCII range and you'd like to leave the header value unencoded if so. >> > But in both cases, you might have bytes or characters outside that range, >> > so you need an explicit encoding, defaulting to utf-8 probably. >> > > > > Message.set_header('Subject', 'Some text', encoding='utf-8') >> > > > > Message.set_header('Subject', b'Some bytes') >> >> Where you just want "a damned valid email and stop making my life hard!": >> >> Message['Subject']='Some text' > > Yes. In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) > wtf? Given some usenet postings I've just dealt with, (3) appears to sometimes be spelled 'x-unknown' and sometimes (in the most recent case) 'unknown-8bit'. A quick google turns up a hit on RFC1428 for the latter, and a bunch of trouble tickets for the former...so I think 'wtf' is correctly spelled 'unknown-8bit'. However, it's not supposed to be used by mail composers, who are expected to know the encoding. It's for mail gateways that are transforming something and don't know the encoding. I'm not sure what this means for the email module, which certainly will be used in a mail gateways....maybe it's the responsibility of the application code to explicitly say 'unknown encoding'? >> Where you care about what encoding is used: >> >> Message['Subject']=Header('Some text',encoding='utf-8') > > Yes. > >> If you have bytes, for whatever reason: >> >> Message['Subject']=b'some bytes'.decode('utf-8') >> >> ...because only you know what encoding those bytes use! > > So you're saying that __setitem__() should not accept raw bytes? If I'm understanding things correctly, if it did accept bytes the person using that interface would need to do whatever encoding (eg: encoded-word) was needed, so the interface should check that the byte string is 8 bit clean. But having some sort of 'setraw' method on Header might be better for that case. --David _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonAt 10:11 -0400 04/13/2009, Barry Warsaw wrote:
>On Apr 10, 2009, at 11:08 AM, James Y Knight wrote: > >> Until you write a parser for every header, you simply cannot decode >> to unicode. The only sane choices are: >> 1) raw bytes >> 2) parsed structured data > >The email package does not need a parser for every header, but it >should provide a framework that applications (or third party >libraries) can use to extend the built-in header parsers. A bare >minimum for functionality requires a Content-Type parser. I think the >email package should also include an address header (Originator, >Destination) parser, and a Message-ID header parser. Possibly >others. The default would probably be some unstructured parser for >headers like Subject. I think the email package should have a parser for every header. All the headers defined in normal mail RFCs should have their own parser, and there would be a default parser for unhandled headers, probably the Unstructured parser. Users could add their own, probably by importing something module that knew how to add its parsing to the email package parsers. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] Dropping bytes "support" in jsonAt 10:14 -0400 04/13/2009, Barry Warsaw wrote:
... >Actually, thinking about this over the weekend, it's much better for >message['subject'] to return a Header instance in all cases. Use >bytes(header) to get the raw bytes. I don't agree. I'd want it to return the appropriate type for that header: string for Subject:, a list of addresses for To:, and so on. Either the user knows what to expect, or they'll learn immediately. If they get a Header, they have to then extract the appropriate data from it, based on its type (but they only know the name). OK, Header instances could have a .useful field that returned the useful data in all instances. But in any case, the email package should guide users in the correct usage, rather than leaving every choice seeming equal, when only one choice is correct. >A good API for getting the parsed and decoded header values needs to >take into account that it won't always be a string. For unstructured >headers like Subject, str(header) would work just fine. For an >Originator or Destination address, what does str(header) return? And >what would be the API for getting the set of realname/addresses out of >the header? msg[<headername>] would be the preferred way. msg.get_header(<headername>).useful would return the useful data form of any header. msg.get_header(<headername>).addresses would return the address list from any address Header, and raise AttributeError with other Headers. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: [Python-Dev] headers api for email packageBarry Warsaw writes:
> On Apr 11, 2009, at 8:39 AM, Chris Withers wrote: > > > Barry Warsaw wrote: > >> >>> message['Subject'] > >> The raw bytes or the decoded unicode? > > > > A header object. > > Yep. You got there before I did. :) > > >> Okay, so you've picked one. Now how do you spell the other way? > > > > str(message['Subject']) > > Yes for unstructured headers like Subject. For structured headers... > hmm. Well, suppose we get really radical here. *People* see email as (rich-)text. So ... message['Subject'] returns an object, partly to be consistent with more complex headers' APIs, but partly to remind us that nothing in email is as simple as it seems. Now, str(message['Subject']) is really for presentation to the user, right? OK, so let's make it a presentation function! Decode the MIME-words, optionally unfold folded lines, optionally compress spaces, etc. This by default returns the subject field as a single, possibly quite long, line. Then a higher-level API can rewrap it, add fonts etc, for fancy presentation. This also suggests that we don't the field tag (ie, "Subject") to be part of this value. Of course a *really* smart higher-level API would access structured headers based on their structure, not on the one-size-fits-all str() conversion. Then MTAs see email as a string of octets. So guess what: > > bytes(message['Subject']) gives wire format. Yow! I think I'm just joking. Right? > >> Now, setting headers. Sometimes you have some unicode thing and > >> sometimes you have some bytes. You need to end up with bytes in > >> the ASCII range and you'd like to leave the header value unencoded > >> if so. But in both cases, you might have bytes or characters > >> outside that range, so you need an explicit encoding, defaulting to > >> utf-8 probably. > >> >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >> >>> Message.set_header('Subject', b'Some bytes') > > > > Where you just want "a damned valid email and stop making my life > > hard!": -1 I mean, yeah, Brother, I feel your pain but it just isn't that easy. If that were feasible, it would be *criminal* to have a .set_header() method at all! In fact, > > Message['Subject']='Some text' is going to (a) need to take *only* unicodes, or (b) raise Exceptions at the slightest provocation when handed bytes. And things only get worse if you try to provide this interface for say "From" (let alone "Content-Type"). Is it really worth doing the mapping interface if it's only usable with free-form headers (ie, only Subject among the commonly used headers)? > Yes. In which case I propose we guess the encoding as 1) ascii, 2) > utf-8, 3) wtf? Uh, what guessing? If you don't know what you have but you believe it to be a valid header field, then presumably you got it off the wire and it's still in bytes and you just spit it out on the wire without trying to decode or encode it. But as I already said, I think that's a bad idea. Otherwise, you should have a unicode, and you simply look at the range of the string. If it fits in ASCII, Bob's your uncle. If not, Bob's your aunt (and you use UTF-8). > > Where you care about what encoding is used: > > > > Message['Subject']=Header('Some text',encoding='utf-8') > > Yes. > > > If you have bytes, for whatever reason: > > > > Message['Subject']=b'some bytes'.decode('utf-8') > > > > ...because only you know what encoding those bytes use! > > So you're saying that __setitem__() should not accept raw bytes? How do you distinguish "raw" bytes from "encoded bytes"? __setitem__() shouldn't accept bytes at all. There should be an API which sets a .formatted_for_the_wire member, and it should have a "validate" option (ie, when true the API attempts to parse the header and raises an exception if it fails to do so; when false, it assumes you know what you're doing and will send out the bytes verbatim). _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
API for Header objects [was: Dropping bytes "support" in json]Tony Nelson writes:
> OK, Header instances could have a .useful field that returned the useful > data in all instances. But in any case, the email package should guide > users in the correct usage, rather than leaving every choice seeming equal, > when only one choice is correct. What do you mean by "only one choice is correct?" For example, a Destination field might be used for presentation (in which case the display name are needed), or to compose a list of recipients (when thjey should be discarded). Some applications might prefer to receive the combination as the original string (although that often is not valid RFC-any), others might prefer it parsed into a pair of display name and mailbox. Quoth Barry Warsaw: > >A good API for getting the parsed and decoded header values needs to > >take into account that it won't always be a string. For unstructured > >headers like Subject, str(header) would work just fine. For an > >Originator or Destination address, what does str(header) return? A string (not folded) of comma-separated addresses in "Display Name" <po@...> form. > >And what would be the API for getting the set of > >realname/addresses out of the header? Does there need to be one? An AddressHeader object could support indexing: message['To'][0] returns the first displayname,mailbox pair. If you really want a list, what's wrong with list(header)? (Yes, I recall that you (Barry) said you don't think subclassing worked very well, but I wonder if maybe we can't get it righter this time around.) > msg[<headername>] would be the preferred way. This goes against the principle that this returns a Header object. For one thing, I really think that there need to be some common methods all Header objects support, like str() and to_wire_format(). Also, if this returns a list for 'To', then str(msg['To']) won't work right: it will return the list enclosed in square brackets and the mailbox portions will be quoted, which isn't useful. > msg.get_header(<headername>).useful would return the useful data form of > any header. Er, shouldn't we just throw away the data that is never useful?<wink> > msg.get_header(<headername>).addresses would return the address list from > any address Header, and raise AttributeError with other Headers. Yes, but a list of what? Strings? Bytes? Displayname/mailbox pairs? _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: API for Header objects [was: Dropping bytes "support" in json]At 03:38 +0900 04/14/2009, Stephen J. Turnbull wrote:
>Tony Nelson writes: > > > OK, Header instances could have a .useful field that returned the useful > > data in all instances. But in any case, the email package should guide > > users in the correct usage, rather than leaving every choice seeming equal, > > when only one choice is correct. > >What do you mean by "only one choice is correct?" For example, a >Destination field might be used for presentation (in which case the >display name are needed), or to compose a list of recipients (when >thjey should be discarded). Some applications might prefer to receive >the combination as the original string (although that often is not >valid RFC-any), others might prefer it parsed into a pair of display >name and mailbox. Assuming that by "Destination" you mean a class of Address header fields, as there is no Destionation: header field, such header fields contain addresses, which can be considered to contain (as the email package does) a list of (name, email address) pairs, or, at a lower level, to also have Comments, there is indeed only one correct choice, which is the one the email package currently provides the diligent user. I wish it to be the one obvious choice, so that less study is needed to properly use the email package. Any use that wishes to discard the email addresses in favor of the friendly names can do so most easily from the parsed [(name, address)], not from the bytes. Parsing Address header fields is hard. Note that Address headers are not Text, as only certain tokens -- not part of the email addresses -- can be RFC 2047-encoded. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@...> ' <http://www.georgeanelson.com/> _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
|
| < Prev | 1 - 2 - 3 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |