|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Just give me the decoded header?Gentlemen, please consider the following ipython session:
In [98]: m = email.message_from_file(f) In [99]: print m["subject"] =?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?= =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?= It gives me the raw subject header value. Now of course I just wanted the header in unicode. So I have to do: In [100]: from email.header import decode_header In [101]: decode_header(m["subject"]) Out[101]: [('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1 legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica', 'utf-8')] In [102]: print decode_header(m["subject"])[0][0] [oui.com.br] Cartão de crédito terá legislação específica My questions are: 1) Why does not it currently return the *decoded* header? 2) Would it break too many apps if we changed it? 2.1) If it would, can we add a function such as message.getheader("subject") for this? 2.1.1) Would you like me to propose a patch with the obvious implementation? Sometimes, for things more or less like this, I just feel like *subclassing* Message. But I can't. The MIME parser is wired to create Messages. I don't think I can tell it to create a MyMessageSubclass. This also happens with the convenience function email.message_from_file(f). It creates a Message. I *think* I could make it into a class method of Message, then I would be able to call MyMessage.from_file(). Is this idea -- making things more object-oriented -- interesting for you? For starters, isn't it high time Message became a new-style class by inheriting from object? -- Nando Florestan =============== [skype] nandoflorestan [phone] + 55 (11) 3675-3038 [mobile] + 55 (11) 9820-5451 [internet] http://oui.com.br/ [À Capela] http://acapela.com.br/ [location] São Paulo - SP - Brasil _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Just give me the decoded header?Nando wrote:
> > Sometimes, for things more or less like this, I just feel like > *subclassing* Message. But I can't. The MIME parser is wired to create > Messages. I don't think I can tell it to create a MyMessageSubclass. > This also happens with the convenience function > email.message_from_file(f). It creates a Message. I *think* I could make > it into a class method of Message, then I would be able to call > MyMessage.from_file(). Is this idea -- making things more > object-oriented -- interesting for you? You can do this now, albeit somewhat differently. See the _class argument at <http://docs.python.org/lib/node149.html> and the _factory argument at <http://docs.python.org/lib/node148.html>. e.g. if your mymessage module defines a MyMessage class as a sub class of email.message.Message, you can do import email import mymessage f = open('/path/to/message/file') msg = email.message_from_file(f, mymessage.MyMessage) to create a MyMessage instance. You can also do import email import mymessage p = email.parser.Parser(mymessage.MyMessage) to create a parser which will create MyMessage instances. -- Mark Sapiro <mark@...> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Just give me the decoded header?Nando writes:
> My questions are: > 1) Why does not it currently return the *decoded* header? Because Message is an implementation of RFC 2822, which says nothing about decoding headers. It is very helpful to model your programs directly on the standards the claim to conform to. Why restrict the base interface to such a low-level API? Well, Internet email is an ancient system going back to RFC 561 at least (published in 1973), and many things that seem unnecessary today with modern technology remain necessary because you cannot know what generation of technology you are communicating with (or even if the remote user is a dog, as the famous joke goes). Often optimizations in modern programs depend on assumptions about standard conformance. > 2) Would it break too many apps if we changed it? It probably would. Multiply decoding headers will probably result in passing non-ASCII to the ASCII codec, and boom! you're down. For example, Mailman is vulnerable to this. > 2.1) If it would, can we add a function such as > message.getheader("subject") for this? You could, but why would you need that particular implementation? > Sometimes, for things more or less like this, I just feel like > *subclassing* Message. Why do that? In my experience, you will eventually find a need to pass the original Message to some routine (or even the original message, in digital signing applications). If you want to work with a SmartMessage so that it contains the same data but returns the decoded headers, just include the original Message as an attribute: import email class SmartMessage(Object): def __init__(self,email_message): self.raw_message = email_message def __getitem__(self,key): return email.header.decode_header(self.raw_message[key]) etc. However, the problem you're going to run into is that this kind of behavior (whether implemented as a subclass or by enveloping the raw_message attribute) will make it impossible for apps to distinguish between Messages and SmartMessages in contexts where it matters. > But I can't. The MIME parser is wired to create > Messages. I don't think I can tell it to create a MyMessageSubclass. Again, why do you want to? Everything you need to implement the behavior you want is in the Message already. > For starters, isn't it high time Message became a new-style class by > inheriting from object? Sure, but code speaks louder than words. Nobody has been willing to speak up yet. :-( _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Just give me the decoded header?Am Donnerstag, 14. Februar 2008 schrieb Nando:
> Gentlemen, please consider the following ipython session: > > > In [98]: m = email.message_from_file(f) > > In [99]: print m["subject"] > =?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?= > =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?= > > > It gives me the raw subject header value. Now of course I just wanted > the header in unicode. So I have to do: > > > In [100]: from email.header import decode_header > > In [101]: decode_header(m["subject"]) > Out[101]: > [('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1 > legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica', > 'utf-8')] Nando, you're just a lucky camper in that case. How would you handle a mixture of say: big5, euc_jp, koi8_r _and_ utf-8 encodings. Please don't claim, that this is unlikely. Sure it is, but never the less, it happens, and does your code gets this pathological case right? Wait, let's normalize them - but how do we handle encoding failures? Remember, there are way too many MUAs, mailing list managers, email gateways, autoresponder, etc. out there, which get this wrong! Next you ask for email.Message to reparse email addresses to conform to RFC 2822, and voila, you created a unmanageable creature called Frankenstein.. If you think about the consequences, you will understand, that Barry and friends will do _everything_ to keep this can o'worms closed in this context. Pete _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
|
|
Re: Just give me the decoded header?Nando wrote:
> Gentlemen, please consider the following ipython session: > > > In [98]: m = email.message_from_file(f) > > In [99]: print m["subject"] > =?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?= > =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?= > > > It gives me the raw subject header value. Now of course I just wanted > the header in unicode. So I have to do: > > > In [100]: from email.header import decode_header > > In [101]: decode_header(m["subject"]) > Out[101]: > [('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1 > legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica', > 'utf-8')] > > In [102]: print decode_header(m["subject"])[0][0] > [oui.com.br] Cartão de crédito terá legislação específica > > > My questions are: > 1) Why does not it currently return the *decoded* header? encoded the same. While what you have works for Subject:, it doesn't work for To:, Reply-To:, From: etc. > 2) Would it break too many apps if we changed it? Yes. Particularly apps that need to log or report broken email headers that cannot be decoded. > 2.1) If it would, can we add a function such as > message.getheader("subject") for this? > 2.1.1) Would you like me to propose a patch with the obvious implementation? I'd love to see things become more Unicode aware. Perhaps return an object implementing __str__() and __unicode__() (or decode()). The cast-to-unicode conversion would decode headers with known encodings and raise an exception on headers with unknown encodings. Similarly, setting headers using Unicode strings would use the known encodings to perform the reverse operation. And you still have access to the raw value if you want to round trip. -- Stuart Bishop <stuart@...> http://www.stuartbishop.net/ _______________________________________________ Email-SIG mailing list Email-SIG@... Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com |
| Free embeddable forum powered by Nabble | Forum Help |