Just give me the decoded header?

View: New views
5 Messages — Rating Filter:   Alert me  

Just give me the decoded header?

by Nando-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gentlemen, please consider the following ipython session:


In [98]: m = email.message_from_file(f)

In [99]: print m["subject"]
=?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?=
        =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?=


It gives me the raw subject header value. Now of course I just wanted
the header in unicode. So I have to do:


In [100]: from email.header import decode_header

In [101]: decode_header(m["subject"])
Out[101]:
[('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1
legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica',
  'utf-8')]

In [102]: print decode_header(m["subject"])[0][0]
[oui.com.br] Cartão de crédito terá legislação específica


My questions are:
1) Why does not it currently return the *decoded* header?
2) Would it break too many apps if we changed it?
2.1) If it would, can we add a function such as
message.getheader("subject") for this?
2.1.1) Would you like me to propose a patch with the obvious implementation?

Sometimes, for things more or less like this, I just feel like
*subclassing* Message. But I can't. The MIME parser is wired to create
Messages. I don't think I can tell it to create a MyMessageSubclass.
This also happens with the convenience function
email.message_from_file(f). It creates a Message. I *think* I could make
it into a class method of Message, then I would be able to call
MyMessage.from_file(). Is this idea -- making things more
object-oriented -- interesting for you?

For starters, isn't it high time Message became a new-style class by
inheriting from object?

--
Nando Florestan
===============
[skype]    nandoflorestan
[phone]  + 55 (11) 3675-3038
[mobile] + 55 (11) 9820-5451
[internet] http://oui.com.br/
[À Capela] http://acapela.com.br/
[location] São Paulo - SP - Brasil

_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Just give me the decoded header?

by Mark Sapiro-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nando wrote:
>
> Sometimes, for things more or less like this, I just feel like
> *subclassing* Message. But I can't. The MIME parser is wired to create
> Messages. I don't think I can tell it to create a MyMessageSubclass.
> This also happens with the convenience function
> email.message_from_file(f). It creates a Message. I *think* I could make
> it into a class method of Message, then I would be able to call
> MyMessage.from_file(). Is this idea -- making things more
> object-oriented -- interesting for you?

You can do this now, albeit somewhat differently. See the _class
argument at <http://docs.python.org/lib/node149.html> and the _factory
argument at <http://docs.python.org/lib/node148.html>.

e.g. if your mymessage module defines a MyMessage class as a sub class
of email.message.Message, you can do

import email
import mymessage

f = open('/path/to/message/file')

msg = email.message_from_file(f, mymessage.MyMessage)


to create a MyMessage instance. You can also do

import email
import mymessage

p = email.parser.Parser(mymessage.MyMessage)

to create a parser which will create MyMessage instances.

--
Mark Sapiro <mark@...>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Just give me the decoded header?

by Stephen J. Turnbull :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nando writes:

 > My questions are:
 > 1) Why does not it currently return the *decoded* header?

Because Message is an implementation of RFC 2822, which says nothing
about decoding headers.  It is very helpful to model your programs
directly on the standards the claim to conform to.

Why restrict the base interface to such a low-level API?  Well,
Internet email is an ancient system going back to RFC 561 at least
(published in 1973), and many things that seem unnecessary today with
modern technology remain necessary because you cannot know what
generation of technology you are communicating with (or even if the
remote user is a dog, as the famous joke goes).  Often optimizations
in modern programs depend on assumptions about standard conformance.

 > 2) Would it break too many apps if we changed it?

It probably would.  Multiply decoding headers will probably result in
passing non-ASCII to the ASCII codec, and boom! you're down.  For
example, Mailman is vulnerable to this.

 > 2.1) If it would, can we add a function such as
 > message.getheader("subject") for this?

You could, but why would you need that particular implementation?

 > Sometimes, for things more or less like this, I just feel like
 > *subclassing* Message.

Why do that?  In my experience, you will eventually find a need to
pass the original Message to some routine (or even the original
message, in digital signing applications).  If you want to work with a
SmartMessage so that it contains the same data but returns the decoded
headers, just include the original Message as an attribute:

import email
class SmartMessage(Object):
    def __init__(self,email_message):
        self.raw_message = email_message
    def __getitem__(self,key):
        return email.header.decode_header(self.raw_message[key])

etc.

However, the problem you're going to run into is that this kind of
behavior (whether implemented as a subclass or by enveloping the
raw_message attribute) will make it impossible for apps to distinguish
between Messages and SmartMessages in contexts where it matters.

 > But I can't. The MIME parser is wired to create
 > Messages. I don't think I can tell it to create a MyMessageSubclass.

Again, why do you want to?  Everything you need to implement the
behavior you want is in the Message already.

 > For starters, isn't it high time Message became a new-style class by
 > inheriting from object?

Sure, but code speaks louder than words.  Nobody has been willing to
speak up yet. :-(


_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Just give me the decoded header?

by Bugzilla from hpj@urpla.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Donnerstag, 14. Februar 2008 schrieb Nando:

> Gentlemen, please consider the following ipython session:
>
>
> In [98]: m = email.message_from_file(f)
>
> In [99]: print m["subject"]
> =?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?=
>         =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?=
>
>
> It gives me the raw subject header value. Now of course I just wanted
> the header in unicode. So I have to do:
>
>
> In [100]: from email.header import decode_header
>
> In [101]: decode_header(m["subject"])
> Out[101]:
> [('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1
> legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica',
>   'utf-8')]

Nando, you're just a lucky camper in that case. How would you handle a
mixture of say: big5, euc_jp, koi8_r _and_ utf-8 encodings. Please don't
claim, that this is unlikely. Sure it is, but never the less, it happens,
and does your code gets this pathological case right?

Wait, let's normalize them - but how do we handle encoding failures?
Remember, there are way too many MUAs, mailing list managers, email
gateways, autoresponder, etc. out there, which get this wrong!

Next you ask for email.Message to reparse email addresses to conform to RFC
2822, and voila, you created a unmanageable creature called Frankenstein..

If you think about the consequences, you will understand, that Barry and
friends will do _everything_ to keep this can o'worms closed in this
context.

Pete
_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

Re: Just give me the decoded header?

by Stuart Bishop :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nando wrote:

> Gentlemen, please consider the following ipython session:
>
>
> In [98]: m = email.message_from_file(f)
>
> In [99]: print m["subject"]
> =?utf-8?b?W291aS5jb20uYnJdIENhcnTDo28gZGUgY3LDqWRpdG8gdGVyw6EgbGVn?=
>         =?utf-8?b?aXNsYcOnw6NvIGVzcGVjw61maWNh?=
>
>
> It gives me the raw subject header value. Now of course I just wanted
> the header in unicode. So I have to do:
>
>
> In [100]: from email.header import decode_header
>
> In [101]: decode_header(m["subject"])
> Out[101]:
> [('[oui.com.br] Cart\xc3\xa3o de cr\xc3\xa9dito ter\xc3\xa1
> legisla\xc3\xa7\xc3\xa3o espec\xc3\xadfica',
>   'utf-8')]
>
> In [102]: print decode_header(m["subject"])[0][0]
> [oui.com.br] Cartão de crédito terá legislação específica
>
>
> My questions are:
> 1) Why does not it currently return the *decoded* header?
Because you often need access to the raw header. Also, not all headers are
encoded the same. While what you have works for Subject:, it doesn't work
for To:, Reply-To:, From: etc.

> 2) Would it break too many apps if we changed it?

Yes. Particularly apps that need to log or report broken email headers that
cannot be decoded.

> 2.1) If it would, can we add a function such as
> message.getheader("subject") for this?
> 2.1.1) Would you like me to propose a patch with the obvious implementation?

I'd love to see things become more Unicode aware.

Perhaps return an object implementing __str__() and __unicode__() (or
decode()). The cast-to-unicode conversion would decode headers with known
encodings and raise an exception on headers with unknown encodings.
Similarly, setting headers using Unicode strings would use the known
encodings to perform the reverse operation. And you still have access to the
raw value if you want to round trip.



--
Stuart Bishop <stuart@...>
http://www.stuartbishop.net/



_______________________________________________
Email-SIG mailing list
Email-SIG@...
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (196 bytes) Download Attachment