codec from a KMime::Message

View: New views
5 Messages — Rating Filter:   Alert me  

codec from a KMime::Message

by laurent Montel-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
in kmail4 we used KMMessage::codec() to get codec.
Now in kmail-akonadi we use KMime::Message.
Which is the function to get codec ?

Thanks
Regards

--
Laurent Montel | laurent@... | KDE/Qt Senior Software Engineer
Klarälvdalens Datakonsult AB, a KDAB Group company
Tel. Sweden (HQ) +46-563-540090, USA +1-866-777-KDAB(5322)
KDAB - Qt Experts - Platform-independent software solutions
_______________________________________________
KDE PIM mailing list kde-pim@...
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/

Re: codec from a KMime::Message

by Bugzilla from mcguire@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Laurent,

On Wednesday 28 October 2009 10:49:00 laurent Montel wrote:
> in kmail4 we used KMMessage::codec() to get codec.
> Now in kmail-akonadi we use KMime::Message.
> Which is the function to get codec ?

Let me first explain a bit what codecs/charsets/encodings are about. Mails can
be written in different charsets, for example in ASCII, UTF8, ISO-8859-15 and
many others.

For KMail to display the message correctly, it needs to know the charset the
mail was written in. If the wrong charset is used, the mail will be displayed
incorrectly, e.g. umlauts are displayed the wrong way. (There is even a word
for that: Mojibake).

So to display the mail correctly, the charset encoding needs to be known. To
solve this, the charset encoding is noted down in the mail itself, as part of
the content-type header. See the source of your mail, you'll see that it uses
the iso-8859-1 charset.

Now, some mail clients unfortunately don't specify the charset the mail was
written in, or even specify the wrong charset. But to display the mail
correctly, we need to use the correct charset.
To solve this, KMail has a fallback character encoding. When KMail displays a
message which has no charset specified, it uses the fallback charset instead.
The fallback charset can be set by the user in the settings under Appearance-
>Message Window. By the default, the fallback charset is the local system
encoding, since the user will likely communicate most with users that use the
same language.
(By RFC, if the charset is not specified in the mail, that should mean ASCII
charset, but because of all the incorrect mailers out there, that is not true,
therefore the fallback encoding)

Then, KMail also has the option to set a override charset. This will be used
even when a message has a charset specified, it will just override the charset
specified in the message. The override charset can also be set in the settings
under Appearance->Message Window, and also in the View menu under "Set
Encoding". If the override charset is set to "Auto", KMail will not override
the charset which is specified in the message and use the charset from the
message instead, or the fallback charset if the message does not specify a
charset.
The override charset can be set on a per-message basis with
KMMessage::setOverrideCodec().

Mails can also consist of multiple MIME parts. Your mail was just a single
text/plain MIME part, but some mails have more parts (think attachments or
HTML mail).
The KMime class representing a single MIME part is KMime::Content. For those
messages only consisting of a single MIME part, KMime::Message is the main and
only MIME part (it inherits KMime::Content). For mails with multiple MIME
parts, the main MIME part/KMime::Content can have child parts/contents, see
KMime::Content::contents(). Therefore, the MIME parts form a tree. Each
part/content can have headers, see KMime::Content::head().
All MIME parts that should be displayed as text should have a content-type
header with a charset parameter to specify how they should be displayed.

Summary:
Fallback charset: Charset that is used when the message has no specified
charset
Override charset: Charset that is always used, even if the message specifies a
charset
MIME part: Mails can consist of multiple parts, which form a tree. Each part
has headers.

Now finally to your question:
-----------------------------

KMMessage::codec() takes into account the override charset and the fallback
charset, have a look at the source. codec() internally calls charset(), which
looks into the headers to see if there is a content-type header with a charset
parameter.
KMime::Content does support fallback encoding, see
KMime::Content::setDefaultCharset().
KMime::Content also supports override encoding, see
KMime::Content::setForceDefaultCharset().

If you set the fallback/override charset, that should work automatically in
KMime, e.g. KMime::Content::decodedText() will take the fallback and override
charset into account.
If you really need the QTextCodec that is used to decode and encode the
charset, you'd need to write that method yourself.
(I'm not sure if KMime::Content::setDefaultCharset() and
setForceDefaultCharset() actually propagate the charset to the child
parts/contents.)

Side note 1: All this is completely unrelated to content-transfer-encoding,
which is something else.

Side note 2: I've been a bit sloppy with the terms charset, encoding and codec
in this mail, hope you still get the idea.

Puh, long mail.

Regards,
Thomas


_______________________________________________
KDE PIM mailing list kde-pim@...
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/

signature.asc (204 bytes) Download Attachment

Re: codec from a KMime::Message

by Bugzilla from mcguire@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote:

> For KMail to display the message correctly, it needs to know the charset
> the mail was written in. If the wrong charset is used, the mail will be
> displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is
> even a word for that: Mojibake).
>
> So to display the mail correctly, the charset encoding needs to be known.
> To solve this, the charset encoding is noted down in the mail itself, as
> part of the content-type header.
>
> [..]
>
> Summary:
> Fallback charset: Charset that is used when the message has no specified
> charset
> Override charset: Charset that is always used, even if the message
> specifies a charset
> MIME part: Mails can consist of multiple parts, which form a tree. Each
> part has headers.
Ok, now let me explain charset vs content-transfer-encoding, for the brave.

The charset is only used when displaying text MIME parts, e.g. text/plain.

But what is content-transfer-encoding?
The problem is that mails can not use the full byte range from 0 to 255 when
being sent, this is disallowed.
However, most attachments like images, zip files and so on do use all 256 byte
values. Text encoded with some charsets also use the full byte range from 0 to
255, for example UTF-8.

Mail sending is constrained to only a part of the byte range, 0 to 127 I
think. This is fine for ASCII text, since that is 0 to 127 only, but things
like attachments or UTF-8 text which use the full byte range can not be sent.

To solve that, stuff that uses the full byte range needs to be
transformed/encoded to something that only uses the first 128 byte values. The
encoding that does this is called content-transfer encoding. There are 4
different content-transfer-encodings:

7-bit: This does nothing, it assumes that the input is already in the 0 to 127
byte value range, and therefore the input can be sent unencoded.

base64: Encodes each and every byte-value into two human-readable characters
from the alphabet

quoted-printable: Encodes each non-ascii character as an equal-sign followed
by two human-readable characters from the alphabet. The advantage over base64
is that nearly all ASCII characters remain unchanged and is therefore much
more human-readable, but the disadvantage is that this encoding scheme has
more overhead.

8-bit: This is an exception, some SMTP servers actually do support sending
mails which contain the fully byte range from 0 to 255. For those SMTP
servers, input that uses the full byte range does not need to be encoded at
all, which is called the 8-bit content-transfer-encoding.

The rule is: Use 7 bit if it is possible, mail clients can deal with that the
best and it has no space overhead.
If something doesn't fit into 7 bit, use either base64 or quoted-printable. It
depends on the input which of those is best for space usage. quoted-printable
is much more human-readable, though.
KMime even has a nice class to detect which is the best content-transfer-
encoding for a given input, see KMime::CharFreq::type().

(BTW, it is even more fun when dealing with linebreaks, but even I don't know
the details there)

For text parts, the content-transfer-encoding is applied on top of the charset
encoding, e.g. first the text is charset-encoded with ISO-8859-15, then it is
content-transfer-encoded with quoted-printable.

Non-text parts, like attachments, don't need a charset encoding, since there
is no text to display. Those parts are encoded with the content-transfer-
encoding only.

Content-transfer-encodings are specified in RFC 2045, section 6.

Regards,
Thomas


_______________________________________________
KDE PIM mailing list kde-pim@...
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/

signature.asc (204 bytes) Download Attachment

Re: codec from a KMime::Message

by Bugzilla from kollix@aon.at :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 28 October 2009 16:21:26 Thomas McGuire wrote:
> Hi,
>
> On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote:
> > For KMail to display the message correctly, it needs to know the charset
> > the mail was written in. If the wrong charset is used, the mail will be
> > displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is
> > even a word for that: Mojibake).

<long and very detailled information snipped>

You should really put this explanation and the other mail somewhere on the wiki or so...

--
Best regards/Schöne Grüße

Martin    ()  ascii ribbon campaign - against html mail
          /\                        - against microsoft attachments

Geschenkideen, Accessoires, Seifen, Kulinarisches: www.bibibest.at


_______________________________________________
KDE PIM mailing list kde-pim@...
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/

signature.asc (196 bytes) Download Attachment

Re: codec from a KMime::Message

by Bugzilla from kloecker@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 28 October 2009, Martin Koller wrote:

> On Wednesday 28 October 2009 16:21:26 Thomas McGuire wrote:
> > Hi,
> >
> > On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote:
> > > For KMail to display the message correctly, it needs to know the
> > > charset the mail was written in. If the wrong charset is used,
> > > the mail will be displayed incorrectly, e.g. umlauts are
> > > displayed the wrong way. (There is even a word for that:
> > > Mojibake).
>
> <long and very detailled information snipped>
>
> You should really put this explanation and the other mail somewhere
> on the wiki or so...
I thought the same. This is really a good explanation of those important
topics.


Regards,
Ingo


_______________________________________________
KDE PIM mailing list kde-pim@...
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/

signature.asc (204 bytes) Download Attachment