|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
codec from a KMime::MessageHi,
in kmail4 we used KMMessage::codec() to get codec. Now in kmail-akonadi we use KMime::Message. Which is the function to get codec ? Thanks Regards -- Laurent Montel | laurent@... | KDE/Qt Senior Software Engineer Klarälvdalens Datakonsult AB, a KDAB Group company Tel. Sweden (HQ) +46-563-540090, USA +1-866-777-KDAB(5322) KDAB - Qt Experts - Platform-independent software solutions _______________________________________________ KDE PIM mailing list kde-pim@... https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/ |
|
|
Re: codec from a KMime::MessageHi Laurent,
On Wednesday 28 October 2009 10:49:00 laurent Montel wrote: > in kmail4 we used KMMessage::codec() to get codec. > Now in kmail-akonadi we use KMime::Message. > Which is the function to get codec ? Let me first explain a bit what codecs/charsets/encodings are about. Mails can be written in different charsets, for example in ASCII, UTF8, ISO-8859-15 and many others. For KMail to display the message correctly, it needs to know the charset the mail was written in. If the wrong charset is used, the mail will be displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is even a word for that: Mojibake). So to display the mail correctly, the charset encoding needs to be known. To solve this, the charset encoding is noted down in the mail itself, as part of the content-type header. See the source of your mail, you'll see that it uses the iso-8859-1 charset. Now, some mail clients unfortunately don't specify the charset the mail was written in, or even specify the wrong charset. But to display the mail correctly, we need to use the correct charset. To solve this, KMail has a fallback character encoding. When KMail displays a message which has no charset specified, it uses the fallback charset instead. The fallback charset can be set by the user in the settings under Appearance- >Message Window. By the default, the fallback charset is the local system encoding, since the user will likely communicate most with users that use the same language. (By RFC, if the charset is not specified in the mail, that should mean ASCII charset, but because of all the incorrect mailers out there, that is not true, therefore the fallback encoding) Then, KMail also has the option to set a override charset. This will be used even when a message has a charset specified, it will just override the charset specified in the message. The override charset can also be set in the settings under Appearance->Message Window, and also in the View menu under "Set Encoding". If the override charset is set to "Auto", KMail will not override the charset which is specified in the message and use the charset from the message instead, or the fallback charset if the message does not specify a charset. The override charset can be set on a per-message basis with KMMessage::setOverrideCodec(). Mails can also consist of multiple MIME parts. Your mail was just a single text/plain MIME part, but some mails have more parts (think attachments or HTML mail). The KMime class representing a single MIME part is KMime::Content. For those messages only consisting of a single MIME part, KMime::Message is the main and only MIME part (it inherits KMime::Content). For mails with multiple MIME parts, the main MIME part/KMime::Content can have child parts/contents, see KMime::Content::contents(). Therefore, the MIME parts form a tree. Each part/content can have headers, see KMime::Content::head(). All MIME parts that should be displayed as text should have a content-type header with a charset parameter to specify how they should be displayed. Summary: Fallback charset: Charset that is used when the message has no specified charset Override charset: Charset that is always used, even if the message specifies a charset MIME part: Mails can consist of multiple parts, which form a tree. Each part has headers. Now finally to your question: ----------------------------- KMMessage::codec() takes into account the override charset and the fallback charset, have a look at the source. codec() internally calls charset(), which looks into the headers to see if there is a content-type header with a charset parameter. KMime::Content does support fallback encoding, see KMime::Content::setDefaultCharset(). KMime::Content also supports override encoding, see KMime::Content::setForceDefaultCharset(). If you set the fallback/override charset, that should work automatically in KMime, e.g. KMime::Content::decodedText() will take the fallback and override charset into account. If you really need the QTextCodec that is used to decode and encode the charset, you'd need to write that method yourself. (I'm not sure if KMime::Content::setDefaultCharset() and setForceDefaultCharset() actually propagate the charset to the child parts/contents.) Side note 1: All this is completely unrelated to content-transfer-encoding, which is something else. Side note 2: I've been a bit sloppy with the terms charset, encoding and codec in this mail, hope you still get the idea. Puh, long mail. Regards, Thomas _______________________________________________ KDE PIM mailing list kde-pim@... https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/ |
|
|
Re: codec from a KMime::MessageHi,
On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote: > For KMail to display the message correctly, it needs to know the charset > the mail was written in. If the wrong charset is used, the mail will be > displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is > even a word for that: Mojibake). > > So to display the mail correctly, the charset encoding needs to be known. > To solve this, the charset encoding is noted down in the mail itself, as > part of the content-type header. > > [..] > > Summary: > Fallback charset: Charset that is used when the message has no specified > charset > Override charset: Charset that is always used, even if the message > specifies a charset > MIME part: Mails can consist of multiple parts, which form a tree. Each > part has headers. The charset is only used when displaying text MIME parts, e.g. text/plain. But what is content-transfer-encoding? The problem is that mails can not use the full byte range from 0 to 255 when being sent, this is disallowed. However, most attachments like images, zip files and so on do use all 256 byte values. Text encoded with some charsets also use the full byte range from 0 to 255, for example UTF-8. Mail sending is constrained to only a part of the byte range, 0 to 127 I think. This is fine for ASCII text, since that is 0 to 127 only, but things like attachments or UTF-8 text which use the full byte range can not be sent. To solve that, stuff that uses the full byte range needs to be transformed/encoded to something that only uses the first 128 byte values. The encoding that does this is called content-transfer encoding. There are 4 different content-transfer-encodings: 7-bit: This does nothing, it assumes that the input is already in the 0 to 127 byte value range, and therefore the input can be sent unencoded. base64: Encodes each and every byte-value into two human-readable characters from the alphabet quoted-printable: Encodes each non-ascii character as an equal-sign followed by two human-readable characters from the alphabet. The advantage over base64 is that nearly all ASCII characters remain unchanged and is therefore much more human-readable, but the disadvantage is that this encoding scheme has more overhead. 8-bit: This is an exception, some SMTP servers actually do support sending mails which contain the fully byte range from 0 to 255. For those SMTP servers, input that uses the full byte range does not need to be encoded at all, which is called the 8-bit content-transfer-encoding. The rule is: Use 7 bit if it is possible, mail clients can deal with that the best and it has no space overhead. If something doesn't fit into 7 bit, use either base64 or quoted-printable. It depends on the input which of those is best for space usage. quoted-printable is much more human-readable, though. KMime even has a nice class to detect which is the best content-transfer- encoding for a given input, see KMime::CharFreq::type(). (BTW, it is even more fun when dealing with linebreaks, but even I don't know the details there) For text parts, the content-transfer-encoding is applied on top of the charset encoding, e.g. first the text is charset-encoded with ISO-8859-15, then it is content-transfer-encoded with quoted-printable. Non-text parts, like attachments, don't need a charset encoding, since there is no text to display. Those parts are encoded with the content-transfer- encoding only. Content-transfer-encodings are specified in RFC 2045, section 6. Regards, Thomas _______________________________________________ KDE PIM mailing list kde-pim@... https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/ |
|
|
Re: codec from a KMime::MessageOn Wednesday 28 October 2009 16:21:26 Thomas McGuire wrote:
> Hi, > > On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote: > > For KMail to display the message correctly, it needs to know the charset > > the mail was written in. If the wrong charset is used, the mail will be > > displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is > > even a word for that: Mojibake). <long and very detailled information snipped> You should really put this explanation and the other mail somewhere on the wiki or so... -- Best regards/Schöne Grüße Martin () ascii ribbon campaign - against html mail /\ - against microsoft attachments Geschenkideen, Accessoires, Seifen, Kulinarisches: www.bibibest.at _______________________________________________ KDE PIM mailing list kde-pim@... https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/ |
|
|
Re: codec from a KMime::MessageOn Wednesday 28 October 2009, Martin Koller wrote:
> On Wednesday 28 October 2009 16:21:26 Thomas McGuire wrote: > > Hi, > > > > On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote: > > > For KMail to display the message correctly, it needs to know the > > > charset the mail was written in. If the wrong charset is used, > > > the mail will be displayed incorrectly, e.g. umlauts are > > > displayed the wrong way. (There is even a word for that: > > > Mojibake). > > <long and very detailled information snipped> > > You should really put this explanation and the other mail somewhere > on the wiki or so... topics. Regards, Ingo _______________________________________________ KDE PIM mailing list kde-pim@... https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/ |
| Free embeddable forum powered by Nabble | Forum Help |