Tux

...making Linux just a little more fun!

Problems with UTF-8 over SMTP

Kat Tanaka Okopnik [kat at linuxgazette.net]


Tue, 22 Jan 2008 12:13:11 -0800

[[[ The originating thread for this discussion is http://linuxgazette.net/147/misc/lg/transliterating_arabic.html -- Kat ]]]

On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote:

> Latin character set (ISO-8859-1 and such) to Russian, yes.
> 
> Eh... I'll send this example, and hope the 8-bit stuff makes it through
> the mail. 
> ```
> ben@Tyr:~$ tsl2utf8 -h
> Mappings:
> 
> A|<90>     B|<91>     V|<92>     G|<93>     D|<94>     E|<95>     J|<96>     Z|<97>
> I|<98>     Y|<99>     K|<9a>     L|<9b>     M|<9c>     N|<9d>     O|<9e>     P|<9f>
> R|<a0>     S|<a1>     T|<a2>     U|<a3>     F|<a4>     H|<a5>     C|<a6>     X|<a7>
> 1|<a8>     2|<a9>     3|<aa>     4|<ab>     5|<ac>     6|<ad>     7|<ae>     8|<af>
> a|<b0>     b|<b1>     v|<b2>     g|<b3>     d|<b4>     e|<b5>     j|<b6>     z|<b7>
> i|<b8>     y|<b9>     k|<ba>     l|<bb>     m|<bc>     n|<bd>     o|<be>     p|<bf>
> r|<80>     s|<81>     t|<82>     u|<83>     f|<84>     h|<85>     c|<86>     x|<87>
> !|<88>     @|<89>     #|<8a>     $|<8b>     %|<8c>     ^|<8d>     &|<8e>     *|<8f>
> +|<91>
> 
> ben@Tyr:~$ tsl2utf8
> samovar
> <81><b0><bc><be><b2><b0><80>
> babu!ka
> <b1><b0><b1><83><88><ba><b0>
> 7jno-^fiopskiy grax uv+l m$!% za hobot na s#ezd *@eric.
> <ae><b6><bd><be>-<8d><84><b8><be><bf><81><ba><b8><b9> [...]
> '''

Alas, as you may note from the above, it came through as utter mojibake, even though my system is capable of reading (some) Russian.

http://people.debian.org/~kubota/mojibake/

http://en.wikipedia.org/wiki/Mojibake

Hmm. Wikipedia sugests that I call it krakozyabry (крокозя́бры). ;)

This looked like a useful gizmo: http://2cyr.com/decode/?lang=en but it failed to produce anything ungarbled this time.

-- 
Kat Tanaka Okopnik
Linux Gazette Mailbag Editor
kat@linuxgazette.net

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Tue, 22 Jan 2008 17:12:59 -0500

On Tue, Jan 22, 2008 at 12:13:11PM -0800, Kat wrote:

> On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote:
> > 
> > Eh... I'll send this example, and hope the 8-bit stuff makes it through
> > the mail. 
> > ```
> > ben@Tyr:~$ tsl2utf8 -h
> > Mappings:
> > 
> > A|<90>     B|<91>     V|<92>     G|<93>     D|<94>     E|<95>     J|<96>     Z|<97>
> > I|<98>     Y|<99>     K|<9a>     L|<9b>     M|<9c>     N|<9d>     O|<9e>     P|<9f>
> > R|<a0>     S|<a1>     T|<a2>     U|<a3>     F|<a4>     H|<a5>     C|<a6>     X|<a7>
> > 1|<a8>     2|<a9>     3|<aa>     4|<ab>     5|<ac>     6|<ad>     7|<ae>     8|<af>
> > a|<b0>     b|<b1>     v|<b2>     g|<b3>     d|<b4>     e|<b5>     j|<b6>     z|<b7>
> > i|<b8>     y|<b9>     k|<ba>     l|<bb>     m|<bc>     n|<bd>     o|<be>     p|<bf>
> > r|<80>     s|<81>     t|<82>     u|<83>     f|<84>     h|<85>     c|<86>     x|<87>
> > !|<88>     @|<89>     #|<8a>     $|<8b>     %|<8c>     ^|<8d>     &|<8e>     *|<8f>
> > +|<91>
> > 
> > ben@Tyr:~$ tsl2utf8
> > samovar
> > <81><b0><bc><be><b2><b0><80>
> > babu!ka
> > <b1><b0><b1><83><88><ba><b0>
> > 7jno-^fiopskiy grax uv+l m$!% za hobot na s#ezd *@eric.
> > <ae><b6><bd><be>-<8d><84><b8><be><bf><81><ba><b8><b9> [...]
> > '''
> 
> Alas, as you may note from the above, it came through as utter
> mojibake, even though my system is capable of reading (some) Russian.

Bleh. As I'm responding to this, using 'vi', I can see exactly where the UTF-8 characters got turned back into the... other... stuff. I.e., the first letter pair looks like 'A|<83><90>' (the '<90>' part being the value of the second byte in hex, i.e. dec144/oct221) - which is actually what the UTF-8 two-byte pair for the character is supposed to be.

To the best of my troubleshooting ability so far, everything breaks somewhere between the time that it leaves my mail client and the time that it arrives at the LG mail server - but I've checked everything on my end, and I'm sending it out with 'utf-8' as the charset and 8 bits set for the SMTP transaction. I'm pretty much stuck at that point, and have been for a while.

> Hmm. Wikipedia sugests that I call it krakozyabry (крокозя́бры). ;)
^ ^

The translit version is fine; the so-called Russian isn't (it's 'kra', not 'kro'.) I also get somewhat annoyed when people put accent marks into plain Russian text anywhere outside a primer without denoting it: given that Russian uses a mark of that sort as part of a letter... well, there's no such letter as a "ya-tilde" in Russian, although anyone looking at the above cite would think so.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


René Pfeiffer [lynx at luchs.at]


Wed, 23 Jan 2008 00:45:37 +0100

On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:

> On Tue, Jan 22, 2008 at 12:13:11PM -0800, Kat wrote:
> > On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote:
> > >
> > > Eh... I'll send this example, and hope the 8-bit stuff makes it through
> > > the mail.
> > [...]
> > Alas, as you may note from the above, it came through as utter
> > mojibake, even though my system is capable of reading (some) Russian.
>
> Bleh. As I'm responding to this, using 'vi', I can see exactly where the
> UTF-8 characters got turned back into the... other... stuff. I.e., the
> first letter pair looks like 'A|=C3=90<90>' (the '<90>' part being the value
> of the second byte in hex, i.e. dec144/oct221) - which is actually what
> the UTF-8 two-byte pair for the character is supposed to be.

It looks a bit unreadable to me (not that I could understand Arabic or Russian though).

> To the best of my troubleshooting ability so far, everything breaks
> somewhere between the time that it leaves my mail client and the time
> that it arrives at the LG mail server - but I've checked everything on
> my end, and I'm sending it out with 'utf-8' as the charset and 8 bits
> set for the SMTP transaction. I'm pretty much stuck at that point, and
> have been for a while.

It took me some time to convert all my workstations and my mutt mail enviroment to UTF-8. Basically I have the following configuration.

 - I use LANG=3Den_GB.UTF-8 as locale setting (don't like the German
   translations ;-).
 - I use UTF-8-capable xterms. "ps ax" says they were started with the
   following options:
   xterm -class UXTerm -title uxterm -u8 -bg black -fg green
 - My .muttrc offers mutt the following encodings when writing emails:
   set send_charset=3D"us-ascii:iso-8859-15:utf-8"
 - Additionally I have the following two lines in my .muttrc:
   set charset=3D"utf-8"
   set editor=3D"vim +':set textwidth=3D72' +':set wrap' +':set encoding=3Dutf-8' +'set si'"
With this combination the encoding is fairly sure to survive (even PGP/MIME and other manglings on the way out).

We now return to your regular scheduled programme.

Best, René.

P.S.: I wonder what happens to the &eacute; in Ben's mutt/xterm/window/thing.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 00:09:59 -0500

On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<a9> Pfeiffer wrote:

> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:
> 
> > To the best of my troubleshooting ability so far, everything breaks
> > somewhere between the time that it leaves my mail client and the time
> > that it arrives at the LG mail server - but I've checked everything on
> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits
> > set for the SMTP transaction. I'm pretty much stuck at that point, and
> > have been for a while.
> 
> It took me some time to convert all my workstations and my mutt mail
> enviroment to UTF-8. Basically I have the following configuration.
> 
>  - I use LANG=en_GB.UTF-8 as locale setting (don't like the German
>    translations ;-).

I've got LANG=en_US.UTF-8; did that pretty early on, since I often have a need for mixing different languages.

>  - I use UTF-8-capable xterms. "ps ax" says they were started with the
>    following options:
>    xterm -class UXTerm -title uxterm -u8 -bg black -fg green

I've got most of that, except I set it in my .Xresources:

xterm*utf8:1
xterm*background: black
xterm*foreground: gold
Not sure what the UXTerm class does beyond turning on '-u8' and providing a different class for font settings, etc., but I can display/read Unicode stuff just fine (':dig' in Vim is a pretty good test; so is Markus Kuhn's "UTF-8-demo.txt.gz".) I'll try adding it, just to see.

>  - My .muttrc offers mutt the following encodings when writing emails:
>    set send_charset="us-ascii:iso-8859-15:utf-8"

Hmm, I didn't have that one - I'll try adding it. Frankly, I doubt that it'll change anything, since the messages queued in my SMTP spool look fine.

>  - Additionally I have the following two lines in my .muttrc:
>    set charset="utf-8"

Got that one.

>    set editor="vim +':set textwidth=72' +':set wrap' +':set encoding=utf-8' +'set si'"

Set in my ~/.vimrc, except for "si" and "encoding". I'll add that too - although, again, the UTF-8 stuff that I write in my files saves and displays just fine.

I suspect that I'm just missing something in my understanding of how SMTP works - although I've studied everything I thought was relevant. Mutt is pretty smart, so my messages (both the one I sent to the list earlier and the test ones I've just sent on a round trip) went out with the following relevant headers:

MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
The method that I invoke on the Net::SMTP object that does the server interaction is:

$smtp{connection}->mail( $message{from}, Bits => 8 )
This is the belt and the suspenders and the no-wrinkle fabric with the double-stitched pockets with special coin holders on the sides, y'know what I mean?

> With this combination the encoding is fairly sure to survive (even
> PGP/MIME and other manglings on the way out).

All I can say is :(((((( ...

> Ren<a9>.
> 
> P.S.: I wonder what happens to the &eacute; in Ben's
>       mutt/xterm/window/thing.

I can see it just fine right now - but it's going to break once I send it back to the list.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


René Pfeiffer [lynx at luchs.at]


Wed, 23 Jan 2008 14:52:49 +0100

On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:

> On Wed, Jan 23, 2008 at 12:45:37AM +0100, René Pfeiffer wrote:
> > [...]
> >  - I use LANG=3Den_GB.UTF-8 as locale setting (don't like the German
> >    translations ;-).
>
> I've got LANG=3Den_US.UTF-8; did that pretty early on, since I often have
> a need for mixing different languages.

Ok.

> >  - I use UTF-8-capable xterms. "ps ax" says they were started with the
> >    following options:
> >    xterm -class UXTerm -title uxterm -u8 -bg black -fg green
>
> I've got most of that, except I set it in my .Xresources:
>
> ``
> xterm*utf8:1
> xterm*background: black
> xterm*foreground: gold
> ''

This looks good, too.

> Not sure what the UXTerm class does beyond turning on '-u8' and
> providing a different class for font settings, etc., but I can
> display/read Unicode stuff just fine (':dig' in Vim is a pretty good
> test; so is Markus Kuhn's "UTF-8-demo.txt.gz".) I'll try adding it, just
> to see.

I use xterm with "-u8" out of habit since it's in my xfce menu configuration. xterms with this option handle the UTF-8-demo.txt file just fine.

> >  - My .muttrc offers mutt the following encodings when writing emails:
> >    set send_charset=3D"us-ascii:iso-8859-15:utf-8"
>
> Hmm, I didn't have that one - I'll try adding it. Frankly, I doubt that
> it'll change anything, since the messages queued in my SMTP spool look
> fine.

Yes, and your headers also look good. I use the above line mainly because then mutt can use an appropriate encoding. UTF-8 isn't always necessary.

> >  - Additionally I have the following two lines in my .muttrc:
> >    set charset=3D"utf-8"
>
> Got that one.

Hm.

> >    set editor=3D"vim +':set textwidth=3D72' +':set wrap' +':set encoding=3Dutf-8' +'set si'"
>
> Set in my ~/.vimrc, except for "si" and "encoding". I'll add that too -
> although, again, the UTF-8 stuff that I write in my files saves and
> displays just fine.

Well, what can I say? :-) Looks good to me.

> I suspect that I'm just missing something in my understanding of how
> SMTP works - although I've studied everything I thought was relevant.

Most modern MTAs are 8-bit clean. From personal experience I know that Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies just fine.

> Mutt is pretty smart, so my messages (both the one I sent to the list
> earlier and the test ones I've just sent on a round trip) went out with
> the following relevant headers:
>
> ``
> MIME-Version: 1.0
> Content-Type: text/plain; charset=3Dutf-8
> Content-Disposition: inline
> Content-Transfer-Encoding: 8bit
> ''

Yes, I saw that, and that's correct - apart from the content in your email. ;-)

> The method that I invoke on the Net::SMTP object that does the server
> interaction is:
>
> ``
> $smtp{connection}->mail( $message{from}, Bits =3D> 8 )
> ''
>
> This is the belt and the suspenders and the no-wrinkle fabric with
> the double-stitched pockets with special coin holders on the sides,
> y'know what I mean?

Yes, basically the take away variant with everything and extra cheese. The only difference is that I carry a Postfix around all the time which handles the submitted emails.

> > With this combination the encoding is fairly sure to survive (even
> > PGP/MIME and other manglings on the way out).
>
> All I can say is :(((((( ...

Which leaves me with ?????????...

> > René.
> >
> > P.S.: I wonder what happens to the &eacute; in Ben's
> >       mutt/xterm/window/thing.
>
> I can see it just fine right now - but it's going to break once I send
> it back to the list.

Why not use screenshots for the quoted text then? :-) On my end of your email I see a "Ren =C3=A9" which looks like "René" encoded in UTF-8 and displayed as ISO-8859-1(5). Maybe wireshark can help us out.

Best, René.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 09:44:27 -0500

On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote:

> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> 
> > I suspect that I'm just missing something in my understanding of how
> > SMTP works - although I've studied everything I thought was relevant.
> 
> Most modern MTAs are 8-bit clean. From personal experience I know that
> Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies
> just fine.

I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.

> > Mutt is pretty smart, so my messages (both the one I sent to the list
> > earlier and the test ones I've just sent on a round trip) went out with
> > the following relevant headers:
> > 
> > ``
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=utf-8
> > Content-Disposition: inline
> > Content-Transfer-Encoding: 8bit
> > ''
> 
> Yes, I saw that, and that's correct - apart from the content in your
> email. ;-)

Heh.

> > > RenA<a9>.
250 2.1.0 ben@linuxgazette.net... Sender ok
250 2.1.5 ben@linuxgazette.net... Recipient ok
354 Enter mail, end with "." on a line by itself
test
тест

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 09:44:27 -0500

On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote:

> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> 
> > I suspect that I'm just missing something in my understanding of how
> > SMTP works - although I've studied everything I thought was relevant.
> 
> Most modern MTAs are 8-bit clean. From personal experience I know that
> Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies
> just fine.

I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.

> > Mutt is pretty smart, so my messages (both the one I sent to the list
> > earlier and the test ones I've just sent on a round trip) went out with
> > the following relevant headers:
> > 
> > ``
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=utf-8
> > Content-Disposition: inline
> > Content-Transfer-Encoding: 8bit
> > ''
> 
> Yes, I saw that, and that's correct - apart from the content in your
> email. ;-)

Heh.

> > > Ren?????^F<a9>.

Yep - that's what I see after I've sent it on a round trip.

> Maybe wireshark can help us
> out.

Oh, I'm pretty sure it's Net::SMTP at this point. Since the queued message is OK, and manually sending the text is OK as well, that's pretty much the only set of gears left in between.

(I'm going to try sending this email manually - launch 'bssmtp' with the '-odq' ("only queue") option and copy the queued result to the telnet session. We'll see what that looks like.)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 09:53:12 -0500

On Wed, Jan 23, 2008 at 09:44:27AM -0500, Benjamin Okopnik wrote:

> On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren? Pfeiffer wrote:
> > On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> > 
> > > I suspect that I'm just missing something in my understanding of how
> > > SMTP works - although I've studied everything I thought was relevant.
> > 
> > Most modern MTAs are 8-bit clean. From personal experience I know that
> > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies
> > just fine.
> 
> I'm pretty sure that 8bit is the default method, but I wanted to nail it
> down just in case. I've even tried 'Bits => "binary"', with no better
> result.
> 
> > > Mutt is pretty smart, so my messages (both the one I sent to the list
> > > earlier and the test ones I've just sent on a round trip) went out with
> > > the following relevant headers:
> > > 
> > > ``
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=utf-8
> > > Content-Disposition: inline
> > > Content-Transfer-Encoding: 8bit
> > > ''
> > 
> > Yes, I saw that, and that's correct - apart from the content in your
> > email. ;-)
> 
> Heh.
> 
> > > > Ren?????^F?.
> 250 2.1.0 ben@linuxgazette.net... Sender ok
> 250 2.1.5 ben@linuxgazette.net... Recipient ok
> 354 Enter mail, end with "." on a line by itself

[laugh] Whoops. I tried sending this manually, so the UTF-8 content would make it through; knowing that '.' by itself means "End of session", I added spaces after the period at this point, but I guess the upstream mail server I was using didn't take me seriously. I'll try resending the whole thing, this time "renaming" that period.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 10:08:32 -0500

On Wed, Jan 23, 2008 at 09:44:27AM -0500, Benjamin Okopnik wrote:

> On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote:
> > On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> > 
> > > I suspect that I'm just missing something in my understanding of how
> > > SMTP works - although I've studied everything I thought was relevant.
> > 
> > Most modern MTAs are 8-bit clean. From personal experience I know that
> > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies
> > just fine.

[snip]

Bleh. Never mind the manual method; the interaction times out before I can glue it all in, and the console hoses some of the text. I'll just send it as is, and let the UTF-8 characters do their thing for now.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 10:09:22 -0500

On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<83><a9> Pfeiffer wrote:

> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> 
> > I suspect that I'm just missing something in my understanding of how
> > SMTP works - although I've studied everything I thought was relevant.
> 
> Most modern MTAs are 8-bit clean. From personal experience I know that
> Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies
> just fine.

I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.

> > Mutt is pretty smart, so my messages (both the one I sent to the list
> > earlier and the test ones I've just sent on a round trip) went out with
> > the following relevant headers:
> > 
> > ``
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=utf-8
> > Content-Disposition: inline
> > Content-Transfer-Encoding: 8bit
> > ''
> 
> Yes, I saw that, and that's correct - apart from the content in your
> email. ;-)

Heh.

> > > Ren<83><a9>.
> > > 
> > > P.S.: I wonder what happens to the &eacute; in Ben's
> > >       mutt/xterm/window/thing.
> > 
> > I can see it just fine right now - but it's going to break once I send
> > it back to the list. 
> 
> Why not use screenshots for the quoted text then? :-)

Well, yes, that would fix this specific instance of the problem - and it would still suck to have an SMTP server that doesn't do the right thing. I just tested a part of my SMTP chain with the following:

ben@Tyr:~$ telnet linuxgazette.net 25
Trying 64.246.26.120...
Connected to linuxgazette.net.
Escape character is '^]'.
220 genetikayos.com ESMTP Sendmail 8.12.11.20060308/8.12.11; Wed, 23 Jan 2008 06:34:44 -0800
HELO Tyr.Thor
MAIL FROM: ben@linuxgazette.net
RCPT TO: ben@linuxgazette.net
DATA
250 genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you
250 2.1.0 ben@linuxgazette.net... Sender ok
250 2.1.5 ben@linuxgazette.net... Recipient ok
354 Enter mail, end with "." on a line by itself
test
<82><b5><81><82>
.                                                    
250 2.0.0 m0NEYiOR015906 Message accepted for delivery
QUIT
221 2.0.0 genetikayos.com closing connection
Connection closed by foreign host.
and that came through just fine, the UTF-8 content makes it across without any special headers (the only header I actually put in was 'Subject: ...'.) This means that Net::SMTP is hosing my content while doing the transaction - which is not great news. On the one hand, I wrote/rewrote 'bssmtp' to be modular and easy to service; on the other hand, I really don't feel like reloading my brain with all the SMTP-relevant guck and sitting down to rewrite that part of it. [sigh] I'm going to have to do that, it seems. Maybe I'll just do the SMTP stuff manually instead of using a module; it's not that tough, and I won't have someone else's code handing me this kind of surprises anymore.

> On my end of your email I see a "Ren<83><a9>" which looks like "Ren<83><a9>" encoded
> in UTF-8 and displayed as ISO-8859-1(5). 

Yep - that's what I see after I've sent it on a round trip.

> Maybe wireshark can help us
> out.

Oh, I'm pretty sure it's Net::SMTP at this point. Since the queued message is OK, and manually sending the text is OK as well, that's pretty much the only set of gears left in between.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Breen Mullins [breen.mullins at gmail.com]


Wed, 23 Jan 2008 06:35:11 -0800

* Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:

>On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<a9> Pfeiffer wrote:
>> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:
>> 
>> > To the best of my troubleshooting ability so far, everything breaks
>> > somewhere between the time that it leaves my mail client and the time
>> > that it arrives at the LG mail server - but I've checked everything on
>> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits
>> > set for the SMTP transaction. I'm pretty much stuck at that point, and
>> > have been for a while.

Hmm. Your headers don't look good to me.

=========
X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
	m0N59WPL030073
 Subject: Re: [TAG] Problems with UTF-8 over SMTP
 Content-Type: text/plain; charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 X-SA-Exim-Version: 4.2.1 (built Mon, 27 Mar 2006 13:42:28 +0200)
======
It definitely says that it's iso-8859-1 here. The Content-Type line is at the end of the headers, just before the Spamassassin stuff. The autoconverted line is several up from there. I suspect that the conversion is misflagging the message.

Breen

-- 
Breen Mullins
Menlo Park, California

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 10:28:46 -0500

On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:

> * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:
> 
> >On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<83><a9> Pfeiffer wrote:
> >> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:
> >> 
> >> > To the best of my troubleshooting ability so far, everything breaks
> >> > somewhere between the time that it leaves my mail client and the time
> >> > that it arrives at the LG mail server - but I've checked everything on
> >> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits
> >> > set for the SMTP transaction. I'm pretty much stuck at that point, and
> >> > have been for a while.
> 
> Hmm. Your headers don't look good to me.
> 
> =========
> X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
> 	m0N59WPL030073
> ======

[blink] How the hell did I miss *that*?

Wow. Thanks, Breen - that sounds like the place where it's getting hosed, all right (I'll test that in a moment by sending it through another server.) The question is, how do I stop it? Anybody familiar with that aspect of SMTP?

I'm definitely sending out a character set - again, this is even before it gets to 'bssmtp', Mutt sets it based on the content and it definitely does the right thing when I have UTF-8 in there. Why it would get converted is a mystery to me.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Neil Youngman [Neil.Youngman at youngman.org.uk]


Wed, 23 Jan 2008 15:45:52 +0000

On Wednesday 23 January 2008 15:28, Ben Okopnik wrote:

> On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:
> > * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:
> > Hmm. Your headers don't look good to me.
> >
> > =========
> > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
> > 	m0N59WPL030073
> > ======
>
> [blink] How the hell did I miss *that*?
>
> Wow. Thanks, Breen - that sounds like the place where it's getting
> hosed, all right (I'll test that in a moment by sending it through
> another server.) The question is, how do I stop it? Anybody familiar
> with that aspect of SMTP?
>
> I'm definitely sending out a character set - again, this is even before
> it gets to 'bssmtp', Mutt sets it based on the content and it definitely
> does the right thing when I have UTF-8 in there. Why it would get
> converted is a mystery to me.

That sounds to me very much like a server receiving it as 8 bit and deciding that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing to do in that circumstance is to encode it, so it can be accepted by a 7 bit only server.

RFC 1652 says

"If a server SMTP does not support the 8-bit MIME transport extension
   (either by not responding with code 250 to the EHLO command, or by
   not including the EHLO keyword value 8BITMIME in its response), then
   the client SMTP must not, under any circumstances, attempt to
   transfer a content which contains characters outside the US-ASCII
   octet range (hex 00-7F).
  
   A client SMTP has two options in this case: first, it may implement a
   gateway transformation to convert the message into valid 7bit MIME,
   or second, or may treat this as a permanent error and handle it in
  
   the usual manner for delivery failures.  The specifics of the
   transformation from 8bit MIME to 7bit MIME are not described by this
   RFC; the conversion is nevertheless constrained in the following
   ways:
 
      (1)  it must cause no loss of information; MIME transport
           encodings must be employed as needed to insure this is
           the case, and
 
      (2)  the resulting message must be valid 7bit MIME."
I assume that the headers should be altered to correctly reflect the encoding, as otherwise it wouldn't be "valid 7bit MIME".

Neil


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 11:15:06 -0500

On Wed, Jan 23, 2008 at 03:45:52PM +0000, Neil Youngman wrote:

> On Wednesday 23 January 2008 15:28, Ben Okopnik wrote:
> > On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:
> > > * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:
> > > Hmm. Your headers don't look good to me.
> > >
> > > =========
> > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
> > > 	m0N59WPL030073
> > > ======
> >
> > [blink] How the hell did I miss *that*?
> >
> > Wow. Thanks, Breen - that sounds like the place where it's getting
> > hosed, all right (I'll test that in a moment by sending it through
> > another server.) The question is, how do I stop it? Anybody familiar
> > with that aspect of SMTP?
> >
> > I'm definitely sending out a character set - again, this is even before
> > it gets to 'bssmtp', Mutt sets it based on the content and it definitely
> > does the right thing when I have UTF-8 in there. Why it would get
> > converted is a mystery to me.
> 
> That sounds to me very much like a server receiving it as 8 bit and deciding 
> that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing 
> to do in that circumstance is to encode it, so it can be accepted by a 7 bit 
> only server. 
> 
> RFC 1652 says
> 
> "If a server SMTP does not support the 8-bit MIME transport extension
>    (either by not responding with code 250 to the EHLO command, or by
>    not including the EHLO keyword value 8BITMIME in its response)

Hmm. If I'm sending a message to myself, then the receiving host is 'linuxgazette.net'.

ben@Tyr:~$ telnet linuxgazette.net 25
Trying 64.246.26.120...
Connected to linuxgazette.net.
Escape character is '^]'.
EHLO Tyr.Thor
250-genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-DSN
250-ETRN
250-AUTH GSSAPI
250-STARTTLS
250-DELIVERBY
250 HELP
Doesn't seem like that would trigger it off. In fact, I don't know that anybody out there is still doing 7bit-only stuff.

>    , then
>    the client SMTP must not, under any circumstances, attempt to
>    transfer a content which contains characters outside the US-ASCII
>    octet range (hex 00-7F).
> 
>    A client SMTP has two options in this case: first, it may implement a
>    gateway transformation to convert the message into valid 7bit MIME,
>    or second, or may treat this as a permanent error and handle it in
> 
>    the usual manner for delivery failures.  The specifics of the
>    transformation from 8bit MIME to 7bit MIME are not described by this
>    RFC; the conversion is nevertheless constrained in the following
>    ways:
> 
>       (1)  it must cause no loss of information; MIME transport
>            encodings must be employed as needed to insure this is
>            the case, and
> 
>       (2)  the resulting message must be valid 7bit MIME."
> 
> I assume that the headers should be altered to correctly reflect the encoding, 
> as otherwise it wouldn't be "valid 7bit MIME".

Right... I don't think that "quoted-printable" is exactly equivalent to "7bit MIME" (although it would indeed pass through that filter.) Even more to the point, the headers on this email still say "Content-Transfer-Encoding: 8bit".

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 11:15:06 -0500

On Wed, Jan 23, 2008 at 03:45:52PM +0000, Neil Youngman wrote:

> On Wednesday 23 January 2008 15:28, Ben Okopnik wrote:
> > On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:
> > > * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:
> > > Hmm. Your headers don't look good to me.
> > >
> > > =========
> > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
> > > 	m0N59WPL030073
> > > ======
> >
> > [blink] How the hell did I miss *that*?
> >
> > Wow. Thanks, Breen - that sounds like the place where it's getting
> > hosed, all right (I'll test that in a moment by sending it through
> > another server.) The question is, how do I stop it? Anybody familiar
> > with that aspect of SMTP?
> >
> > I'm definitely sending out a character set - again, this is even before
> > it gets to 'bssmtp', Mutt sets it based on the content and it definitely
> > does the right thing when I have UTF-8 in there. Why it would get
> > converted is a mystery to me.
> 
> That sounds to me very much like a server receiving it as 8 bit and deciding 
> that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing 
> to do in that circumstance is to encode it, so it can be accepted by a 7 bit 
> only server. 
> 
> RFC 1652 says
> 
> "If a server SMTP does not support the 8-bit MIME transport extension
>    (either by not responding with code 250 to the EHLO command, or by
>    not including the EHLO keyword value 8BITMIME in its response)

Hmm. If I'm sending a message to myself, then the receiving host is 'linuxgazette.net'.

ben@Tyr:~$ telnet linuxgazette.net 25
Trying 64.246.26.120...
Connected to linuxgazette.net.
Escape character is '^]'.
EHLO Tyr.Thor
250-genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-DSN
250-ETRN
250-AUTH GSSAPI
250-STARTTLS
250-DELIVERBY
250 HELP
Doesn't seem like that would trigger it off. In fact, I don't know that anybody out there is still doing 7bit-only stuff.

>    , then
>    the client SMTP must not, under any circumstances, attempt to
>    transfer a content which contains characters outside the US-ASCII
>    octet range (hex 00-7F).
> 
>    A client SMTP has two options in this case: first, it may implement a
>    gateway transformation to convert the message into valid 7bit MIME,
>    or second, or may treat this as a permanent error and handle it in
> 
>    the usual manner for delivery failures.  The specifics of the
>    transformation from 8bit MIME to 7bit MIME are not described by this
>    RFC; the conversion is nevertheless constrained in the following
>    ways:
> 
>       (1)  it must cause no loss of information; MIME transport
>            encodings must be employed as needed to insure this is
>            the case, and
> 
>       (2)  the resulting message must be valid 7bit MIME."
> 
> I assume that the headers should be altered to correctly reflect the encoding, 
> as otherwise it wouldn't be "valid 7bit MIME".

Right... I don't think that "quoted-printable" is exactly equivalent to "7bit MIME" (although it would indeed pass through that filter.) Even more to the point, the headers on this email still say "Content-Transfer-Encoding: 8bit".

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Neil Youngman [Neil.Youngman at youngman.org.uk]


Wed, 23 Jan 2008 16:27:22 +0000

On Wednesday 23 January 2008 16:15, Ben Okopnik wrote:

> Doesn't seem like that would trigger it off. In fact, I don't know that
> anybody out there is still doing 7bit-only stuff.

I don't have the headers to look at, so I can only guess. Is the last received header "genetikayos.com"?

> Right... I don't think that "quoted-printable" is exactly equivalent to
> "7bit MIME" (although it would indeed pass through that filter.) Even
> more to the point, the headers on this email still say
> "Content-Transfer-Encoding: 8bit".

I would say that "quoted-printable" is a subset of 7bit mime and in my (very limited) experience, it seems to be the default choice. It does sound as though the headers haven't been correctly updated.

Neil


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 14:17:44 -0500

On Wed, Jan 23, 2008 at 04:27:22PM +0000, Neil Youngman wrote:

> On Wednesday 23 January 2008 16:15, Ben Okopnik wrote:
> > Doesn't seem like that would trigger it off. In fact, I don't know that
> > anybody out there is still doing 7bit-only stuff.
> 
> I don't have the headers to look at, so I can only guess. Is the last received 
> header "genetikayos.com"? 

Here's a header from a successful one (i.e., the one I sent via a manual SMTP session):

 From ben  Wed Jan 23 09:37:32 2008
Return-Path: ben@linuxgazette.net
Received: from genetikayos.com [64.246.26.120]
        by Tyr with POP3 (fetchmail-6.3.2)
        for <ben@localhost> (single-drop); Wed, 23 Jan 2008 09:37:32 -0500 (EST)
Received: from Tyr.Thor (72.sub-75-203-218.myvzw.com [75.203.218.72])
        by genetikayos.com (8.12.11.20060308/8.12.11) with SMTP id m0NEYiOR015906
        for ben@linuxgazette.net; Wed, 23 Jan 2008 06:34:51 -0800
Date: Wed, 23 Jan 2008 06:34:44 -0800
From: Ben Okopnik <ben@linuxgazette.net>
Message-Id: <200801231434.m0NEYiOR015906@genetikayos.com> X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,MISSING_SUBJECT, RCVD_IN_PBL,TO_CC_NONE autolearn=no version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com Status: RO Content-Length: 14 Lines: 2
And here's one from an email with the same content (which came through broken), sent via Mutt and bssmtp:

 From ben  Wed Jan 23 14:08:26 2008
Return-Path: ben@linuxgazette.net
Received: from localhost [127.0.0.1]
        by Tyr with POP3 (fetchmail-6.3.2)
        for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 -0500 (EST)
Received: from localhost.localdomain (genetikayos.com [64.246.26.120])
        by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP id m0NJ7G4Q026459
        for <ben@linuxgazette.net>; Wed, 23 Jan 2008 11:07:28 -0800
Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with SMTP id 70419937;
        Wed, 23 Jan 2008 14:07:40 -0500
From: Ben Okopnik <ben@linuxgazette.net>
Message-ID: <20080123190740.GC8843@linuxgazette.net>
Date: Wed, 23 Jan 2008 14:07:40 -0500
To: Ben Okopnik <ben@linuxgazette.net>
MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.11
X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com
I note that there's no conversion warning in it - and yet, it's broken.

> > Right... I don't think that "quoted-printable" is exactly equivalent to
> > "7bit MIME" (although it would indeed pass through that filter.) Even
> > more to the point, the headers on this email still say
> > "Content-Transfer-Encoding: 8bit".
> 
> I would say that "quoted-printable" is a subset of 7bit mime and in my (very 
> limited) experience, it seems to be the default choice. 

That sounds reasonable. Perhaps converting anything that's not 100% clear to 7bit is some servers' default policy.

> It does sound as 
> though the headers haven't been correctly updated.

Again, possible - but the UTF-8 does come through in a manual session, where I've used no headers from the sender side beyond the 'From:' (the 'RCPT TO:' just gets used to determine where to send it, and doesn't get added to the headers.)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Neil Youngman [Neil.Youngman at youngman.org.uk]


Wed, 23 Jan 2008 20:06:08 +0000

On Wednesday 23 January 2008 19:17, Ben Okopnik wrote:

> On Wed, Jan 23, 2008 at 04:27:22PM +0000, Neil Youngman wrote:
> > On Wednesday 23 January 2008 16:15, Ben Okopnik wrote:
> > > Doesn't seem like that would trigger it off. In fact, I don't know that
> > > anybody out there is still doing 7bit-only stuff.
> >
> > I don't have the headers to look at, so I can only guess. Is the last
> > received header "genetikayos.com"?
>
> Here's a header from a successful one (i.e., the one I sent via a manual
> SMTP session):
>
> ``
> From ben  Wed Jan 23 09:37:32 2008
> Return-Path: ben@linuxgazette.net
> Received: from genetikayos.com [64.246.26.120]
>         by Tyr with POP3 (fetchmail-6.3.2)
>         for <ben@localhost> (single-drop); Wed, 23 Jan 2008 09:37:32 -0500
> (EST) Received: from Tyr.Thor (72.sub-75-203-218.myvzw.com [75.203.218.72])
> by genetikayos.com (8.12.11.20060308/8.12.11) with SMTP id m0NEYiOR015906
> for ben@linuxgazette.net; Wed, 23 Jan 2008 06:34:51 -0800
> Date: Wed, 23 Jan 2008 06:34:44 -0800
> From: Ben Okopnik <ben@linuxgazette.net>
> Message-Id: <200801231434.m0NEYiOR015906@genetikayos.com>
> X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,MISSING_SUBJECT,
>         RCVD_IN_PBL,TO_CC_NONE autolearn=no version=3.1.8
> X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com
> Status: RO
> Content-Length: 14
> Lines: 2

No MIME headers, should be treated as plain US-ASCII for most purposes IIRC. Probably most 8 bit clean MTAs wouldn't check that it's seven bit clean, especially if they're handling ESMTP.

> And here's one from an email with the same content (which came through
> broken), sent via Mutt and bssmtp:
>
> ``
> From ben  Wed Jan 23 14:08:26 2008
> Return-Path: ben@linuxgazette.net
> Received: from localhost [127.0.0.1]
>         by Tyr with POP3 (fetchmail-6.3.2)
>         for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 -0500
> (EST) Received: from localhost.localdomain (genetikayos.com
> [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP
> id m0NJ7G4Q026459 for <ben@linuxgazette.net>; Wed, 23 Jan 2008 11:07:28
> -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with
> SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500
> From: Ben Okopnik <ben@linuxgazette.net>
> Message-ID: <20080123190740.GC8843@linuxgazette.net>
> Date: Wed, 23 Jan 2008 14:07:40 -0500
> To: Ben Okopnik <ben@linuxgazette.net>
> MIME-Version: 1.0
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline
> Content-Transfer-Encoding: 8bit
> User-Agent: Mutt/1.5.11
> X-Spam-Status: No, score=-1.6 required=5.0
> tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8
> X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com
> ''
>
> I note that there's no conversion warning in it - and yet, it's broken.

That's got all the proper MIME headers. While I wouldn't normally expect MTAs to rely on the MIME headers, if they've got to update the MIME headers when they convert, maybe they do require MIME headers to be present before doing the conversion?

I'm afraid the headers don't suggest much to me.

The 2 big differences are the MIME headers and the use of mutt/bssmtp. I don't have much info on bssmtp. I wonder if it offers 8bitmime? If not, maybe Mutt does the 8bit to 7bit conversion when handing of bssmtp? I think that's clutching at straws, but you never know.

Neil


Top    Back


Neil Youngman [Neil.Youngman at youngman.org.uk]


Wed, 23 Jan 2008 20:19:59 +0000

On Wednesday 23 January 2008 20:06, Neil Youngman wrote:

> On Wednesday 23 January 2008 19:17, Ben Okopnik wrote:
> > From ben  Wed Jan 23 14:08:26 2008
> > Return-Path: ben@linuxgazette.net
> > Received: from localhost [127.0.0.1]
> >         by Tyr with POP3 (fetchmail-6.3.2)
> >         for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26
> > -0500 (EST) Received: from localhost.localdomain (genetikayos.com
> > [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP
> > id m0NJ7G4Q026459 for <ben@linuxgazette.net>; Wed, 23 Jan 2008 11:07:28
> > -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with
> > SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500
> > From: Ben Okopnik <ben@linuxgazette.net>
> > Message-ID: <20080123190740.GC8843@linuxgazette.net>
> > Date: Wed, 23 Jan 2008 14:07:40 -0500
> > To: Ben Okopnik <ben@linuxgazette.net>
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=utf-8
> > Content-Disposition: inline
> > Content-Transfer-Encoding: 8bit
> > User-Agent: Mutt/1.5.11
> > X-Spam-Status: No, score=-1.6 required=5.0
> > tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8
> > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on
> > genetikayos.com ''
> >
> > I note that there's no conversion warning in it - and yet, it's broken.

I missed that there's no mention of "quoted-printable" at all, which suggests that conversion to quoted-printable is a red herring, absent any other indication of quoted-printable encoding.

Neil


Top    Back


René Pfeiffer [lynx at luchs.at]


Wed, 23 Jan 2008 21:32:22 +0100

On Jan 23, 2008 at 2019 +0000, Neil Youngman appeared and said:

> On Wednesday 23 January 2008 20:06, Neil Youngman wrote:
> > On Wednesday 23 January 2008 19:17, Ben Okopnik wrote:
> > > From ben  Wed Jan 23 14:08:26 2008
> > > Return-Path: ben@linuxgazette.net
> > > Received: from localhost [127.0.0.1]
> > >         by Tyr with POP3 (fetchmail-6.3.2)
> > >         for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26
> > > -0500 (EST) Received: from localhost.localdomain (genetikayos.com
> > > [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP
> > > id m0NJ7G4Q026459 for <ben@linuxgazette.net>; Wed, 23 Jan 2008 11:07:28
> > > -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with
> > > SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500
> > > From: Ben Okopnik <ben@linuxgazette.net>
> > > Message-ID: <20080123190740.GC8843@linuxgazette.net>
> > > Date: Wed, 23 Jan 2008 14:07:40 -0500
> > > To: Ben Okopnik <ben@linuxgazette.net>
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=3Dutf-8
> > > Content-Disposition: inline
> > > Content-Transfer-Encoding: 8bit
> > > User-Agent: Mutt/1.5.11
> > > X-Spam-Status: No, score=3D-1.6 required=3D5.0
> > > tests=3DAWL,BAYES_00,MISSING_SUBJECT autolearn=3Dno version=3D3.1.8
> > > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on
> > > genetikayos.com ''
> > >
> > > I note that there's no conversion warning in it - and yet, it's broken.
>
> I missed that there's no mention of "quoted-printable" at all, which suggests
> that conversion to quoted-printable is a red herring, absent any other
> indication of quoted-printable encoding.

In this case the only thing I can think of is a filter plugin, be it anti-virus, anti-spam or even anti-UTF-8. :)

Best, René.


Top    Back


René Pfeiffer [lynx at luchs.at]


Wed, 23 Jan 2008 16:56:20 +0100

On Jan 23, 2008 at 1028 -0500, Ben Okopnik appeared and said:

> On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:
> > * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 00:09 -0500]:
> >
> > >On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren=C3=83=C2=A9 Pfeiffer wrote:
> > >> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:
> > >>
> > >> > To the best of my troubleshooting ability so far, everything breaks
> > >> > somewhere between the time that it leaves my mail client and the time
> > >> > that it arrives at the LG mail server - but I've checked everything on
> > >> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits
> > >> > set for the SMTP transaction. I'm pretty much stuck at that point, and
> > >> > have been for a while.
> >
> > Hmm. Your headers don't look good to me.
> >
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D
> > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id
> > 	m0N59WPL030073
> > =3D=3D=3D=3D=3D=3D
>
> [blink] How the hell did I miss *that*?

I missed it as well.

> Wow. Thanks, Breen - that sounds like the place where it's getting
> hosed, all right (I'll test that in a moment by sending it through
> another server.) The question is, how do I stop it? Anybody familiar
> with that aspect of SMTP?

Yes, Sendmail does a conversion when it's not configured to pass 8-bit data. This can be changed in the mailer flags; AFAIK one has to use the smtp8 mailer for SMTP.

Now I know why I don't see this problem. Postfix passes 8-bit data and PGP/MIME uses quoted-printable encoding to be on the safe side.

> I'm definitely sending out a character set - again, this is even before
> it gets to 'bssmtp', Mutt sets it based on the content and it definitely
> does the right thing when I have UTF-8 in there. Why it would get
> converted is a mystery to me.

I think it's due to the MTA doing the conversion mentioned above.

Best, René.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 23 Jan 2008 11:10:41 -0500

On Wed, Jan 23, 2008 at 04:56:20PM +0100, Ren<a9> Pfeiffer wrote:

> On Jan 23, 2008 at 1028 -0500, Ben Okopnik appeared and said:
> 
> > Wow. Thanks, Breen - that sounds like the place where it's getting
> > hosed, all right (I'll test that in a moment by sending it through
> > another server.) The question is, how do I stop it? Anybody familiar
> > with that aspect of SMTP?
> 
> Yes, Sendmail does a conversion when it's not configured to pass 8-bit
> data. This can be changed in the mailer flags; AFAIK one has to use the
> smtp8 mailer for SMTP. 

In theory, that's what the 'Bits => 8' was supposed to do - but somehow, it's not having any effect.

> Now I know why I don't see this problem. Postfix passes 8-bit data and
> PGP/MIME uses quoted-printable encoding to be on the safe side.

I've been coming around to thinking that I should wrap up my messages as MIME attachments rather than inlining them. Again, a bit of a pain, but better than this.

> > I'm definitely sending out a character set - again, this is even before
> > it gets to 'bssmtp', Mutt sets it based on the content and it definitely
> > does the right thing when I have UTF-8 in there. Why it would get
> > converted is a mystery to me.
> 
> I think it's due to the MTA doing the conversion mentioned above.

I'll poke at it in the next day or two and see how it goes.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Breen Mullins [breen.mullins at gmail.com]


Wed, 23 Jan 2008 21:19:13 -0800

* Ben Okopnik <ben@linuxgazette.net> [2008-01-23 10:28 -0500]:

>
>[blink] How the hell did I miss *that*?
>
>Wow. Thanks, Breen - that sounds like the place where it's getting
>hosed, all right (I'll test that in a moment by sending it through
>another server.) The question is, how do I stop it? Anybody familiar
>with that aspect of SMTP?

It's not SMTP. I think Ren?'s right - it's a filter of some sort.

Note that after it's changed your Content-Type declaration, it puts the new one right after your subject line and just before the spamassassin line - which makes me think that the filter on that server is getting called just before SA.

Looks like it's just 'smart' enough to get triggered only part of the time.

(FWIW, your message was in Quoted-Printable by the time I got it from the list.)

Breen

-- 
Breen Mullins
Menlo Park, California

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Thu, 24 Jan 2008 10:53:54 -0500

On Wed, Jan 23, 2008 at 09:19:13PM -0800, Breen Mullins wrote:

> * Ben Okopnik <ben@linuxgazette.net> [2008-01-23 10:28 -0500]:
> 
> >
> >[blink] How the hell did I miss *that*?
> >
> >Wow. Thanks, Breen - that sounds like the place where it's getting
> >hosed, all right (I'll test that in a moment by sending it through
> >another server.) The question is, how do I stop it? Anybody familiar
> >with that aspect of SMTP?
> 
> It's not SMTP. I think Ren<a9>'s right - it's a filter of some sort. 

Actually, it turns out that the Net::SMTP module on my end is screwing it up. I was pretty sure that was it by this point - since doing a manual SMTP transaction with the LG mail server made the UTF-8 content come through without any problems - and then I found a smoking gun.

Ren<a9>'s idea of using 'wireshark' got me started on troubleshooting the actual transaction nitty-gritty; I used 'tcpdump'... which, of course, showed nothing since I'm using SSH to port-forward LG:25 to my localhost:2025 (duh!) So then, I set 'Debug => 1' in Net::SMTP, and saw the following:

Net::SMTP>>> Net::SMTP(2.30)
Net::SMTP>>>   Net::Cmd(2.27)
Net::SMTP>>>     Exporter(5.58)
Net::SMTP>>>   IO::Socket::INET(1.31)
Net::SMTP>>>     IO::Socket(1.30)
Net::SMTP>>>       IO::Handle(1.27)
Net::SMTP=GLOB(0x1150240)<<< 220 genetikayos.com ESMTP Sendmail 8.12.11.20060308/8.12.11; Thu, 24 Jan 2008 07:38:36 -0800
Net::SMTP=GLOB(0x1150240)>>> EHLO localhost.localdomain
Net::SMTP=GLOB(0x1150240)<<< 250-genetikayos.com Hello genetikayos.com [64.246.26.120], pleased to meet you
Net::SMTP=GLOB(0x1150240)<<< 250-ENHANCEDSTATUSCODES
Net::SMTP=GLOB(0x1150240)<<< 250-PIPELINING
Net::SMTP=GLOB(0x1150240)<<< 250-8BITMIME
Net::SMTP=GLOB(0x1150240)<<< 250-SIZE
Net::SMTP=GLOB(0x1150240)<<< 250-DSN
Net::SMTP=GLOB(0x1150240)<<< 250-ETRN
Net::SMTP=GLOB(0x1150240)<<< 250-AUTH GSSAPI
Net::SMTP=GLOB(0x1150240)<<< 250-STARTTLS
Net::SMTP=GLOB(0x1150240)<<< 250-DELIVERBY
Net::SMTP=GLOB(0x1150240)<<< 250 HELP
Net::SMTP=GLOB(0x1150240)>>> MAIL FROM:<ben@linuxgazette.net> BODY=8BITMIME
Net::SMTP=GLOB(0x1150240)<<< 250 2.1.0 <ben@linuxgazette.net>... Sender ok
Net::SMTP=GLOB(0x1150240)>>> RCPT TO:<ben@linuxgazette.net>
Net::SMTP=GLOB(0x1150240)<<< 250 2.1.5 <ben@linuxgazette.net>... Recipient ok
Net::SMTP=GLOB(0x1150240)>>> DATA
Net::SMTP=GLOB(0x1150240)<<< 354 Enter mail, end with "." on a line by itself
Net::SMTP=GLOB(0x1150240)>>> Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with SMTP id 4796104;
Net::SMTP=GLOB(0x1150240)>>>    Thu, 24 Jan 2008 10:38:10 -0500
Net::SMTP=GLOB(0x1150240)>>> From: Ben Okopnik <ben@linuxgazette.net>
Net::SMTP=GLOB(0x1150240)>>> Message-ID: <20080124153809.GC8283@linuxgazette.net>
Net::SMTP=GLOB(0x1150240)>>> Date: Thu, 24 Jan 2008 10:38:09 -0500
Net::SMTP=GLOB(0x1150240)>>> To: Ben Okopnik <ben@linuxgazette.net>
Net::SMTP=GLOB(0x1150240)>>> Subject: Ttt-ttt
Net::SMTP=GLOB(0x1150240)>>> MIME-Version: 1.0
Net::SMTP=GLOB(0x1150240)>>> Content-Type: text/plain; charset=utf-8
Net::SMTP=GLOB(0x1150240)>>> Content-Disposition: inline
Net::SMTP=GLOB(0x1150240)>>> Content-Transfer-Encoding: 8bit
Net::SMTP=GLOB(0x1150240)>>> User-Agent: Mutt/1.5.11
Net::SMTP=GLOB(0x1150240)>>> 
Net::SMTP=GLOB(0x1150240)>>> test
Net::SMTP=GLOB(0x1150240)>>> <b5><91>
Net::SMTP=GLOB(0x1150240)>>> .
Net::SMTP=GLOB(0x1150240)<<< 250 2.0.0 m0OFcaKj019698 Message accepted for delivery
Jan 24 10:39:25 bssmtp: Removing c-4796104 and m-4796104
Jan 24 10:39:25 bssmtp: Unlocking message 4796104
Net::SMTP=GLOB(0x1150240)>>> QUIT
Net::SMTP=GLOB(0x1150240)<<< 221 2.0.0 genetikayos.com closing connection
I don't know if the above characters are going to come through, but the content (lines 7 and 8 from the bottom) are already munged. Game, set, and match - it's on my end.

Net::SMTP does not appear to have any user-controllable handles for twiddling this kind of thing, so I'm going to a) see if I can fix the internal bits in it, b) if I can't do that in a reasonably short amount of time, I'm going to give up and replicate the above manually, /sans/ conversion, and c) file a bug report.

Thanks very much for the help, everybody!

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Thu, 24 Jan 2008 11:38:46 -0500

Woo-HOO. Nailed it.

Marcus Kuhn's UTF-8 test file:

[[[ Elided for publication. You can see it at: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ]]]

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Karl-Heinz Herrmann [kh1 at khherrmann.de]


Wed, 23 Jan 2008 20:28:33 +0100

Hi,

I see the same garbage in my mailer (sylpheed-claws) which does support utf-8 usually.

A look in the header says....

On Tue, 22 Jan 2008 14:39:21 -0500 Ben Okopnik <ben@linuxgazette.net> wrote:

> X-MIME-Autoconverted: from 8bit to base64 by genetikayos.com id m0MJcqEJ007364 
> translation Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: base64

but if I force utf8 the characters look a bit differnt (less chars per group). If I force the base64 decode as well I get:

> [Error decoding BASE64]
> [Error decoding BASE64]
> [Error decoding BASE64]
> [Error decoding BASE64]
> [Error decoding BASE64]

So something must have decoded the base64 already (and it seems not to tell in the header) and messed it up.

K.-H.


Top    Back