|Meet the Gang 1 2 3 4 5 6 7 8 9|
From Mark McGrath
Answered By Jay R Ashworth, Mike "Iron" Orr, Yann Vernier, Ben Okopnik, Andreas Daab
Here's an interesting question that I thought you might take a look at. IT applies more to the Europeans among us but then maybe the wider community might benefit if it were broadened to the wider question of dealing with different fonts!
How is it possible to view the euro symbol on programs running on linux machines, programs like netscape, emacs, mutt, etc., etc.,
Thanks in advance.
[Jay] I quoted all of that for a reason.
This is a good one, and one that's cropping up on several of the mail lists in which I participate, as well. It's a multi-faceted problem.
Using non-ASCII characters (I was going to say "on a PC", but that's sort of obvious) requires several things:
Each of these is handled, in Linux, by different things, and you need to make sure that all the pieces are in place. Frankly, I wouldn't be surprised at all to see an amendement to X3.64, the ASCII standard (which I think is also ISO 10646), to include the Euro character.
Actually, I'm surprised it's not there already.
[Iron] There aren't any empty slots in ASCII. You'd have to replace something like the backslash or the pipe symbol, and that would wreak havoc on situations that don't expect these glyphs to change, like ASCII art, shell scripts and Windows pathnames.
[Jay] Well, no, I think you could find some other character to replace. I should clarify that I really don't mean USASCII (the 7 bit character set), what I really mean is "the most common 8-bit extended version of ASCII" -- though admittedly I don't know what that is. ISO-8859-0?
[Yann] -1, for western europe. -2 is eastern europe. Euro variants are -15 and -16 respectively, I think. The replaced character is €, which is an ancient sun symbol that also means "currency". It has the high bit set. Swedish people got lucky in that some odd person decided to put that character on our keyboards long ago - it's at Shift+4, with $ at AltGr+4. However, this also means that our keys are now marked with both "sol" (sun in swedish) and "Euro" (AltGr+E), but in different positions, and either one may or may not work.
[Iron] Although I think character 35 (# or number sign) shows up as L (pound sterling) on British screens and other currency symbols in other places, no?
[Jay] Yeah, it tends to...
An answer to this question of the Euro has, probably not all that surprisingly, been written already; it's the Euro Character Support miniHOWTO:
...but it's sort of weak, and may be specific to Finnish.
[Iron] As it says, new charsets have been added to the LATIN-x series containing the euro symbol. Are these high-bit characters? If so, they'll have the usual problem with non-ASCII characters: they show up differently depending on which charset is loaded on the recipient's computer, and whether the program/ console can switch charsets according to the document or portion of the document.
[Jay] Yeah. But there's a pretty standard default 8-bit set these days, isn't there? Even if it's just "IBM Code Page 437/850".
[Yann] Code pages 437 and 850 are incompatible, which annoyed people here no end as letters in our alphabet are different in the two. The "pretty standard" set is Latin 1, or ISO 8859-1, which happens to coincide with codepage 850 quite a lot.
[Jay] There doesn't actaully seem to be a general HOWTO on using non-ASCII character sets with Unix that I can find at the moment [spots opportunity ], but some of the specific ones inlclude those for Belarussian, Danish, Hebrew, and the Unicode one -- which is probably where we should all be headed anyway... though the idea of security holes in the character set worries me a touch...
Hope this helps at least a little bit; you're correct; it's a weak spot.
[Iron] "ASCII" means characters 0-127, which have been standardized since the 1960s. (See "man ascii", also a good idea if you need to look up a character, or to convert decimal/hex/octal/character.) Characters 0-31 are nonprintable control codes, 32-127 are adequate for English and programming languages. On older computers, the high-bit characters (128-255) weren't avaiable because the OS used the bit for something else (e.g., Apple ][ used it to represent "inverse video character"). (Actually, I also remember reading something about the Apple ][ using the high bit as a strobe bit, meaning a character was received. It's been twenty years; my memory is faulty.)
To support other languages, various 8-bit charsets were introduced. The ISO-8859-x series ("man iso_8859_1") is the most common on UNIX. -1 (aka LATIN-1) covers Western Europe (Germanic/Romance languages), -2 (aka LATIN-2) covers Eastern Europe (Slavic languages), -3 (aka LATIN-3) covers miscellaneous Europe (and Esperanto ).
[Ben] <narrowed eyes behind the dark glasses> You thought I'd miss that, didn't you? The Revolution Never Sleeps.
[Iron] No, I knew you'd never miss that. You had extensive training, comrade.
The higher-numbered series cover Cyrillic, Greek, Turkish, Celtic, etc. New series were added to address deficiencies in previous series for certain languages, and to add the Euro symbol.
LATIN-1 is the default charset for the Linux console and xterm, following widespread UNIX precedent, and because it was convenient for Linus and most of the original Linux users.
Codepage 850 and the like are from the DOS world, and do the same thing but in an incompatible way.
Russia is in an unusual situation because a native charset, KOI8-r, competes with ISO-8859-5 and Codepage ???. One advantage of KOI8 is that if the high bit gets lost, it degenerates cleanly into readable ASCII, and can easily be converted back by restoring the missing bit. Unfortunately, the makers of ISO-8859-5 and Codepage ??? didn't think about just adopting the KOI8 character positions. Blame it on the Cold War. Some Russian web sites have a switch link to switch between the four most common charsets.
All 8-bit charsets have the disadvantage that they can display only one other language family + English. If you need to write in two other language families, you have to use ASCII for one, because the console, xterm and text documents cannot change charsets in mid-document.
Unicode, being a 16-bit charset (or more), solves all these problems, but on Linux it hasn't reached the stage of no-brainer setup or universal support by all applications.
[Andreas Daab] Hi!
No Problem with Red Hat 7.2 and euro under console, KDE and konsole. I"m from Germany and have to put the following settings in /etc/sysconfig/i18n:
LANG="de_DE@euro" SUPPORTED="de_DE@euro:de_DE:de" SYSFONT="lat0-sun16" SYSFONTACM="iso15"
Okay, this gives me the euro symbol on the console. For X and kde remember to use the iso8859-15 charset, your correct language and country settings. If the euro symbol works in X, set it as curreny symbol in KDE.
If you wan't the euro for konsole, use unicode as fontset. Mozilla shows the euro, if you use the iso8859-15 charset with all fonts and as default character coding (Preferences/Navigator/Character Coding).
Hope this works for you.
|Meet the Gang 1 2 3 4 5 6 7 8 9|