uses 32 extra characters in the range 128-159 with the following unicode
values (Charset -enc Cp1252 -ctab):
8364, 65533, 8218, 402, 8222, 8230, 8224, 8225, 710, 8240, 352, 8249, 338, 65533, 381, 65533,
65533, 8216, 8217, 8220, 8221, 8226, 8211, 8212, 732, 8482, 353, 8250, 339, 65533, 382, 376,
The names of these characters as extracted from
UnicodeData.txt
8364 20AC;EURO SIGN;Sc;0;ET;;;;;N;;;;;
65533 FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
8218 201A;SINGLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW SINGLE COMMA QUOTATION MARK;;;;
402 0192;LATIN SMALL LETTER F WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER SCRIPT F;;0191;;0191
8222 201E;DOUBLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW DOUBLE COMMA QUOTATION MARK;;;;
8230 2026;HORIZONTAL ELLIPSIS;Po;0;ON;<compat> 002E 002E 002E;;;;N;;;;;
8224 2020;DAGGER;Po;0;ON;;;;;N;;;;;
8225 2021;DOUBLE DAGGER;Po;0;ON;;;;;N;;;;;
710 02C6;MODIFIER LETTER CIRCUMFLEX ACCENT;Lm;0;ON;;;;;N;MODIFIER LETTER CIRCUMFLEX;;;;
8240 2030;PER MILLE SIGN;Po;0;ET;;;;;N;;;;;
352 0160;LATIN CAPITAL LETTER S WITH CARON;Lu;0;L;0053 030C;;;;N;LATIN CAPITAL LETTER S HACEK;;;0161;
8249 2039;SINGLE LEFT-POINTING ANGLE QUOTATION MARK;Pi;0;ON;;;;;Y;LEFT POINTING SINGLE GUILLEMET;;;;
338 0152;LATIN CAPITAL LIGATURE OE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER O E;;;0153;
381 017D;LATIN CAPITAL LETTER Z WITH CARON;Lu;0;L;005A 030C;;;;N;LATIN CAPITAL LETTER Z HACEK;;;017E;
8216 2018;LEFT SINGLE QUOTATION MARK;Pi;0;ON;;;;;N;SINGLE TURNED COMMA QUOTATION MARK;;;;
8217 2019;RIGHT SINGLE QUOTATION MARK;Pf;0;ON;;;;;N;SINGLE COMMA QUOTATION MARK;;;;
8220 201C;LEFT DOUBLE QUOTATION MARK;Pi;0;ON;;;;;N;DOUBLE TURNED COMMA QUOTATION MARK;;;;
8221 201D;RIGHT DOUBLE QUOTATION MARK;Pf;0;ON;;;;;N;DOUBLE COMMA QUOTATION MARK;;;;
8226 2022;BULLET;Po;0;ON;;;;;N;;;;;
8211 2013;EN DASH;Pd;0;ON;;;;;N;;;;;
8212 2014;EM DASH;Pd;0;ON;;;;;N;;;;;
732 02DC;SMALL TILDE;Sk;0;ON;<compat> 0020 0303;;;;N;SPACING TILDE;;;;
8482 2122;TRADE MARK SIGN;So;0;ON;<super> 0054 004D;;;;N;TRADEMARK;;;;
353 0161;LATIN SMALL LETTER S WITH CARON;Ll;0;L;0073 030C;;;;N;LATIN SMALL LETTER S HACEK;;0160;;0160
8250 203A;SINGLE RIGHT-POINTING ANGLE QUOTATION MARK;Pf;0;ON;;;;;Y;RIGHT POINTING SINGLE GUILLEMET;;;;
339 0153;LATIN SMALL LIGATURE OE;Ll;0;L;;;;;N;LATIN SMALL LETTER O E;;0152;;0152
382 017E;LATIN SMALL LETTER Z WITH CARON;Ll;0;L;007A 030C;;;;N;LATIN SMALL LETTER Z HACEK;;017D;;017D
376 0178;LATIN CAPITAL LETTER Y WITH DIAERESIS;Lu;0;L;0059 0308;;;;N;LATIN CAPITAL LETTER Y DIAERESIS;;;00FF;
|
representation of Cp850 in Latin1 created by Charset -tab -enc Cp850.
The blanks represent line-drawing and other characters which
cannot be represented in Latin1 (java actually prints question marks).
Position 240 has the shy,
the last char (255) is the nonbreaking space (nbsp, Latin1 160).
... 0 1 2 3 4 5 6 7 8 9 A B C D E F
128 Ç ü é â ä à å ç ê ë è ï î ì Ä Å
144 É æ Æ ô ö ò û ù ÿ Ö Ü ø £ Ø ×
160 á í ó ú ñ Ñ ª º ¿ ® ¬ ½ ¼ ¡ « »
176 Á Â À © ¢ ¥
192 ã Ã ¤
208 ð Ð Ê Ë È Í Î Ï ¦ Ì
224 Ó ß Ô Ò õ Õ µ þ Þ Ú Û Ù ý Ý ¯ ´
240 ± ¾ ¶ § ÷ ¸ ° ¨ · ¹ ³ ²
According to the default ISISAC.TAB, ranges 128-154 and 160-165 are alpha.
This fits Cp437, which is the same for these ranges,
but contains even more linedrawing in the higher positions.
Most confusing, it has some greek letters for technical use in row "224",
with the beta just at position 225, where it was replaced by the similar
looking german sz ligature.
Less surprising, the default ISISUC.TAB is also made for Cp437.
the unicode mapping of Cp850 as created by Charset -ctab
199, 252, 233, 226, 228, 224, 229, 231, 234, 235, 232, 239, 238, 236, 196, 197,
201, 230, 198, 244, 246, 242, 251, 249, 255, 214, 220, 248, 163, 216, 215, 402,
225, 237, 243, 250, 241, 209, 170, 186, 191, 174, 172, 189, 188, 161, 171, 187,
9617,9618,9619,9474,9508, 193, 194, 192, 169,9571,9553,9559,9565, 162, 165,9488,
9492,9524,9516,9500,9472,9532, 227, 195,9562,9556,9577,9574,9568,9552,9580, 164,
240, 208, 202, 203, 200, 305, 205, 206, 207,9496,9484,9608,9604, 166, 204,9600,
211, 223, 212, 210, 245, 213, 181, 254, 222, 218, 219, 217, 253, 221, 175, 180,
173, 177,8215, 190, 182, 167, 247, 184, 176, 168, 183, 185, 179, 178,9632, 160,
See also a
list of encodings
supported by Java and some
notes on the use of charsets and unicode with ISIS
.
Roman Czyborra compiled an
illustrated overview
over the most commonly used codepages.
$Id: CsTables.txt,v 1.5 2004/06/10 11:10:06 kripke Exp $
|