BRAMA - News - Business - Sports - Brama Press - Calendar
  BRAMA COMPUTING and SOFTWARE
ADVERTISEMENT
Tickets to Ukraine - book and buy on-line
BRAMA Computing and Software Thursday, February 15, 2007, 18:22 EST
Information
·BRAMA Computing
·Brief Explanations of Software Available at this Server
·Configure your PC:
Ukrainianization of Windows/PC/DOS-based Applications
·Configure your Mac:
Ukrainianization of Macintosh by Zenon Feszczak (non-technical)
·RUSTEX-L Cyrillic Encoding FAQ (Technical)
·Cyrillic Encoding Tables (Technical)
·BRAMA Home

Software
·Macintosh (fonts, keyboards, other)
·Unix/Linux (fonts, utility, keybd)
·Windows 3.1 (fonts, keyboards, other)
·Windows 95/98/NT (fonts, keyboards, other)
·DOS (fonts, keyboards, other)

Interactive Forums
· Subscribe to: computers@brama.com
or go to the Computer Archives.
· ukr.* newsgroups

Digital Images
· Digital Embroidery traditional sidebars, dividers, etc.
· Ukragrafix large collection of wallpapers, sidebars, dividers, icons, graphics sets
· Icons for Windows
· Tryzub art images

Other Resources
·Links


RUSTEX Cyrillic FAQ V1.2
The following is a document made available on the RUSTEX-L listserv. It is provided here at BRAMA-Ukrainian Gateway for those who are curious about the intricacies of using the Cyrillic alphabet on their computing systems. It (probably) explains certain puzzling phenomenons and generates others.

RUSTEX-L Cyrillic Encoding FAQ Contents

Q: What are the commonly used computer encodings for Cyrillic?
Q: What kind of transliteration schemes are there?
Q: What are the eight-bit schemes?
KOI-8 (GOST 19768-74) | Code Page 866 | Brjabrin's Alternativnyj Variant (AV) |
Apple Standard Cyrillic | GOSTSCI

Q: Is this a big mess or what? (Unicode)
Q: Is everything clear now? (Conversion Tables to Unicode)
Koi-8 to Unicode | CP866 to Unicode | CP1251 to Unicode | Mac to Unicode |
Alternativnyj Variant to Unicode | Osnovnoj Variant to Unicode | ISO8859-5 to Unicode

Acknowledgements:


Q: What are the commonly used computer encodings for Cyrillic?
A: Broadly speaking, there are three kinds of schemes in use: those that replace Cyrillic characters by 7-bit ascii values, those that use the full 8-bit range 0-255, and those using multi-byte codes. Presently only the first two types are in wide use, but for reference purposes I will also discuss the third type.

| Top of page |

Q: What kind of transliteration schemes are there?
A: The most important one is called KOI-7: the Russian alphabet is given by the ASCII characters (note the exchange of upper and lower cases):
UPPER CASE:  abwgde$vzijklmnoprstufhc~{}"yx|`q
lower case:  ABWGDE#VZIJKLMNOPRSTUFHC^[]_YX\@Q
The following extensions to the official standard KOI-7 are supported in Glenn Thobe's conversion programs for invertibility: '"'=YER, '#'=yo, '$'=YO, '<'=left guillemet, '>'=right guillemet.

A slightly different (multicharacter) scheme is employed by Steve Gaardner's (gaarder@theory.tc.cornell.edu) conversion code from Old KOI-8, included below. This particular scheme provides easy readability but suffers from some transliteration weirdness, such as mapping short ii and yeri on the same character. Since proper transliteration often requires context-sensitive rules, and differs from language to language within the same script, a fuller discussion is beyond the scope of the present document. For an overview of the major Cyrillic to Latin transliteration schemes used in the US, see pp 457-460 of the Style Manual of the US Government Printing Office, for sale by the Superintendent of Documents, USGPO, Washington DC 20402, Stock Number 021-000-00120-1 (paper) or 021-000-00120-0 (hardbound).

#include 
char transtbl[64][5] =
        {"yu", "a", "b", "ts", "d" , "e", "f", "g", "kh", "i", "y" , "k", "l",
        "m", "n", "o", "p", "ya", "r" , "s", "t", "u", "zh", "v", "'",
        "y", "z", "sh", "e", "shch", "ch", "`",
        "YU", "A", "B", "TS", "D" , "E", "F", "G", "KH", "I", "Y" , "K", "L",
        "M", "N", "O", "P", "YA", "R" , "S", "T", "U", "ZH", "V", "'",
        "Y", "Z", "SH", "E", "SHCH", "CH", "`" };
main()
{
        int c;

        while ((c = getchar()) != EOF)
        {       if ( c > 0x80) c -= 0x80;
                if ( c < 0x40) putchar(c);
                else printf("%s",transtbl[c-0x40]);
        }
}

| Top of page |

Q: What are the eight-bit schemes?
A: For the IBM mainframe world, which includes the ES (edinnaja sistema) clones of 360-370 mainframes, the basic scheme, called DKOI-8, extends EBCDIC by putting the Cyrillic letters in the unused slots, mostly in the rectangle 0x8a to 0xff (first hex digit >=8, second digit >=a). The mysteries of EBCDIC/ASCII conversion go beyond the scope of this document, and in the table that follows I will ignore 8-bit ascii values below 0xa0 and refer the reader to Dimitri Vulis' excellent document, which sheds some light on the IBM meaning of the characters 0x80-0x9f which are reserved in both IS0 8859-1 (Latin-1) and 8859-5 (Cyrillic).

/* From 8859-5 to DKOI-8. ebcdic(isoval) = isotoibm[isoval-160] */

int isotoibm[96] = {
0x41,0xaa,0x4a,0xb1,0x9f,0xb2,0x6a,0xb5,
0xbd,0xb4,0x9a,0x8a,0x5f,0xca,0xaf,0xbc,
0x90,0x8f,0xea,0xfa,0xbe,0xa0,0xb6,0xb3,
0x9d,0xda,0x9b,0x8b,0xb7,0xb8,0xb9,0xab,
0x64,0x65,0x62,0x66,0x63,0x67,0x9e,0x68,
0x74,0x71,0x72,0x73,0x78,0x75,0x76,0x77,
0xac,0x69,0xed,0xee,0xeb,0xef,0xec,0xbf,
0x80,0xfd,0xfe,0xfb,0xfc,0xad,0xae,0x59,
0x44,0x45,0x42,0x46,0x43,0x47,0x9c,0x48,
0x54,0x51,0x52,0x53,0x58,0x55,0x56,0x57,
0x8c,0x49,0xcd,0xce,0xcb,0xcf,0xcc,0xe1,
0x70,0xdd,0xde,0xdb,0xdc,0x8d,0x8e,0xdf
};

There are minor variations to DKOI, called Cyrillic Extended Code Page 037 (most common on BITNET), CECP 500 (which is the definitive one), the "JNET" and the "FORTRAN" mappings. The differences between these are tabulated below. Notice that EBCDIC/DKOI, unlike ASCII, is not uniquely defined even on the 0-127 range:

8859-5 037 500 JNET FORTRAN

0x21 0x5a 0x4f 0x5a 0x4f exclamation point (bang)
0x5b 0xba 0x4a 0xad 0x4a opening square bracket
0x5d 0xbb 0x5a 0xbd 0x5a closing square bracket
0x5e 0xb0 0x5f 0x5f 0x5f circumflex accent
0x7c 0x4f 0xbb 0x6a 0x4f logical or (vertical bar)
[a2] 0x4a 0xb0 0x43 0x43 centsign (in 037)/capital dje (in 500)
[ac] 0x5f 0xba 0x54 0x54 logical not (in 037)/capital kje (in 500)
0xd5 0xef 0xef 0xbb 0xad small ie
0xe3 0x46 0x46 0x4a 0xbb small u
0xe5 0x47 0x47 0xfc 0xbd small kha
0xfc 0xdc 0xdc 0x6a 0xfc small kje

For the Internet, the most important code seems to be Old KOI-8, widely used in the Relcom groups (but probably not a whole lot elsewhere). Old KOI-8 (GOST 19768-74) from 1974 more or less follows Latin transliteration order and does not include upper-case hard sign, or letters common to other Slavic Cyrillic alphabets (Bulgarian, Macedonian, Serbian, Ukrainian...). In the 0-127 range it is identical with ascii, and for the 192-254 region see the transtabl array above. Some software, including uunpack (also used in Sergej Ryzhkov's bml, aka Beauty Mail system for PCs) which is distributed by Relcom, force upper-case hard sign to 255, others (and the standard!) declare this incorrect, or perhaps reserve 255 for DEL. In Andrew Hume's (andrew@research.att.com) tcs this is called the "mystery DOS Cyrillic encoding", except that his sha and shcha seem to be interchanged. The semantics of 128-191 is unclear to me. If there is an official code page (it was suggested that Xenix users might have one), please post it.

For the PC community, Code Page 866 seems to be quite important. This is what Microsoft is using in its russified version of MS-DOS. In 0-31 ascii control chars are replaced by a random selection of dingbats. In 32-126 it is identical to ascii, and in 127 it has something that looks like a little house (the interpretation of such positions seems to be subject to much uncertainty). The Russian part (128-255) is identical to Brjabrin's alternativnyj variant, except for 242-251, where some of the accents/symbols of AV are replaced by non-Russian Cyrillic characters and other symbols. Unfortunately CP 866 covers only Ukrainian and Belorussian, with the vague suggestion that e.g. Macedonian users could redefine the six non-Russian Cyrillic positions. This problem is largely resolved in Code Page 1251, the Microsoft Cyrillic Windows 3.1 character set, (also endorsed by WordPerfect and Adobe), which contains all Cyrillic letters used by modern Slavic languages. CP 1251 is fully compatible with ascii on 0-127 (leaves control positions undefined), has the Russian alphabet (in order, but without io) in 192-256, and puts the non-Russian Cyrillic, Russian io, and a few symbols in 128-191.

Brjabrin's Alternativnyj Variant (AV) is also widely used on PCs. It has Russian in 128 to 175 in alphabetical order except for yo, graphics characters in 176 to 223, again Russian in 224-241. The same set of graphics characters, but not in the same order, is used in Brajabin's Osnovnoj Variant: they are similar to, but not identical with, IBM Extended ASCII graphics chars (neither the set of shapes nor the code values are the exact same). AV and OV have no non-Russian Cyrillic or accented characters, but four accent marks are provided: 242 (acute below the symbol), 243 (grave below the symbol), 244 (acute above the symbol), and 245 (grave above the symbol). These, as well as upper case and lower case yo, codes 240 and 241, are in the same position in Osnovnoj Variant as well. Codes 246 - 249 are arrows, pointing right, left, down, up, in that order. Codes 250 and 251 are, in both sets described by Briabrin, the division sign and the plus/minus sign (the latter becomes a radical sign in 866). 252 is the Number symbol, 253 is a sunburst, and 254 is "end of proof". 255 is in principle unused -- in practice people put things there.

For the academic community, the lack of accents is remedied by the Academic version of AV developed at Cornell, which includes upper and lower case acute-accented vowels, and lower case grave-accented vowels. These replace all but six of the graphics characters (the six that were retained are those that are necessary for drawing a single-line box). The accented vowels in this set include a grave-accented lower case yo. Also included are the letters with diacritics used in French, German, and Spanish. The complete chart and DOS/Windows software may be requested from Exceller Software Corp. 800-426-0444. (This is NOT a product endorsement -- I haven't even seen the stuff!) Cornell also developed an Academic version of CP1251. In this, non-Russian Slavic languages are not supported: their letters have been replaced by Russian accented vowels. These include upper and lower case acute-accented vowels, and lower case grave-accented vowels. Also included are upper and lower case grave-accented yo. The AcademicFont Cyrillic character set was developed by University Microcomputers, who pioneered the use of Slavic languages on IBM-compatible computers in the US in the mid-eighties. This set is included among the 11 sets in Exceller's product. It supports Slavic and some non-Slavic languages, but not accented vowels.

For the Macintosh community, there is a separate code page. It is ascii below 128, has the Russian capital letters in 128-159 in alphabetical order (as usual, io is treated separately) and the Russian lowercase letters in 240-254, but lower case ja is moved to 239, its place taken by the sunburst symbol. In the 160-238 range we finde the same set of (ISO 8859-5) non-Russian Cyrillic characters as in CP 1251. The symbols that appear here are also largely the same as in 1251, but the orderings are completely different and a few symbols are unique to one or the other, e.g. permille in 1251, capital delta in the Mac encoding.

Finally, the most broadly accepted standard outside these communities seems to be GOSTSCI, a term used colloquially to refer to Brjabrin's Osnovnoj Variant or to ISO 8859-5 (which is also ECMA 114), although these two are not identical when it comes to non-Russian Cyrillic. The term "New KOI-8" means the 1987 revision of KOI-8 (GOST 19768-87) -- all these use the same (alphabetical, except for yo) order as 8859/5, starting with A at 176. However, the non-Russian Cyrillic characters (160-176 and 240-255 in new KOI-8) are not part of OV, their space is taken up by some graphics chars described for AV above. ISO 8859-5 provides for the Cyrillic characters required for writing all major Slavic Cyrillic alphabets (Belorussian, Bulgarian, Macedonian, Serbian, Ukrainian...), but not for those alphabets that were devised for non-Slavic languages in the Soviet Union (Abkhazian, Bashkir, Chukchee, Khanty, Tajik, ....), or archaic letters.

| Top of page |

Q: Is this a big mess or what? (Unicode)
A: To straighten this out, it seems necessary to adopt a fixed point of reference, which I take to be Unicode V1.1 = ISO 10646-1.2. While in principle 10646 is a four-byte standard and Unicode uses 16-bit integers, the "Basic Multilingual Plane" of 10646 is by definition identical to the values assigned in Unicode 1.1, both being two-byte quantities (called UCS-2 by ISO). The following list gives the essential part of the names of the Cyrillic characters and the last two hex digits of their Unicode/10646 encoding.

For reasons of space, the official Unicode/10646 names have been abbreviated. For a full list of names, anon ftp to unicode.org, cd to pub/MappingTables, and get namesall.lst (which is slightly over 200k). To get back the full official name from the abbreviations, always add the prefix CYRILLIC, unless the position is UNUSED. Further, expand CAP (SMA) to CAPITAL (SMALL). Finally, the word LETTER should be added after CAP/SMA, unless it is THOUSANDS. LIGATURE, or COMBINING. As for the numerical code value, these have also been abbreviated to the last two digits, since the preceding two hex digits (really signifying "Cyrillic") are always 04 in Unicode/10646.

UNUSED                          00
CAP IO                          01
CAP DJE                         02
CAP GJE                         03
CAP E                           04
CAP DZE                         05
CAP I                           06
CAP YI                          07
CAP JE                          08
CAP LJE                         09
CAP NJE                         0A
CAP TSHE                        0B
CAP KJE                         0C
UNUSED                          0D
CAP SHORT U                     0E
CAP DZHE                        0F
CAP A                           10
CAP BE                          11
CAP VE                          12
CAP GE                          13
CAP DE                          14
CAP IE                          15
CAP ZHE                         16
CAP ZE                          17
CAP II                          18
CAP SHORT II                    19
CAP KA                          1A
CAP EL                          1B
CAP EM                          1C
CAP EN                          1D
CAP O                           1E
CAP PE                          1F
CAP ER                          20
CAP ES                          21
CAP TE                          22
CAP U                           23
CAP EF                          24
CAP KHA                         25
CAP TSE                         26
CAP CHE                         27
CAP SHA                         28
CAP SHCHA                       29
CAP HARD SIGN                   2A
CAP YERI                        2B
CAP SOFT SIGN                   2C
CAP REVERSED E                  2D
CAP IU                          2E
CAP IA                          2F
SMA A                           30
SMA BE                          31
SMA VE                          32
SMA GE                          33
SMA DE                          34
SMA IE                          35
SMA ZHE                         36
SMA ZE                          37
SMA II                          38
SMA SHORT II                    39
SMA KA                          3A
SMA EL                          3B
SMA EM                          3C
SMA EN                          3D
SMA O                           3E
SMA PE                          3F
SMA ER                          40
SMA ES                          41
SMA TE                          42
SMA U                           43
SMA EF                          44
SMA KHA                         45
SMA TSE                         46
SMA CHE                         47
SMA SHA                         48
SMA SHCHA                       49
SMA HARD SIGN                   4A
SMA YERI                        4B
SMA SOFT SIGN                   4C
SMA REVERSED E                  4D
SMA IU                          4E
SMA IA                          4F
UNUSED                          50
SMA IO                          51
SMA DJE                         52
SMA GJE                         53
SMA E                           54
SMA DZE                         55
SMA I                           56
SMA YI                          57
SMA JE                          58
SMA LJE                         59
SMA NJE                         5A
SMA TSHE                        5B
SMA KJE                         5C
UNUSED                          5D
SMA SHORT U                     5E
SMA DZHE                        5F
CAP OMEGA                       60
SMA OMEGA                       61
CAP YAT                         62
SMA YAT                         63
CAP IOTIFIED E                  64
SMA IOTIFIED E                  65
CAP LITTLE YUS                  66
SMA LITTLE YUS                  67
CAP IOTIFIED LITTLE YUS         68
SMA IOTIFIED LITTLE YUS         69
CAP BIG YUS                     6A
SMA BIG YUS                     6B
CAP IOTIFIED BIG YUS            6C
SMA IOTIFIED BIG YUS            6D
CAP KSI                         6E
SMA KSI                         6F
CAP PSI                         70
SMA PSI                         71
CAP FITA                        72
SMA FITA                        73
CAP IZHITSA                     74
SMA IZHITSA                     75
CAP IZHITSA DOUBLE GRAVE        76
SMA IZHITSA DOUBLE GRAVE        77
CAP UK DIGRAPH                  78
SMA UK DIGRAPH                  79
CAP ROUND OMEGA                 7A
SMA ROUND OMEGA                 7B
CAP OMEGA TITLO                 7C
SMA OMEGA TITLO                 7D
CAP OT                          7E
SMA OT                          7F
CAP KOPPA                       80
SMA KOPPA                       81
THOUSANDS SIGN                  82
NON-SPACING TITLO               83
NON-SPACING PALATALIZATION      84
NON-SPACING DASIA PNEUMATA      85
NON-SPACING PSILI PNEUMATA      86
UNUSED                          87
UNUSED                          88
UNUSED                          89
UNUSED                          8A
UNUSED                          8B
UNUSED                          8C
UNUSED                          8D
UNUSED                          8E
UNUSED                          8F
CAP GE WITH UPTURN              90
SMA GE WITH UPTURN              91
CAP GE BAR                      92
SMA GE BAR                      93
CAP GE HOOK                     94
SMA GE HOOK                     95
CAP ZHE WITH RIGHT DESCENDER    96
SMA ZHE WITH RIGHT DESCENDER    97
CAP ZE CEDILLA                  98
SMA ZE CEDILLA                  99
CAP KA WITH RIGHT DESCENDER     9A
SMA KA WITH RIGHT DESCENDER     9B
CAP KA VERTICAL BAR             9C
SMA KA VERTICAL BAR             9D
CAP KA BAR                      9E
SMA KA BAR                      9F
CAP REVERSED GE KA              A0
SMA REVERSED GE KA              A1
CAP EN WITH RIGHT DESCENDER     A2
SMA EN WITH RIGHT DESCENDER     A3
CAP EN GE                       A4
SMA EN GE                       A5
CAP PE HOOK                     A6
SMA PE HOOK                     A7
CAP O HOOK                      A8
SMA O HOOK                      A9
CAP ES CEDILLA                  AA
SMA ES CEDILLA                  AB
CAP TE WITH RIGHT DESCENDER     AC
SMA TE WITH RIGHT DESCENDER     AD
CAP STRAIGHT U                  AE
SMA STRAIGHT U                  AF
CAP STRAIGHT U BAR              B0
SMA STRAIGHT U BAR              B1
CAP KHA WITH RIGHT DESCENDER    B2
SMA KHA WITH RIGHT DESCENDER    B3
CAP TE TSE                      B4
SMA TE TSE                      B5
CAP CHE WITH RIGHT DESCENDER    B6
SMA CHE WITH RIGHT DESCENDER    B7
CAP CHE VERTICAL BAR            B8
SMA CHE VERTICAL BAR            B9
CAP H                           BA
SMA H                           BB
CAP IE HOOK                     BC
SMA IE HOOK                     BD
CAP IE HOOK OGONEK              BE
SMA IE HOOK OGONEK              BF
PALOCHKA                        C0
CAP SHORT ZHE                   C1
SMA SHORT ZHE                   C2
CAP KA HOOK                     C3
SMA KA HOOK                     C4
UNUSED                          C5
UNUSED                          C6
CAP EN HOOK                     C7
SMA EN HOOK                     C8
UNUSED                          C9
UNUSED                          CA
CAP CHE WITH LEFT DESCENDER     CB
SMA CHE WITH LEFT DESCENDER     CC
UNUSED                          CD
UNUSED                          CE
UNUSED                          CF
CAP A WITH BREVE                D0
SMA A WITH BREVE                D1
CAP A WITH DIAERESIS            D2
SMA A WITH DIAERESIS            D3
CAP LIGATURE A IE               D4
SMA LIGATURE A IE               D5
CAP IE WITH BREVE               D6
SMA IE WITH BREVE               D7
CAP SCHWA                       D8
SMA SCHWA                       D9
CAP SCHWA WITH DIAERESIS        DA
SMA SCHWA WITH DIAERESIS        DB
CAP ZHE WITH DIAERESIS          DC
SMA ZHE WITH DIAERESIS          DD
CAP ZE WITH DIAERESIS           DE
SMA ZE WITH DIAERESIS           DF
CAP ABKHASIAN DZE               E0
SMA ABKHASIAN DZE               E1
CAP I WITH MACRON               E2
SMA I WITH MACRON               E3
CAP I WITH DIAERESIS            E4
SMA I WITH DIAERESIS            E5
CAP O WITH DIAERESIS            E6
SMA O WITH DIAERESIS            E7
CAP BARRED O                    E8
SMA BARRED O                    E9
CAP BARRED O WITH DIAERESIS     EA
SMA BARRED O WITH DIAERESIS     EB
CAP U WITH ACUTE                EC
SMA U WITH ACUTE                ED
CAP U WITH MACRON               EE
SMA U WITH MACRON               EF
CAP U WITH DIAERESIS            F0
SMA U WITH DIAERESIS            F1
CAP U WITH DOUBLE ACUTE         F2
SMA U WITH DOUBLE ACUTE         F3
CAP CHE WITH DIAERESIS          F4
SMA CHE WITH DIAERESIS          F5
CAP DJE WITH ACUTE              F6
SMA DJE WITH ACUTE              F7
CAP YERU WITH DIAERESIS         F8
SMA YERU WITH DIAERESIS         F9
UNUSED                          FA
UNUSED                          FB
UNUSED                          FC
UNUSED                          FD
UNUSED                          FE
UNUSED                          FF
| Top of page |

Q: Is everything clear now? (Conversion Tables to Unicode)
A: Probably not. To ease the pain, here follow some tentative conversion tables *from* the 8-bit schemes described above *to* Unicode. Since the Unicode/10646 character set is much larger, no tables are provided in the other direction.

In the 0-127 range everything is ASCII (except for the CP866 dingbats in the range 0-31 which I have no sympathy for, and for EBCDIC/DKOI-8, for which see above) so here tables are only provided for 128-255. Notice that often values other than starting with 0x04 are given, meaning that the Unicode equivalent is outside the Unicode Cyrillic range 0x0400-0x04ff, but included at some other place, typically among the arrows (0x2190-0x21ff) or other semigraphic material (0x2500-0x25ff). If a particular encoding leaves (by official definition, not necessarily in practical usage) some code unused, this is designated by "-1" in the conversion table. For some positions the tables show a "-2", meaning that I have no information on the intended meaning. (This is not the same as there being no Unicode codepoint for the character in question, a situation we potentially encounter with AV and OV 242-245, see note there.)

/* From old Koi-8 to Unicode */

long oldkoi8tou[128] = {
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
0x044e,0x0430,0x0431,0x0446,0x0434,0x0435,0x0444,0x0433,
0x0445,0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,
0x043f,0x044f,0x0440,0x0441,0x0442,0x0443,0x0436,0x0432,
0x044c,0x044b,0x0437,0x0448,0x044d,0x0449,0x0447,0x044a,
0x042e,0x0410,0x0411,0x0426,0x0414,0x0415,0x0424,0x0413,
0x0425,0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,
0x041f,0x042f,0x0420,0x0421,0x0422,0x0423,0x0416,0x0412,
0x042c,0x042b,0x0417,0x0428,0x042d,0x0429,0x0427,0x042a
};


/* From CP866 to Unicode */

long cp866tou[128] = {
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x2591,0x2592,0x2593,0x2502,0x2524,0x2561,0x2562,0x2556,
0x2555,0x2563,0x2551,0x2557,0x255d,0x255c,0x255b,0x2510,
0x2514,0x2534,0x252c,0x251c,0x2500,0x253c,0x255e,0x255f,
0x255a,0x2554,0x2569,0x2566,0x2560,0x2550,0x256c,0x2567,
0x2568,0x2564,0x2565,0x2559,0x2558,0x2552,0x2553,0x256b,
0x256a,0x2518,0x250c,0x2588,0x2584,0x258c,0x2590,0x2580,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f,
0x0401,0x0451,0x0404,0x0454,0x0407,0x0457,0x040e,0x045e,
0x00b0,0x2022,0x00b7,0x221a,0x2116,0x00a4,0x25a0,   -1
};


/* From CP1251 to Unicode */

long cp1251tou[128] = {
0x0402,0x0403,0x201a,0x0453,0x201e,0x2026,0x2020,0x2021,
    -1,0x2030,0x0409,0x2039,0x040a,0x040c,0x040b,0x040f,
0x0452,0x2018,0x2019,0x201c,0x201d,0x2022,0x2013,0x2014,
    -1,0x2122,0x0459,0x203a,0x045a,0x045c,0x045b,0x045f,
0x00a0,0x040e,0x045e,0x0408,0x00a4,0x0490,0x00a6,0x00a7,
0x0401,0x00a9,0x0404,0x00ab,0x00ac,0x00ad,0x00ae,0x0407,
0x00b0,0x00b1,0x0406,0x0456,0x0491,0x00b5,0x00b6,0x00b7,
0x0451,0x2116,0x0454,0x00bb,0x0458,0x0405,0x0455,0x0457,
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f,
};


/* From Mac to Unicode */

long mactou[128] = {
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x2020,0x00b0,0x0490,0x00a3,0x00a7,0x2022,0x00b6,0x0406,
0x00ae,0x00a9,0x2122,0x0402,0x0452,0x2260,0x0403,0x0453,
0x221e,0x00b1,0x2264,0x2265,0x0456,0x03bc,0x0491,0x0408,
0x0404,0x0454,0x0407,0x0457,0x0409,0x0459,0x040a,0x045a,
0x0458,0x0405,0x00ac,0x221a,0x0192,0x2248,0x0394,0x00ab,
0x00bb,0x2026,0x0020,0x040b,0x045b,0x040c,0x045c,0x0455,
0x00b0,0x00b1,0x0406,0x0456,0x0491,0x00b5,0x00b6,0x00b7,
0x040e,0x045e,0x040f,0x045f,0x2116,0x0401,0x0451,0x044f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x00a4,
};



/* From Alternativnyj Variant to Unicode */

long avtou[128] = {
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x2591,0x2592,0x2593,0x2502,0x2524,0x2561,0x2562,0x2556,
0x2555,0x2563,0x2551,0x2557,0x255d,0x255c,0x255b,0x2510,
0x2514,0x2534,0x252c,0x251c,0x2500,0x253c,0x255e,0x255f,
0x255a,0x2554,0x2569,0x2566,0x2560,0x2550,0x256c,0x2567,
0x2568,0x2564,0x2565,0x2559,0x2558,0x2552,0x2553,0x256b,
0x256a,0x2518,0x250c,0x2588,0x2584,0x258c,0x2590,0x2580,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f,
0x0401,0x0451,0x0317,0x0316,0x0301,0x0300,0x2192,0x2190,
0x2193,0x2191,0x00f7,0x00b1,0x2116,0x00a4,0x25a0,   -1
};
/* The interpretation of the four symbols following the second alphabetic block in AV remains unclear. One suggestion was to treat these as (non-spacing) grave and acute, as appearing above upper- or lowercase letters, but the graphical rendering in Briabin's original article makes clear that the distinction is between acute and grave, above or below the letter: this is what the table now has.

But the preponderance of graphical symbols in AV suggests that the intention was to provide facilities for character graphics, in which case the interpretation is simply straight lines connecting two adjacent midpoints of the boinding box. If the box is the unit square, these would run from (.5,0) to (0,.5) and to (1,.5), and from (.5,1) to (0,.5) and to (1,.5), in this order. (The line segments are of course directionless.) Such symbols are not present in Unicode -- the closest things are 0x25de 0x25df 0x25dc 0x25dd (in this order) but these are curved, not straight.

Whether the graphics or the accent usage is more prevalent in actual usage only those plugged into the Russian PC community can tell. If the graphics usage turns out to be prevalent, these four symbols would be reasonable candidates for incorporation into Unicode, perhaps at positions 0x25ef to 0x25f3. */

/* From Osnovnoj Variant to Unicode */

long ovtou[128] = {
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f,
0x0401,0x0451,0x0317,0x0316,0x0301,0x0300,0x2192,0x2190,
0x2193,0x2191,0x00f7,0x00b1,0x2116,0x00a4,0x25a0,   -1
};

/* The same problem with the interpretation of 242-245 as in AV (these
rows are definitely identical). The low positions of OV are probably
identical to 176-223 in AV... */


/* From ISO8859-5 to Unicode */

long newkoi8tou[128] = {
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
0x00a0,0x0401,0x0402,0x0403,0x0404,0x0405,0x0406,0x0407,
0x0408,0x0409,0x040a,0x040b,0x040c,0x00ad,0x040e,0x040f,
0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417,
0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f,
0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,
0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f,
0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437,
0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f,
0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447,
0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f,
0x2116,0x0451,0x0452,0x0453,0x0454,0x0455,0x0456,0x0457,
0x0458,0x0459,0x045a,0x00a7,0x045c,0x045d,0x045e,0x045f
};

/* Use newkoi8tou in combination with isotoibm to derive the unicode
meaning of the Cyrillic range in the DKOI extension of EBCDIC. If
someone has DKOI-8 text available, I'd love to actually try... */
| Top of page |

ACKNOWLEDGEMENTS
Most of the information was provided by the following:
David J. Birnbaum at djbpitt+@pitt.edu
Bur Davis at bdavis@adobe.com
George Fowler at gfowler@ucs.indiana.edu
Richard B. Paine at RPAINE@CCNODE.Colorado.EDU
Slava Paperno at PAPY@CORNELLA.cit.cornell.edu
Glenn E. Thobe at thobe@getunx.info.com
Dimitri Vulis at DLV@CUNYVMS1.BITNET
Johan W. van Wingen (acknowledged in Dimitri Vulis' posting, but no netaddress) Thanks to all who contributed -- I am responsible for the errors that still remain.
Andras Kornai ( andras@calera.com, kornai@csli.stanford.edu)

** HOT: [Ukrainian Christmas Traditions] [SHOP UKRAINIAN] [PARLIAMENTARY ELECTIONS 2006]

BRAMA Home -- BRAMA in Ukrainian -- Calendar -- UkraiNEWStand -- Community Press -- Search BRAMA -- Arts/Culture -- Business -- CLASSIFIEDS -- Compute/Software -- Social Issues -- Education -- Fun -- Law -- e-LISTS&BB's -- Nova Khvylia (New Wave) -- SPORTS -- Travel -- Ukraine -- Government -- Diaspora Directory -- Suggest a Link -- Report a dead link -- About BRAMA - WebHosting - Domains - Advertising -- What's New? -- GOOGLE-- Yahoo!
Copyright © 1997-2007 BRAMA, Inc.tm, Inc. All Rights Reserved.