There is also a Russian version of this document.
Note that the Russian and English versions are not identical; they are updated separately.
Written by Grigory Naumovets
(E-mail: g_n@online.com.ua).
This manual is based on my personal experience and of course was not intended as comprehensive (in the 'References' section, there are some links to other manuals and resources available on the Internet). Originally, it was prepared in 1997 for the Learning Resource Center Project of AIHA and oriented mainly on people working under Windows95 and using a specific set of software.
I have updated it several times since then, and now it is quite different from the initial version.
Your comments, additions, or corrections are welcome.
Last updated on: August 26, 1999
Send me a message using an online feedback form
NOTE: When I talk here about 'Cyrillic', I mean mainly Russian or Ukrainian. There is nothing here about any Serbian-, Bulgarian-, or Macedonian-specific issues. As far as I know, these countries (at least Bulgarians) are using on the Internet mainly the Windows-1251 code page.
CONTENTS
- Introduction
- ASCII and most common Cyrillic code pages
- 7-bit ASCII code page
- Most common 8-bit Cyrillic code pages
- UNICODE: a 'unified' code page
- 7-bit encodings of 8-bit texts: 'Quoted-printable' and 'base64'
- Tune your Web browser to read Web pages in Russian
- Netscape Navigator (v.2,3)
- Microsoft Internet Explorer (v.3, 4, 5)
- Netscape Communicator (v.4.*)
- HTML documents with explicitly indicated charsets
- Tune your mailer for correspondence in Russian
- How to choose format for sending Cyrillic messages
- Choose code page
- 8-bit or 7-bit format?
- Sending text files as attachments
- How to send messages in KOI-8: possible approaches
- Typing KOI-8 texts under Windows
- Native Windows95 keyboard switcher
- CyrWin 95
- ParaWin 95/98
- WinKey
- How to choose format for sending Cyrillic messages
- Recoding Cyrillic texts by mailers
- Handling of Cyrillic texts by some mailers
- Mailers Comparison Table
- Eudora Pro (v2.2 & 3.*)
- Netscape Mail (Mozilla) v.2
- Netscape Mail (Mozilla) v.3
- Microsoft Internet Mail
- Netscape Mail (Mozilla) v.4 (aka Netscape Messenger)
- Outlook Express
- Pegasus Mail for Windows
- The Bat!
- Bmail/UUPC for DOS
- Recoding Cyrillic texts by proxy servers
- Recoding Cyrillic texts with external programs
- Some recoders for DOS
- CP Tuner 95
- CP Tuner 97
- Total Recode II and some other decoders
- Online recoders
- Some comments on the problem of Ukrainian KOI8
- References
- Acknowledgements
Introduction
When a text message is sent via e-mail (as distinct from fax), it contains digital codes of the characters, but not their images. If the sender and the addressee use on their computers same codes for same characters (in other words, use the same coding system), the addressee will be able to read the original message correctly. If, however, they use different coding systems, the message should be recoded at some point(s) of its route in such a way that codes received by the addressee would be different from the original codes, but correspond to the same characters.With plain Latin characters (without accents, diacritical marks, etc.), the issue of code compatibility does not normally arise, since the standard used for their coding is commonly accepted. However, people often encounter this problem when sending and receiving Cyrillic messages, because there are several different standards used to encode Cyrillic characters. To understand and solve this kind of problems, one should learn about most commonly used Cyrillic coding systems and how Cyrillic texts can be transformed by mailers and servers processing incoming and outgoing mail.
ASCII and most common Cyrillic code pages
Seven-bit ASCII code page
Most common 8-bit Cyrillic code pages
Under Windows, it is common to encode Cyrillic texts using the so called Code Page 1251 (nicknamed for this reason by Russian-speaking people as 'Windows code page'). For example, if one types a text in MS Word for Windows 6.0 or 7.0 using 'usual' Cyrillic fonts for Windows and saves the document as 'Text Only', the text file will be saved in this encoding.
Under DOS, Russian characters are most often encoded using Code Page 866 (also known as the 'modified alternative code page' -- or, for brevity, 'Alternative' or 'DOS'). This code page is native for Russian also under the OS/2 operating system. In case of the Russian regional settings of Windows, saving a Cyrillic text from MS Word for Windows as 'MS-DOS Text' will write the file in this encoding. To read such files in MS Word for Windows, one should enable the option 'Confirm conversion at open' (via the menu Tools/ Options/ General) and then select file conversion from 'MS-DOS Text' when prompted.
With operating systems of the UNIX family, Russian texts are usually encoded using the KOI-8 Code Page (this Russian abbreviation means an '8-bit code for information interchange'). Actually, 'KOI-8' is a generic name for a family of code pages, such as Russian KOI8-R (RFC 1489) and Ukrainian KOI8-U (RFC 2319).
KOI8-R is accepted as standard encoding for Russian-language USENET newsgroups (e.g., relcom.*) and is commonly used for transfer of Russian messages by e-mail. A lot of useful information about this code page and 'koization' of various software can be found on Andrei Chernov's Web page <http://www.nagual.pp.ru/~ache/koi8.html>. Under Windows, texts in KOI-8 encoding can be read and printed using special 'KOI-8' fonts for Windows. However, to type texts in this encoding, one should install a special keyboard layout.
Macintosh computers use for Cyrillic texts their own code page known as Code Page 10007 (or just 'Macintosh'). Helpful information and references on Russification of Macintosh-compatible computers can be found, for example, on www.relcom.ru/Russification/MacKoi8-r, http://www.friends-partners.org/partners/rusmac/, http://www.hf.uib.no/smi/files/eudtab.html.
Cyrillic code page ISO-8859-5 (sometimes called for brevity just 'ISO') is similar to a code page known as the 'main' DOS, or 'GOST' code page (though in fact the 'alternative' code page 866 has long become 'main' for DOS). An advantage of ISO-8859-5 is the strictly alphabetical order of Russian characters, which is most convenient for data sorting in databases. It is native for the SunOS operating system.
'Translit' is NOT a code page -- it means just transliteration, i.e. substitution of Cyrillic letters with 'phonetically similar' Latin letters and their combinations. It is used as a last resort by (or for) people who do not have on their computers Cyrillic fonts, keyboard layouts, etc., or do not feel comfortable with the variety of Cyrillic code pages.
UNICODE: a 'unified' code page
Unicode in Windows 95/98 and MS Office 97
Unicode has not yet become popular on the Internet, but it is used by some programs working under Windows95. For example, Microsoft Word 97 has an option of saving text files in the Unicode encoding (Save As / Unicode text), and non-ASCII characters in Word 97 documents are also encoded according to the Unicode table (and thus take twice as much space in the doc file). Such documents cannot be read by previous versions of MS Word. Generally, Microsoft Word 97 and other MS Office 97 applications work correctly only with Unicode-type fonts for Windows (i.e. those fonts that include several 'scripts' -- Western, Cyrillic, Greek, etc.). If you use Word 6.0 or 7.0, it certainly makes sense to download a special plug-in converter allowing Word 6.0 and 7.0 to open documents saved in the Word97 format <ftp://ftp.microsoft.com/Softlib/MSLFILES/wrd97cnv.exe>. (However, some Word97 documents with complex formatting cannot be opened in Word 6.0 and 7.0 even with this converter).
Another helpful utility is a TTF Converter designed for converting non-Unicode fonts to the Unicode format. After conversion, they look like Unicode fonts having two scripts -- e.g., Western and Cyrillic, and can be used by MS Office 97 applications. However, even if you use only Unicode fonts, you can have problems with printing Cyrillic from Office 97 to some types of printers. This is a well-known Microsoft bug; to fix it, go, for example, to Paul Gorodyansky's Web page.
If you wish to learn in more detail about Cyrillic code pages and fonts and the way they are handled by Windows, go to Konstantin Kazarnovskii's Web Page.
Seven-bit encodings of 8-bit texts: 'Quoted-printable' and 'base64'
Eight-bit texts can be sent by e-mail in their original 8-bit format; this is usually indicated in the message header as: Content-Transfer-Encoding: 8bit. However, one has to keep in mind that some e-mail servers (especially, in the 'seven-bit' English-speaking world) do NOT handle 8-bit texts properly. Some of them just 'cut off' the eighth bit, thus reducing all 8-bit codes by 128. Therefore, most mail agents allow conversion of 8-bit messages into a 7-bit form (assuming that the addressee's mailer will automatically perform the reverse conversion and thus restore the original text format). For example, in Eudora Pro v.2.2 this option can be enabled through the following menu: Tools/ Options/ Sending Mail/ May Use Quoted-Printable. In the 'quoted-printable' presentation, standard 7-bit characters are left unchanged, whereas 8-bit characters are replaced by a sequence of '= ' symbolsandapairofLatinlettersand/orfiguresrepresentingtherelevanthexadecimalcode.So,the7-bitencodedtextsentbye-mailmaylookasfollows:-=DD=EB=E5=EA=F2=F0=EE=ED=ED=FB=E5=F2=E0=E1=EB=E8=F6=FB(MicrosoftExcel,= Lotus 1-2-3 =E8 =E4=F0.);and the header of this message will indicate: Content-Transfer-Encoding: quoted-printable.
'base64' is another commonly used method for seven-bit presentation of 8-bit texts (and for encoding binary attachments, too). Employment of this method is indicated in the message header as: Content-transfer-encoding: base64.
For example, one can select this mode through the Mail/ Options/ Send/ Plain Text/ Settings/ MIME/ Encode text using/ base64 menu in the Microsoft Internet Mail (included in the Microsoft Internet Explorer v.3.0).
When receiving a message where a text is encoded as base64 or quoted-printable, modern (MIME-compliant) mailers automatically recover the original 8-bit text. However, those recipients whose mailers do not support this conversion will have to decrypt such messages by running an external decoding program.
Tune your Web browser to read Web pages in Russian
Generally, when looking at Russian Web sites, you can see several approaches to the problem of different Russian code pages used in different operating systems.One approach is to keep on the Web server several copies of the document converted to a variety of code pages, so that users could choose the code page that is most convenient for them. Actually, instead of several copies, in many cases there is only one copy accessible via several 'ports' which convert the document to the encoding selected by the user. Anyway, users can select code pages: sometimes only win-1251 or koi8-r, and sometimes also dos-866, iso-8859-5, Macintosh, and 'transliterated'.
Another approach is used by the so-called Russian Apache Web server, which tries to detect the 'native' code page of the browser and then converts the original document requested by the user to the appropriate encoding. In this case, the Web page usually does not have a menu for code page selection; if you work under Windows, you will most probably see the document in Win-1251 encoding.
The third approach is to make the document available to all users only in one encoding, assuming that they should be able to tune their browsers to read it. There are 'KOI8 ONLY' and 'ANTI-KOI' campaigns, and quite a lot of Cyrillic Web pages available either only in KOI8-R or only in Win-1251.
If you want to create your own Russian Web page, keep in mind that it is usually easier to tune Windows browsers to KOI8-R than Unix browsers to Win-1251. Therefore, if you want your Web page to be easily readable not only for Windows users, it makes sense to use the KOI8-R encoding.
If you want to read Russian Web pages, it is sufficient to tune your browser to understand KOI-8 and Windows-1251; other code pages never appear as the ONLY Web page encoding available. (However, you may find on the Internet a lot of Russian text files in DOS encoding; they are intended for downloading, not for online viewing by your browser).
Netscape Navigator (v.2,3)
For Netscape Navigator (v.3), one has to open the menu Options/ General Preferences/ Fonts, and set Windows-1251 Cyrillic fonts (proportional and fixed-width) for the 'Cyrillic' encoding and KOI-8 fonts for the 'Cyrillic (KOI8-R)' encoding. Sometimes it happens that none of the KOI-8 fonts already installed is recognized by the system as a fixed-width font, and thus no KOI-8 fonts are visible in the fixed-font selection menu. If this problem arises, one can download a set of 'appropriate' fonts from <http://www.relcom.ru/Russification/WinNetscape/ForWWW.zip>.
Then, one has to choose from the menu Options/ Document Encoding 'Cyrillic' for viewing Web pages in Windows-1251 and 'Cyrillic (KOI-8)' for viewing Web pages in KOI-8.
If you still use Netscape Navigator (v.2), it's high time to replace it with a more recent version. If for any reason you cannot do it, you can set KOI-8 fonts for the 'Latin 2' encoding and Windows-1251 fonts for the 'Korean' encoding (this version of Netscape Navigator does not offer any Cyrillic code pages in the 'Document Encoding' menu). Then, you can select from the menu Options/ Document Encoding 'Latin 2' to view texts in KOI-8 and 'Korean' to view texts in Windows-1251.
Microsoft Internet Explorer (v.3,4,5)
Internet Explorer v.4 and 5 has a lot of new capabilities, bells, and whistles, but the principle of handling Cyrillic is essentially the same way as in IE v.3.*; selection of languages and encodings here is included in the View/ Fonts or View/ Encoding or View/ Language menu. However, its mailer and newsreader called Outlook Express is much more powerful than MS Mail and News included in the IE v.3.*; it will be discussed later.
Also, IE v.4 and 5 can handle Unicode-type 7-bit and 8-bit encodings called UTF-7 and UTF-8.
Netscape Communicator (v.4.*)
However, treatment of Cyrillic by Netscape Communicator is generally more buggy in comparison to MS Internet Explorer. Sometimes Cyrillic text on the screen is unreadable even though everything is tuned right, and there is no evident reason why. If it happens, try the following:
- reload the document
- switch encoding to a different one, and then back
- view the document source and try to understand what could cause the problem (maybe the charset was indicated incorrectly)
- if you still do not understand it, try Internet Explorer instead of Netscape.
Settings of Web browsers to view a document in a specific encoding do work properly only if the header of the HTML document does not explicitly indicate a code page. If, however, the header contains a line like that:
<META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=windows-1251'>)
the browser will be automatically set to the code page specified by this tag -- Netscape Navigator v.3.* displays the text with the fonts set for this code page in the Options/ General Preferences/ Fonts menu, while Internet Explorer and Netscape Communicator perform the appropriate recoding. In this case, attempts to set the code page manually via the browser menu do not work. If the charset specified in the header of the HTML document is different from that actually used in the document, the text may be unreadable. For Netscape Navigator v.3.*, this problem can be solved by setting for the encoding specified in the document a font corresponding to the actual code page. Generally, if the text on a Web page is unreadable, try viewing the HTML source via the View/ Document Source menu. It often helps either to read the text or to identify the reason of trouble.
Generally, explicit indication of the charset in the header of an HTML document is a trick that should be used with some caution. In most cases, it works fine, but remember several simple things:
1. no charset is better than wrong charset.
2. Explicit indication of a charset may make the document unreadable for some old versions of browsers that do not recognize the name of this code page.
3. If you convert an HTML document with an explicitly indicated charset to a different code page, don't forget to correct the charset name as well.
4. Don't indicate charset explicitly if you are placing your Web page on a server that may recode it.
5. If your page is in Win-1251 encoding, even with the charset=windows-1251 tag, don't think that it will be readable for everybody. Many browsers made for UNIX do NOT support this charset, and many people working under UNIX generally believe that reading a Web page available only in Win-1251 is a complete waste of their time.
Also, if you want your Cyrillic Web page to be readable for everybody, never use FONT FACE tags. This tag is regarded as a bad style of HTML design because it may cause problems for people who do not have this font, or use a different operating system.
Tune your mailer for correspondence in Russian
How to choose format for sending Cyrillic messages
Choose code page
1. messages should be sent in KOI8-R encoding
2. message header should correctly indicate the code page ('charset=koi8-r')
8-bit or 7-bit format?
On the one hand, as already noted, there are email servers that do not transmit 8-bit texts. Some of them automatically convert 8-bit texts to a 7-bit format (e.g. base64), while some of them just 'cut off' the eighth bit. (By the way, KOI-8 code page has an interesting feature allowing the text to remain readable even after the loss of the eighth bit, because the Cyrillic characters in this code page are shifted exactly by 128 from the 'phonetically similar' Latin letters). Furthermore, use of 8-bit texts in the Subject field and in other fields of the header is actually not 'legal' since Internet standards require any 8-bit records in the header to be encoded to a 7-bit format.
On the other hand, it has long been common for the ex-USSR to send e-mail messages in the 8-bit format (including the text in the Subject field), and the vast majority of email servers transmit such messages without any problems. Moreover, many people in ex-USSR still use mailers that cannot automatically recover the original 8-bit form of messages received in the 7-bit quoted-printable or base64 encoding. Finally, messages in their original 8-bit form are smaller than messages encoded to base64 and, especially, to quoted-printable. Therefore, our third advice will be as follows:
- 3. send messages in the 8-bit format. If you find that your (or your addressee's) mail server corrupts 8-bit messages, try to enable the mode of encoding 8-bit texts to quoted-printable or base64. If your addressee cannot decode these encodings, try to attach your text as a separate UUENCODEd file.
Sending text files as attachments
How to send messages in KOI-8: possible approaches
In fact, one can either type a letter in the KOI-8 encoding using appropriate fonts and keyboard layouts, or type it first in any convenient encoding (e.g., Windows-1251 or DOS) and then convert it to KOI-8. Generally, it is possible to use several approaches:
- 'Koification' of Windows (i.e. installation of KOI-8 fonts and keyboard layouts for Windows);
- Use of mailers able to recode incoming and outgoing messages;
- Use of special 'proxy' servers able to recode incoming and outgoing messages;
- Use of external recoding programs.
Typing KOI-8 texts under Windows
Native Windows95 keyboard switcher
Unfortunately, the 'native' Windows'95 keyboard switcher is not trouble-free. For example, it may not work in some windows, e.g. in the message composition window of Eudora (for more info about this bug, see the section about Eudora). Moreover, the native switcher does not include any tools to edit keyboard layouts. (An editor of Windows95 keyboard layouts can be found, for example, on <http://www.kiarchive.ru/pub/cyrillic/windows/jkbd9542.zip> -- but it may not work with layouts created by a different tool). Finally, if one has to use more than two or three keyboard layouts (e.g., English, Russian, Ukrainian, and KOI-8), the native Widows95 keyboard switcher becomes quite inconvenient, as since it does not offer different hot keys to access different layouts, and you have to scroll through all of them one by one.
If you are not comfortable using the native Widows95 keyboard switcher, it makes sense to install special software for switching and editing keyboard layouts.
CyrWin 95
ParaWin 95/98
WinKey
However, it may not work properly with Unicode-type fonts, and so is not suitable for applications like MS Office 97.
Recoding Cyrillic texts by mailers
Naturally, to correctly recode incoming messages, one needs to know their code page. In principle, there can be two basic approaches to this problem:
- assume that all messages are coming in the same 'external' encoding (most likely, KOI8-R). In this case, messages coming in different encodings will be recoded incorrectly.
- try to identify the code page of the incoming message by its header (header field charset=). In this case, messages with incorrect charsets indicated in their headers will be recoded incorrectly.
Handling of Cyrillic texts by some mailers
Mailers Comparison Table
Mailer | Message text recoding | Specifies Cyrillic charset as | Problems and comments |
Eudora Pro v.2.2 | None External recoding or koi8 fonts needed |
incorrect: iso-8859-1 (can be hacked to koi8-r) | Very convenient interface; Problems with: Unicode fonts, Win95 keybrd switcher, msgs forwarded by Netscape, splitted attachments; Generally not convenient for Cyrillic correspondence |
Eudora Pro v.3.*, 4.* | None Plugin for win<->koi recoding available |
incorrect: iso-8859-1 (with plugin: koi8-r or windows-1251) |
Same as Eudora v.2.*; v.3.* supports multiple accounts and plugins, including the koi8 plugin; v.4.* has Cyrillic bugs that should be patched |
Netscape Mail (Mozilla) v.3.* | Incoming: none Outgoing: win to koi, koi to koi Replies: koi to koi koi8 fonts needed; no external recoding |
koi8-r (may differ from actual charset) | Single-level mail folders; Forwarded msgs may cause problems for Eudora 2.2 users; Generally not convenient for Cyrillic correspondence |
Netscape Mail (Mozilla) v.4.* | Incoming: any to win Outgoing: win to koi (version-dependent) koi8 fonts or external recoding not needed |
koi8-r (may differ from actual charset) | Multilevel mail folders; Multiple profiles; Counter-intuitive choice of encoding for outgoing msgs in v.4.0* |
MS Internet Mail | Incoming: koi to win, win to win Outgoing: win to koi, win to win koi8 fonts or external recoding not needed |
correct koi8-r or Windows-1251 |
Single-level mail folders; no choice of encoding when replying; messages with incorrectly specified charset may be unreadable |
Outlook Express v.4.*, 5.* | Incoming: any to Unicode Outgoing: Unicode to any koi8 fonts or external recoding not needed |
correct koi8-r, Windows-1251 and other |
Multilevel mail folders; integrated mail and news; multiple accounts; probably most powerful handling of Cyrillic; optional support of koi8-u; most convenient for Cyrillic correspondence |
The Bat! | Incoming: any to win Outgoing: win to any koi8 fonts or external recoding not needed |
correct koi8-r, koi8-u, Windows-1251 and other |
Small but quite powerful; multilevel mail folders; multiple accounts; handling of non-standard charsets; supports koi8-u; embedded support of PGP encryption; limited support of HTML-formatted messages; most convenient for Cyrillic correspondence |
Pegasus Mail | None After tuning: Incoming: koi to win; Outgoing: win to koi koi8 fonts or external recoding not needed if tuned |
Tunable to koi8-r | Quite powerful, but interface not too convenient Needs tuning multilevel mail folders; Problems with 8-bit headers; Generally not convenient for Cyrillic correspondence |
Forte Agent (not Free!) |
After tuning: Incoming: koi to win; Outgoing: win to koi and other Tunable for external or internal recoding |
tunable to koi8-r, koi8-u and other | One of the best newsreaders but may be also used as mailer; single-level mail folders; For tuning, see www.glasnet.ru/~kazarn/soft.htm and references therein |
Eudora Pro (v.2.2 & 3.*)
Problems.
What I like most of all in Eudora is its very convenient user interface -- probably the best of all mailers I know. However, Eudora also has some drawbacks:
- In the header of outgoing Cyrillic messages, Eudora indicates an incorrect charset ('charset=iso-8859-1' instead of 'koi8-r' or 'Windows-1251'). In some cases, it results in an improper processing of such messages by a recipient's mailer. This can be corrected by editing the Eudora.exefile with a binary editor. (You can fix Eudora Pro v.2.2 with this crack; cracks for other versions could be found on Andrei Chernov's Web page). However, do not change the charset to koi8-r if you send your messages in Windows-1251!!
- Sometimes Eudora does not want to display Cyrillic Unicode fonts: if you set the font script in the Tools/ Options/ Fonts & Display menu to Cyrillic, it switches it 'by itself' back to Western. In this case, try using different screen fonts, especially non-truetype and non-Unicode fonts.
- Also, Eudora may block the 'native' Win95 keyboard switcher in the message editing window. To switch the keyboard layout to Russian, one has to go to another Eudora's window, switch to Russian there, and then go back to the message editing window. This problem can often be fixed by changing Eudora's screen font to a non-Unicode or a non-truetype Cyrillic font. However, a more radical solution is to use a different program (such as ParaWin 95 Pro, CyrWin 95, or WinKey) instead of the 'native' Win95 keyboard switcher. These programs are much more convenient, especially if you have to use more than two keyboard layouts.
Eudora's automatic recoding of binary attachments is also inconvenient when you receive a uuencoded file sliced into several pieces. In this case, the first piece is automatically uudecoded and saved as a binary file. To recover the original file, you have to uuencode this first piece back, to append to it the other pieces in the appropriate order, and then to uudecode the obtained file.
An advantage of Eudora Pro v.3.* over the previous versions of Eudora is its support for multiple 'personalities' with different identities, pop, smtp, and other settings. It handles Cyrillic in essentially the same way as Eudora Pro v.2.2, including the iso-8859-1 charset erroneously indicated for outgoing Cyrillic messages. Cracks are available on Andrei Chernov's KOI8 page.
Another advantage of Eudora Pro v.3.* is that its capabilities can be extended with the help of various plug-ins -- for example, the koi8 plugin designed by Eugene Surovegin. This plugin adds to the menu commands for recoding messages between koi8-r and windows-1251; also, it specifies the charset of outgoing Cyrillic messages as koi8-r instead of iso-8859-1.
Eudora can be (and often is) used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back. If you use Eudora v.3.* with the koi8 plugin, it may be convenient to receive incoming mail unrecoded and to send outgoing mail via a recoding smtp proxy. Thus, you will be able to recode incoming messages with the koi8 plugin whenever necessary, while your outgoing mail will be automatically recoded by the proxy to koi8-r.
Netscape Mail (Mozilla) - v.2
Netscape Mail (Mozilla) - v.3
Netscape Mail v.3 should NOT be used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back.
The mode of sending messages in 7-bit or 8-bit form is controlled in Netscape Mail via the menu Options/ Mail and News Preferences/ Composition/ Allow 8-bit.
When messages are forwarded with Netscape Mail, the original text by default is sent as an attachment, with the attachment file name defined by the message subject. It is very inconvenient for recipients who use Eudora (and maybe some other mailers). If you want to make their life happier, forward messages as quoted text via the Message/ Forward Quoted menu, instead of just pressing the Forward button.
Microsoft Internet Mail
If your mailer is Microsoft Internet Mail, you will not need to 'koify' Windows or to work via recoding proxy servers. Make sure that you connect directly to the pop and smtp server of your provider, without any recoding proxies. Otherwise, Microsoft Internet Mail may not work properly.
Netscape Mail (Mozilla) v.4 (aka Netscape Messenger)
Netscape Mail v.4, as distinct from the previous versions, supports multilevel mail folders. It should NOT be used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back.
To make Netscape Messenger send messages in the 8-bit format, without encoding to quoted-printable, one should go to the menu Edit/ Preferences/ Mail & Newsgroups/ Messages/ and set the switch Send messages that use 8-bit characters: to As is.
Some problems and solutions.
Problem: Netscape Messenger v.4.5*, 4.6* is sending the Subject field encoded to 7-bit. How to make it send it in the plain 8-bit format?
Solution: This setting cannot be controlled from the menus. To fix it, one should open the prefs.js file (it should be in the directory ...NetscapeUsers<username>) and add the following line:
user_pref('mail.strictly_mime_headers', false);
Another line also worth adding to this file is
user_pref('mailnews.start_page.enabled', false);
-- It prevents Messenger from automatically connecting Netscape NetCenter to display the Welcome message.
For more details, see Paul Gorodyansky's instructions on
http://www.relcom.ru/Russification/WinNetscape/.
Outlook Express
You can tune Outlook Express to Cyrillic via Tools/ Options/ Read/ Fonts. Choose there Cyrillic as the default charset, and set Mime Encoding to KOI8 if you plan to write mainly in KOI8. When you are reading or sending messages (including replies), you can easily select the proper encoding via the View/ Language menu. In the Tools/ Options/ Send menu, set Mail sending format to Plain Text, and then in the Settings set Message format to MIME and check Allow 8-bit characters in headers (if you don't want your 8-bit subjects to be converted to 7-bit). If you want to send your binary attachments uuencoded, choose Uuencode instead of MIME. However, in this case there will be no MIME fields in your message's header, and the charset will not be specified.
An interesting feature of Outlook Express is charset remapping: when you view messages with certain charsets in different encodings, you are prompted to set this remapping as default for messages with this charset. You can see and modify the list of remappings via the Tools/ Options/ Read/ International settings menu.
Of course, if you use Outlook Express, you should NOT connect to the pop and smtp server of your provider via recoding proxies to avoid multiple recoding.
Some problems and solutions
Problem: Sometimes message text is unreadable even if the Language is set to the proper charset.
Solution: Try switching to a different charset (say, Win-1251 instead of koi8-r) and then back to the correct charset.
Problem: Some or all Cyrillic characters, especially in the Subject field, are replaced with question marks when printed, though they are normally seen on the screen.
Solution: This problem (and other problems with Cyrillic printing) may occur for some beta versions of OE4 and OE5. Upgrade to a newer version. You may also try different fonts, though it does not always work. It may be helpful to go to the Internet Explorer (menu View/Internet Options/General/Accessibility/Formatting) and set Ignore font styles specified on Web Pages.
Problem: Sometimes OE5 shows Cyrillic headers in the list of newsgroup postings in a wrong encoding (not recoded from koi8 to win-1251). Switching the encoding has no effect.
Solution: This is a well-known bug of OE5, but techniques of fighting it are not always effective and sometimes look shamanistic (see the following list of actions):
- before downloading any new messages, go to the Inbox or Outbox and view any messages kept there (you may check the box 'When starting, go directly to my Inbox folder' in the Tools/ Options/ General menu);
- go to the Tools/ Options/ Read/ Fonts menu and set Cyrillic (Windows) as the default encoding;
- go to the Tools/ Options/ Read/ International Settings menu, set Cyrillic (KOI8-R) as the default encoding, and check the box 'Use default encoding for all incoming messages'.
- go to the Tools/ Options/ Send/ International Settings menu and set Cyrillic (KOI8-R) as the default encoding.
- this all may (or may not) take effect only after you reload the list of messages.
Problem: When quoting a message you a replying to, OE5 sometimes does mark the quoted text with '>', but sometimes does not.
Solution: This happens when you receive a Rich-Text or HTML-formatted message. It cannot be fixed. Ask your correspondent to send you messages in the plain text format (with Outlook Express, one should go to the Tools/ Options/ Send/ Mail sending format and News sending format menus and set them to 'Plain text').
Pegasus Mail for Windows
Pegasus Mail does NOT allow sending Cyrillic texts in the Subject field in the 8-bit format. To send the body of message in the 8-bit form, one should enable the option 'Allow 8-bit MIME message encoding' in the Tools/ Options/ Advanced Settings/ menu (after that, Pegasus Mail will warn you that sending texts in the 8-bit format is formally illegal). In the 'MIME character set' window one should type 'koi8-r'.
The Bat!
A specific feature of The Bat! is that in the 8-bit mode it sends 8-bit subjects in the following form:
Subject: =?koi8-r?Q?[8-bit text is here]?= .
One more advantage of The Bat! is that it supports the Ukrainian koi8-u encoding (since v.1.18). If you use an earlier version, it is quite easy to add a support for koi8-u -- you just have to install a koi8-u conversion table. (I made it myself -- please let me know if you find any errors in it). To install it, you need to copy it to the directory where The Bat! is installed (usually it is Program Files/ The Bat!), and then go to Options/ XLAT Tables/ Add.
Actually, I think The Bat! is one of the best mailers for correspondence in Cyrillic - even though it does not include a newsreader, has a limited support for the HTML format of mail messages, and does not support UTF-8 and UTF-7 formats available, for example, in Outlook Express.
To tune The Bat! for 8-bit correspondence in Cyrillic, go to the following menus:
Account/ Properties:
Transport -- 8-bit characters are treated: Without changes.
Options -- Allow 8-bit characters in message headers.
Templates/ New message -- Use character set: Cyrillic (KOI-8).
Bmail/UUPC for DOS
If you want to send an email message to a recipient who uses Bmail/UUPC or a similar program, you should keep in mind that such mailers cannot automatically decode 7-bit representations of 8-bit texts in the quoted-printable or base64 form, so you should send your messages in the 8-bit form. Also, these programs automatically decode only UUENCODEd attachments.
In early versions of Bmail/UUPC (up to 5.09g), all outgoing messages were recoded from DOS-866 to KOI8-R, and all incoming messages -- from KOI8-R to DOS-866. More recent versions allow one to choose the recoding table. However, the program recodes according to the selected table not all incoming messages, but only those with certain charsets indicated in the header. As a result, messages with incorrectly specified charsets (or 'multipart-mixed' messages) often come to the mailbox without recoding. Practice shows that it is often more convenient to use a version that would recode all messages regardless of the charsets indicated in their headers. As far as I know, among the recent versions of Bmail/UUPC, the most convenient in this respect is version 6.18m (which does not recode only 'multipart-mixed' messages).
Generally, to minimize your troubles with Bmail, I believe it makes sense to switch to the most recent version of Demos Mail for DOS you are able to find -- it will have some capabilities not available in Bmail.
If, for technical or financial reasons, you connect to your provider in the UUPC mode, but would like to use such POP/SMTP mailers as Eudora, Internet Mail, Netscape Mail, Pegasus, etc., you can try using a POP,SMTP/UUPC gate for Windows95 called Mailserver. Mailserver v.2.12 was free; there was a time when it could be automatically received by email in reply to an empty message sent to <mailserver@karst.kiev.ua>, but it is no longer supported by the developer, and I do not know a place on the Internet where you can find it (though I have a copy of it somewhere). If you install Mailserver, you can completely disable recoding of the Cyrillic text in the UUPC program and use for its recoding capabilities of a POP/SMTP mailer (e.g. The Bat!, MS Internet Mail, Outlook Express, Eudora with a koi8 plugin, Pegasus Mail, etc.). Versions 3.* are commercial, but much more advanced -- they support also the NNTP protocol and can completely take care of converting code pages and replacing charsets. A demo version of Mailserver 3.* can be downloaded from < http://www.kiarchive.ru/pub/windows/internet/mail/MailsrvD.exe>.
Recoding Cyrillic texts by proxy servers
If you work via a proxy server, 'koification' of Windows becomes unnecessary. However, if somebody sends you a message in a different encoding than the one implied by the proxy server (e.g. Windows-1251 instead of koi8-r), or with an incorrectly specified charset (if your proxy takes the charset field into account), it will be recoded in a wrong way. So, if you work via a recoding proxy, you will most probably have to learn how to use special recoding programs that can recover incorrectly recoded texts.
Recoding Cyrillic texts with external programs
Some recoders for DOS
CP Tuner 95 (obsolete)
CP Tuner 97
Tot-Recode II and some other decoders
It can also decode quoted-printable to 8-bit (but you should do it in the manual mode; it is also convenient to create a new scheme which includes decoding from QP to 8-bit and from KOI to Win).
Like CP Tuner 97, it can automatically 'transliterate' Cyrillic texts (i.e. replace Cyrillic characters with 'phonetically similar' Latin letters or their combinations.
ECoder is a very small time-limited shareware program (<100 Kb), which nevertheless also has pretty advanced decoding capabilities.
Agama Mail Reader is another commercial decoder.
Online decoders
Some comments on the problem of Ukrainian KOI8
One problem with koi-8 (which does not arise with Win-1251) is that the standard koi8-r code page does not include any Ukrainian-specific letters. The Ukrainian version of koi8 known as koi8-u has been used for several years, but only in April 1998 it was officially registered by RFC 2319. Thus far, recoding to/from this charset is NOT yet supported by many mailers, browsers, proxies, etc. which do support koi8-r. So, when you send a message or make a Web page in Ukrainian using the koi8-u encoding, it is very likely that some letters in it will not be read properly by some of your addressees (especially in case of any Win1251<->KOI-8 recoding by a mailer, provider's proxy, or Web browser). This situation has resulted in many Ukrainian Web pages appearing on the Internet ONLY in the Windows-1251 encoding.Up to now, the choice of browsers, mailers, and newsreaders for Windows 95 that support koi8-u is very limited. To read koi8-u Web pages under Windows with Netscape v.3.*, you should install koi8-u fonts. They are available, for example, on the official KOI8-U page. For tuning MS Internet Explorer 4.* and Outlook Express to koi8-u, you should install from the Microsoft Web site the Pan-European language support add-on. The Ukrainian koi8-u charset is indicated there as koi8-ru; if you want the correct charset name to be indicated in the header of your message or postings, you should edit the System Registry and change 'koi8-ru' wherever you find this string to 'koi8-u'.
An instruction for dummies on installation of the koi8-u support in IE4 and OE4 can be found here (in Ukrainian, Win-1251 encoding).
However, it is possible to do the same much faster, without having to download and install the full Pan-European support (I believe it is about 1 Mb). You actually need only one small file which can be found here (the instruction is inside).
Also, it is possible to find on the Internet some koi8-u hacks for specific versions of Netscape v.4.*.
A mailer called The Bat! supports koi8-u since v.1.18; if you use an earlier version, it is quite easy to add a koi8-u conversion table. (I made it myself -- please let me know if you find any errors in it). To install it, you need to copy it to the directory where The Bat! is installed (usually it is Program Files/ The Bat!), and then go to Options/ XLAT Tables/ Add.
Information about KOI8-U is available also on http://cad.ntu-kpi.kiev.ua/multiling/koi8-u/index.html.
Look at the KOI8-U character map copied from this page.
References
Andrei Chernov's Web page: a Bible on 'koification' of various operating systems, mailers, browsers, and newsreaders. <http://www.nagual.pp.ru/~ache/koi8.html>How to Russify Netscape Navigator for Windows <http://www.relcom.ru/Russification/WinNetscape> -- WinNetscape Russification Bible by Paul Gorodyanski
Paul Gorodyansky's Web page http://ourworld.compuserve.com/homepages/PaulGor/ - Russification of Netscape, Office 97 and other helpful stuff
Konstantin Kazarnovskii about fonts and languages <http://www.glasnet.ru/~kazarn/fonts.htm> -- very educational and well written, definitely worth reading
Lots of helpful information on Russification of various programs and operation systems <http://www.siber.com/sib/russify/>
Win95 Russification FAQ - <http://www.hackzone.ru/rtw95>
WinNT Russification - <http://rwntug.quarta.msk.ru/questions.htm>
Russification of Macintosh computers <http://www.relcom.ru/Russification/MacKoi8-r>
How to tune Forte Agent to work with Cyrillic <http://blue.iris.mipt.ru/timur/agent.htm>
How mailers should handle Cyrillic messages (and how they actually do it) <http://blue.iris.mipt.ru/timur/trueway.htm>
A lot of useful programs for working with Cyrillic texts <http://www.kiarchive.ru/pub/cyrillic/>
Some frequently asked questions about Russification of DOS and Windows <http://www.maths.monash.edu.au/~kig/ruscom/general/rusfaq.html>
Installation of various keyboard layouts for native Windows95 keyboard switcher <http://www.netsight.net/~ryba/cyrillic/index.html>
Internet standards (RFC -- requests for comments) <http://www.rfc-editor.org>
RFC 1489: description of KOI8-R code page
RFC 1521: MIME format for messages sent over the Internet
RFC 1522: MIME format for non-ASCII characters in the message headers
RFC 2319: Ukrainian Character Set KOI8-U
RFC 2152: UTF-7 -- A Mail-Safe Transformation Format of Unicode
RFC 2279: UTF-8, a transformation format of ISO 10646
Acknowledgements
The author is grateful to Paul Gorodyansky, Konstantin Kazarnovskii, Igor Manyuk, Anthon Lobastoff, and Andrew Tooziak for their helpful explanations.To the main page
- Things Cyrillic
- Cyrillic in e-mail, www, etc.