This version of the page http://gn.org.ua/cyrmnl-eng (0.0.0.0) stored by archive.org.ua. It represents a snapshot of the page as of 2017-03-13. The original page over time could change.
How to read Cyrillic on the Internet and send by e-mail: some hints and explanations | Home Page of Grigory Naumovets

How to read Cyrillic on the Internet and send by e-mail: some hints and explanations


There is also a Russian version of this document.
Note that the Russian and English versions are not identical; they are updated separately.

Written by Grigory Naumovets
(E-mail: g_n@online.com.ua).

This manual is based on my personal experience and of course was not intended as comprehensive (in the 'References' section, there are some links to other manuals and resources available on the Internet). Originally, it was prepared in 1997 for the Learning Resource Center Project of AIHA and oriented mainly on people working under Windows95 and using a specific set of software.
I have updated it several times since then, and now it is quite different from the initial version.
Your comments, additions, or corrections are welcome.

Last updated on: August 26, 1999 

Send me a message using an online feedback form



NOTE: When I talk here about 'Cyrillic', I mean mainly Russian or Ukrainian. There is nothing here about any Serbian-, Bulgarian-, or Macedonian-specific issues. As far as I know, these countries (at least Bulgarians) are using on the Internet mainly the Windows-1251 code page. 

CONTENTS

  • Introduction
  • ASCII and most common Cyrillic code pages
    • 7-bit ASCII code page
    • Most common 8-bit Cyrillic code pages
    • UNICODE: a 'unified' code page
  • 7-bit encodings of 8-bit texts: 'Quoted-printable' and 'base64'
  • Tune your Web browser to read Web pages in Russian
    • Netscape Navigator (v.2,3)
    • Microsoft Internet Explorer (v.3, 4, 5)
    • Netscape Communicator (v.4.*)
    • HTML documents with explicitly indicated charsets
  • Tune your mailer for correspondence in Russian
    • How to choose format for sending Cyrillic messages
      • Choose code page
      • 8-bit or 7-bit format?
      • Sending text files as attachments
    • How to send messages in KOI-8: possible approaches
    • Typing KOI-8 texts under Windows
      • Native Windows95 keyboard switcher
      • CyrWin 95
      • ParaWin 95/98
      • WinKey
  • Recoding Cyrillic texts by mailers
  • Handling of Cyrillic texts by some mailers
    • Mailers Comparison Table
    • Eudora Pro (v2.2 & 3.*)
    • Netscape Mail (Mozilla) v.2
    • Netscape Mail (Mozilla) v.3
    • Microsoft Internet Mail
    • Netscape Mail (Mozilla) v.4 (aka Netscape Messenger)
    • Outlook Express
    • Pegasus Mail for Windows
    • The Bat!
    • Bmail/UUPC for DOS
  • Recoding Cyrillic texts by proxy servers
  • Recoding Cyrillic texts with external programs
    • Some recoders for DOS
    • CP Tuner 95
    • CP Tuner 97
    • Total Recode II and some other decoders
    • Online recoders
  • Some comments on the problem of Ukrainian KOI8
  • References
  • Acknowledgements

Introduction

When a text message is sent via e-mail (as distinct from fax), it contains digital codes of the characters, but not their images. If the sender and the addressee use on their computers same codes for same characters (in other words, use the same coding system), the addressee will be able to read the original message correctly. If, however, they use different coding systems, the message should be recoded at some point(s) of its route in such a way that codes received by the addressee would be different from the original codes, but correspond to the same characters.

With plain Latin characters (without accents, diacritical marks, etc.), the issue of code compatibility does not normally arise, since the standard used for their coding is commonly accepted. However, people often encounter this problem when sending and receiving Cyrillic messages, because there are several different standards used to encode Cyrillic characters. To understand and solve this kind of problems, one should learn about most commonly used Cyrillic coding systems and how Cyrillic texts can be transformed by mailers and servers processing incoming and outgoing mail.


ASCII and most common Cyrillic code pages

    Seven-bit ASCII code page

The code table commonly accepted for Latin characters is called ASCII (American Standard Code for Information Interchange). In this code table, Latin characters as well as figures, punctuation marks, and some basic symbols are represented by digital codes ranging from 32 to 127. 'Latin' texts are sometimes called 'seven-bit texts,' because seven binary digits (bits) are sufficient to encode them. (Two raised to the power of seven equals 128, so that seven-bit codes allow up to128 characters to be encoded). If such a text is sent via e-mail, it is usually indicated in the header of this message by the following line: Content-Type: text/plain; charset='us-ascii'

    Most common 8-bit Cyrillic code pages

The codes ranging from 128 to 255 (they are called '8-bit codes,' because they require eight binary digits) are used to represent additional or 'nation-specific' characters. There are lots of different code tables (often called 'code pages'), which vary depending on the operating system and language (or group of languages). To identify correctly the character corresponding to a particular code, one should know the code page used to encode the text.

Under Windows, it is common to encode Cyrillic texts using the so called Code Page 1251 (nicknamed for this reason by Russian-speaking people as 'Windows code page'). For example, if one types a text in MS Word for Windows 6.0 or 7.0 using 'usual' Cyrillic fonts for Windows and saves the document as 'Text Only', the text file will be saved in this encoding.

Under DOS, Russian characters are most often encoded using Code Page 866 (also known as the 'modified alternative code page' -- or, for brevity, 'Alternative' or 'DOS'). This code page is native for Russian also under the OS/2 operating system. In case of the Russian regional settings of Windows, saving a Cyrillic text from MS Word for Windows as 'MS-DOS Text' will write the file in this encoding. To read such files in MS Word for Windows, one should enable the option 'Confirm conversion at open' (via the menu Tools/ Options/ General) and then select file conversion from 'MS-DOS Text' when prompted.

With operating systems of the UNIX family, Russian texts are usually encoded using the KOI-8 Code Page (this Russian abbreviation means an '8-bit code for information interchange'). Actually, 'KOI-8' is a generic name for a family of code pages, such as Russian KOI8-R (RFC 1489) and Ukrainian KOI8-U (RFC 2319).
KOI8-R is accepted as standard encoding for Russian-language USENET newsgroups (e.g., relcom.*) and is commonly used for transfer of Russian messages by e-mail. A lot of useful information about this code page and 'koization' of various software can be found on Andrei Chernov's Web page <http://www.nagual.pp.ru/~ache/koi8.html>. Under Windows, texts in KOI-8 encoding can be read and printed using special 'KOI-8' fonts for Windows. However, to type texts in this encoding, one should install a special keyboard layout.

Macintosh computers use for Cyrillic texts their own code page known as Code Page 10007 (or just 'Macintosh'). Helpful information and references on Russification of Macintosh-compatible computers can be found, for example, on www.relcom.ru/Russification/MacKoi8-r, http://www.friends-partners.org/partners/rusmac/, http://www.hf.uib.no/smi/files/eudtab.html.

Cyrillic code page ISO-8859-5 (sometimes called for brevity just 'ISO') is similar to a code page known as the 'main' DOS, or 'GOST' code page (though in fact the 'alternative' code page 866 has long become 'main' for DOS). An advantage of ISO-8859-5 is the strictly alphabetical order of Russian characters, which is most convenient for data sorting in databases. It is native for the SunOS operating system.

'Translit' is NOT a code page -- it means just transliteration, i.e. substitution of Cyrillic letters with 'phonetically similar' Latin letters and their combinations. It is used as a last resort by (or for) people who do not have on their computers Cyrillic fonts, keyboard layouts, etc., or do not feel comfortable with the variety of Cyrillic code pages.

Compare different Cyrillic code pages


    UNICODE: a 'unified' code page

As one could guess from its very name, 'Unicode'; is an attempt to create a unified code table that would have enough room for letters and characters of all human languages. Obviously, this table should be quite big, and the codes consist of 16 bits (i.e. two bytes) instead of 8 bits, allowing as many as 65536 elements to be encoded. Thus, the 'unification' of character encoding is achieved at the expense of a twofold increase in the text file size. The Unicode standard complies with the ISO/IEC 10646-1 standard defining a so-called Universal Character Set (UCS) and its two-byte per character representation called UCS-2; so, the Unicode encoding may also be named UCS-2. However, Internet and email protocols do not support tracsfer of any 16-bit characters; for sending them over the Internet, one should temporarily convert them to special 7-bit or 8-bit formats called UTF-7 and UTF-8, respectively (UTF is for UCS Transformation Format). New versions of Web browsers (and their mailers), such as Netscape Communicator 4.* and Microsoft Internet Explorer 4.* or 5.*, support UTF-7 and UTF-8.

Unicode in Windows 95/98 and MS Office 97

Unicode has not yet become popular on the Internet, but it is used by some programs working under Windows95. For example, Microsoft Word 97 has an option of saving text files in the Unicode encoding (Save As / Unicode text), and non-ASCII characters in Word 97 documents are also encoded according to the Unicode table (and thus take twice as much space in the doc file). Such documents cannot be read by previous versions of MS Word. Generally, Microsoft Word 97 and other MS Office 97 applications work correctly only with Unicode-type fonts for Windows (i.e. those fonts that include several 'scripts' -- Western, Cyrillic, Greek, etc.). If you use Word 6.0 or 7.0, it certainly makes sense to download a special plug-in converter allowing Word 6.0 and 7.0 to open documents saved in the Word97 format <ftp://ftp.microsoft.com/Softlib/MSLFILES/wrd97cnv.exe>. (However, some Word97 documents with complex formatting cannot be opened in Word 6.0 and 7.0 even with this converter).
Another helpful utility is a TTF Converter designed for converting non-Unicode fonts to the Unicode format. After conversion, they look like Unicode fonts having two scripts -- e.g., Western and Cyrillic, and can be used by MS Office 97 applications. However, even if you use only Unicode fonts, you can have problems with printing Cyrillic from Office 97 to some types of printers. This is a well-known Microsoft bug; to fix it, go, for example, to Paul Gorodyansky's Web page.

If you wish to learn in more detail about Cyrillic code pages and fonts and the way they are handled by Windows, go to Konstantin Kazarnovskii's Web Page.


Seven-bit encodings of 8-bit texts: 'Quoted-printable' and 'base64'

Eight-bit texts can be sent by e-mail in their original 8-bit format; this is usually indicated in the message header as: Content-Transfer-Encoding: 8bit. However, one has to keep in mind that some e-mail servers (especially, in the 'seven-bit' English-speaking world) do NOT handle 8-bit texts properly. Some of them just 'cut off' the eighth bit, thus reducing all 8-bit codes by 128. Therefore, most mail agents allow conversion of 8-bit messages into a 7-bit form (assuming that the addressee's mailer will automatically perform the reverse conversion and thus restore the original text format). For example, in Eudora Pro v.2.2 this option can be enabled through the following menu: Tools/ Options/ Sending Mail/ May Use Quoted-Printable. In the 'quoted-printable' presentation, standard 7-bit characters are left unchanged, whereas 8-bit characters are replaced by a sequence of '= ' symbolsandapairofLatinlettersand/orfiguresrepresentingtherelevanthexadecimalcode.So,the7-bitencodedtextsentbye-mailmaylookasfollows:-=DD=EB=E5=EA=F2=F0=EE=ED=ED=FB=E5=F2=E0=E1=EB=E8=F6=FB(MicrosoftExcel,= Lotus 1-2-3 =E8 =E4=F0.);
and the header of this message will indicate: Content-Transfer-Encoding: quoted-printable.

'base64' is another commonly used method for seven-bit presentation of 8-bit texts (and for encoding binary attachments, too). Employment of this method is indicated in the message header as: Content-transfer-encoding: base64.
For example, one can select this mode through the Mail/ Options/ Send/ Plain Text/ Settings/ MIME/ Encode text using/ base64 menu in the Microsoft Internet Mail (included in the Microsoft Internet Explorer v.3.0).

When receiving a message where a text is encoded as base64 or quoted-printable, modern (MIME-compliant) mailers automatically recover the original 8-bit text. However, those recipients whose mailers do not support this conversion will have to decrypt such messages by running an external decoding program. 


Tune your Web browser to read Web pages in Russian

Generally, when looking at Russian Web sites, you can see several approaches to the problem of different Russian code pages used in different operating systems.
One approach is to keep on the Web server several copies of the document converted to a variety of code pages, so that users could choose the code page that is most convenient for them. Actually, instead of several copies, in many cases there is only one copy accessible via several 'ports' which convert the document to the encoding selected by the user. Anyway, users can select code pages: sometimes only win-1251 or koi8-r, and sometimes also dos-866, iso-8859-5, Macintosh, and 'transliterated'.

Another approach is used by the so-called Russian Apache Web server, which tries to detect the 'native' code page of the browser and then converts the original document requested by the user to the appropriate encoding. In this case, the Web page usually does not have a menu for code page selection; if you work under Windows, you will most probably see the document in Win-1251 encoding.

The third approach is to make the document available to all users only in one encoding, assuming that they should be able to tune their browsers to read it. There are 'KOI8 ONLY' and 'ANTI-KOI' campaigns, and quite a lot of Cyrillic Web pages available either only in KOI8-R or only in Win-1251.

If you want to create your own Russian Web page, keep in mind that it is usually easier to tune Windows browsers to KOI8-R than Unix browsers to Win-1251. Therefore, if you want your Web page to be easily readable not only for Windows users, it makes sense to use the KOI8-R encoding.

If you want to read Russian Web pages, it is sufficient to tune your browser to understand KOI-8 and Windows-1251; other code pages never appear as the ONLY Web page encoding available. (However, you may find on the Internet a lot of Russian text files in DOS encoding; they are intended for downloading, not for online viewing by your browser).

    Netscape Navigator (v.2,3)

Essentially, tuning of Netscape Navigator v.3.* to read Russian Web pages consists in installation of appropriate fonts that can be used to display KOI-8 and Windows-1251 texts. The procedure is described in much detail by Paul Gorodyansky on <http://www.relcom.ru/Russification/WinNetscape>; here we will only explain the general idea.

For Netscape Navigator (v.3), one has to open the menu Options/ General Preferences/ Fonts, and set Windows-1251 Cyrillic fonts (proportional and fixed-width) for the 'Cyrillic' encoding and KOI-8 fonts for the 'Cyrillic (KOI8-R)' encoding. Sometimes it happens that none of the KOI-8 fonts already installed is recognized by the system as a fixed-width font, and thus no KOI-8 fonts are visible in the fixed-font selection menu. If this problem arises, one can download a set of 'appropriate' fonts from <http://www.relcom.ru/Russification/WinNetscape/ForWWW.zip>.
Then, one has to choose from the menu Options/ Document Encoding 'Cyrillic' for viewing Web pages in Windows-1251 and 'Cyrillic (KOI-8)' for viewing Web pages in KOI-8.

If you still use Netscape Navigator (v.2), it's high time to replace it with a more recent version. If for any reason you cannot do it, you can set KOI-8 fonts for the 'Latin 2' encoding and Windows-1251 fonts for the 'Korean' encoding (this version of Netscape Navigator does not offer any Cyrillic code pages in the 'Document Encoding' menu). Then, you can select from the menu Options/ Document Encoding 'Latin 2' to view texts in KOI-8 and 'Korean' to view texts in Windows-1251.


     Microsoft Internet Explorer (v.3,4,5)

Unlike Netscape Navigator v.2&3, Microsoft Internet Explorer (v.3) does not require you to install any special fonts to view koi-8 and Windows-1251 Web pages. It is sufficient to choose the Cyrillic set of characters and default language from the View/ Options/ General/ Font Settings menu, and set MIME-Encoding to Windows-1251 or KOI8-R. Also, when viewing a Web page with Internet Explorer, one can just click the left mouse button on the icon located in the bottom right corner of the IE window and select the required code page from the pop-up menu. (If you don't see this icon, it probably means that Multilanguage Support is not installed). Instead of substituting the fonts, Internet Explorer performs the appropriate recoding of the original document.
Internet Explorer v.4 and 5 has a lot of new capabilities, bells, and whistles, but the principle of handling Cyrillic is essentially the same way as in IE v.3.*; selection of languages and encodings here is included in the View/ Fonts or View/ Encoding or View/ Language menu. However, its mailer and newsreader called Outlook Express is much more powerful than MS Mail and News included in the IE v.3.*; it will be discussed later.
Also, IE v.4 and 5 can handle Unicode-type 7-bit and 8-bit encodings called UTF-7 and UTF-8.

     Netscape Communicator (v.4.*)

A similar approach is used in Netscape Navigator v.4 (aka Netscape Communicator) -- it does not need any koi8 fonts, and can recode between koi8 and Win-1251 'on the fly'. You can choose the appropriate code page via the View/ Character Set menu. Its tuning to Cyrillic is also described in much detail by Paul Gorodyansky on <http://www.relcom.ru/Russification/WinNetscape>.
However, treatment of Cyrillic by Netscape Communicator is generally more buggy in comparison to MS Internet Explorer. Sometimes Cyrillic text on the screen is unreadable even though everything is tuned right, and there is no evident reason why. If it happens, try the following:
  • reload the document
  • switch encoding to a different one, and then back
  • view the document source and try to understand what could cause the problem (maybe the charset was indicated incorrectly)
  • if you still do not understand it, try Internet Explorer instead of Netscape.
HTML documents with explicitly indicated charsets

Settings of Web browsers to view a document in a specific encoding do work properly only if the header of the HTML document does not explicitly indicate a code page. If, however, the header contains a line like that:
<META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=windows-1251'>)
the browser will be automatically set to the code page specified by this tag -- Netscape Navigator v.3.* displays the text with the fonts set for this code page in the Options/ General Preferences/ Fonts menu, while Internet Explorer and Netscape Communicator perform the appropriate recoding. In this case, attempts to set the code page manually via the browser menu do not work. If the charset specified in the header of the HTML document is different from that actually used in the document, the text may be unreadable. For Netscape Navigator v.3.*, this problem can be solved by setting for the encoding specified in the document a font corresponding to the actual code page. Generally, if the text on a Web page is unreadable, try viewing the HTML source via the View/ Document Source menu. It often helps either to read the text or to identify the reason of trouble.
Generally, explicit indication of the charset in the header of an HTML document is a trick that should be used with some caution. In most cases, it works fine, but remember several simple things:
1. no charset is better than wrong charset.
2. Explicit indication of a charset may make the document unreadable for some old versions of browsers that do not recognize the name of this code page.
3. If you convert an HTML document with an explicitly indicated charset to a different code page, don't forget to correct the charset name as well.
4. Don't indicate charset explicitly if you are placing your Web page on a server that may recode it.
5. If your page is in Win-1251 encoding, even with the charset=windows-1251 tag, don't think that it will be readable for everybody. Many browsers made for UNIX do NOT support this charset, and many people working under UNIX generally believe that reading a Web page available only in Win-1251 is a complete waste of their time.
Also, if you want your Cyrillic Web page to be readable for everybody, never use FONT FACE tags. This tag is regarded as a bad style of HTML design because it may cause problems for people who do not have this font, or use a different operating system.


Tune your mailer for correspondence in Russian

    How to choose format for sending Cyrillic messages

      Choose code page

When two individuals are communicating with one another via e-mail, of course they are free to use any encoding acceptable for both of them (and for the mailing system delivering their messages). However, if you wish to send a message to a person you don't know, or to a list of many recipients, you should follow the standards and traditions that have become common on the Internet. The KOI8-R Code Page has become a de-facto standard for sending messages in Russian, and is officially recommended by the charters of Russian-language USENET newsgroups. It may be more convenient to use a different code page for correspondence with your particular addressee, but the general recommendations are as follows:

1. messages should be sent in KOI8-R encoding
2. message header should correctly indicate the code page ('charset=koi8-r')


       8-bit or 7-bit format?

Let us consider separately the problem of choosing a 8-bit or 7-bit format (see paragraph 3) for sending Cyrillic messages.

On the one hand, as already noted, there are email servers that do not transmit 8-bit texts. Some of them automatically convert 8-bit texts to a 7-bit format (e.g. base64), while some of them just 'cut off' the eighth bit. (By the way, KOI-8 code page has an interesting feature allowing the text to remain readable even after the loss of the eighth bit, because the Cyrillic characters in this code page are shifted exactly by 128 from the 'phonetically similar' Latin letters). Furthermore, use of 8-bit texts in the Subject field and in other fields of the header is actually not 'legal' since Internet standards require any 8-bit records in the header to be encoded to a 7-bit format.

On the other hand, it has long been common for the ex-USSR to send e-mail messages in the 8-bit format (including the text in the Subject field), and the vast majority of email servers transmit such messages without any problems. Moreover, many people in ex-USSR still use mailers that cannot automatically recover the original 8-bit form of messages received in the 7-bit quoted-printable or base64 encoding. Finally, messages in their original 8-bit form are smaller than messages encoded to base64 and, especially, to quoted-printable. Therefore, our third advice will be as follows:

    3. send messages in the 8-bit format. If you find that your (or your addressee's) mail server corrupts 8-bit messages, try to enable the mode of encoding 8-bit texts to quoted-printable or base64. If your addressee cannot decode these encodings, try to attach your text as a separate UUENCODEd file.

      Sending text files as attachments

The recommendation to send Cyrillic texts in KOI-8 encoding does NOT apply to the case when the text is attached to the message as a separate file rather than included in the message body. (In Eudora Pro v2.* and 3.*, attached 8-bit texts are inserted in the message body if the option 'Put text attachments in body of message' in the Tools/ Options/ Attachments menu is enabled). The body of a message may be recoded while passing through various mail servers and mailers, which does not happen to attached files. It makes sense to choose the code page for the attached files depending on the operating system used by your addressee (i.e. Windows-1251 for Windows, 866 for DOS, and KOI-8 for UNIX). Keep in mind that messages with the text attached in separate files are less convenient to read and reply than 'usual' messages.


    How to send messages in KOI-8: possible approaches

Thus, it is generally recommended to send Russian messages in KOI8-R encoding. But how to do it if you use Windows?

In fact, one can either type a letter in the KOI-8 encoding using appropriate fonts and keyboard layouts, or type it first in any convenient encoding (e.g., Windows-1251 or DOS) and then convert it to KOI-8. Generally, it is possible to use several approaches:

  • 'Koification' of Windows (i.e. installation of KOI-8 fonts and keyboard layouts for Windows);
  • Use of mailers able to recode incoming and outgoing messages;
  • Use of special 'proxy' servers able to recode incoming and outgoing messages;
  • Use of external recoding programs.

    Typing KOI-8 texts under Windows

To type any text in KOI-8 under Windows, one should have KOI-8 fonts for Windows and a keyboard switcher with appropriate layouts installed. You can use either the native Windows95 keyboard switcher or special software.

      Native Windows95 keyboard switcher

The procedure of installing a KOI-8 keyboard layout for Windows95 is described on Andrei Chernov's Web page <http://www.nagual.pp.ru/~ache/koi8.html> and (in more detail) on <http://www.netsight.net/~ryba/cyrillic/index.html>. After installing such a layout, one can add one more language (e.g. Polish or Afrikaans) to the keyboard switching menu and set a KOI-8 keyboard layout for it. Andrei Chernov proposes even a radical procedure of Windows95 'koification' based on a modification of some system files. As a result, KOI8-R code page could replace one of the character sets supported by Unicode-type fonts (e.g., Central European). After that, new KOI8-R fonts, such as Arial KOI8-R and Times New Roman KOI8-R, become available along with such Win-1251 fonts as Arial Cyrillic and Times New Roman Cyrillic. (However, it probably makes much more sense to install a browser, mailer and newsreader able to understand KOI-8 than to 'koify' the entire operating system).

Unfortunately, the 'native' Windows'95 keyboard switcher is not trouble-free. For example, it may not work in some windows, e.g. in the message composition window of Eudora (for more info about this bug, see the section about Eudora). Moreover, the native switcher does not include any tools to edit keyboard layouts. (An editor of Windows95 keyboard layouts can be found, for example, on <http://www.kiarchive.ru/pub/cyrillic/windows/jkbd9542.zip> -- but it may not work with layouts created by a different tool). Finally, if one has to use more than two or three keyboard layouts (e.g., English, Russian, Ukrainian, and KOI-8), the native Widows95 keyboard switcher becomes quite inconvenient, as since it does not offer different hot keys to access different layouts, and you have to scroll through all of them one by one.

If you are not comfortable using the native Widows95 keyboard switcher, it makes sense to install special software for switching and editing keyboard layouts.


      CyrWin 95

This is a commercial program. It offers a variety of hot keys to access various keyboard layouts, and has a keyboard editor. In addition, it has one more important advantage. If you use the US version of Windows95, you may run into problems when installing programs with a Russian-language interface (for instance, Russian characters in the menus may be unreadable). To solve these problems, it is not always sufficient to install the Multilanguage Support Add-On for Windows95 and Cyrillic system fonts. In some cases, it is necessary to modify some system files of Windows95 (you can read about it in more detail here). Installation of CyrWin'95 automatically does this job and fixes the problems arising with Russified software. 

       ParaWin 95/98

This program is also commercial. In contrast to the old (single-diskette) version of ParaWin 95, it is capable of switching and editing various keyboard layouts. ParaWin 95 indicator replaces the native keyboard switcher on the taskbar and looks very similar, but is much more convenient, with a variety of hot key combinations, and can control the system font character set and Windows<->DOS text conversions. ParaWin 95 also fixes most problems with Russified software under the US version of Windows95 (though probably to a lesser extent than CyrWin 95). For Windows 98, use new versions of Parawin 95 able to work both under Win95 and Win98 (old versions of Parawin95 designed only for Win95 may conflict with Win98).

       WinKey

This program is freeware and quite convenient, and can be used both under Windows95 and under Windows 3.*. It can be downloaded, for example, from <http://www.kiarchive.ru/pub/cyrillic/windows/>. As distinct from ParaWin 95 Pro and CyrWin 95, it does not 'Cyrillize' the system files of Windows95, though includes a number of useful system fonts (e.g. DOS-866 fonts for Windows).
However, it may not work properly with Unicode-type fonts, and so is not suitable for applications like MS Office 97.

    Recoding Cyrillic texts by mailers

Instead of 'koifying' Windows, one can use mailers capable of recoding incoming and outgoing messages. These programs may have the 'internal' and 'external' text encodings differing from one another. It makes sense to choose a code page convenient for text typing and editing under the user's operating system (for example, 1251 under Windows or 866 under DOS) as the 'internal' encoding, and the code page generally accepted for email messages as the 'external' encoding (KOI8-R for messages in Russian). When sending a message, the mailer must convert the code page from 'internal' to the 'external' one (e.g. 1251 into KOI8-R), while upon receiving a message it has to perform the reverse conversion (accordingly, KOI8-R into 1251).

Naturally, to correctly recode incoming messages, one needs to know their code page. In principle, there can be two basic approaches to this problem:

  • assume that all messages are coming in the same 'external' encoding (most likely, KOI8-R). In this case, messages coming in different encodings will be recoded incorrectly.
  • try to identify the code page of the incoming message by its header (header field charset=). In this case, messages with incorrect charsets indicated in their headers will be recoded incorrectly.
Timur Kadyshev's Web page at <http://blue.iris.mipt.ru/timur/trueway.htm> presents an attempt to formulate the requirements that should be met by mailers to eliminate the necessity for 'koization' of Windows. Unfortunately, most mailers do not comply with most of these requirements. Among the best in this respect are Outlook Express and The Bat!. 

     Handling of Cyrillic texts by some mailers

Mailers Comparison Table

Mailer  Message text recoding  Specifies Cyrillic charset as  Problems and comments 
Eudora Pro v.2.2 None 
External recoding or koi8 fonts needed
incorrect: iso-8859-1 (can be hacked to koi8-r)  Very convenient interface; 
Problems with: Unicode fonts, Win95 keybrd switcher, msgs forwarded by Netscape, splitted attachments; Generally not convenient for Cyrillic correspondence
Eudora Pro v.3.*, 4.*  None 
Plugin for win<->koi recoding available
incorrect: iso-8859-1 
(with plugin: koi8-r or windows-1251)
Same as Eudora v.2.*; v.3.* supports multiple accounts and plugins, including the koi8 plugin; v.4.* has Cyrillic bugs that should be patched
Netscape Mail (Mozilla) v.3.*  Incoming: none
Outgoing: win to koi, koi to koi
Replies: koi to koi 
koi8 fonts needed; no external recoding 
koi8-r (may differ from actual charset)  Single-level mail folders; Forwarded msgs may cause problems for Eudora 2.2 users; Generally not convenient for Cyrillic correspondence
Netscape Mail (Mozilla) v.4.* Incoming: any to win
Outgoing: win to koi (version-dependent)
koi8 fonts or external recoding not needed
koi8-r (may differ from actual charset) Multilevel mail folders; Multiple profiles; Counter-intuitive choice of encoding for outgoing msgs in v.4.0* 
MS Internet Mail  Incoming: koi to win, win to win
Outgoing: win to koi, win to win 
koi8 fonts or external recoding not needed
correct koi8-r or
Windows-1251
Single-level mail folders; no choice of encoding when replying; messages with incorrectly specified charset may be unreadable 
Outlook Express v.4.*, 5.* Incoming: any to Unicode
Outgoing: Unicode to any 
koi8 fonts or external recoding not needed
correct koi8-r,
Windows-1251 and other 
Multilevel mail folders; integrated mail and news; multiple accounts; probably most powerful handling of Cyrillic; optional support of koi8-u; most convenient for Cyrillic correspondence
The Bat!  Incoming: any to win
Outgoing: win to any 
koi8 fonts or external recoding not needed
correct koi8-r, koi8-u,
Windows-1251 and other
Small but quite powerful; multilevel mail folders; multiple accounts; handling of non-standard charsets; supports koi8-u; embedded support of PGP encryption; limited support of HTML-formatted messages; most convenient for Cyrillic correspondence
Pegasus Mail  None 
After tuning: Incoming: koi to win; Outgoing: win to koi
koi8 fonts or external recoding not needed if tuned
Tunable to koi8-r  Quite powerful, but interface not too convenient
Needs tuning
multilevel mail folders; 
Problems with 8-bit headers; 
Generally not convenient for Cyrillic correspondence
Forte Agent
(not Free!) 
After tuning: Incoming: koi to win; Outgoing: win to koi and other
Tunable for external or internal recoding
tunable to koi8-r, koi8-u and other  One of the best newsreaders but may be also used as mailer;
single-level mail folders;
For tuning, see www.glasnet.ru/~kazarn/soft.htm and references therein

    Eudora Pro (v.2.2 & 3.*)

Eudora does NOT perform any recoding of Cyrillic texts. To set up Eudora for Cyrillic correspondence, it is sufficient to choose Cyrillic fonts corresponding to the code page you are going to use (usually, koi-8 or Windows-1251) in the menu Tools/ Options/ Fonts & Display. The mode of conversion of outgoing 8-bit messages to a 7-bit form is controlled via the menu Options/ Sending Mail/ May Use Quoted-Printable. If you are attaching a Cyrillic text file, look at the option Put text attachments in body of message in the Tools/ Options/ Attachments menu: the text will be incorporated into the message body if this option is enabled, and attached as a separate file if it is disabled.
Problems.
What I like most of all in Eudora is its very convenient user interface -- probably the best of all mailers I know. However, Eudora also has some drawbacks:
  • In the header of outgoing Cyrillic messages, Eudora indicates an incorrect charset ('charset=iso-8859-1' instead of 'koi8-r' or 'Windows-1251'). In some cases, it results in an improper processing of such messages by a recipient's mailer. This can be corrected by editing the Eudora.exefile with a binary editor. (You can fix Eudora Pro v.2.2 with this crack; cracks for other versions could be found on Andrei Chernov's Web page). However, do not change the charset to koi8-r if you send your messages in Windows-1251!!
  • Sometimes Eudora does not want to display Cyrillic Unicode fonts: if you set the font script in the Tools/ Options/ Fonts & Display menu to Cyrillic, it switches it 'by itself' back to Western. In this case, try using different screen fonts, especially non-truetype and non-Unicode fonts.
  • Also, Eudora may block the 'native' Win95 keyboard switcher in the message editing window. To switch the keyboard layout to Russian, one has to go to another Eudora's window, switch to Russian there, and then go back to the message editing window. This problem can often be fixed by changing Eudora's screen font to a non-Unicode or a non-truetype Cyrillic font. However, a more radical solution is to use a different program (such as ParaWin 95 Pro, CyrWin 95, or WinKey) instead of the 'native' Win95 keyboard switcher. These programs are much more convenient, especially if you have to use more than two keyboard layouts.
I would also like to mention here another drawback of Eudora not related to the topic of Cyrillic texts. Eudora automatically recodes attachments to incoming messages and saves them in separate files (usually in the /Eudora/Attach directory). However, if you receive a message forwarded by Netscape Mail, Eudora does not show its text in the message body, but saves it in a separate file with a long name defined by the message subject, and in most cases you cannot open it just by double-clicking it in the message window. Of course, it is very inconvenient, and I do not know a simple way to solve this problem.
Eudora's automatic recoding of binary attachments is also inconvenient when you receive a uuencoded file sliced into several pieces. In this case, the first piece is automatically uudecoded and saved as a binary file. To recover the original file, you have to uuencode this first piece back, to append to it the other pieces in the appropriate order, and then to uudecode the obtained file.

An advantage of Eudora Pro v.3.* over the previous versions of Eudora is its support for multiple 'personalities' with different identities, pop, smtp, and other settings. It handles Cyrillic in essentially the same way as Eudora Pro v.2.2, including the iso-8859-1 charset erroneously indicated for outgoing Cyrillic messages. Cracks are available on Andrei Chernov's KOI8 page.
Another advantage of Eudora Pro v.3.* is that its capabilities can be extended with the help of various plug-ins -- for example, the koi8 plugin designed by Eugene Surovegin. This plugin adds to the menu commands for recoding messages between koi8-r and windows-1251; also, it specifies the charset of outgoing Cyrillic messages as koi8-r instead of iso-8859-1.
Eudora can be (and often is) used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back. If you use Eudora v.3.* with the koi8 plugin, it may be convenient to receive incoming mail unrecoded and to send outgoing mail via a recoding smtp proxy. Thus, you will be able to recode incoming messages with the koi8 plugin whenever necessary, while your outgoing mail will be automatically recoded by the proxy to koi8-r.


      Netscape Mail (Mozilla) - v.2

By now, Netscape Mail (v.2) is quite obsolete. It does NOT recode Cyrillic texts, either. Its setup for Cyrillic correspondence is same as that used for Netscape Navigator (v.2). The Cyrillic Code Page (charset) in the message headers is indicated incorrectly.

       Netscape Mail (Mozilla) - v.3

Its setup for Cyrillic correspondence is similar to that used for Netscape Navigator (v.3) browser. If Document Encoding in the Options menu is set to Cyrillic (Win1251), outgoing messages can be typed in Windows-1251, and upon sending they will be recoded to koi8-r, and their headers will indicate 'charset=koi8-r'. If you set Document Encoding to Cyrillic (KOI8-R), you will have to type your message using a KOI8 keyboard layout, and it will be sent in koi8-r without recoding. Incoming messages are NOT recoded.
Netscape Mail v.3 should NOT be used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back.

The mode of sending messages in 7-bit or 8-bit form is controlled in Netscape Mail via the menu Options/ Mail and News Preferences/ Composition/ Allow 8-bit.
When messages are forwarded with Netscape Mail, the original text by default is sent as an attachment, with the attachment file name defined by the message subject. It is very inconvenient for recipients who use Eudora (and maybe some other mailers). If you want to make their life happier, forward messages as quoted text via the Message/ Forward Quoted menu, instead of just pressing the Forward button.


       Microsoft Internet Mail

This mailer, along with Microsoft Internet Explorer (v.3), is included in Windows95 OSR2, but can be also downloaded and installed separately from the Microsoft Web site <http://www.microsoft.com/ie/>, and may work without IE. This mailer allows one to use Windows-1251 as an 'internal' encoding and send messages in Windows-1251 or KOI8-R, according to one's choice, with the correct charset indicated in the header. The code page for reading incoming and sending outgoing messages is selected via the View/ Language menu. (If you reply to a message you received, this choice is not available -- the message will be sent in the same encoding as the one you received). To send Cyrillic texts (including those in the 'Subject' field) in the 8-bit form, 'Mail Sending Format' in the Mail/ Options/ Send menu should be set to 'Plain text', and in 'Settings' one should set 'Message Format'='MIME' and 'Encode text using'='None', and enable the option 'Allow 8-bit characters in headers'.

If your mailer is Microsoft Internet Mail, you will not need to 'koify' Windows or to work via recoding proxy servers. Make sure that you connect directly to the pop and smtp server of your provider, without any recoding proxies. Otherwise, Microsoft Internet Mail may not work properly. 


      Netscape Mail (Mozilla) v.4 (aka Netscape Messenger)

Similarly to MS Internet Mail & News (and unlike Netscape Mail v.3), Netscape Messenger v.4.* does NOT need any koi8 fonts and keyboard layouts. It recodes messages between koi8-r and Win-1251 'on the fly.' You can select a code page via the View/ Encoding or View/ Character Set menu. However, there is one thing that may seem confusing: if you want to send a message in koi8-r, you should select in the composition window the Win-1251 encoding; then, you will be able to normally type it using usual Win-1251 fonts and keyboard layouts, and it will be automatically converted to koi8-r when you send it. What happens when you select in the composition window a different Cyrillic encoding depends on the version of Netscape. For versions 4.0*, when you select the koi8-r (or Western) encoding and type your text in Win-1251, the mailer 'thinks' it's already in the proper encoding, and does not recode it to koi8-r. Therefore, to send a message in koi8-r, you should select the Win-1251 encoding; if you select koi8-r (or Western), it is sent it in Win-1251. It is quite confusing for many people, but this is the way it works. For versions 4.5*, 4.6*, it does not matter what Cyrillic encoding you choose when you compose a message -- it is always sent in koi8-r.
Netscape Mail v.4, as distinct from the previous versions, supports multilevel mail folders. It should NOT be used with pop/smtp proxies that recode incoming and outgoing Cyrillic messages from koi8-r to Windows-1251 and back.
To make Netscape Messenger send messages in the 8-bit format, without encoding to quoted-printable, one should go to the menu Edit/ Preferences/ Mail & Newsgroups/ Messages/ and set the switch Send messages that use 8-bit characters: to As is.

Some problems and solutions.

Problem: Netscape Messenger v.4.5*, 4.6* is sending the Subject field encoded to 7-bit. How to make it send it in the plain 8-bit format?
Solution: This setting cannot be controlled from the menus. To fix it, one should open the prefs.js file (it should be in the directory ...NetscapeUsers<username>) and add the following line:
user_pref('mail.strictly_mime_headers', false);
Another line also worth adding to this file is
user_pref('mailnews.start_page.enabled', false);
-- It prevents Messenger from automatically connecting Netscape NetCenter to display the Welcome message.
For more details, see Paul Gorodyansky's instructions on
http://www.relcom.ru/Russification/WinNetscape/.


      Outlook Express

Outlook Express is included in Microsoft Internet Explorer v.4.* and 5.*, and it is much better than MS Internet Mail & News coming with IE v.3.*. Its layout is tunable and can be made quite convenient (though, in my opinion, not as convenient as Eudora's). It has a system of multilevel folders for organizing your mail archive and, which is even more convenient, can import the existing mailboxes, address books, and account settings from Eudora, Netscape, and MS Internet Mail.
You can tune Outlook Express to Cyrillic via Tools/ Options/ Read/ Fonts. Choose there Cyrillic as the default charset, and set Mime Encoding to KOI8 if you plan to write mainly in KOI8. When you are reading or sending messages (including replies), you can easily select the proper encoding via the View/ Language menu. In the Tools/ Options/ Send menu, set Mail sending format to Plain Text, and then in the Settings set Message format to MIME and check Allow 8-bit characters in headers (if you don't want your 8-bit subjects to be converted to 7-bit). If you want to send your binary attachments uuencoded, choose Uuencode instead of MIME. However, in this case there will be no MIME fields in your message's header, and the charset will not be specified.
An interesting feature of Outlook Express is charset remapping: when you view messages with certain charsets in different encodings, you are prompted to set this remapping as default for messages with this charset. You can see and modify the list of remappings via the Tools/ Options/ Read/ International settings menu.
Of course, if you use Outlook Express, you should NOT connect to the pop and smtp server of your provider via recoding proxies to avoid multiple recoding.

Some problems and solutions

Problem: Sometimes message text is unreadable even if the Language is set to the proper charset.
Solution: Try switching to a different charset (say, Win-1251 instead of koi8-r) and then back to the correct charset.

Problem: Some or all Cyrillic characters, especially in the Subject field, are replaced with question marks when printed, though they are normally seen on the screen.
Solution: This problem (and other problems with Cyrillic printing) may occur for some beta versions of OE4 and OE5. Upgrade to a newer version. You may also try different fonts, though it does not always work. It may be helpful to go to the  Internet Explorer (menu View/Internet Options/General/Accessibility/Formatting) and set Ignore font styles specified on Web Pages.

Problem: Sometimes OE5 shows Cyrillic headers in the list of newsgroup postings in a wrong encoding (not recoded from koi8 to win-1251). Switching the encoding has no effect.
Solution: This is a well-known bug of OE5, but techniques of fighting it are not always effective and sometimes look shamanistic (see the following list of actions):
- before downloading any new messages, go to the Inbox or Outbox and view any messages kept there (you may check the box 'When starting, go directly to my Inbox folder' in the Tools/ Options/ General menu);
- go to the Tools/ Options/ Read/ Fonts menu and set Cyrillic (Windows) as the default encoding;
- go to the Tools/ Options/ Read/ International Settings menu, set Cyrillic (KOI8-R) as the default encoding, and check the box 'Use default encoding for all incoming messages'.
- go to the Tools/ Options/ Send/ International Settings menu and set Cyrillic (KOI8-R) as the default encoding.
- this all may (or may not) take effect only after you reload the list of messages.

Problem: When quoting a message you a replying to, OE5 sometimes does mark the quoted text with '>', but sometimes does not.
Solution: This happens when you receive a Rich-Text or HTML-formatted message. It cannot be fixed. Ask your correspondent to send you messages in the plain text format (with Outlook Express, one should go to the Tools/ Options/ Send/ Mail sending format and News sending format menus and set them to 'Plain text').


      Pegasus Mail for Windows

Despite being freeware and small, Pegasus Mail is quite powerful. It has a lot of various options and controls, which offer some interesting opportunities for experienced users but can sometimes be confusing for beginners. Pegasus Mail can use Windows-1251 as its 'internal' encoding and KOI8-R as the 'external' code page. However, to recode incoming and outgoing messages correctly from Windows-1251 to KOI8-R and back, one should modify the configuration files specifying the recoding tables (initially, the code page name is incorrectly indicated as 'koi-8r' instead of 'koi8-r'). This procedure is briefly described on Andrei Chernov's Web page. Software for Pegasus Russification can be found on http://severov.atom.ru/guests/pegasus/.

Pegasus Mail does NOT allow sending Cyrillic texts in the Subject field in the 8-bit format. To send the body of message in the 8-bit form, one should enable the option 'Allow 8-bit MIME message encoding' in the Tools/ Options/ Advanced Settings/ menu (after that, Pegasus Mail will warn you that sending texts in the 8-bit format is formally illegal). In the 'MIME character set' window one should type 'koi8-r'.


      The Bat!

The Bat! is a small (the size of its installation file varies from version to version and often fits into a single 1.44 Mb diskette) but quite powerful and Cyrillic-friendly shareware mailer. It has menus in several languages (including Russian), supports multiple accounts (that can be protected with different passwords), multilevel nesting of mail folders, customizable templates for different kinds of messages, and a variety of Cyrillic code pages (koi8-r, Win-1251, DOS-866, ISO-8859-5, and Mac). It can handle also none-standard encodings, such as '1251 autoconverted from KOI8' and 'KOI8 autoconverted from 1251'.
A specific feature of The Bat! is that in the 8-bit mode it sends 8-bit subjects in the following form:
Subject: =?koi8-r?Q?[8-bit text is here]?= .
One more advantage of The Bat! is that it supports the Ukrainian koi8-u encoding (since v.1.18). If you use an earlier version, it is quite easy to add a support for koi8-u -- you just have to install a koi8-u conversion table. (I made it myself -- please let me know if you find any errors in it). To install it, you need to copy it to the directory where The Bat! is installed (usually it is Program Files/ The Bat!), and then go to Options/ XLAT Tables/ Add.
Actually, I think The Bat! is one of the best mailers for correspondence in Cyrillic - even though it does not include a newsreader, has a limited support for the HTML format of mail messages, and does not support UTF-8 and UTF-7 formats available, for example, in Outlook Express.
To tune The Bat! for 8-bit correspondence in Cyrillic, go to the following menus:
Account/ Properties:
Transport -- 8-bit characters are treated: Without changes.
Options -- Allow 8-bit characters in message headers.
Templates/ New message -- Use character set: Cyrillic (KOI-8).

       Bmail/UUPC for DOS

Many people in ex-USSR are still sending and receiving e-mail with such mailers as Bmail/UUPC (or Demos Mail/UUPC) for DOS. These mailers automatically dial-up and connect to the server, send outgoing messages, receive incoming messages, and then immediately disconnect. If connection is broken during file transfer, they can reconnect and resume transferring the file from the position where the connection was lost. For this reason, the 'UUPC' mode of sending and receiving email can work satisfactorily even for telephone lines of a very poor quality (and, furthermore, it is usually much less expensive in comparison to the on-line Internet access).

If you want to send an email message to a recipient who uses Bmail/UUPC or a similar program, you should keep in mind that such mailers cannot automatically decode 7-bit representations of 8-bit texts in the quoted-printable or base64 form, so you should send your messages in the 8-bit form. Also, these programs automatically decode only UUENCODEd attachments.

In early versions of Bmail/UUPC (up to 5.09g), all outgoing messages were recoded from DOS-866 to KOI8-R, and all incoming messages -- from KOI8-R to DOS-866. More recent versions allow one to choose the recoding table. However, the program recodes according to the selected table not all incoming messages, but only those with certain charsets indicated in the header. As a result, messages with incorrectly specified charsets (or 'multipart-mixed' messages) often come to the mailbox without recoding. Practice shows that it is often more convenient to use a version that would recode all messages regardless of the charsets indicated in their headers. As far as I know, among the recent versions of Bmail/UUPC, the most convenient in this respect is version 6.18m (which does not recode only 'multipart-mixed' messages).

Generally, to minimize your troubles with Bmail, I believe it makes sense to switch to the most recent version of Demos Mail for DOS you are able to find -- it will have some capabilities not available in Bmail.

If, for technical or financial reasons, you connect to your provider in the UUPC mode, but would like to use such POP/SMTP mailers as Eudora, Internet Mail, Netscape Mail, Pegasus, etc., you can try using a POP,SMTP/UUPC gate for Windows95 called Mailserver. Mailserver v.2.12 was free; there was a time when it could be automatically received by email in reply to an empty message sent to <mailserver@karst.kiev.ua>, but it is no longer supported by the developer, and I do not know a place on the Internet where you can find it (though I have a copy of it somewhere). If you install Mailserver, you can completely disable recoding of the Cyrillic text in the UUPC program and use for its recoding capabilities of a POP/SMTP mailer (e.g. The Bat!, MS Internet Mail, Outlook Express, Eudora with a koi8 plugin, Pegasus Mail, etc.). Versions 3.* are commercial, but much more advanced -- they support also the NNTP protocol and can completely take care of converting code pages and replacing charsets. A demo version of Mailserver 3.* can be downloaded from < http://www.kiarchive.ru/pub/windows/internet/mail/MailsrvD.exe>. 


     Recoding Cyrillic texts by proxy servers

To help their clients solve problems with recoding of Cyrillic texts, some providers install special recoding servers (also called recoding proxies). If you specify these proxies as POP- and SMTP-servers in the settings of your mailer, all incoming and outgoing mail passing through these servers will be automatically recoded as programmed by your provider. The function of such servers is often evident from their name: for example, pop-1251.gu.net recodes incoming messages from koi8-r to Windows-1251, smtp-1251.gu.net recodes outgoing messages from Windows-1251 to koi8-r, while a server called news-1251.gu.net recodes incoming and outgoing USENET postings).

If you work via a proxy server, 'koification' of Windows becomes unnecessary. However, if somebody sends you a message in a different encoding than the one implied by the proxy server (e.g. Windows-1251 instead of koi8-r), or with an incorrectly specified charset (if your proxy takes the charset field into account), it will be recoded in a wrong way. So, if you work via a recoding proxy, you will most probably have to learn how to use special recoding programs that can recover incorrectly recoded texts. 


     Recoding Cyrillic texts with external programs

      Some recoders for DOS

LOTS of various programs are available for recoding Cyrillic texts under DOS. For example, a small program called Recode can be downloaded from <http://www.dubna.ru/demo/src/recode.zip>. In addition to the usual function of converting the text between various Cyrillic code pages, it can also recover the initial 8-bit form of the texts encoded to Quoted-printable or 'HTML-style' (in the latter case, the text can look approximately like that: '&Acirc;&egrave;& ntilde;&icirc;&ecirc;&icirc;&icirc;&aacute;&eth' etc.). This program is especially helpful for people using old-style mailers like Bmail or Dmail/UUPC, which do not understand Quoted-printable and HTML. However, Recode does not work with 'HTML-number-style' (I mean texts that look like that: '& #132;& #135; & #134; & #131;', but without spaces between & and #). Evgeny Kotsuba's decoder called DC (http://g23.relcom.ru/g23/9749/dc) can handle 'HTML-number-style', convert to/from Translit, and has some capabilities for automatic code page recognition. Note that DC needs Dos4gw.exe (DOS protected-mode utility), which is not included in the DC package, but comes with many DOS-based games.

      CP Tuner 95 (obsolete)

CP Tuner 95 is a commercial program designed as a plug-in for Microsoft Word 7.0 (it does not work with the Russian version of Word 7.0 and with Word 97). Its installation creates in Word 7.0 an additional menu item named 'CP Tuner'. If you select from the 'CP Tuner' menu 'Suggestions', the program will try to detect the code page used in the current window of Word 7.0. The Source Code Page and Destination Code Page are set in the menu CP Tuner/ Set Codes, after which the conversion is performed upon the command Smart Convert. If in the Source Code Page window several (or all) code pages are checked, the program will try to detect the source code page used in the document, and convert the text from this code page to the required one. It is very convenient, but in some cases this 'automatic conversion' does not work. For example, if your mailer or proxy-server automatically recode incoming messages using the table ' KOI-8 -> Win1251' (implying that all messages come in the KOI-8 encoding), Win1251 messages will be recoded in the same way, thus yielding a text not corresponding to any standard Cyrillic code page. In such cases, CP Tuner 95 is unable to automatically detect the encoding and recover the original text. In this particular case, one would need to set conversion 'from Win1251 to KOI-8', which would restore the initial Win1251 encoding.

      CP Tuner 97

CP Tuner 97 is a significantly improved version of CP Tuner 95. CP Tuner 97 is capable of working not only under Word 7.0, but also under Word 97, and as a stand-alone program. It has a variety of predefined text recoding schemes (including multiple recoding). If in the CP Tuner 97 menu Tools/ Options/ you choose the mode 'On Decode command: Autodetect the best named scheme and use it', then upon the Tools/ Decode command the program will try to automatically convert the text to Windows 1251. In most cases (including the one described in the previous paragraph), the automatic mode of recoding works successfully. Otherwise, you can supplement the predefined recoding schemes with your own ones.

      Tot-Recode II and some other decoders

The idea and capabilities of Tot-Recode II seem to be quite similar to those of CP Tuner 97 (though user's interface looks somewhat different). However, the main advantage of Tot-Recode II is that it is FREE.
It can also decode quoted-printable to 8-bit (but you should do it in the manual mode; it is also convenient to create a new scheme which includes decoding from QP to 8-bit and from KOI to Win).
Like CP Tuner 97, it can automatically 'transliterate' Cyrillic texts (i.e. replace Cyrillic characters with 'phonetically similar' Latin letters or their combinations.


ECoder is a very small time-limited shareware program (<100 Kb), which nevertheless also has pretty advanced decoding capabilities.
Agama Mail Reader is another commercial decoder.

      Online decoders

Some Cyrillic recoders are available online, so that you can access them via the form-based WWW interface -- for example, Automatic Online Decoder (www.design.ru/free/decoder) or Ilya Sandler's Universal Cyrillic Converter (www.friends-partners.org/~isandler/cyrconv/cyrconv.html).

Some comments on the problem of Ukrainian KOI8

One problem with koi-8 (which does not arise with Win-1251) is that the standard koi8-r code page does not include any Ukrainian-specific letters. The Ukrainian version of koi8 known as koi8-u has been used for several years, but only in April 1998 it was officially registered by RFC 2319. Thus far, recoding to/from this charset is NOT yet supported by many mailers, browsers, proxies, etc. which do support koi8-r. So, when you send a message or make a Web page in Ukrainian using the koi8-u encoding, it is very likely that some letters in it will not be read properly by some of your addressees (especially in case of any Win1251<->KOI-8 recoding by a mailer, provider's proxy, or Web browser). This situation has resulted in many Ukrainian Web pages appearing on the Internet ONLY in the Windows-1251 encoding.
Up to now, the choice of browsers, mailers, and newsreaders for Windows 95 that support koi8-u is very limited. To read koi8-u Web pages under Windows with Netscape v.3.*, you should install koi8-u fonts. They are available, for example, on the official KOI8-U page. For tuning MS Internet Explorer 4.* and Outlook Express to koi8-u, you should install from the Microsoft Web site the Pan-European language support add-on. The Ukrainian koi8-u charset is indicated there as koi8-ru; if you want the correct charset name to be indicated in the header of your message or postings, you should edit the System Registry and change 'koi8-ru' wherever you find this string to 'koi8-u'.
An instruction for dummies on installation of the koi8-u support in IE4 and OE4 can be found here (in Ukrainian, Win-1251 encoding).
However, it is possible to do the same much faster, without having to download and install the full Pan-European support (I believe it is about 1 Mb). You actually need only one small file which can be found here (the instruction is inside).

Also, it is possible to find on the Internet some koi8-u hacks for specific versions of Netscape v.4.*.
A mailer called The Bat! supports koi8-u since v.1.18; if you use an earlier version, it is quite easy to add a koi8-u conversion table. (I made it myself -- please let me know if you find any errors in it). To install it, you need to copy it to the directory where The Bat! is installed (usually it is Program Files/ The Bat!), and then go to Options/ XLAT Tables/ Add.
Information about KOI8-U is available also on http://cad.ntu-kpi.kiev.ua/multiling/koi8-u/index.html.
Look at the KOI8-U character map copied from this page.


References

Andrei Chernov's Web page: a Bible on 'koification' of various operating systems, mailers, browsers, and newsreaders. <http://www.nagual.pp.ru/~ache/koi8.html>

How to Russify Netscape Navigator for Windows <http://www.relcom.ru/Russification/WinNetscape> -- WinNetscape Russification Bible by Paul Gorodyanski

Paul Gorodyansky's Web page http://ourworld.compuserve.com/homepages/PaulGor/ - Russification of Netscape, Office 97 and other helpful stuff

Konstantin Kazarnovskii about fonts and languages <http://www.glasnet.ru/~kazarn/fonts.htm> -- very educational and well written, definitely worth reading

Lots of helpful information on Russification of various programs and operation systems <http://www.siber.com/sib/russify/>

Win95 Russification FAQ - <http://www.hackzone.ru/rtw95>

WinNT Russification - <http://rwntug.quarta.msk.ru/questions.htm>

Russification of Macintosh computers <http://www.relcom.ru/Russification/MacKoi8-r>

How to tune Forte Agent to work with Cyrillic <http://blue.iris.mipt.ru/timur/agent.htm>

How mailers should handle Cyrillic messages (and how they actually do it) <http://blue.iris.mipt.ru/timur/trueway.htm>

A lot of useful programs for working with Cyrillic texts <http://www.kiarchive.ru/pub/cyrillic/>

Some frequently asked questions about Russification of DOS and Windows <http://www.maths.monash.edu.au/~kig/ruscom/general/rusfaq.html>

Installation of various keyboard layouts for native Windows95 keyboard switcher <http://www.netsight.net/~ryba/cyrillic/index.html>

Internet standards (RFC -- requests for comments) <http://www.rfc-editor.org>
RFC 1489: description of KOI8-R code page
RFC 1521: MIME format for messages sent over the Internet
RFC 1522: MIME format for non-ASCII characters in the message headers
RFC 2319: Ukrainian Character Set KOI8-U
RFC 2152: UTF-7 -- A Mail-Safe Transformation Format of Unicode
RFC 2279: UTF-8, a transformation format of ISO 10646


Acknowledgements

The author is grateful to Paul Gorodyansky, Konstantin Kazarnovskii, Igor Manyuk, Anthon Lobastoff, and Andrew Tooziak for their helpful explanations.

To the main page

  • Things Cyrillic
  • Cyrillic in e-mail, www, etc.