How to make Pidgin Unicode-compatible
Pidgin is an open source cross-platform instant messaging client that offers a straightforward way to access most popular chat networks from within one programme. Unfortunately, getting non-Latin language to display correctly on Pidgin can be a challenge.
If you have configured Pidgin to use a Latin font (Tools –> Conversation –> Font), and you receive a message in Chinese, for example, Pidgin does not automatically fall back on a font that can display Chinese characters. To see this, paste the following into Pidgin's chat window:
汉语 (Simplified Chinese), 漢語 (Traditional Chinese), 日本語 (Japanese), カタカナ (Japanese Katakana), ひらがな (Japanese Hiragana), 한국어 (Korean)
Depending on the font you are using, some or all of these characters will not show correctly. To get around this, you need to set Pidgin to use a font that can display Latin text and characters in the language you need. Recommendations are: “Sim Sun” for Simplified Chinese, “MingLiU” for Traditional Chinese, “Mincho” for Japanese, “Batang” or “Guilim” for Korean. All are Microsoft TrueType fonts that come with one or another version of Windows, Office or Microsoft's Office Proofing Tools.
But what if you need a font for more than one language? For Chinese, Japanese and Korean (CJK), there are two fonts that I know of that can handle most East Asian character sets: “Batang” and “Arial Unicode MS”. As far as I remember, Batang doesn't cover all Japanese and all Chinese characters, but is more widely available and probably okay for occasional use. “Arial Unicode MS”, on the other hand, looks more modern and blends in better on a Western system. In order to this to work, you will also need to tell Pidgin to ignore font formats of conversation partners (Tools –> Preferences –> Conversations and uncheck “Show formatting on incoming messages”).
An alternative and more elegant solution to the problem was presented here on a blog called cappella.blogspot.com. The idea is basically to manipulate the config files of the GTK framework used by Pidgin and to change the entries for the fallback fonts used for Unicode characters.
Note that even though Pan-Unicode fonts cover a wide range of Unicode code points, that does not mean that they're suitable for users who simultaneously need to work with, e.g., Simplified and Traditional Chinese, or Chinese and Japanese. Unicode encodes characters and not glyphs, so East Asian Han characters (Hanzi 汉字, or Kanji 漢字) that share the same origin also share the same Unicode code point, even though the actual character in everyday use may be written differently in China, Japan and Korea. Although many characters are identical, differences between writing styles range from a single stroke to a completely different composition. The example above shows the Chinese (left) and Japanese (right) representation of a character at the same Unicode code point. Pan-Unicode fonts thus have to default to one standard representation, and both Arial Unicode MS and Ascender's Droid font use Simplified Chinese characters for the shared code points, for example. This is to be taken into account when choosing a font. For more information on Han character unification, see:
- Wikipedia: Unihan
- Unihan.com.cn: CJK Ideographs Comparison
- Unicode.org.cn: Unihan Database Lookup Tool