Andy's Pali Page

Unicode and Pali


updated July 24, 2000

Unicode is a computer standard that has one unique "computer representation" for every symbol ever used in any human language. This will bring big benefits to software companies that develop software for international markets, and allow people who work "multi-language" to share documents easier.

Unicode does NOT make it much easier to create multi-language documents.  A person will still need to set the language for every section of their text, remember keyboard layouts, and remember special keystroke combinations. The key issue: there simply aren't enough keys on the keyboard to make typing easier when a person uses more than one language. There are several approaches that you can take when using Unicode characters to create Pali documents.

For completeness, here is a link to the Unicode home site: http://www.unicode.org/

An excellent site that makes it easier to find the characters for any language and their codes is the link "Character Lists and Test Pages for Unicode Ranges" at: Alan Wood's Unicode Resources

Get a Unicode Font

The first thing you will need is a font that can display the Unicode characters in your word processor.

The first thing to note is that not all Unicode fonts can display all of the Unicode characters! The huge Unicode standard is organized into "Code Pages" (Subsets), so that font creators can create displayable characters only for certain parts of the Unicode characters if they wish.

You may already have a Unicode font on your computer! Check to see if the "Lucida Sans Unicode" font is on your computer. This does not have all of the "Code Pages" (subsets) necessary to display all of the Pali characters.

This is an example of a Unicode font that can display all Unicode characters: Microsoft offers a font called "Arial Unicode MS" that you can use with Word 97 or higher. (i.e. if you have Word 97 - or higher - on a Windows 98 computer - or higher - you can use the font).

The download is on the MS Publisher web site, but it is a general-purpose Unicode font, containing 40,000 Unicode characters. The font is a 13 Mb. download ( 1 hour at 28.800 download speed), and needs 28 Mb.(yes, that's Mb.!) on your disk after you install it.

If you wish, click on this link to download the Arial Unicode MS font.

Installing the font

You install the font like any other font. Instructions on how to install a font are at the bottom of my web page Pali Fonts.

Using the Unicode font with Microsoft Word (I don't know about WordPerfect)

I'll start by saying that the "Arial Unicode MS" font is only one Unicode font. The techniques below will work only work with ANY Unicode font that implements at least the "Latin-1", "Latin Extended-A" and "Latin-Extended-Additional" subset character pages of Unicode.

Secondly, the techniques below will allow you generate the Pali characters independent of which "natural languages" you are working in (as long as you have a Unicode font selected for that part of your text).

You have several choices:

1. Customize your keyboard to insert the symbols
2. Create macros that generate the symbols
3. Get someone to create a Pali keyboard driver (.nls) file if most of your typing is in Pali.

Customizing Your Keyboard to Insert the Symbols

1. Select the font "Arial Unicode MS" in your document (like any other font)
2. "Insert"->"Symbol..." will show you the symbol dialog. On the right there is a drop-down list called 'Subset'. You will need to find the characters (see table below), select them, and then use the button at the bottom of the dialog called 'Shortcut Key...' to assign a unique keyboard key sequence to generate the character. Make sure you write this down!
3. After this, you will be able to get the Pali Unicode characters from the keyboard.

Create macros that Generate the Pali Characters

1. Create macros ("Tools"->"Macro"->"Macros" or just Alt-F8).
2. Assigning a keystroke to your macro is "Tools"->"Customize..."->button at bottom "Keyboard...", then in the 'Categories' list box select 'Macros'. This produces a list of your macros. Select the macro for the character and assign a unique keystroke sequence. Be sure to write this down!

Here is the code that you will need for your macros:

Sub Pali_m_dotunderneath()
'
' Pali_m_dotunderneath Macro
' Macro created 06/09/00 by Andy Shaw
'
With Selection

    .Collapse Direction:=wdCollapseStart
    .InsertSymbol CharacterNumber:=7747, Font:="Arial Unicode MS", Unicode:=True

End With

End Sub

Creating a Pali keyboard Driver

It would be sensible for anyone using "mostly Pali" in their documents to have a Pali keyboard driver. If you are using two languages (i.e. Pali and English), a program like palitrans is much easier to use. palitrans is not yet available for Unicode. If you wish to have the source code, just let me know. palitrans is "open source" under the terms of the GNU General Public License.

A Handy Little Program for Multi-lingual people

Microsoft has a very useful little program that will show you how the keys are mapped/assigned when you change keyboards (English, French, etc). You can use the "visual keyboard" and your mouse to type in characters. It works with Office 2000. You can download it at:

Microsoft Visual Keyboard

Understanding the Suggested Keyboard Assignments in the Table below

These keyboard assignments have been designed to be "easy to remember", and as easy as possible to type. They have been tested on a US English Windows 98 system using Word 97.

The capital letters are used because that is what you will see when you do the keyboard assignments in Word 97. All that a capital letter means is the key. For instance, "A" means the "a" key, not capital "A" (Shift-A).

The assignments for the letters ~n, ~N, "n, and "N are a bit different than the rest of the keys. I have used "T" for tilde, and "O" for overdot.

Note: do not use text formatting keystrokes (like Ctrl-B for bold) when typing in the Unicode font you have chosen for the keystroke assignments. Enter the text, then use the mouse and toolbar buttons for text formatting (bold, underline, indentation, etc.)

Once you've done a couple of keys, the whole setup should take about 15 minutes to create and test.

Now, lets set up and test two keys ("n and "N).

1. From the main menu "Insert"
2. In that drop-down menu "Symbol..."
3. In the upper left corner of the dialog box that just appeared you will see "Font:" Choose the Unicode font you prefer (for instance, "Arial Unicode MS").
4. In the upper right corner, you will see "Subset:" Most of the subsets are at the top of the list, but "Latin Extended-Additional" is about 1/3 of the way down the list and a bit hard to find. Locate it, and select it.
5. Now we must find the character "n
6. Once we find the character, we select it with the mouse.
7. At the bottom there is a button called: "Shortcut Key...". Press the button.
8. The cursor should be automatically positioned in the data entry area called "Press new shortcut key". If the cursor is not there, click there.
9. Hold down the Ctrl key, and WITHOUT releasing it, press the "O" key, press the "N" key, and release the Ctrl key. You should see exactly the text in the table below. ("Ctrl-O,N").
10. Press the button "Assign"
11. Press the button "Close"

Now, find the "N character and repeat steps 5 to 8.

9.b. Hold down the Shift key AND the Ctrl key at the same time and DO NOT release them, press "O", press "N", and release the Shift and Ctrl key.
You should see exactly this: "Ctrl-Shift-T,Shift-T"
10.b Press the button "Assign"
11.b Press the button "Close".

Now, press the button "Close" to close the "Symbol" dialog, select the font "Arial Unicode MS".

Press the Ctrl key and KEEP it held down, press "O", press "N", release the Ctrl key. You should see the Pali character for "n

Press the Shift key AND Ctrl key and DO NOT release them, press "O", press "N" and release the Shift and Ctrl keys. You should see the "N character.

Repeat for the rest of the keys.

Character (as tranliteration)

Character Number

Suggested Keystroke Assignment

Unicode Subset

Aa

256

Ctrl-Shift-A,Shift-A

Latin Extended-A

aa

257

Ctrl-A,A

Latin Extended-A

Ii

298

Ctrl-Shift-I,Shift-I

Latin Extended-A

ii

299

Ctrl-I,I

Latin Extended-A

Uu

362

Ctrl-Shift-U,Shift-U

Latin Extended-A

uu

363

Ctrl-U,U

Latin Extended-A

.D

7692

Ctrl-Shift-D,Shift-D

Latin Extended-Additional

.d

7693

Ctrl-D,D

Latin Extended-Additional

.L

7734

Ctrl-Shift-L,Shift-L

Latin Extended-Additional

.l

7735

Ctrl-L,L

Latin Extended-Additional

.M

7746

Ctrl-Shift-M,Shift-M

Latin Extended-Additional

.m

7747

Ctrl-M,M

Latin Extended-Additional

.N

7750

Ctrl-Shift-N,Shift-N

Latin Extended-Additional

.n

7751

Ctrl-N,N

Latin Extended-Additional

~N

209

Ctrl-Shift-T,Shift-N

Latin-1

~n

241

Ctrl-T,N

Latin-1

"N

7748

Ctrl-Shift-O,Shift-N

Latin Extended-Additional

"n

7749

Ctrl-O,N

Latin Extended-Additional

.T

7788

Ctrl-Shift-T,Shift-T

Latin Extended-Additional

.t

7789

Ctrl-T,T

Latin Extended-Additional


Back to Andy's Pali Page

Privacy Policy

(c) 2000 Andy Shaw

Contact Me