tipitaka network ... his life, his acts, his words                 sabbe satta bhavantu sukhi-tatta

Tipitaka Network [ Home ]

Unicode for Romanised Pali Scripts

by Ong Yong Peng, MEngSc, BE, BSc, MIEEE, MIEAust
28 January, 2005

Under the Unicode system, characters are placed in code charts [1]. Each character (or symbol) has a unique number, known as code point, assigned to it. In the Pali Roman script, there are several characters with diacritics. These characters are located in three separate code charts in the Unicode system as follows:

Latin-1 Supplement

  • 00D1 Latin capital letter N with tilde
  • 00F1 Latin small letter n with tilde

Latin Extended-A

  • 0100 Latin capital letter A with macron
  • 0101 Latin small letter a with macron
  • 012A Latin capital letter I with macron
  • 012B Latin small letter i with macron
  • 014A Latin capital letter ENG
  • 014B Latin small letter eng
  • 016A Latin capital letter U with macron
  • 016B Latin small letter u with macron

Latin Extended Additional

  • 1E0C Latin capital letter D with dot below
  • 1E0D Latin small letter d with dot below
  • 1E36 Latin capital letter L with dot below
  • 1E37 Latin small letter l with dot below
  • 1E40 Latin capital letter M with dot above
  • 1E41 Latin small letter m with dot above
  • 1E42 Latin capital letter M with dot below
  • 1E43 Latin small letter m with dot below
  • 1E44 Latin capital letter N with dot above
  • 1E45 Latin small letter n with dot above
  • 1E46 Latin capital letter N with dot below
  • 1E47 Latin small letter n with dot below
  • 1E6C Latin capital letter T with dot below
  • 1E6D Latin small letter t with dot below

The following table shows these characters, the character code to display them in HTML documents, and glyphs for comparison. If your web browser does not display correctly the Unicode characters, in square brackets [ ... ], you should consider installing Unicode fonts or using browsers which support Unicode[4].

Name

 HTML character entity escape code

Glyph 
Symbolic Numeric Hexadecimal
Latin-1 Supplement
Latin capital letter N with tilde Ñ [ Ñ ] Ñ [ Ñ ] Ñ [Ñ]
Latin small letter n with tilde ñ [ ñ ] ñ [ ñ ] ñ [ ñ ]
Latin Extended-A 
Latin capital letter A with macron Ā [ Ā ] Ā [ Ā ]
Latin small letter a with macron ā [ ā ] ā [ ā ]
Latin capital letter I with macron Ī [ Ī ] Ī [ Ī ]
Latin small letter i with macron ī [ ī ] ī [ ī ]
Latin capital letter ENG [2] Ŋ [ Ŋ ] Ŋ [ Ŋ ]
Latin small letter eng [3] ŋ [ ŋ ] ŋ [ ŋ ]
Latin capital letter U with macron Ū [ Ū ] Ū [ Ū ]
Latin small letter u with macron ū [ ū ] ū [ ū ]
Latin Extended Additional
Latin capital letter D with dot below Ḍ [ Ḍ ] Ḍ [ Ḍ ]
Latin small letter d with dot below ḍ [ ḍ ] ḍ [ ḍ ]
Latin capital letter L with dot below Ḷ [ Ḷ ] Ḷ [ Ḷ ]
Latin small letter l with dot below ḷ [ ḷ ] ḷ [ ḷ ]
Latin capital letter M with dot above [2] Ṁ [ Ṁ ] Ṁ [ Ṁ ]
Latin small letter m with dot above [3] ṁ [ ṁ ] ṁ [ ṁ ]
Latin capital letter M with dot below [2] Ṃ [ Ṃ ] Ṃ [ Ṃ ]
Latin small letter m with dot below [3] ṃ [ ṃ ] ṃ [ ṃ ]
Latin capital letter N with dot above Ṅ [ Ṅ ] Ṅ [ Ṅ ]
Latin small letter n with dot above ṅ [ ṅ ] ṅ [ ṅ ]
Latin capital letter N with dot below Ṇ [ Ṇ ] Ṇ [ Ṇ ]
Latin small letter n with dot below ṇ [ ṇ ] ṇ [ ṇ ]
Latin capital letter T with dot below Ṭ [ Ṭ ] Ṭ [ Ṭ ]
Latin small letter t with dot below ṭ [ ṭ ] ṭ [ ṭ ]

Notes:

[1] Complete Unicode code charts are available here: http://www.unicode.org/charts

[2] These characters are interchangeable.

[3] These characters are interchangeable.

[4] Information on Unicode fonts and browsers is available from Alan Wood's site: http://www.alanwood.net/unicode


Namo Tassa Bhagavato Arahato Sammāsambuddhassa.
Buddha sāsana.m cira.m ti.t.thatu.