How the Telugu shaping engine works

The character reordering rules of the Uniscribe Telugu shaping engine are described below. None of the rules need to be encoded in an OpenType font, as long as the font is to be used with Uniscribe (or another client that follows the Unicode Standard for character reordering). In fact, if a font developer attempted to encode such reordering information in an OpenType font, they would need to add a huge number of many-to-many glyph mappings to cover the very simple algorithms that Uniscribe uses.

Uniscribe always performs reordering operations in a specified order, as described below.

Starting with a syllable of one of the following forms:

{C + [Nukta] + H} + C + [M] + [VM] + [SM]

…or a syllable without vowels

{C + [Nukta] + H} + C + H

…or a syllable without consonants

VO + [VM] + [SM]

 

  1. The shaping engine finds the base consonantof the syllable, using the following algorithm: starting from the end of the syllable, move backwards until a consonant is found that does not have a below-base or post-base form (post-base forms have to follow below-base forms), or arrive at the first consonant. The consonant stopped at will be the base.

     

  2. If the base consonant is not the last one, Uniscribe moves the halantfrom the base consonant to the last one.

     

  3. If the syllable starts with Ra + H, Uniscribe moves this combination so that it follows the base consonant.

     

  4. Uniscribe splits two- or three-part matras into their parts. This splitting is a character-to-character operation). Then Uniscribe moves the left ‘matra’ part to the beginning of the syllable.

     

  5. Uniscribe classifies consonants and ‘matra’ parts as pre-base, above-base (Reph), below-base or post-base. This classification exists on the character code level and is language-dependent, not font-dependent.

     

  6. Uniscribe then groups elements of the syllable (consonants and ‘matras’) according to this classification. Pre-base elements will precede the base consonant. The above-base, below-base and post-base components will follow the base glyph.

     

    • ‘Halants’ are moved with the consonants they affect.

After performing the character reordering steps, the sequence of characters will have one of the following forms:

For Telugu:

{Cpre + H} + Cbase + [Mabove] + [Mbelow] + [Mpost] + {Cbelow + H} +

{Cpost + H} + [VMpost]

(Out of Mabove, Mbelow, Mpost and VMpost different combinations can be present)

In the absence of a vowel, we’ll have

{Cpre + [Nukta] + H + [Ra + H]vattu} + Cbase + [Ra + H]vattu + H

Finally, a syllable with independent vowel will look like

VO + [VM1] + [VM2]

Shaping with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs. Then, Uniscribe applies contextual shape features to the glyph string. 

 

– From http://www.microsoft.com/typography/otfntdev/teluguot/features.htm