PAN Localization project releases its research and outputs on


11th Mother Language Day


A Giant Leap for Multilingual Cyberspace 

“In the field of IT, all language communities are entitled to have at their disposal equipment adapted to their linguistic system and tools and products in their language, so as to derive full advantage from the potential offered by such technologies for self-expression, education, communication, publication, translation and information processing and the dissemination of culture in general” [1].

PAN Localization project ( has been a regional initiative addressing these challenges and promoting the use of language technology across Asia. The project, initiated in 2003, has developed and disseminated computing solutions for Bahasa Indonesia, Bangla, Dzongkha, Khmer, Lao, Mongolian, Nepali, Pashto, Sinhala, Tamil, Tibetan and Urdu. These languages represent a population of nearly one billion people across developing Asia.  

On the occasion of the eleventh International Mother Language Day, 21st February 2010, PAN Localization project is pleased to release its research, technology and resources through its website. 

This project has been carried out with collaboration of Pan Asia Networking (PAN) program of IDRC, Canada (, Center for Research in Urdu Language Processing ( at National University of Computer and Emerging Sciences, Pakistan ( and the following partner organizations: 

- Afghan Computer Science Association (ACSA:, Afghanistan

- BRAC University (CRBLP:, Bangladesh

- Development Research Network (D.NET:, Bangladesh

- Department of IT (DIT:, Bhutan

- Ministry of Education, Youth and Sports (, Cambodia

- Institute of Technology (ITC:, Cambodia

- National ICT Development Authority (NIDA:, Cambodia

- Tibet University (TU:, China

- Institute of Science and Technology, TAR China

- Tibet Academy of Agricultural and Animal Husbandry Sciences, China

- University of Indonesia (UI:, Indonesia

- Agency for the Assessment and Application of Technology (BPPT:,

- National Authority for Science and Technology (NAST:, Laos

- InfoCon Co. Ltd.(, Mongolia

- Mongolian University of Science and Technology (MUST:,

- National University of Mongolia (NUM:, Mongolia

- Madan Puraskar Pustakalaya (MPP:, Nepal

- E-Network Research and Development (ENRD:, Nepal

- University of Colombo School of Computing (LTRL, UCSC:,
  Sri Lanka 



[1] Universal Declaration on Linguistic Rights, UNESCO, 1996.


Salient Research Outputs


Bahasa Indonesia

Statistical Machine Translation, English-Bahasa Parallel Corpus (1 Million words), POS Tagged Bahasa Corpus (500,000 words), Part of Speech Tagset and Tagger,…[details] 


Text to Speech System (Awarded), Optical Character Recognition System (Short listed for Award), Bangla Pad, Spell Checker, Lexicon, Language Table for IDNs, Part of Speech Tagset and Tagger, Wordnet (1000 words), Tagged Corpus (5 Million words), English-Bangla Parallel Corpus, Training on Content Development using infomediaries, Online Legal Content for Farmers in Bangla,…[details


DzongkhaLinux, Optical Character Recognition System, Language Table for IDNs, Part of Speech Tagset, Corpus (600,000 words), Lexicon (23,000 words), Text to Speech System (prototype), Dzongkha Terminology, Collation, Locale, Fonts and Keyboard, Training on DzongkhaLinux,…[details


Optical Character Recognition System, Java Applications and Plug-ins for Collation, Encoding Conversion, Word Segmentation, Locale, Mobile SMS, Language Table for IDNs, Part of Speech Tagset and Tagger, Lexicon, Text to Speech System (prototype), Tagged Corpus (150,000 words), Online Khmer Content, Training of Govt. officials on Khmer Open Source Software,…[details] 


Optical Character Recognition System, and MS Office Plug-in for Word Segmentation, Collation, Spell Checker, Lao Pad, Fonts, Keyboard, Language Table for IDNs, Part of Speech Tagset, POS Tagged Corpus, Parallel Corpus (37,000 words), Online Lao Content,…[details] 


Part of Speech Tagset and Tagger, Spell Checker, Corpus (1,000,000 words), Tagged Corpus (100,000 words), Lexicon (10,000 words), Automatic Speech Recognition, Localization of Pidgin and SeaMonkey,… [details] 


NepaLinux (Awarded), Spell Checker, Grammar Checker, Parallel Corpus (100,000 words), Tagged Corpus (80,000 words), Lexicon (37,000 words), Optical Character Recognition System (prototype), Language Table for IDNs, Training Material on NepaLinux, Training of Rural Centers on Nepali Open Source Software,…[details


Localized SeaMonkey (Awarded), Keyboard, Fonts, Language Table for IDNs,…[details] 

Sinhala & Tamil

Sinhala Optical Character Recognition System, Sinhala Text to Speech System (Awarded), Screen Reader for Sinhala for Blind, Language Learning Tool for Tamil in Sinhala and English, Sinhala Wordnet, Localized OpenTM, Language Table for IDNs, Collation Standard, Encoding Conversion tool,…[details] 


Collation Standard, Online Tibetan Content, Farmer Training on using Online Tibetan Content,…[details


Parallel Corpus (100,000 words), Stemmer, Collation, Optical Character Recognition, Localization of, SeaMonkey, Web Composer and Psi, Terminology Glossary, Gendered Outcome Mapping Tool (Awarded), Part of Speech Tagset and Tagger, Tagged Corpus (200,000 words), Language Table for IDNs, Training Material on Localized Applications, Training on Localized Software to Rural School Children, Content Generated by Rural School Children and Teachers,…[details]



This page has been accessed: wordpress counter times, since February 20th, 2010