PAN Localization Cambodia (PLC) of IDRC


PAN Localization Cambodia of IDRC is one of the components of the PAN Localization project which are funded by International Development Research Center of Canada (IDRC) through its PAN Asia Network programs and the National University of Computer and Emerging Sciences (NUCES), Lahore, Pakistan. The project started in 2004 and completed its first phase in April 2007.


The project worked closely with the Committee for Standardization of Khmer Script in Computer (CSKSC) and the Royal Academy of Cambodia (RAC) to ensure an output which is as reliable as possible.





Team's Achievements


After phase I, PLC has successfully completed its planned work resulting in the development of local language standards and localized applications. The main outputs include:

  • Khmer Collation Sequence: with the approval from the Royal Academy of Cambodia for public use

  • Khmer locale: will be fully supported in the next version of Microsoft Windows Vista.

  • Encoding conversion utility from non-Unicode format texts into Unicode text

  • Collation and sorting utility: supports both Chuon Nath Dictionary rule and alphabetic rule
  • Khmer Line breaking utility: with 51,363 dictionary words
  • Khmer Spell checker utility: with 51,363 dictionary words
  • Khmer Lexicon: with around 30,000 head words and descriptions
  • Text corpus of 673295 token words
  • An English-Khmer website which contains information about PLC, forum and project outputs to download
  • Khmer Optical Character Recognition (in progress)
  • Khmer Open source software (in progress)


  • Khmer Line Breaking

  • Khmer Lexicon Utility

  • Khmer Collation Software Utility

  • Khmer Spell Checker

  • Khmer Conversion Utility


Research Reports:

  • Khmer Unicode Line Breaking Software

  • Khmer Lexicon

  • Khmer Collation Development

  • Khmer Conversion Utility User Manual

  • Khmer Conversion Utility

  • Khmer Spell Checker


Objectives during Second Phase

  • Localized URL
  • Mobile SMS in Khmer
  • Content Development
  • Find and Replace
  • Training Conduction

CPI Team Members and their Designations

  • Chea, Sok Hour, Country Project Manager (CPM)
  • Seng, Vannara, Assistant to CPM
MoEYS Partners
  • Ms. Ros, Pich Hemy, ITC Senior Developer
  • Chin, Chanthirith, Lead Developer
  • Kron, Visal, Developer
  • Tith, Sakal, Developer
  • Ms. Vann, Sophea, Developer


  Three main Workshops were held in order to share the project work with public
  • On 18th of November 2004 at the council of minister of Cambodia, a workshop on “Localization Technologies” was held with the participation of IT specialists from government, NGO and universities. Presentations were delivered by project team members in various topics on Khmer Standardization, Unicode, OpenType fonts, Encoding conversion, Khmer Collation, and Khmer Lexical development.

Participants in Localization Technologies
  • On 23rd December 2005 at Sunway hotel, a workshop on “Khmer Unicode and Applications” was held and participants from government, NGOs, universities and media were invited. Work presentations were delivered by project team members. The seminar concluded with free distribution of CDs containing the research outputs of the project: Khmer Smart Typing, Encoding Conversion utilities, Collation and Sorting utilities, Line Breaking utilities and Spell Checker utilities.

Participants in Khmer Unicode and Applications
  • Workshop on “Khmer Language Processing and Web Page Development”, from 16th May to 16th June 2007, Institute of Technology of Cambodia

Computer Science students from three universities in Cambodia including Institute of Technology of Cambodia, Norton University, and Build Bright University were invited to join the one-month workshop. They were introduced to both conceptual and technical knowledge that is needed to develop local language applications and web pages in Khmer language. The content includes Introduction to writing system, Unicode standard and Khmer Unicode, Font concept and development, Khmer Unicode applications, and some concept in Khmer webpage development with PHP, MySQL.

Khmer Language Processing and Web Development
  • Series of Presentation were conducted at various NGO and Universities by the CPM. 
  • 2 Local TV Shows with live questions & answers were performed by CPM
  • 1 regular Local TV Show was  executed by CPM.



A lot of efforts have been put into to strengthen our human resource capacities through various trainings and summer school:

  • Training on “Localization and Khmer Language Processing” from 20th June 2004  till 27th December 2004, at PAN Localization Cambodia

The training was conducted by Mr. Atif Gulzar to train the Cambodian human resource in developing localization applications and to help them successfully meet the planned deliverables. The team received training on basic to advanced programming and basic to advance language processing techniques. The training broadly comprised of OpenType font development, Unicode language processing, lexicon development, project life cycle from design to execution and testing, advance programming in C++, and Visual Basic .NET.

  • Summer School on “Asian Language Processing” 1st June till 15th August, 2006, FAST-NU, Lahore, Pakistan

Two PLC development team members and one lecturer from Institute of Technology of Cambodia were sent to participate the summer school which was organized by the PAN Localization project in the purpose of advancing knowledge among team members on speech, script and language processing. Our team was introduced to various topics including phonetics and phonology, script processing, syntax and morphology, speech processing fundamental, and computational linguistic.

  • Training on “Open Source Software”, from January 2nd to January 19th, 2007 , PAN Localization Cambodia

An expert from Nepal component was invited to train Cambodia, Laos, Mongolia and Pakistan team members on Open Source Software development. At the end of the training, the trainees were able to work on localization in OSS for their respective languages.

OSS Training participants
  • Training on “Optical Character Recognition”, from 4th June to 29th June 2007, NECTEC, Thailand

One development team was sent to one-month training at NECTEC on Optical Character Recognition. With the assistance from the NECTEC member, a prototype on Khmer OCR is expected to be developed by our team.

OCR Participants