Identification of Adopted Pali Words in Myanmar Text
Myanmar language has been significantly influenced by Pali language due to the practice of Buddhism and study of Buddhist literature in Myanmar. As a result, Pali words have been widely adopted and used in Myanmar language. This study presents an algorithm for identifying Myanmar-adopted Pali words in Myanmar text. The system employs a combination of rule-based syllable segmentation and a dictionary-based longest matching method. A program was developed and trained on a corpus containing 8,895 sentences. It recognized 579 unique Pali words. The accuracy of the system was tested on a different corpus containing 3,641 sentences and the system correctly identified 279 unique Pali words, achieving a Precision of 97.59%, Recall of 99.04% and F-measure of 98.31%. Usages of Pali words are inevitable in Myanmar text and the results of this study will improve many NLP tasks of Myanmar language such as spelling checking, text categorization and text-to-speech synthesis etc.
Keywords: Myanmar Pali Words, Pali Words Identification, Syllable Segmentation, Longest Matching, Conjunct Consonants
Download Full-Text
ABOUT THE AUTHOR
Zin Maung Maung
University of Computer Studies, Mandalay, Myanmar
Zin Maung Maung
University of Computer Studies, Mandalay, Myanmar