Automatic Named Entity Identification and Classification using Heuristic Based Approach for Telugu
Named Entity Recognition (NER) and Classification becomes more and more important in many natural language processing applications. It helps machine to recognize named entities in text and assign them with the appropriate categories. NER for Telugu is a challenging task since Telugu is very rich in morphology. Recent systems rely on machine learning approaches, but their performance is highly dependent on size and quality of training data. In this paper we proposed a rule based Named Entity Recognition and Classification system for Telugu language. In this paper we describe the identification and classification of Named Entities using word level features, work lookup features and contextual features. Further classification of identified Named Entities and ambiguity resolution is done through contextual rules and syntax information. The System is tested on different data sets of News paper and Teluguwiki corpus.
Keywords: Heuristics, Named Entity, Gazetteers, Morphology.
Download Full-Text
ABOUT THE AUTHORS
P. M. Yohan
Associate Professor, Dept. of MCA, Wesley P.G. College, Secunderabad, Andhra Pradesh, India.
B. Sasidhar
Professor, Dept. of CSE, Mahaveer Institute of Science and Technology, Hyderabad, Andhra Pradesh, India.
Sk. Althaf Hussain Basha
Professor, Dept. of School of Computing, GRIET, Hyderabad, Andhra Pradesh, India
A. Govardhan
Professor of CSE, SIT, JNTUH, Kukatpally, Hyderabad, Andhra Pradesh, India.
P. M. Yohan
Associate Professor, Dept. of MCA, Wesley P.G. College, Secunderabad, Andhra Pradesh, India.
B. Sasidhar
Professor, Dept. of CSE, Mahaveer Institute of Science and Technology, Hyderabad, Andhra Pradesh, India.
Sk. Althaf Hussain Basha
Professor, Dept. of School of Computing, GRIET, Hyderabad, Andhra Pradesh, India
A. Govardhan
Professor of CSE, SIT, JNTUH, Kukatpally, Hyderabad, Andhra Pradesh, India.