Sunday 5th of May 2024
 

Improving Rare Case Prediction with Replication Technique


Nittaya Kerdprasop, Fonthip Koongaew, Zagon Budsabong, Phaichayon Kongchai and Kittisak Kerdprasop

The ability to predict correctly rarely occurring cases is important to the success of applying data mining method to many real life applications. In the context of data mining, rare cases refer to labeled data instances that are infrequently occurred in the database. Discovering infrequent patterns are of interest in some specific domains such as genetic mutant identification, fraud credit card detection, network intruder prevention. But most learning algorithms are biased toward the majority cases such that the minority cases are considered as noise and thus they are ignored during the model induction steps. This ignorance causes the learning algorithm to generate a model that cannot classify or predict a minority case. We thus study the replication technique based on the over-sampling method to solve this problem. However, a straightforward application of over-sampling method may lead to the over-fitting problem in such a way that the generated model is too specific to the manipulated data. We thus apply the cluster-based technique to selectively filter a training dataset. The experimental results on primary tumor, arrhythmia and communities-and-crime datasets show significant improvement on predicting accuracy, specificity, and sensitivity of the induced models. But the results on multiple features correlation dataset show non-significant improvement; this case requires further investigation.

Keywords: Rare Case Prediction, Classification Model, Sample Replication, Data Mining, Over-sampling Technique.

Download Full-Text


ABOUT THE AUTHORS

Nittaya Kerdprasop
Nittaya Kerdprasop is an associate professor with the school of computer engineering, Suranaree University of Technology, Thailand. She received her B.S. from Mahidol University, Thailand, in 1985, M.S. in computer science from the Prince of Songkla University, Thailand, in 1991, and Ph.D. in computer science from Nova Southeastern University, U.S.A., in 1999. She is a member of ACM and IEEE Computer Society. Her research of interest includes Knowledge Discovery in Databases, Artificial Intelligence, Logic Programming, Deductive and Active Databases.

Fonthip Koongaew
Fonthip Koongaew was a computer engineer with the data engineering research unit. She received her bachelor and master degrees in computer engineering from the school of computer engineering, Suranaree University of Technology in 2010 and 2012, respectively. Her research interest is data mining, constraint logic programming, and decision tree induction.

Zagon Budsabong
Zagon Budsabong is a master student with the School of Computer Engineering, Suranaree University of Technology, Thailand. He received his bachelor degree in computer science from Rajamangala University of Technology Tawan-Ok, Chakrabongse Bhuvanarth Campus, Thailand, in 2010. His research interest is software engineering and intelligent data analysis.

Phaichayon Kongchai
Phaichayon Kongchai is currently a doctoral student with the School of Computer Engineering, Suranaree University of Technology, Thailand. He received his bachelor degree in computer engineering from Suranaree University of Technology, Thailand, in 2010, and master degree in computer engineering in the same institution in 2012. His research topic is related to data mining, constraint logic programming, and artificial intelligence.

Kittisak Kerdprasop
Kittisak Kerdprasop is an associate professor and chair of computer engineering school, Suranaree University of Technology, Thailand. He received his bachelor degree in Mathematics from Srinakarinwirot University, Thailand, in 1986, master degree in computer science from the Prince of Songkla University, Thailand, in 1991, and doctoral degree in computer science from Nova Southeastern University, U.S.A., in 1999. His current research includes Data mining, Artificial Intelligence, Functional and Logic Programming, and Computational Statistics.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »