Visual data mining techniques for classification of diabetic patients

1 February 2013

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1070-1075
https://doi.org/10.1109/iadcc.2013.6514375

Abstract

Clustering is a data mining technique for finding important patterns in unorganized and huge data collections. The likelihood approach of clustering technique is quite often used by many researchers for classifications due to its' being simple and easy to implement. It uses Expectation-Maximization (EM) algorithm for sampling. The study of classification of diabetic patients was main focus of this research work. Diabetic patients were classified by data mining techniques for medical data obtained from Pima Indian Diabetes (PID) data set. This research was based on three techniques of EM Algorithm, h-means+ clustering and Genetic Algorithm (GA). These techniques were employed to form clusters with similar symptoms. Result analyses proved that h-means+ and double crossover genetics process based techniques were better on performance comparison scale. The simulation tests were performed on WEKA software tool for three models used to test classification. The hypothesis of similar patterns of diabetes case among PID and local hospital data was tested and found positive with correlation coefficient of 0.96 for two types of the data sets. About 35% of a total of 768 test samples were found with diabetes presence.

Keywords

This publication has 8 references indexed in Scilit:

Cascading GA & CFS for Feature Subset selection in Medical Data Mining
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
Mean shift-based clustering
Pattern Recognition, 2007
Parallel medical image analysis for diabetic diagnosis
International Journal of Computer Applications in Technology, 2005
Use of genetic algorithms for neural networks to predict community-acquired pneumonia
Published by Elsevier ,2003
Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models
Computational Statistics & Data Analysis, 2003
Unsupervised learning of finite mixture models
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002
A Component-Wise EM Algorithm for Mixtures
Journal of Computational and Graphical Statistics, 2001

Cited by 20 articles