Ensembles of Classifiers for Morphological Galaxy Classification

Abstract
We compare the use of three algorithms for performing automated morphological galaxy classification using a sample of 800 galaxies. Classifiers are created using a single training set as well as bootstrap replicates of the training set, producing an ensemble of classifiers. We use a Naive Bayes classifier, a neural network trained with backpropagation, and a decision-tree induction algorithm with pruning. Previous work in the field has emphasized backpropagation networks and decision trees. The Naive Bayes classifier is easy to understand and implement and often works remarkably well on real-world data. For each of these algorithms, we examine the classification accuracy of individual classifiers using 10-fold cross validation and of ensembles of classifiers trained using 25 bootstrap data sets and tested on the same cross-validation test sets. Our results show that (1) the neural network produced the best individual classifiers (lowest classification error) for the majority of cases, (2) the ensemble approach significantly reduced the classification error for the neural network and the decision-tree classifiers but not for the Naive Bayes classifier, (3) the ensemble approach worked better for decision trees (typical error reduction of 12%-23%) than for the neural network (typical error reduction of 7%-12%), and (4) the relative improvement when using ensembles decreases as the number of output classes increases. While more extensive comparisons are needed (e.g., a variety of data and classifiers), our work is the first demonstration that the ensemble approach can significantly increase the performance of certain automated classification methods when applied to the domain of morphological galaxy classification.

This publication has 18 references indexed in Scilit: