Fast integration of heterogeneous data sources for predicting gene function with limited annotation
Open Access
- 27 May 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (14), 1759-1765
- https://doi.org/10.1093/bioinformatics/btq262
Abstract
Motivation: Many algorithms that integrate multiple functional association networks for predicting gene function construct a composite network as a weighted sum of the individual networks and then use the composite network to predict gene function. The weight assigned to an individual network represents the usefulness of that network in predicting a given gene function. However, because many categories of gene function have a small number of annotations, the process of assigning these network weights is prone to overfitting. Results: Here, we address this problem by proposing a novel approach to combining multiple functional association networks. In particular, we present a method where network weights are simultaneously optimized on sets of related function categories. The method is simpler and faster than existing approaches. Further, we show that it produces composite networks with improved function prediction accuracy using five example species (yeast, mouse, fly, Esherichia coli and human). Availability: Networks and code are available from: http://morrislab.med.utoronto.ca/˜sara/SW Contact:smostafavi@cs.toronto.edu; quaid.morris@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 24 references indexed in Scilit:
- Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized ProteinsPLoS Biology, 2009
- A statistical framework for genomic data fusionBioinformatics, 2004
- Least angle regressionThe Annals of Statistics, 2004
- Whole-genome annotation by using evidence integration in functional-linkage networksProceedings of the National Academy of Sciences, 2004
- EnsMart: A Generic System for Fast and Flexible Access to Biological DataGenome Research, 2004
- Gene Expression Omnibus: NCBI gene expression and hybridization array data repositoryNucleic Acids Research, 2002
- The Elements of Statistical LearningSpringer Series in Statistics, 2001
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- The ENZYME database in 2000Nucleic Acids Research, 2000
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000