Determining the subcellular location of new proteins from microscope images using local features
Open Access
- 8 July 2013
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 29 (18), 2343-2349
- https://doi.org/10.1093/bioinformatics/btt392
Abstract
Motivation: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Results: Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. Availability: The datasets are available for download at http://murphylab.web.cmu.edu/data/. The software was written in Python and C++ and is available under an open-source license at http://murphylab.web.cmu.edu/software/. The code is split into a library, which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address. Contact:murphy@cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 34 references indexed in Scilit:
- Protein subcellular location pattern classification in cellular images using latent discriminative modelsBioinformatics, 2012
- Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixingBioinformatics, 2010
- Determining the distribution of probes between different subcellular locations through automated unmixing of subcellular patternsProceedings of the National Academy of Sciences, 2010
- Efficient framework for automated classification of subcellular patterns in budding yeastCytometry Part A, 2009
- IICBU 2008: a proposed benchmark suite for biological image analysisMedical & Biological Engineering & Computing, 2008
- Random subwindows and extremely randomized trees for image classification in cell biologyBMC Cell Biology, 2007
- A multiresolution approach to automated classification of protein subcellular location imagesBMC Bioinformatics, 2007
- Fast automated cell phenotype image classificationBMC Bioinformatics, 2007
- Large-Scale Automated Analysis of Location Patterns in Randomly Tagged 3T3 CellsAnnals of Biomedical Engineering, 2007
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974