Presence‐Only Data and the EM Algorithm
- 28 May 2009
- journal article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 65 (2), 554-563
- https://doi.org/10.1111/j.1541-0420.2008.01116.x
Abstract
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation–maximization algorithm to estimate the underlying presence–absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence–absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.This publication has 16 references indexed in Scilit:
- Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression treesMarine Ecology Progress Series, 2006
- Novel methods improve prediction of species’ distributions from occurrence dataEcography, 2006
- Using multivariate adaptive regression splines to predict the distributions of New Zealand's freshwater diadromous fishFreshwater Biology, 2005
- USE AND INTERPRETATION OF LOGISTIC REGRESSION IN HABITAT-SELECTION STUDIESThe Journal of Wildlife Management, 2004
- Removing GPS collar bias in habitat selection studiesJournal of Applied Ecology, 2004
- An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo‐absence dataJournal of Applied Ecology, 2004
- Extended statistical approaches to modelling spatial pattern in biodiversity in northeast New South Wales. II. Community-level modellingBiodiversity and Conservation, 2002
- Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 2001
- Relationships Among Grizzly Bears, Roads and Habitat in the Swan Mountains MontanaJournal of Applied Ecology, 1996
- Case-control studies with contaminated controlsJournal of Econometrics, 1996