The Unreasonable Effectiveness of Data
Top Cited Papers
- 24 March 2009
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Intelligent Systems
- Vol. 24 (2), 8-12
- https://doi.org/10.1109/mis.2009.36
Abstract
At Brown University, there is excitement of having access to the Brown Corpus, containing one million English words. Since then, we have seen several notable corpora that are about 100 times larger, and in 2006, Google released a trillion-word corpus with frequency counts for all sequences up to five words long. In some ways this corpus is a step backwards from the Brown Corpus: it's taken from unfiltered Web pages and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors. It's not annotated with carefully hand-corrected part-of-speech tags. But the fact that it's a million times larger than the Brown Corpus outweighs these drawbacks. A trillion-word corpus - along with other Web-derived corpora of millions, billions, or trillions of links, videos, images, tables, and user interactions - captures even very rare aspects of human behavior. So, this corpus could serve as the basis of a complete model for certain tasks - if only we knew how to extract the model from the data.Keywords
This publication has 10 references indexed in Scilit:
- Scene completion using millions of photographsCommunications of the ACM, 2008
- Learning to create data-integrating queriesProceedings of the VLDB Endowment, 2008
- WebTablesProceedings of the VLDB Endowment, 2008
- Scaling textual inference to the webPublished by Association for Computational Linguistics (ACL) ,2008
- Translating queries into snippets for improved query expansionPublished by Association for Computational Linguistics (ACL) ,2008
- Introduction to Statistical Relational LearningPublished by MIT Press ,2007
- Organizing and searching the world wide web of facts -- step twoPublished by Association for Computing Machinery (ACM) ,2007
- The Semantic WebScientific American, 2001
- The Unreasonable Effectiveness of Mathematics in the Natural SciencesPublished by Springer Science and Business Media LLC ,1995
- The unreasonable effectiveness of mathematics in the natural sciences. Richard courant lecture in mathematical sciences delivered at New York University, May 11, 1959Communications on Pure and Applied Mathematics, 1960