Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering
- 1 January 2009
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 206-213
- https://doi.org/10.1109/wi-iat.2009.37
Abstract
We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed "full-subtopic retrieval". To solve this problem we present a novel search results clustering algorithm that generates clusters labeled by keyphrases. The keyphrases are extracted from the generalized suffix tree built from the search results and merged through an improved hierarchical agglomerative clustering procedure. We also introduce a novel measure for evaluating full-subtopic retrieval performance, namely "Subtopic Search Length under k document sufficiency". Using a test collection specifically designed for evaluating subtopic retrieval, we found that our algorithm outperformed both other existing search results clustering algorithms and also a search results re-ranking method that emphasized diversity of results (at least for k≫1; i.e., when we are interested in retrieving more than one relevant document per subtopic). Our approach has been implemented into KeySRC (Keyphrase-based Search Results Clustering), a full web clustering engine available online at http://keysrc.fub.it.Keywords
This publication has 20 references indexed in Scilit:
- A survey of Web clustering enginesACM Computing Surveys, 2009
- Mobile information retrieval with search results clustering: Prototypes and evaluationsJournal of the American Society for Information Science and Technology, 2009
- Novelty and topicality in interactive information retrievalJournal of the American Society for Information Science and Technology, 2007
- Graph Visualization Techniques for Web Clustering EnginesIEEE Transactions on Visualization and Computer Graphics, 2007
- Clustering versus faceted categories for information explorationCommunications of the ACM, 2006
- A divide-and-merge methodology for clusteringPublished by Association for Computing Machinery (ACM) ,2005
- Suffix Trees on WordsAlgorithmica, 1999
- On-line construction of suffix treesAlgorithmica, 1995
- A freely available wide coverage morphological analyzer for EnglishPublished by Association for Computational Linguistics (ACL) ,1992
- Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systemsAmerican Documentation, 1968