Efficient query processing on graph databases
- 23 April 2009
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 34 (1), 1-48
- https://doi.org/10.1145/1508857.1508859
Abstract
We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index , to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows. FG*-index consists of three components: the FG-index , the feature-index , and the FAQ-index . First, the FG-index employs the concept of Frequent subGraph ( FG ) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries . We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries ( FAQs ). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked . Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.Keywords
Funding Information
- Research Grants Council, University Grants Committee, Hong Kong (617808)
This publication has 21 references indexed in Scilit:
- Effective elimination of redundant association rulesData Mining and Knowledge Discovery, 2007
- Fast best-effort pattern matching in large attributed graphsPublished by Association for Computing Machinery (ACM) ,2007
- Correlation search in graph databasesPublished by Association for Computing Machinery (ACM) ,2007
- Maintaining frequent closed itemsets over a sliding windowJournal of Intelligent Information Systems, 2007
- Graph indexing based on discriminative frequent structure analysisACM Transactions on Database Systems, 2005
- Substructure similarity search in graph databasesPublished by Association for Computing Machinery (ACM) ,2005
- CloseGraphPublished by Association for Computing Machinery (ACM) ,2003
- Issues in data stream managementACM SIGMOD Record, 2003
- Covering indexes for branching path queriesPublished by Association for Computing Machinery (ACM) ,2002
- Algorithmics and applications of tree and graph searchingPublished by Association for Computing Machinery (ACM) ,2002