Fast best-effort pattern matching in large attributed graphs

Top Cited Papers

12 August 2007

proceedings article
Published by Association for Computing Machinery (ACM)

p. 737-746
https://doi.org/10.1145/1281192.1281271

Abstract

We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.

Keywords

This publication has 15 references indexed in Scilit:

Measuring and extracting proximity in networks
Published by Association for Computing Machinery (ACM) ,2006
Finding Frequent Patterns in a Large Sparse Graph*
Data Mining and Knowledge Discovery, 2005
Discovering frequent topological structures from graph datasets
Published by Association for Computing Machinery (ACM) ,2005
On mining cross-graph quasi-cliques
Published by Association for Computing Machinery (ACM) ,2005
Template Based Semantic Similarity for Security Applications
Lecture Notes in Computer Science, 2005
Automatic multimedia cross-modal correlation discovery
Published by Association for Computing Machinery (ACM) ,2004
Graph indexing
Published by Association for Computing Machinery (ACM) ,2004
Graph-based technologies for intelligence analysis
Communications of the ACM, 2004
Towards Scaling Fully Personalized PageRank
Lecture Notes in Computer Science, 2004
OntoSeek: content-based access to the Web
IEEE Intelligent Systems and their Applications, 1999

Cited by 170 articles