Graph Summarization with Quality Guarantees

Posted ON March 01, 2017

Authors: Matteo Riondato (Two Sigma), David Garcia-Soriano, Francesco Bonchi

Published in: Data Mining and Knowledge Discovery, 31(2):314–349

Presented at: Department of Information Engineering, University of Padua, Padua (Italy), September 26, 2016

Abstract: We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation. In this work we study a very natural type of summary: the original set of vertices is partitioned into a small number of supernodes connected by superedges to form a complete weighted graph. The superedge weights are the edge densities between vertices in the corresponding supernodes. To quantify the dissimilarity between the original graph and a summary, we adopt the reconstruction error and the cut-norm error. By exposing a connection between graph summarization and geometric clustering problems (i.e., k-means and k-median), we develop the first polynomial-time approximation algorithms to compute the best possible summary of a certain size under both measures. We discuss how to use our summaries to store a (lossy or lossless) compressed graph representation and to approximately answer a large class of queries about the original graph, including adjacency, degree, eigenvector centrality, and triangle and subgraph counting. Using the summary to answer queries is very efficient as the running time to compute the answer depends on the number of supernodes in the summary, rather than the number of nodes in the original graph.


Related Articles

Life at Two Sigma

We’re rigorous about our work and developing our people.

Learn More

Interested in working at Two Sigma? Explore careers.

This website uses cookies to ensure you get the best experience on our website. Learn More
Got It