Graphs naturally represent information ranging from links between web pages, to communication in email networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect anomalies that indicate critical events, such as an attack on a computer system, disease formation in the human brain, or the fall of a company?
This book presents scalable, principled discovery algorithms that combine globality with locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas:
•Individual Graph Mining: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverages information about few entities (obtained via summarization or other methods) and the network structure to efficiently and effectively learn information about the unknown entities.
•Collective Graph Mining: We extend the idea of individual-graph summarization to time-evolving graphs, and show how to scalably discover temporal patterns. Apart from summarization, we claim that graph similarity is often the underlying problem in a host of applications where multiple graphs occur (e.g., temporal anomaly detection, discovery of behavioral patterns), and we present principled, scalable algorithms for aligning networks and measuring their similarity.
The methods that we present in this book leverage techniques from diverse areas, such as matrix algebra, graph theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems. We present applications of our exploration algorithms to massive datasets, including a Web graph of 6.6 billion edges, a Twitter graph of 1.8 billion edges, brain graphs with up to 90 million edges, collaboration, peer-to-peer networks, browser logs, all spanning millions of users and interactions.
|Publisher:||Morgan and Claypool Publishers|
|Series:||Synthesis Lectures on Data Mining and Knowledge Discovery|
|Product dimensions:||7.50(w) x 9.30(h) x 0.00(d)|
About the Author
Danai Koutra is an Assistant Professor in Computer Science and Engineering at University of Michigan, Ann Arbor. Her research interests include large-scale graph mining, graph similarity and matching, graph summarization, and anomaly detection. Danai's research has been applied mainly to social, collaboration, and web networks, as well as brain connectivity graphs. She holds one "rate-1" patent and has six (pending) patents on bipartite graph alignment. Danai won the 2016 ACM SIGKDD Dissertation award, and an honorable mention for the SCS Doctoral Dissertation Award (CMU). She has multiple papers in top data mining conferences, including two award-winning papers, she has given three tutorials, and her work has been covered by the popular press, such as the MIT Technology Review. She has worked at IBM Watson, Microsoft Research, and Technicolor. She earned her Ph.D. and M.S. in Computer Science from CMU in 2015 and her diploma in Electrical and Computer Engineering at the National Technical University of Athens in 2010.
Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, the SIGKDD Innovations Award (2010), 24 "best paper" awards (including 5 "test of time" awards), and 4 teaching awards. Six of his advisees have attracted KDD or SCS dissertation awards, He is an ACM Fellow, he has served as a member of the executive committee of SIGKDD; he has published over 350 refereed articles, 17 book chapters, and 2 monographs. He holds seven patents (and 2 pending), and he has given over 40 tutorials and over 20 invited distinguished lectures. His research interests include large-scale data mining with emphasis on graphs and time sequences; anomaly detection, tensors, and fractals.
University of Illinois at Urbana-Champaign
University of California Santa Cruz
Wei Wang is an associate professor in the Department of Computer Science and a member of the Carolina Center for Genomic Sciences at the University of North Carolina at Chapel Hill. She received a MS degree from the State University of New York at Binghamton in 1995 and a PhD degree in Computer Science from the University of California at Los Angeles in 1999. She was a research staff member at the IBM T. J. Watson Research Center between 1999 and 2002. Dr. Wang's research interests include data mining, bioinformatics, and databases. She has filed seven patents, and has published one monograph and more than one hundred research papers in international journals and major peer-reviewed conference proceedings. Dr. Wang received the IBM Invention Achievement Awards in 2000 and 2001. She was the recipient of a UNC Junior Faculty Development Award in 2003 and an NSF Faculty Early Career Development (CAREER) Award in 2005. She was named a Microsoft Research New Faculty Fellow in 2005. She was recently honored with the 2007 Phillip and Ruth Hettleman Prize for Artistic and Scholarly Achievement at UNC. Dr. Wang is an associate editor of the IEEE Transactions on Knowledge and Data Engineering and ACM Transactions on Knowledge Discovery in Data, and an editorial board member of the International Journal of Data Mining and Bioinformatics. She serves on the program committees of prestigious international conferences such as ACM SIGMOD, ACM SIGKDD, VLDB, ICDE, EDBT, ACM CIKM, IEEE ICDM, and SSDBM.
Table of Contents
Table of Contents: Acknowledgments / Introduction / Summarization of Static Graphs / Inference in a Graph / Summarization of Dynamic Graphs / Graph Similarity / Graph Alignment / Conclusions and Further Research Problems / Bibliography / Authors' Biographies