Xuemin Lin - Selected Publications#
There are 147 papers published in the top venues in the last 5 years (e.g., in 2022, 17 ICDE papers, and 1 SIGMOD pper, 2020, 4xSIGMOD, 10xVLDB, and 9xICDE long papers were published), below are 10 representative publications in the 5 years.
1. HUGE: An Efficient and Scalable Subgraph Enumeration Engine, Z. Yang, L. Lai, X. LIN, K. Hao, W. Zhang, SIGMOD 2021.
This is the first time to propose a framework together with a system to conduct an enumeration of subgraph combining combining a pull base with a push base while minimizing the computation and communication costs with the bounded memory. This is a revolutionary paradigm by combining the strengths of two existing paradigms that can speed-up the existing computation techniques by orders of magnitude.
2. Communication-Efficient Distributed Covariance Sketch, with Applications to Distributed PCA Z. Huang, X. LIN, W. Zhang, Y. Zhang, Journal of Machine Learning Research 2021.
This work bridges the gap between deterministic and randomized communication complexity for computing a covariance sketch of a large matrix distributed across servers. A novel algorithm for distributed PCA with improved communication cost is developed based on a well-known connection between covariance sketch and PCA. Moreover, with the new algorithms, each server only needs to make one pass over the data with limited working space.
3. FAST: FPGA-based Subgraph Matching on Massive Graphs, X. Jin, Z. Yang, X. LIN, S. Yang, L. Qin, Y. Peng, ICDE 2021.
This work tames the NP-hardness of subgraph matching by proposing a CPU-FPGA co-designed framework. The first FPGA-based subgraph matching algorithm is designed and implemented. To take full advantage of the pipeline mechanism on FPGAs, task parallelism optimization and task generator separation strategy are proposed, achieving massive parallelism. 462.0x and 150.0x speedup is reported in experiments compared with the state-of-the-art algorithms. It is anticipated this will be a milestone in subgraph matching algorithms.
4. Maximum Biclique Search at Billion Scale, B. Lv, L. Qin, X. LIN, Y. Zhang, Z. Qian, J. Zhou, VLDB 13(9): 1359-1372, 2020. The best paper runner-up award.
Review from the selection committee: Bicliques are fundamental structures in bipartite graphs that are useful for many different applications, from anomaly detection in e-commerce, gene expression analysis, to social network-based recommendations. In these applications, a fundamental problem is to efficiently determine maximum bi-cliques, which are bi-cliques with the maximum number of edges in a bipartite graph. This paper introduces a practical solution, based on progressive bounding, to find maximum bi-cliques, and shows that their solution works efficiently on big real-world datasets of billions of edges. The progressive bounding framework repeatedly obtains lower-bounds on the number of nodes in a maximum biclique. Then by analyzing the vertex properties together with the lower-bounds, the search space for maximum bicliques can be significantly reduced. They also show that only a logarithmic number of lower-bounds are needed and extensively demonstrate the effectiveness of their technique on many real datasets, some with millions of nodes and over a billion edges. The algorithms are adopted in Ali double 11 for detecting clicking farming.
5. C. Ma, Y. Fang, R. Cheng, L. Lakshmanan, W. Zhang, X. LIN, Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs, SIGMOD2020:1051-1066. (One of best papers in SIGMOD2020, SIGMOD2021 Research Highlight Award).
Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph, it takes three days for one of the best exact algorithms to complete. In this article, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and we develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on 15 real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.
6. H. Kim, Y. Choi, K. Park, X. LIN, S. Hong, W. Han, Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. SIGMOD Conference 2021: 925-937.
Subgraph query processing (also known as subgraph search) and subgraph matching are fundamental graph problems in many application domains. In this paper, we propose a new subgraph search algorithm using equivalences of vertices in order to reduce search space: (1) static equivalence of vertices in a query graph that leads to an efficient matching order of the vertices, and (2) dynamic equivalence of candidate vertices in a data graph, which enables us to capture and remove redundancies in search space. These techniques for subgraph search also lead to an improved algorithm for subgraph matching. Experiments show that our approach outperforms state-of-the-art subgraph search and subgraph matching algorithms by up to several orders of magnitude with respect to query processing time.
7. Q. Linhu, F. Zhang, X. LIN, W. Zhang, Y. Zhang, Global Reinforcement of Social Networks: The Anchored Coreness Problem, SIGMOD2020: 2211-2226.
The stability of a social network has been widely studied as an important indicator for both the network holders and the participants. In this paper, we propose and study the anchored coreness problem in this paper: anchoring a small number of vertices to maximize the coreness gain (the total increment of coreness) of all the vertices in the network. We prove the problem is NP-hard and show it is more challenging than the existing local-view problems. An efficient heuristic algorithm is proposed with novel techniques on pruning search space and reusing the intermediate results. Extensive experiments on real-life data demonstrate that our model is effective for reinforcing social networks and our algorithm is efficient.
8. H. Kim, S. Min, K. Park, X. LIN, S.H. Hong, W.S. Han, IDAR: Fast Supergraph Search Using DAG Integration, PVLDB 13(9): 1456-1468(VLDB2020).
Supergraph search is a fundamental graph query processing problem in many application domains. In existing algorithms, index construction or filtering approaches are computationally expensive, and search methods can cause redundant computations. In this paper, we introduce four new concepts to address these limitations: (1) DAG integration, (2) dynamic programming between integrated DAG and graph, (3) active-first search, and (4) relevance-size order, which together lead to a much faster and scalable algorithm for supergraph search. Extensive experiments with real datasets show that our approach outperforms state-of-the-art algorithms by up to orders of magnitude in terms of indexing time and query processing time.
9. Y. Peng, Y. Zhang, X. LIN, W. Zhang, L. Qin, Answering Billion-Scale Label-Constrained Reachability Queries within Microsecond, PVLDB 13(6): 812-825(VLDB2020).
In this paper, we propose novel label-constrained 2-hop indexing techniques with novel pruning rules and order strategies. It is shown that our worst query time could be bounded by the in-out index entry size. With all these techniques, comprehensive experiments show that our proposed methods significantly outperform the state-of-the-art technique in terms of query response time (up to 5 orders of magnitude speedup), index size and index construction time. In particular, our proposed method can answer LCR queries within microsecond over billion-scale graphs in a single machine.
10. I/O Efficient Core Graph Decomposition at Web Scale, D. Wen, L. Qin, Y. Zhang, X. LIN, J.X. Yu, ICDE2016: 133-144 (BEST PAPER AWARD).
Review: This is the first time we demonstrated that billion scale graph data can be computed efficiently to conduct core decomposition on a single computer. This technique is very important for conducting on-line mining communities in social networks.