The Library
Scaling high-quality pairwise link-based similarity retrieval on billion-edge graphs
Tools
Yu, Weiren, McCann, Julie, Zhang, Chengyuan and Ferhatosmanoglu, Hakan (2022) Scaling high-quality pairwise link-based similarity retrieval on billion-edge graphs. ACM Transactions on Information Systems, 40 (4). 78. doi:10.1145/3495209 ISSN 1046-8188.
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: http://dx.doi.org/10.1145/3495209
Abstract
SimRank is an attractive link-based similarity measure used in fertile fields of Web search and sociometry. However, the existing deterministic method by Kusumoto et al. [24] for retrieving SimRank does not always produce high-quality similarity results, as it fails to accurately obtain diagonal correction matrix D. Moreover, SimRank has a “connectivity trait” problem: increasing the number of paths between a pair of nodes would decrease its similarity score. The best-known remedy, SimRank++ [1], cannot completely fix this problem, since its score would still be zero if there are no common in-neighbors between two nodes.
In this article, we study fast high-quality link-based similarity search on billion-scale graphs. (1) We first devise a “varied-D” method to accurately compute SimRank in linear memory. We also aggregate duplicate computations, which reduces the time of [24] from quadratic to linear in the number of iterations. (2) We propose a novel “cosine-based” SimRank model to circumvent the “connectivity trait” problem. (3) To substantially speed up the partial-pairs “cosine-based” SimRank search on large graphs, we devise an efficient dimensionality reduction algorithm, PSR#, with guaranteed accuracy. (4) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument in [24] that “if D is replaced by a scaled identity matrix (1-Ɣ)I, their top-K rankings will not be affected much”. (5) We propose a novel method that can accurately convert from Li et al. SimRank ~{S} to Jeh and Widom’s SimRank S. (6) We propose GSR#, a generalisation of our “cosine-based” SimRank model, to quantify pairwise similarities across two distinct graphs, unlike SimRank that would assess nodes across two graphs as completely dissimilar. Extensive experiments on various datasets demonstrate the superiority of our proposed approaches in terms of high search quality, computational efficiency, accuracy, and scalability on billion-edge graphs.
Item Type: | Journal Article | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||||||
Journal or Publication Title: | ACM Transactions on Information Systems | ||||||||||||
Publisher: | ACM | ||||||||||||
ISSN: | 1046-8188 | ||||||||||||
Official Date: | October 2022 | ||||||||||||
Dates: |
|
||||||||||||
Volume: | 40 | ||||||||||||
Number: | 4 | ||||||||||||
Number of Pages: | 45 | ||||||||||||
Article Number: | 78 | ||||||||||||
DOI: | 10.1145/3495209 | ||||||||||||
Status: | Peer Reviewed | ||||||||||||
Publication Status: | Published | ||||||||||||
Reuse Statement (publisher, data, author rights): | © ACM, 2022. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Information Systems, Volume 40, Issue 4, October 2022 http://doi.acm.org/10.1145/3495209 | ||||||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||||||
Copyright Holders: | Association for Computing Machinery | ||||||||||||
RIOXX Funder/Project Grant: |
|
||||||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |