Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

SimRank*: effective and scalable pairwise similarity search based on graph topology

Tools
- Tools
+ Tools

Yu, Weiren, Lin, Xuemin, Zhang, Wenjie, Pei, Jian and McCann, Julie A. (2019) SimRank*: effective and scalable pairwise similarity search based on graph topology. The VLDB Journal, 28 (3). pp. 401-426. doi:10.1007/s00778-018-0536-3

[img]
Preview
PDF
WRAP-SimRank-effective-scalable-pairwise-graph-topology-Yu-2019.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons Attribution 4.0.

Download (2699Kb) | Preview
Official URL: http://dx.doi.org/10.1007/s00778-018-0536-3

Request Changes to record.

Abstract

Given a graph, how can we quantify similarity between two nodes in an effective and scalable way? SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes” can be regarded as an aggregation of similarities based on incoming paths. Despite its popularity in various applications (e.g., web search and social networks), SimRank has an undesirable trait, i.e., “zero-similarity”: it accommodates only the paths of equal length from a common “center” node, whereas a large portion of other paths are fully ignored. In this paper, we propose an effective and scalable similarity model, SimRank*, to remedy this problem. (1) We first provide a sufficient and necessary condition of the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, Li et al. ’s SimRank model, Random Walk with Restart (RWR), and ASCOS++. (2) We next present our treatment, SimRank*, which can resolve this issue while inheriting the merit of the simple SimRank philosophy. (3) We reduce the series form of SimRank* to a closed form, which looks simpler than SimRank but which enriches semantics without suffering from increased computational overhead. This leads to an iterative form of SimRank*, which requires O(Knm) time and O(n2) memory for computing all (n2) pairs of similarities on a graph of n nodes and m edges for K iterations. (4) To improve the computational time of SimRank* further, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient heuristic to speed up all-pairs SimRank* computation to O(Knm~) time, where m~ is generally much smaller than m. (5) To scale SimRank* on billion-edge graphs, we propose two memory-efficient single-source algorithms, i.e., ss-gSR* for geometric SimRank*, and ss-eSR* for exponential SimRank*, which can retrieve similarities between all n nodes and a given query on an as-needed basis. This significantly reduces the O(n2) memory of all-pairs search to either O(Kn+m~) for geometric SimRank*, or O(n+m~) for exponential SimRank*, without any loss of accuracy, where m~≪n2 . (6) We also compare SimRank* with another remedy of SimRank that adds self-loops on each node and demonstrate that SimRank* is more effective. (7) Using real and synthetic datasets, we empirically verify the richer semantics of SimRank*, and validate its high computational efficiency and scalability on large graphs with billions of edges.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Science > Computer Science
Library of Congress Subject Headings (LCSH): Similarity (Geometry), Web applications -- Research, Graph algorithms
Journal or Publication Title: The VLDB Journal
Publisher: Springer Berlin Heidelberg
ISSN: 1066-8888
Official Date: June 2019
Dates:
DateEvent
June 2019Published
11 January 2019Available
18 December 2018Accepted
Volume: 28
Number: 3
Page Range: pp. 401-426
DOI: 10.1007/s00778-018-0536-3
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Open Access
RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
NSFC61702560[NSFC] National Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/501100001809
NSFC61672235[NSFC] National Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/501100001809
DP170101628Australian Research Councilhttp://dx.doi.org/10.13039/501100000923
DP180103096Australian Research Councilhttp://dx.doi.org/10.13039/501100000923

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us