The Library

SimRank*: effective and scalable pairwise similarity search based on graph topology

Tools

Yu, Weiren, Lin, Xuemin, Zhang, Wenjie, Pei, Jian and McCann, Julie A. (2019) SimRank*: effective and scalable pairwise similarity search based on graph topology. The VLDB Journal, 28 (3). pp. 401-426. doi:10.1007/s00778-018-0536-3 ISSN 1066-8888.

Preview

PDF
WRAP-SimRank-effective-scalable-pairwise-graph-topology-Yu-2019.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons Attribution 4.0.
Download (2699Kb) | Preview

Official URL: http://dx.doi.org/10.1007/s00778-018-0536-3

Request Changes to record.

Abstract

Given a graph, how can we quantify similarity between two nodes in an effective and scalable way? SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes” can be regarded as an aggregation of similarities based on incoming paths. Despite its popularity in various applications (e.g., web search and social networks), SimRank has an undesirable trait, i.e., “zero-similarity”: it accommodates only the paths of equal length from a common “center” node, whereas a large portion of other paths are fully ignored. In this paper, we propose an effective and scalable similarity model, SimRank*, to remedy this problem. (1) We first provide a sufficient and necessary condition of the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, Li et al. ’s SimRank model, Random Walk with Restart (RWR), and ASCOS++. (2) We next present our treatment, SimRank*, which can resolve this issue while inheriting the merit of the simple SimRank philosophy. (3) We reduce the series form of SimRank* to a closed form, which looks simpler than SimRank but which enriches semantics without suffering from increased computational overhead. This leads to an iterative form of SimRank*, which requires O(Knm) time and O(n2) memory for computing all (n2) pairs of similarities on a graph of n nodes and m edges for K iterations. (4) To improve the computational time of SimRank* further, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient heuristic to speed up all-pairs SimRank* computation to O(Knm~) time, where m~ is generally much smaller than m. (5) To scale SimRank* on billion-edge graphs, we propose two memory-efficient single-source algorithms, i.e., ss-gSR* for geometric SimRank*, and ss-eSR* for exponential SimRank*, which can retrieve similarities between all n nodes and a given query on an as-needed basis. This significantly reduces the O(n2) memory of all-pairs search to either O(Kn+m~) for geometric SimRank*, or O(n+m~) for exponential SimRank*, without any loss of accuracy, where m~≪n2 . (6) We also compare SimRank* with another remedy of SimRank that adds self-loops on each node and demonstrate that SimRank* is more effective. (7) Using real and synthetic datasets, we empirically verify the richer semantics of SimRank*, and validate its high computational efficiency and scalability on large graphs with billions of edges.

Item Type:

Journal Article

Subjects:

Q Science > QA Mathematics

Divisions:

Faculty of Science, Engineering and Medicine > Science > Computer Science

Library of Congress Subject Headings (LCSH):

Similarity (Geometry), Web applications -- Research, Graph algorithms

Journal or Publication Title:

The VLDB Journal

Publisher:

Springer Berlin Heidelberg

ISSN:

1066-8888

Official Date:

June 2019

Dates:

Date	Event
June 2019	Published
11 January 2019	Available
18 December 2018	Accepted

Volume:

Number:

Page Range:

pp. 401-426

DOI:

10.1007/s00778-018-0536-3

Status:

Peer Reviewed

Publication Status:

Published

Access rights to Published version:

Open Access (Creative Commons)

Date of first compliant deposit:

29 January 2020

Date of first compliant Open Access:

11 February 2020

RIOXX Funder/Project Grant:

Project/Grant ID	RIOXX Funder Name	Funder ID
NSFC61702560	[NSFC] National Natural Science Foundation of China	http://dx.doi.org/10.13039/501100001809
NSFC61672235	[NSFC] National Natural Science Foundation of China	http://dx.doi.org/10.13039/501100001809
DP170101628	Australian Research Council	http://dx.doi.org/10.13039/501100000923
DP180103096	Australian Research Council	http://dx.doi.org/10.13039/501100000923

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

University of Warwick
Publications service & WRAP

Highlight your research

The Library

SimRank*: effective and scalable pairwise similarity search based on graph topology

Abstract

Repository staff actions (login required)

Downloads

University of WarwickPublications service & WRAP

Highlight your research

The Library

SimRank*: effective and scalable pairwise similarity search based on graph topology

Abstract

Repository staff actions (login required)

Downloads

University of Warwick
Publications service & WRAP