The Library

Hypergraph-based optimisations for scalable graph analytics and learning

Tools

Haldar, Aparajita (2022) Hypergraph-based optimisations for scalable graph analytics and learning. PhD thesis, University of Warwick.

Preview

PDF
WRAP_Theses_Haldar_2022.pdf - Submitted Version - Requires a PDF viewer.
Download (4Mb) | Preview

Official URL: http://webcat.warwick.ac.uk/record=b3910044

Request Changes to record.

Abstract

Graph-structured data has benefits of capturing inter-connectivity (topology) and hetero geneous knowledge (node/edge features) simultaneously. Hypergraphs may glean even more information reflecting complex non-pairwise relationships and additional metadata. Graph- and hypergraph-based partitioners can model workload or communication patterns of analytics and learning algorithms, enabling data-parallel scalability while preserving the solution quality. Hypergraph-based optimisations remain under-explored for graph neural networks (GNNs), which have complex access patterns compared to analytics workloads. Furthermore, special optimisations are needed when representing dynamic graph topologies and learning incrementally from streaming data. This thesis explores hypergraph-based optimisations for several scalable graph analytics and learning tasks. First, a hypergraph sampling approach is presented that supports large-scale dynamic graphs when modelling information cascades. Next, hypergraph partitioning is applied to scale approximate similarity search, by caching the computed features of replicated vertices. Moving from analytics to learning tasks, a data-parallel GNN training algorithm is developed using hypergraph-based construction and partitioning. Its communication scheme allows scalable distributed full-batch GNN training on static graphs. Sparse adja cency patterns are captured to perform non-blocking asynchronous communications for considerable speedups (10x single machine state-of-the-art baseline) in limited memory and bandwidth environments. Distributing GNNs using the hypergraph approach, compared to the graph approach, halves the running time and achieves 15% lower message volume. A new stochastic hypergraph sampling strategy further improves communication efficiency in distributed mini-batch GNN training. The final contribution is the design of streaming partitioners to handle dynamic data within a dataflow framework. This online partitioning pipeline allows complex graph or hypergraph streams to be processed asynchronously. It facilitates low latency distributed GNNs through replication and caching. Overall, the hypergraph-based optimisations in this thesis enable the development of scalable dynamic graph applications.

Item Type:

Thesis (PhD)

Subjects:

Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software

Library of Congress Subject Headings (LCSH):

Hypergraphs, Graph theory -- Data processing, Deep learning (Machine learning), Neural networks (Computer science)

Official Date:

December 2022

Dates:

Date	Event
December 2022	UNSPECIFIED

Institution:

University of Warwick

Theses Department:

Department of Computer Science

Thesis Type:

PhD

Publication Status:

Unpublished

Supervisor(s)/Advisor:

Ferhatosmanoglu, Hakan

Sponsors:

Feuer International Scholarship

Format of File:

pdf

Extent:

xv, 121 pages : illustrations (some colour), charts

Language:

eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

University of Warwick
Publications service & WRAP

Highlight your research

The Library

Hypergraph-based optimisations for scalable graph analytics and learning

Abstract

Repository staff actions (login required)

University of WarwickPublications service & WRAP

Highlight your research

The Library

Hypergraph-based optimisations for scalable graph analytics and learning

Abstract

Repository staff actions (login required)

University of Warwick
Publications service & WRAP