Hypergraph-based optimisations for scalable graph analytics and learning

[thumbnail of WRAP_Theses_Haldar_2022.pdf]
Preview
PDF
WRAP_Theses_Haldar_2022.pdf - Submitted Version - Requires a PDF viewer.

Download (4MB) | Preview

Request Changes to record.

Abstract

Graph-structured data has benefits of capturing inter-connectivity (topology) and hetero geneous knowledge (node/edge features) simultaneously. Hypergraphs may glean even more information reflecting complex non-pairwise relationships and additional metadata. Graph- and hypergraph-based partitioners can model workload or communication patterns of analytics and learning algorithms, enabling data-parallel scalability while preserving the solution quality. Hypergraph-based optimisations remain under-explored for graph neural networks (GNNs), which have complex access patterns compared to analytics workloads. Furthermore, special optimisations are needed when representing dynamic graph topologies and learning incrementally from streaming data. This thesis explores hypergraph-based optimisations for several scalable graph analytics and learning tasks. First, a hypergraph sampling approach is presented that supports large-scale dynamic graphs when modelling information cascades. Next, hypergraph partitioning is applied to scale approximate similarity search, by caching the computed features of replicated vertices. Moving from analytics to learning tasks, a data-parallel GNN training algorithm is developed using hypergraph-based construction and partitioning. Its communication scheme allows scalable distributed full-batch GNN training on static graphs. Sparse adja cency patterns are captured to perform non-blocking asynchronous communications for considerable speedups (10x single machine state-of-the-art baseline) in limited memory and bandwidth environments. Distributing GNNs using the hypergraph approach, compared to the graph approach, halves the running time and achieves 15% lower message volume. A new stochastic hypergraph sampling strategy further improves communication efficiency in distributed mini-batch GNN training. The final contribution is the design of streaming partitioners to handle dynamic data within a dataflow framework. This online partitioning pipeline allows complex graph or hypergraph streams to be processed asynchronously. It facilitates low latency distributed GNNs through replication and caching. Overall, the hypergraph-based optimisations in this thesis enable the development of scalable dynamic graph applications.

Item Type: Thesis [via Doctoral College] (PhD)
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Hypergraphs, Graph theory -- Data processing, Deep learning (Machine learning), Neural networks (Computer science)
Official Date: December 2022
Dates:
Date
Event
December 2022
UNSPECIFIED
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Ferhatosmanoglu, Hakan
Sponsors: Feuer International Scholarship
Format of File: pdf
Extent: xv, 121 pages : illustrations (some colour), charts
Language: eng
URI: https://wrap.warwick.ac.uk/176730/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item