The Library
High-dimensional sparse random networks with covariates
Tools
Stein, Stefan (2021) High-dimensional sparse random networks with covariates. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Stein_2021.pdf - Submitted Version - Requires a PDF viewer. Download (6Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3766431
Abstract
High-dimensional sparse random networks with covariates
An increasingly urgent task in analysis of network data is to develop statistical models that include contextual information in the form of covariates while respecting degree heterogeneity and network sparsity. We study various stochastic network models with parameters that explicitly account for these stylized features of realworld networks.
To set the tone of the thesis, we highlight in Chapter 1 the fallacy of data selective inference – a common practice of artificially truncating an observed network by throwing away any nodes that are not well-connected. This constitutes a form of sampling bias, which we quantify theoretically for the Erdos-Rényi model and empirically for the Stochastic Block Model.
We introduce the sparse _-model with covariates (S_M-C) in Chapter 2. By assuming sparsity of the degree heterogeneity parameter, S_M-C is capable of fitting sparse, undirected networks, enabling us to avoid data-selective inference. For parameter estimation, we propose the use of a penalized likelihood method with an `1-penalty on the nodal parameters. This gives rise to a convex optimization formulation which immediately connects our estimation procedure to the LASSO literature. We provide finite sample bounds on the excess risk and the `1-error of the resulting estimator and develop a central limit theorem for the parameter associated with the covariates.
In Chapter 3 we zoom in on the special case of S_M-C when no degree heterogeneity parameter is present. We call this the sparse Erdos-Rényi model with covariates (ER-C) and show that it can model networks of almost arbitrary sparsity.
We extend S_M-C to directed networks by introducing the parameter-Sparse Random Graph Model (SRGM) in Chapter 4. We prove that an `1-penalized estimator is model selection consistent for SRGM. We further recover results similar to the ones we established for S_M-C. Special focus is placed on the interplay of the network sparsity, the parameter sparsity and the penalty we use. This allows us to paint a nuanced picture of the effect of different sparsity regimes on parameter estimation.
Chapter 6 presents the results of a collaboration with Tata Steel in Europe and can be read independently from the rest of this thesis. In it we present the initial Guided Analytics for parameter Testing and controlband Extraction (iGATE) framework, a novel feature selection procedure for industry applications that combines expert knowledge with statistical techniques.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software |
||||
Library of Congress Subject Headings (LCSH): | Analysis of covariance, Sparse matrices, Computer networks -- Statistical methods, Stochastic models, Random graphs | ||||
Official Date: | December 2021 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Statistics | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Leng, Chenlei | ||||
Sponsors: | Engineering and Physical Sciences Research Council | ||||
Format of File: | |||||
Extent: | x, 207 leaves : illustrations | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year