Bayesian inference in the M-open world

[thumbnail of WRAP_Theses_Jewson_2019.pdf]
Preview
PDF
WRAP_Theses_Jewson_2019.pdf - Submitted Version - Requires a PDF viewer.

Download (4MB) | Preview

Request Changes to record.

Abstract

This thesis examines Bayesian inference and its suitability for modern statistical applications. Motivated by the vast quantities of data currently available for analysis, we forgo the M-closed assumption that the model used for inference is correctly specified and place ourselves in the more realistic M-open world. Here, we assume that the model used for statistical inference is at best an approximation.

In the M-open world Bayes’ rule updating has been shown [Berk et al., 1966; Bissiri et al., 2016] to learn about the model parameters minimising the log-score, or equivalently the Kullback-Leibler divergence (KLD) to the data generating process (DGP). It is also known that minimising the log-score puts great emphasis on correctly capturing the tails of the sample distribution of the data. We observe, that this emphasis is so great, that the majority of the data can be ignored to sufficiently account an outlier. This is purportedly desirable when inference is the goal of the analysis. However, in Chapter 2 we show that when informed decision making via the minimisation of expected losses is the goal of the statistical analysis, as it so often is, Bayes’ rule inferences are less desirable. This motivates us to consider minimising alternative divergences to the KLD.

Bayesian updating minimising alternative divergences to the KLD has briefly been considered in the literature. However, those methods are neither sufficiently well motivated or properly justified as a principled updating of beliefs. We are able to use the foundations of general Bayesian inference (GBI) to produce belief updates minimising any statistical divergence. This allows us to consider the divergence as a subjective judgement and motivate several divergences from a decision making perspective.

Chapter 3 extends the motivation for minimising divergences alternative to the KLD. Here, we consider the model to be one among a equivalence class of belief models all respecting the belief judgements the decision maker (DM) has been able to make. It is therefore desirable for inference to be stable across this equivalence class. This is a well studied problem with respect to the prior component of the Bayesian analysis, but we believe we are one of the first to consider extending these result to the likelihood model. We prove that, unlike Bayes’ rule updating, inference designed at minimising the total-variation divergence (TVD), the Hellinger divergence (HD), and the β-divergence (βD), are able to provide provably stable inferences.

Chapter 4 is inspired by the computation required to infer posteriors in modern Bayesian inference. We derive a generalised optimisation problem defining Bayesian inference. This is axiomatically motivated and contains Bayes’ rule inference, GBI and variational inference (VI) as special cases. This generalised Bayesian inference problem is composed of three interpretable components: a loss function defining the limiting parameter of interest for the analysis; a prior regularising divergence describing how the posterior should quantify uncertainty; and a set of admissible posterior densities to optimise over. Chapters 2 and 3 examined changing the target parameter of inference to deal with model misspecification. Chapter 4 then shows that changing the prior regularising divergence can resolve VI’s tendency to allow posteriors to over-concentrate, we call these methods generalised variational inference (GVI). We also show situations where methods failing to satisfy our axioms produces undesirable and non-transparent inference. We show that GVI is able improve upon state of the art performances for deep Gaussian processes and Bayesian neural networks.

The final chapter considers the challenging and widely applicable problem of detecting regime changes in multi-dimensional on-line streaming data, Bayesian online changepoint detection (BOCPD). BOCPD must use simple computable models in order to run in real time. The current methodology allows model misspecifications and outliers associated with these simple models to cause the detection of spurious changepoints (CP). We robustify this analysis using the βD. We are able to prove results demonstrating that greater evidence is required in order to force the declaration of a CP when using the βD instead of the KLD. Additionally, we deploy a type of GVI algorithm to produces fast and accurate posterior inference that are suitable for on-line application. Applying this robustified algorithm to data recording air pollution in London finds a changepoint around the introduction of the congestion charge but, unlike previous methods does not detect any further regime changes.

Item Type: Thesis [via Doctoral College] (PhD)
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Library of Congress Subject Headings (LCSH): Bayesian statistical decision theory, Decision making -- Mathematical models
Official Date: September 2019
Dates:
Date
Event
September 2019
UNSPECIFIED
Institution: University of Warwick
Theses Department: Department of Statistics
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Smith, J. Q., 1953- ; Holmes, Christopher C.
Format of File: pdf
Extent: xxiv, 224 leaves : illustrations
Language: eng
URI: https://wrap.warwick.ac.uk/147853/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item