The Library
Analysing clinical data for real-world evidence generation in oncology
Tools
Traynor, Carlos (2021) Analysing clinical data for real-world evidence generation in oncology. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Traynor_2021.pdf - Submitted Version - Requires a PDF viewer. Download (3283Kb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3821330
Abstract
Randomised clinical trials (RCT) are the bedrock of evidence-based medicine and remain the gold standard in determining the efficacy and safety of investigational new drugs in well-defined populations. They have high internal validity and remain crucial for securing regulatory approval. However, RCTs potentially lack external validity because of the exclusion of subpopulations such as the elderly or comorbidities. Time constraints limit assessing long-term effects, and sample size may be inadequate to identify new biomarkers for personalisation. Real-world evidence (RWE) can complement RCTs' evidence by providing effectiveness and safety data in a wide range of outcomes representative of the everyday clinical setting. Similarly, the concept of real-world data (RWD) is typically associated with big datasets that advance current medical practice towards personalisation. However, if used only to predict the most beneficial treatment choice, the best-case scenario with RWE analysis could match the current medical practice. The key challenge in analysing RWD is that individualised treatment effects are never observed. Therefore, its non-randomised, observational nature is prone to biases from unrecognised factors. To properly use RWD requires finding better solutions to the unique challenges of working with clinical data: (1) a significant amount of missing data, (2) heterogeneous data, (3) seldom exist a ground truth. This dissertation addresses these specialities constructing a formal causal inference framework to enhance the statistical analysis of RWD. We focus on three problems:
• Missing data imputation
• Accurately predicting the consequences of treatment in biomarker-defined populations
• Assessing how conclusions might change in the presence of hidden factors
To appropriately tackle these problems, we propose new methodologies for RWD analysis: 1. We formalise the missing data problem to design a machine-learning algorithm to perform missing data imputation. 2. We develop Bayesian modelling techniques for treatment effects heterogeneity of survival outcomes introducing a new methodology named survival Gaussian processes, which are particularly well-suited for distributed varying treatment effects inference. 3. We extend the Bayesian approach to infer causal bounds for time-varying effects probabilistically. To demonstrate the technique's utility, we analyse two large real-world cohorts of non-small cell lung cancer patients with epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), kirsten rat sarcoma (KRAS), B-RAF proto-oncogene (BRAF), and immunotherapy marker programmed death-ligand 1 receptor (PD-L1) status of biomarker and treated with immune checkpoint inhibitors (ICI). The first study tackled the missing data problem by developing a new imputation algorithm for multiple imputations in synthetic and real-world examples of biomarker status missingness. The second study covered the impact of ICI in the survival time of NSCLC patients stratified by PD-L1 expression, handling missing data with the first study's results, and embracing the Bayesian approach for modelling heterogeneous treatment effects and time-varying effects. We show that the proposed methods outperform state-of-the-art methods for missing data imputation in complex datasets with non-linearities, pooling across PD-L1 per cent staining difference with Gaussian processes achieves better out-of-sample performance than conventional interaction models and estimates of causal bounds are critical for understanding the impact of unobserved confounding in analysing RWD.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics R Medicine > R Medicine (General) |
||||
Library of Congress Subject Headings (LCSH): | Medical informatics, Clinical trials -- Data processing, Clinical medicine -- Research -- Data processing, Data Collection, Biomarkers, Missing observations (Statistics), Mathematical statistics | ||||
Official Date: | December 2021 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | School of Engineering | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Chappell, M. J. (Michael J.) ; Evans, Neil D. | ||||
Sponsors: | Engineering and Physical Sciences Research Council ; AstraZeneca (Firm) | ||||
Format of File: | |||||
Extent: | xxiii, 188 leaves : illustrations, charts | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year