Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Imputing biomarker status from RWE datasets — a comparative study

Tools
- Tools
+ Tools

Traynor, Carlos, Sahota, Tarjinder, Tomkinson, Helen, Gonzalez-Garcia, Ignacio, Evans, Neil D. and Chappell, Michael J. (2021) Imputing biomarker status from RWE datasets — a comparative study. Journal of Personalized Medicine, 11 (12). e1356. doi:10.3390/jpm11121356 ISSN 2075-4426.

[img]
Preview
PDF
WRAP-imputing-biomarker-status-RWE-datasets-2021.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons Attribution 4.0.

Download (543Kb) | Preview
Official URL: https://doi.org/10.3390/jpm11121356

Request Changes to record.

Abstract

Missing data is a universal problem in analysing Real-World Evidence (RWE) datasets. In RWE datasets, there is a need to understand which features best correlate with clinical outcomes. In this context, the missing status of several biomarkers may appear as gaps in the dataset that hide meaningful values for analysis. Imputation methods are general strategies that replace missing values with plausible values. Using the Flatiron NSCLC dataset, including more than 35,000 subjects, we compare the imputation performance of six such methods on missing data: predictive mean matching, expectation-maximisation, factorial analysis, random forest, generative adversarial networks and multivariate imputations with tabular networks. We also conduct extensive synthetic data experiments with structural causal models. Statistical learning from incomplete datasets should select an appropriate imputation algorithm accounting for the nature of missingness, the impact of missing data, and the distribution shift induced by the imputation algorithm. For our synthetic data experiments, tabular networks had the best overall performance. Methods using neural networks are promising for complex datasets with non-linearities. However, conventional methods such as predictive mean matching work well for the Flatiron NSCLC biomarker dataset.

Item Type: Journal Article
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics
R Medicine > R Medicine (General)
Divisions: Faculty of Science, Engineering and Medicine > Engineering > Engineering
SWORD Depositor: Library Publications Router
Library of Congress Subject Headings (LCSH): Evidence-based medicine, Clinical medicine -- Decision making, Systematic reviews (Medical research), Clinical trials -- Reporting, Clinical trials -- Computer simulation, Machine learning , Mathematical statistics
Journal or Publication Title: Journal of Personalized Medicine
Publisher: MDPI
ISSN: 2075-4426
Official Date: 13 December 2021
Dates:
DateEvent
13 December 2021Published
4 December 2021Accepted
Volume: 11
Number: 12
Article Number: e1356
DOI: 10.3390/jpm11121356
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Open Access (Creative Commons)
Date of first compliant deposit: 7 February 2022
Date of first compliant Open Access: 8 February 2022
RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
RESEE3316[EPSRC] Engineering and Physical Sciences Research Councilhttp://dx.doi.org/10.13039/501100000266
AESEE8817AstraZenecahttp://dx.doi.org/10.13039/100004325

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us