
The Library
Imputing biomarker status from RWE datasets — a comparative study
Tools
Traynor, Carlos, Sahota, Tarjinder, Tomkinson, Helen, Gonzalez-Garcia, Ignacio, Evans, Neil D. and Chappell, Michael J. (2021) Imputing biomarker status from RWE datasets — a comparative study. Journal of Personalized Medicine, 11 (12). e1356. doi:10.3390/jpm11121356 ISSN 2075-4426.
|
PDF
WRAP-imputing-biomarker-status-RWE-datasets-2021.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution 4.0. Download (543Kb) | Preview |
Official URL: https://doi.org/10.3390/jpm11121356
Abstract
Missing data is a universal problem in analysing Real-World Evidence (RWE) datasets. In RWE datasets, there is a need to understand which features best correlate with clinical outcomes. In this context, the missing status of several biomarkers may appear as gaps in the dataset that hide meaningful values for analysis. Imputation methods are general strategies that replace missing values with plausible values. Using the Flatiron NSCLC dataset, including more than 35,000 subjects, we compare the imputation performance of six such methods on missing data: predictive mean matching, expectation-maximisation, factorial analysis, random forest, generative adversarial networks and multivariate imputations with tabular networks. We also conduct extensive synthetic data experiments with structural causal models. Statistical learning from incomplete datasets should select an appropriate imputation algorithm accounting for the nature of missingness, the impact of missing data, and the distribution shift induced by the imputation algorithm. For our synthetic data experiments, tabular networks had the best overall performance. Methods using neural networks are promising for complex datasets with non-linearities. However, conventional methods such as predictive mean matching work well for the Flatiron NSCLC biomarker dataset.
Item Type: | Journal Article | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics R Medicine > R Medicine (General) |
|||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Engineering > Engineering | |||||||||
SWORD Depositor: | Library Publications Router | |||||||||
Library of Congress Subject Headings (LCSH): | Evidence-based medicine, Clinical medicine -- Decision making, Systematic reviews (Medical research), Clinical trials -- Reporting, Clinical trials -- Computer simulation, Machine learning , Mathematical statistics | |||||||||
Journal or Publication Title: | Journal of Personalized Medicine | |||||||||
Publisher: | MDPI | |||||||||
ISSN: | 2075-4426 | |||||||||
Official Date: | 13 December 2021 | |||||||||
Dates: |
|
|||||||||
Volume: | 11 | |||||||||
Number: | 12 | |||||||||
Article Number: | e1356 | |||||||||
DOI: | 10.3390/jpm11121356 | |||||||||
Status: | Peer Reviewed | |||||||||
Publication Status: | Published | |||||||||
Access rights to Published version: | Open Access (Creative Commons) | |||||||||
Date of first compliant deposit: | 7 February 2022 | |||||||||
Date of first compliant Open Access: | 8 February 2022 | |||||||||
RIOXX Funder/Project Grant: |
|
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year