Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

An improved machine learning pipeline for urinary volatiles disease detection : diagnosing diabetes

Tools
- Tools
+ Tools

Martinez-Vernon, Andrea, Covington, James A., Arasaradnam, Ramesh P., Esfahani, Siavash, O’Connell, Nicola, Kyrou, Ioannis and Savage, Richard S. (2018) An improved machine learning pipeline for urinary volatiles disease detection : diagnosing diabetes. PLoS One, 13 (9). e0204425. doi:10.1371/journal.pone.0204425

[img]
Preview
PDF
WRAP-improved-machine-learning-pipeline-urinary-volatiles-disease-detection-Savage-2018.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons Attribution 4.0.

Download (3862Kb) | Preview
Official URL: https://doi.org/10.1371/journal.pone.0204425

Request Changes to record.

Abstract

Motivation

The measurement of disease biomarkers in easily–obtained bodily fluids has opened the door to a new type of non–invasive medical diagnostics. New technologies are being developed and fine–tuned in order to make this possibility a reality. One such technology is Field Asymmetric Ion Mobility Spectrometry (FAIMS), which allows the measurement of volatile organic compounds (VOCs) in biological samples such as urine. These VOCs are known to contain a range of information on the relevant person’s metabolism and can in principle be used for disease diagnostic purposes. Key to the effective use of such data are well–developed data processing pipelines, which are necessary to extract the most useful data from the complex underlying biological structure.

Results

In this study, we present a new data analysis pipeline for FAIMS data, and demonstrate a number of improvements over previously used methods. We evaluate the effect of a series of candidate operational steps during data processing, such as the use of wavelet transforms, principal component analysis (PCA), and classifier ensembles. We also demonstrate the use of FAIMS data in our pipeline to diagnose diabetes on the basis of a simple urine sample using machine learning classifiers. We present results for data generated from a case-control study of 115 urine samples, collected from 72 type II diabetic patients, with 43 healthy volunteers as negative controls. The resulting pipeline combines the steps that resulted in the best classification model performance. These include the use of a two–dimensional discrete wavelet transform, and the Wilcoxon rank–sum test for feature selection. We are able to achieve a best ROC curve AUC of 0.825 (0.747–0.9, 95% CI) for classification of diabetes vs control. We also note that this result is robust to changes in the data pipeline and different analysis runs, with AUC > 0.80 achieved in a range of cases. This is a substantial improvement in performance over previously used data processing methods in this area. Our ability to make strong statements about FAIMS ability to diagnose diabetes is sadly limited, as we found confounding effects from the demographics when including these data in the pipeline. The demographics alone produced a best AUC of 0.87 (0.795–0.94, 95% CI). While the combination of the demographics and FAIMS data resulted in an improvement on the AUC (0.907; 0.848–0.97, 95% CI), it did not prove to be a significant difference. Nevertheless, the pipeline itself shows a significant improvement in performance over more basic methods which have been used with FAIMS data in the past.

Item Type: Journal Article
Subjects: Q Science > Q Science (General)
R Medicine > RC Internal medicine
Divisions: Faculty of Science, Engineering and Medicine > Engineering > Engineering
Faculty of Science, Engineering and Medicine > Science > Statistics
Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School
Library of Congress Subject Headings (LCSH): Non-insulin-dependent diabetes -- Diagnosis, Biochemical markers, Machine learning, Urine -- Analysis
Journal or Publication Title: PLoS One
Publisher: Public Library of Science
ISSN: 1932-6203
Official Date: 27 September 2018
Dates:
DateEvent
27 September 2018Published
9 September 2018Accepted
Volume: 13
Number: 9
Article Number: e0204425
DOI: 10.1371/journal.pone.0204425
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Open Access
RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
UNSPECIFIEDConsejo Nacional de Ciencia y Tecnologíahttp://dx.doi.org/10.13039/501100003141
UNSPECIFIEDUniversity of Warwickhttp://dx.doi.org/10.13039/501100000741
UNSPECIFIED[MRC] Medical Research Councilhttp://dx.doi.org/10.13039/501100000265
Open Access Version:
  • https://journals.plos.org/plosone/articl...

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us