Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Data for A method for machine learning generation of realistic synthetic datasets for validating healthcare applications

Tools
- Tools
+ Tools

Arvanitis, Theodoros N., White, Sean, Harrison, Stuart, Chaplin, Rupert and Despotou, George (2022) Data for A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. [Dataset]

[img] Microsoft Word (Readme file)
README.rtf - Published Version
Embargoed item. Restricted access to Repository staff only

Download (718b)
[img] Archive (ZIP) (Dataset)
RSDGM Experiments.zip - Published Version
Embargoed item. Restricted access to Repository staff only

Download (9Mb)
Official URL: http://wrap.warwick.ac.uk/162871/

Request Changes to record.

Abstract

Background Digital health applications can improve quality and effectiveness of healthcare, by offering a number of tools to patients, professionals, and the healthcare system. Introduction of new technologies is not without risk, and digital health applications are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, which needs large datasets to test their application in realistic clinical scenarios. Access to such datasets is challenging, due to concerns about patient privacy. Development of synthetic datasets, which will be sufficiently realistic to test digital applications, is seen as a potential alternative, enabling their deployment.

Objective The aim of work was to develop a method for the generation of realistic synthetic datasets, statistically equivalent to real clinical datasets, and demonstrate that Generative Adversarial Network based approach is fit for purpose.

Method A generative adversarial network was implemented and trained, in a series of six experiments, using numerical and categorical variables from three clinically relevant datasets, including ICD-9 and laboratory codes from the MIMIC III dataset. A number of contextual steps provided the success criteria for the synthetic dataset.

Results The approach created a synthetic dataset that exhibits very similar statistical characteristics with the real dataset. Pairwise association of variables is very similar. A high degree of Jaccard similarity and a successful K-S test further support this.

Conclusions The proof of concept of generating realistic synthetic datasets was successful, with the approach showing promise for further work.

Item Type: Dataset
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
R Medicine > R Medicine (General)
Divisions: Faculty of Science, Engineering and Medicine > Engineering > WMG (Formerly the Warwick Manufacturing Group)
Type of Data: Synthetically generated data based on the experiments
Library of Congress Subject Headings (LCSH): Neural networks (Computer science), Machine learning, Medical Informatics, Medicine -- Data processing
Publisher: University of Warwick, Warwick Manufacturing Group
Official Date: 25 February 2022
Dates:
DateEvent
25 February 2022Published
15 February 2022Available
15 February 2022Created
Status: Not Peer Reviewed
Publication Status: Published
Media of Output (format): .log
Access rights to Published version: Open Access (Creative Commons)
Copyright Holders: University of Warwick, NHS Digital
Description:

The dataset contains the synthetic data produced in experiments 3-6, as described in: A Method for Machine Learning Generation of Realistic Synthetic Datasets for Validating Healthcare Applications (DOI: 10.1177/14604582221077000/) in Health Informatics Journal, SAGE.
Please contact the 'contacting author' for access. Access after approval of all authors.

Date of first compliant deposit: 15 February 2022
RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
UNSPECIFIED[MRC] Medical Research Councilhttp://dx.doi.org/10.13039/501100000265
UNSPECIFIED[EPSRC] Engineering and Physical Sciences Research Councilhttp://dx.doi.org/10.13039/501100000266
UNSPECIFIED[ESRC] Economic and Social Research Councilhttp://dx.doi.org/10.13039/501100000269
UNSPECIFIEDDepartment of Health and Social Care (England)UNSPECIFIED
UNSPECIFIEDChief Scientist Office, Scottish Government Health and Social Care Directoratehttp://dx.doi.org/10.13039/100014589
UNSPECIFIEDHealth and Social Care Research and Development Divisionhttp://dx.doi.org/10.13039/501100010756
UNSPECIFIEDPublic Health Agencyhttp://dx.doi.org/10.13039/501100001626
UNSPECIFIEDBritish Heart Foundationhttp://dx.doi.org/10.13039/501100000274
UNSPECIFIEDWellcome Trusthttp://dx.doi.org/10.13039/100010269
Related URLs:
  • Other
  • Related item in WRAP
Contributors:
ContributionNameContributor ID
DepositorDespotou, George65139

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item
twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us