The Library
Relative performance of machine learning and linear regression in predicting quality of life and academic performance of school children in Norway : data analysis of a quasi-experimental study
Tools
Froud, Robert J., Hansen, Solveig Hakestad, Ruud, Hans Kristian, Foss, Jonathan G. K., Ferguson, Leila and Fredriksen, Per Morten (2021) Relative performance of machine learning and linear regression in predicting quality of life and academic performance of school children in Norway : data analysis of a quasi-experimental study. Journal of Medical Internet Research, 23 (7). 22021. doi:10.2196/22021 ISSN 1438-8871.
|
PDF
WRAP-predicting-quality-life-academic-performance-school-children-Norway-Froud-2021.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution 4.0. Download (455Kb) | Preview |
|
PDF
WRAP-predicting-quality-life-academic-performance-school-children-Norway-Froud-2021 .pdf - Accepted Version Embargoed item. Restricted access to Repository staff only - Requires a PDF viewer. Download (1222Kb) |
Official URL: http://dx.doi.org/10.2196/22021
Abstract
Background:
Machine learning (ML) approaches are increasingly being used in health research. It is not clear how useful these approaches are for modelling continuous health outcomes. Child quality of life (QoL) is associated with parental socioeconomic status and child activity levels, and may be associated with aerobic fitness and strength. It is not clear whether diet, or academic performance (AP) is associated with QoL.
Objective:
To compare predictive performances of ML approaches with linear regression for modelling QoL and AP using parental education and lifestyle data.
Methods:
We modelled data from children attending nine schools in a quasi-experimental study (NCT02495714). We split data randomly into training and validation sets, and simulated curvilinear, non-linear, and heteroscedastic variables. We examined relative performance of ML approaches using R2, making comparisons to mixed and fixed models, and regression with splines, with and without imputation. We also examined the effect of training set size on overfitting.
Results:
We had 1,711 cases. Using real data, our regression models explained 24% of AP variance in the complete-case validation set, and up to 15% of QoL variance. While ML models explained high proportions of variance in training sets, in validation sets these explained ~0% of AP and between 3% and 8% of QoL. Following imputation, ML models improved up to 15% for AP. ML models outperformed regression for modelling simulated non-linear and heteroscedastic variables only. A smaller training set did not lead to increased overfitting. The best predictors of QoL were 7-point self-reported activity (P<.001; ß=1.09 (95% CI 0.53 to 1.66)) and TV/computer use (P=.002; ß=-0.95 (-1.55 to -0.36)). For AP, these were mother having master’s-level education (P<.001; ß=1.98 (0.25 to 3.71)) and dichotomised self-reported activity (P=.001; ß=2.47 (1.08 to 3.87)). Adjusted academic performance was associated with QoL (P=.02; ß=0.12 (0.02 to 0.22)).
Conclusions:
Exercising to cause sweat once per week and 2 hours per day of TV or computer use are associated with small-to-medium increases and decreases in child QoL, respectively. An increase in AP of 20 units is associated with a small increase in QoL. A mother having higher and master’s-level education, 2 hours per day of TV or computer use, and taking at least 2 hours of exercise, are each associated with small-to-medium increases in AP. Differences between effects of computer/TV use for work/leisure needs further investigation. Linear regression is less prone to overfitting and performs better than ML in predicting continuous health outcomes in a dataset containing missing data. Imputation improves ML performance but not enough to outperform regression. ML outperformed regression with non-linear and heteroscedastic data and may be of use when such relationships exist, and where imputation is sensible or there are no missing data. Clinical Trial: The data are from a quasi-experimental design and not an RCT but nevertheless the study from which the data are from does have a registration: NCT02495714
Item Type: | Journal Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Alternative Title: | |||||||||
Subjects: | H Social Sciences > HN Social history and conditions. Social problems. Social reform L Education > LB Theory and practice of education Q Science > Q Science (General) Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software |
||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School > Clinical Trials Unit Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School > Health Sciences Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School |
||||||||
Library of Congress Subject Headings (LCSH): | School children -- Norway , School children -- Norway -- Social conditions -- Data processing, School children -- Intelligence testing -- Norway-- Data processing, Quality of life -- Norway -- Data processing, Academic achievement -- Norway -- Evaluation -- Data processing, Regression analysis, Artificial intelligence | ||||||||
Journal or Publication Title: | Journal of Medical Internet Research | ||||||||
Publisher: | JMIR Publications | ||||||||
ISSN: | 1438-8871 | ||||||||
Official Date: | 16 July 2021 | ||||||||
Dates: |
|
||||||||
Volume: | 23 | ||||||||
Number: | 7 | ||||||||
Article Number: | 22021 | ||||||||
DOI: | 10.2196/22021 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||
Date of first compliant deposit: | 27 May 2021 | ||||||||
Date of first compliant Open Access: | 19 August 2021 | ||||||||
RIOXX Funder/Project Grant: |
|
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year