The Library

Technical metrics used to evaluate health care chatbots : a scoping review

Tools

Abd-Alrazaq, Alaa, Safi, Zeineb, Alajlani, Mohannad, Warren, Jim, Househ, Mowafa and Denecke, Kerstin (2020) Technical metrics used to evaluate health care chatbots : a scoping review. Journal of Medical Internet Research, 22 (6). e18301. doi:10.2196/18301 ISSN 1438-8871.

Research output not available from this repository.

Request-a-Copy directly from author or use local Library Get it For Me service.

Official URL: https://doi.org/10.2196/18301

Request Changes to record.

Abstract

Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field. This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots. Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated. Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content). The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies.

Item Type:

Journal Article

Divisions:

Faculty of Science, Engineering and Medicine > Engineering > WMG (Formerly the Warwick Manufacturing Group)

SWORD Depositor:

Library Publications Router

Journal or Publication Title:

Journal of Medical Internet Research

Publisher:

Journal of Medical Internet Research

ISSN:

1438-8871

Official Date:

June 2020

Dates:

Date	Event
June 2020	Published
15 April 2020	Available

Volume:

Number:

Article Number:

e18301

DOI:

10.2196/18301

Status:

Peer Reviewed

Publication Status:

Published

Access rights to Published version:

Restricted or Subscription Access

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

University of Warwick
Publications service & WRAP

Highlight your research

The Library

Technical metrics used to evaluate health care chatbots : a scoping review

Abstract

Repository staff actions (login required)

University of WarwickPublications service & WRAP

Highlight your research

The Library

Technical metrics used to evaluate health care chatbots : a scoping review

Abstract

Repository staff actions (login required)

University of Warwick
Publications service & WRAP