The utility of machine learning and text mining to expedite systematic reviews in injury recovery research

Giummarra, Melita

doi:10.4225/03/5abc4fb307975

2018_ML_giummarra_FINAL.pdf (74.98 MB)

The utility of machine learning and text mining to expedite systematic reviews in injury recovery research

presentation

posted on 2018-03-29, 02:30 authored by Melita Giummarra

Systematic reviews are an enormously valuable method to understand the level of scientific evidence for a specific problem. However, the exponential rate of publication poses a major barrier to our capacity to conduct and update high quality systematic reviews in a timely manner. Several machine learning text mining tools have been developed to address this problem. Abstrackr is one such tool, hosted at Brown University, USA, that is a free web-based platform that uses an active learning algorithm to generate predictions of relevance from the words in citation titles, abstracts and keywords (using unigrams and bigrams). Abstrackr then sorts citations according to relevance, allowing researchers to quickly identify relevant articles, and reducing the need to screen articles with very low relevance. Previous studies have shown that Abstrackr is a useful tool to reduce the burden of conducting and updating systematic reviews in specific topics in health (e.g., genetics) without compromising sensitivity and specificity to identify relevant citations for full text review.
We used Abstrackr to support screening in a systematic review examining the role of fault attributions in recovery from transport injury. A comprehensive search of five databases identified 10,559 citations. Two reviewers screened citations: one screened every citation for relevance (the “gold standard” method), and the second rated citations until a stopping prediction rule was met (no new predictions in Abstrackr). An overview of our experience in using Abstrackr and text mining for health research will be discussed, especially our learnings on the workload efficiencies, precision and false-positives observed from machine-learning assisted screening.