Machine learning for automating subjective clinical assessment of gait impairment in people with acquired brain injury - a comparison of an image extraction and classification system to expert scoring.
Ashleigh Mobbs, Michelle Kahn, Gavin Williams, Benjamin F Mentiplay, Yong-Hao Pua, Ross A Clark
{"title":"Machine learning for automating subjective clinical assessment of gait impairment in people with acquired brain injury - a comparison of an image extraction and classification system to expert scoring.","authors":"Ashleigh Mobbs, Michelle Kahn, Gavin Williams, Benjamin F Mentiplay, Yong-Hao Pua, Ross A Clark","doi":"10.1186/s12984-024-01406-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Walking impairment is a common disability post acquired brain injury (ABI), with visually evident arm movement abnormality identified as negatively impacting a multitude of psychological factors. The International Classification of Functioning, Disability and Health (ICF) qualifiers scale has been used to subjectively assess arm movement abnormality, showing strong intra-rater and test-retest reliability, however, only moderate inter-rater reliability. This impacts clinical utility, limiting its use as a measurement tool. To both automate the analysis and overcome these errors, the primary aim of this study was to evaluate the ability of a novel two-level machine learning model to assess arm movement abnormality during walking in people with ABI.</p><p><strong>Methods: </strong>Frontal plane gait videos were used to train four networks with 50%, 75%, 90%, and 100% of participants (ABI: n = 42, healthy controls: n = 34) to automatically identify anatomical landmarks using DeepLabCut<sup>™</sup> and calculate two-dimensional kinematic joint angles. Assessment scores from three experienced neurorehabilitation clinicians were used with these joint angles to train random forest networks with nested cross-validation to predict assessor scores for all videos. Agreement between unseen participant (i.e. test group participants that were not used to train the model) predictions and each individual assessor's scores were compared using quadratic weighted kappa. One sample t-tests (to determine over/underprediction against clinician ratings) and one-way ANOVA (to determine differences between networks) were applied to the four networks.</p><p><strong>Results: </strong>The machine learning predictions have similar agreement to experienced human assessors, with no statistically significant (p < 0.05) difference for any match contingency. There was no statistically significant difference between the predictions from the four networks (F = 0.119; p = 0.949). The four networks did however under-predict scores with small effect sizes (p range = 0.007 to 0.040; Cohen's d range = 0.156 to 0.217).</p><p><strong>Conclusions: </strong>This study demonstrated that machine learning can perform similarly to experienced clinicians when subjectively assessing arm movement abnormality in people with ABI. The relatively small sample size may have resulted in under-prediction of some scores, albeit with small effect sizes. Studies with larger sample sizes that objectively and automatically assess dynamic movement in both local and telerehabilitation assessments, for example using smartphones and edge-based machine learning, to reduce measurement error and healthcare access inequality are needed.</p>","PeriodicalId":16384,"journal":{"name":"Journal of NeuroEngineering and Rehabilitation","volume":"21 1","pages":"124"},"PeriodicalIF":5.2000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264460/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of NeuroEngineering and Rehabilitation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s12984-024-01406-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Walking impairment is a common disability post acquired brain injury (ABI), with visually evident arm movement abnormality identified as negatively impacting a multitude of psychological factors. The International Classification of Functioning, Disability and Health (ICF) qualifiers scale has been used to subjectively assess arm movement abnormality, showing strong intra-rater and test-retest reliability, however, only moderate inter-rater reliability. This impacts clinical utility, limiting its use as a measurement tool. To both automate the analysis and overcome these errors, the primary aim of this study was to evaluate the ability of a novel two-level machine learning model to assess arm movement abnormality during walking in people with ABI.
Methods: Frontal plane gait videos were used to train four networks with 50%, 75%, 90%, and 100% of participants (ABI: n = 42, healthy controls: n = 34) to automatically identify anatomical landmarks using DeepLabCut™ and calculate two-dimensional kinematic joint angles. Assessment scores from three experienced neurorehabilitation clinicians were used with these joint angles to train random forest networks with nested cross-validation to predict assessor scores for all videos. Agreement between unseen participant (i.e. test group participants that were not used to train the model) predictions and each individual assessor's scores were compared using quadratic weighted kappa. One sample t-tests (to determine over/underprediction against clinician ratings) and one-way ANOVA (to determine differences between networks) were applied to the four networks.
Results: The machine learning predictions have similar agreement to experienced human assessors, with no statistically significant (p < 0.05) difference for any match contingency. There was no statistically significant difference between the predictions from the four networks (F = 0.119; p = 0.949). The four networks did however under-predict scores with small effect sizes (p range = 0.007 to 0.040; Cohen's d range = 0.156 to 0.217).
Conclusions: This study demonstrated that machine learning can perform similarly to experienced clinicians when subjectively assessing arm movement abnormality in people with ABI. The relatively small sample size may have resulted in under-prediction of some scores, albeit with small effect sizes. Studies with larger sample sizes that objectively and automatically assess dynamic movement in both local and telerehabilitation assessments, for example using smartphones and edge-based machine learning, to reduce measurement error and healthcare access inequality are needed.
期刊介绍:
Journal of NeuroEngineering and Rehabilitation considers manuscripts on all aspects of research that result from cross-fertilization of the fields of neuroscience, biomedical engineering, and physical medicine & rehabilitation.