Video-Based Pose Estimation for Gait Analysis in Stroke Survivors during Clinical Assessments: A Proof-of-Concept Study

Q1 Computer Science Digital Biomarkers Pub Date : 2022-01-13 DOI:10.1159/000520732

L. Lonini, Y. Moon, Kyle R. Embry, R. Cotton, K. McKenzie, Sophia Jenz, A. Jayaraman

{"title":"Video-Based Pose Estimation for Gait Analysis in Stroke Survivors during Clinical Assessments: A Proof-of-Concept Study","authors":"L. Lonini, Y. Moon, Kyle R. Embry, R. Cotton, K. McKenzie, Sophia Jenz, A. Jayaraman","doi":"10.1159/000520732","DOIUrl":null,"url":null,"abstract":"Recent advancements in deep learning have produced significant progress in markerless human pose estimation, making it possible to estimate human kinematics from single camera videos without the need for reflective markers and specialized labs equipped with motion capture systems. Such algorithms have the potential to enable the quantification of clinical metrics from videos recorded with a handheld camera. Here we used DeepLabCut, an open-source framework for markerless pose estimation, to fine-tune a deep network to track 5 body keypoints (hip, knee, ankle, heel, and toe) in 82 below-waist videos of 8 patients with stroke performing overground walking during clinical assessments. We trained the pose estimation model by labeling the keypoints in 2 frames per video and then trained a convolutional neural network to estimate 5 clinically relevant gait parameters (cadence, double support time, swing time, stance time, and walking speed) from the trajectory of these keypoints. These results were then compared to those obtained from a clinical system for gait analysis (GAITRite®, CIR Systems). Absolute accuracy (mean error) and precision (standard deviation of error) for swing, stance, and double support time were within 0.04 ± 0.11 s; Pearson’s correlation with the reference system was moderate for swing times (r = 0.4–0.66), but stronger for stance and double support time (r = 0.93–0.95). Cadence mean error was −0.25 steps/min ± 3.9 steps/min (r = 0.97), while walking speed mean error was −0.02 ± 0.11 m/s (r = 0.92). These preliminary results suggest that single camera videos and pose estimation models based on deep networks could be used to quantify clinically relevant gait metrics in individuals poststroke, even while using assistive devices in uncontrolled environments. Such development opens the door to applications for gait analysis both inside and outside of clinical settings, without the need of sophisticated equipment.","PeriodicalId":11242,"journal":{"name":"Digital Biomarkers","volume":"6 1","pages":"9 - 18"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Biomarkers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1159/000520732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 27

Abstract

Recent advancements in deep learning have produced significant progress in markerless human pose estimation, making it possible to estimate human kinematics from single camera videos without the need for reflective markers and specialized labs equipped with motion capture systems. Such algorithms have the potential to enable the quantification of clinical metrics from videos recorded with a handheld camera. Here we used DeepLabCut, an open-source framework for markerless pose estimation, to fine-tune a deep network to track 5 body keypoints (hip, knee, ankle, heel, and toe) in 82 below-waist videos of 8 patients with stroke performing overground walking during clinical assessments. We trained the pose estimation model by labeling the keypoints in 2 frames per video and then trained a convolutional neural network to estimate 5 clinically relevant gait parameters (cadence, double support time, swing time, stance time, and walking speed) from the trajectory of these keypoints. These results were then compared to those obtained from a clinical system for gait analysis (GAITRite®, CIR Systems). Absolute accuracy (mean error) and precision (standard deviation of error) for swing, stance, and double support time were within 0.04 ± 0.11 s; Pearson’s correlation with the reference system was moderate for swing times (r = 0.4–0.66), but stronger for stance and double support time (r = 0.93–0.95). Cadence mean error was −0.25 steps/min ± 3.9 steps/min (r = 0.97), while walking speed mean error was −0.02 ± 0.11 m/s (r = 0.92). These preliminary results suggest that single camera videos and pose estimation models based on deep networks could be used to quantify clinically relevant gait metrics in individuals poststroke, even while using assistive devices in uncontrolled environments. Such development opens the door to applications for gait analysis both inside and outside of clinical settings, without the need of sophisticated equipment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视频的步态估计在中风幸存者的临床评估:一项概念验证研究

深度学习的最新进展在无标记人体姿态估计方面取得了重大进展，使从单摄像机视频中估计人体运动学成为可能，而无需反射标记和配备运动捕捉系统的专业实验室。这种算法有可能从手持摄像机记录的视频中量化临床指标。在这里，我们使用DeepLabCut，一个用于无标记姿势估计的开源框架，对一个深度网络进行微调，以跟踪82个腰部以下视频中的5个身体关键点（髋、膝、踝、脚跟和脚趾），这些视频中有8名中风患者在临床评估期间进行地上行走。我们通过标记每个视频2帧中的关键点来训练姿势估计模型，然后训练卷积神经网络，根据这些关键点的轨迹估计5个临床相关的步态参数（节奏、双支撑时间、摆动时间、站立时间和行走速度）。然后将这些结果与步态分析临床系统（GAITRite®，CIR Systems）获得的结果进行比较。摆动、站立和双支撑时间的绝对精度（平均误差）和精度（误差标准偏差）在0.04±0.11s内；Pearson与参考系统的相关性在摆动时间方面中等（r=0.4-0.66），但在站立和双支撑时间方面更强（r=0.93-0.95）。Cadence平均误差为-0.25步/分钟±3.9步/分钟（r=0.97），而步行速度平均误差为-0.02±0.11m/s（r=0.92）。这些初步结果表明，即使在不受控制的环境中使用辅助设备，基于深度网络的单摄像头视频和姿势估计模型也可以用于量化中风后个体的临床相关步态指标。这样的发展为步态分析在临床环境内外的应用打开了大门，而不需要复杂的设备。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊