在跑步机上行走时，标记帧数对二维无标记姿态估计(DeepLabCut)精度的影响

Gait & posture Pub Date : 2023-09-01 DOI:10.1016/j.gaitpost.2023.07.254

Maud Van Den Bogaart, Maaike M. Eken, Rachel H.J. Senden, Rik G.J. Marcellis, Kenneth Meijer, Pieter Meyns, Hans M.N. Essers

{"title":"在跑步机上行走时，标记帧数对二维无标记姿态估计(DeepLabCut)精度的影响","authors":"Maud Van Den Bogaart, Maaike M. Eken, Rachel H.J. Senden, Rik G.J. Marcellis, Kenneth Meijer, Pieter Meyns, Hans M.N. Essers","doi":"10.1016/j.gaitpost.2023.07.254","DOIUrl":null,"url":null,"abstract":"Gait analysis is imperative for tailoring evidence-based interventions in individuals with and without a physical disability.1 The gold standard for gait analysis is optoelectronic three-dimensional motion analysis, which requires expertise, is laboratory based, and requires expensive equipment, which is not available in all settings, particularly in low to middle-income countries. New techniques based on deep learning to track body landmarks in simple video recordings allow recordings in a natural environment.2,3 Deeplabcut is a free and open-source toolbox to track user-defined features in videofiles.4,5 What is the minimal number of additional labelled frames needed for good tracking accuracy of markerless pose estimation (DeepLabCut) during treadmill walking? An increasing number of videos (1, 2, 5, 10, 15 and 20 videos) from typically developed adults (mean age = 50.7±17.3 years) were included in the analysis. Participants walked at comfortable walking speed on a dual-belt instrumented treadmill (Computer Assisted Rehabilitation Environment (CAREN), Motekforce Link, Amsterdam, The Netherlands). 2D video recordings were conducted in the sagittal plane with a gray-scale camera (50 Hz, Basler scA640-74gm, Basler, Germany). Using the pre-trained MPII human model (ResNet101; pcut-off = 0.8) in DeepLabCut, the following joints and anatomical landmarks were tracked unilaterally (left side): Ankle, knee, hip, shoulder, elbow and wrist (chin and forehead were excluded). An increasing number of frames was labeled per video (1 and 5 frames per video) and added to the pre-trained MPII human model, which was then retrained till 500.000 iterations. 95% of the labelled frames were used for training, 5% for testing. For each scenario with an increasing number of videos and manually labelled frames, the train and test error was calculated. Good tracking accuracy was defined as an error smaller then the diameter of a retroreflective marker (= 1.4 cm). The results of the train and test pixel errors are presented in Fig. 1 for 11 different scenarios. When the number of videos increased to 5 videos with 1 or 5 labelled frames, the train pixel error reduced to 1.11 and 1.16 pixels, respectively (corresponding to an error of < 1 cm). From labelling at least 20 frames, the test pixel error was less then 5 pixels (corresponding to an error of < 3 cm).Download : Download high-res image (91KB)Download : Download full-size image A good tracking accuracy (error < 1 cm) in the training set was achieved from 5 additionally labeled videos. The tracking accuracy for the test dataset remained constant (≈ 2-3 cm) from labelling 20 frames or more. Further research is needed and ongoing to determine the optimal number of training iterations and additional labelled videos and frames for good test and train tracking accuracy (< 1.4 cm). This optimal setup will then be used to validate DeepLabCut to measure joint centres and angles during walking with respect to the gold standard.","PeriodicalId":94018,"journal":{"name":"Gait & posture","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of the number of labelled frames on the accuracy of 2D markerless pose estimation (DeepLabCut) during treadmill walking\",\"authors\":\"Maud Van Den Bogaart, Maaike M. Eken, Rachel H.J. Senden, Rik G.J. Marcellis, Kenneth Meijer, Pieter Meyns, Hans M.N. Essers\",\"doi\":\"10.1016/j.gaitpost.2023.07.254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gait analysis is imperative for tailoring evidence-based interventions in individuals with and without a physical disability.1 The gold standard for gait analysis is optoelectronic three-dimensional motion analysis, which requires expertise, is laboratory based, and requires expensive equipment, which is not available in all settings, particularly in low to middle-income countries. New techniques based on deep learning to track body landmarks in simple video recordings allow recordings in a natural environment.2,3 Deeplabcut is a free and open-source toolbox to track user-defined features in videofiles.4,5 What is the minimal number of additional labelled frames needed for good tracking accuracy of markerless pose estimation (DeepLabCut) during treadmill walking? An increasing number of videos (1, 2, 5, 10, 15 and 20 videos) from typically developed adults (mean age = 50.7±17.3 years) were included in the analysis. Participants walked at comfortable walking speed on a dual-belt instrumented treadmill (Computer Assisted Rehabilitation Environment (CAREN), Motekforce Link, Amsterdam, The Netherlands). 2D video recordings were conducted in the sagittal plane with a gray-scale camera (50 Hz, Basler scA640-74gm, Basler, Germany). Using the pre-trained MPII human model (ResNet101; pcut-off = 0.8) in DeepLabCut, the following joints and anatomical landmarks were tracked unilaterally (left side): Ankle, knee, hip, shoulder, elbow and wrist (chin and forehead were excluded). An increasing number of frames was labeled per video (1 and 5 frames per video) and added to the pre-trained MPII human model, which was then retrained till 500.000 iterations. 95% of the labelled frames were used for training, 5% for testing. For each scenario with an increasing number of videos and manually labelled frames, the train and test error was calculated. Good tracking accuracy was defined as an error smaller then the diameter of a retroreflective marker (= 1.4 cm). The results of the train and test pixel errors are presented in Fig. 1 for 11 different scenarios. When the number of videos increased to 5 videos with 1 or 5 labelled frames, the train pixel error reduced to 1.11 and 1.16 pixels, respectively (corresponding to an error of < 1 cm). From labelling at least 20 frames, the test pixel error was less then 5 pixels (corresponding to an error of < 3 cm).Download : Download high-res image (91KB)Download : Download full-size image A good tracking accuracy (error < 1 cm) in the training set was achieved from 5 additionally labeled videos. The tracking accuracy for the test dataset remained constant (≈ 2-3 cm) from labelling 20 frames or more. Further research is needed and ongoing to determine the optimal number of training iterations and additional labelled videos and frames for good test and train tracking accuracy (< 1.4 cm). This optimal setup will then be used to validate DeepLabCut to measure joint centres and angles during walking with respect to the gold standard.\",\"PeriodicalId\":94018,\"journal\":{\"name\":\"Gait & posture\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Gait & posture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.gaitpost.2023.07.254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gait & posture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.gaitpost.2023.07.254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

步态分析对于在有或没有身体残疾的个体中定制基于证据的干预措施是必要的步态分析的黄金标准是光电三维运动分析，这需要专业知识，以实验室为基础，需要昂贵的设备，这并不是在所有情况下都能得到，特别是在中低收入国家。基于深度学习的新技术可以在简单的视频记录中跟踪身体地标，从而在自然环境中进行记录。Deeplabcut是一个免费的开源工具箱，用于跟踪视频文件中用户自定义的功能。4,5在跑步机行走过程中，无标记姿势估计(DeepLabCut)的良好跟踪精度所需的最小附加标记帧数是多少?来自典型发育成人(平均年龄= 50.7±17.3岁)的越来越多的视频(1、2、5、10、15和20个视频)被纳入分析。参与者以舒适的步行速度在双带器械跑步机上行走(计算机辅助康复环境(CAREN)， Motekforce Link，阿姆斯特丹，荷兰)。采用灰度摄像机(50 Hz, Basler scA640-74gm, Basler, Germany)在矢状面进行二维录像。使用预训练的MPII人体模型(ResNet101;pcut = 0.8)，在DeepLabCut中，单侧(左侧)跟踪以下关节和解剖标志:踝关节、膝关节、髋关节、肩部、肘关节和手腕(下巴和前额除外)。每个视频标记越来越多的帧(每个视频1帧和5帧)，并添加到预训练的MPII人类模型中，然后重新训练直到500,000次迭代。95%的标记帧用于训练，5%用于测试。对于视频数量不断增加和手动标记帧的每个场景，计算训练和测试误差。良好的跟踪精度定义为误差小于反射标记直径(= 1.4 cm)。11种不同场景下的训练和测试像素误差结果如图1所示。当视频数量增加到5个视频，分别有1个或5个标记帧时，列车像素误差分别减小到1.11和1.16像素(对应误差< 1 cm)。从标记至少20帧开始，测试像素误差小于5像素(对应于误差< 3 cm)。下载:下载高分辨率图像(91KB)下载:下载全尺寸图像从5个额外标记的视频中获得了训练集中良好的跟踪精度(误差< 1 cm)。在标记20帧或更多帧后，测试数据集的跟踪精度保持不变(≈2-3 cm)。需要进行进一步的研究，以确定最佳的训练迭代次数和额外的标记视频和帧，以获得良好的测试和训练跟踪精度(< 1.4 cm)。这个最佳设置将用于验证DeepLabCut，以测量行走过程中相对于黄金标准的关节中心和角度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The effect of the number of labelled frames on the accuracy of 2D markerless pose estimation (DeepLabCut) during treadmill walking

Gait analysis is imperative for tailoring evidence-based interventions in individuals with and without a physical disability.1 The gold standard for gait analysis is optoelectronic three-dimensional motion analysis, which requires expertise, is laboratory based, and requires expensive equipment, which is not available in all settings, particularly in low to middle-income countries. New techniques based on deep learning to track body landmarks in simple video recordings allow recordings in a natural environment.2,3 Deeplabcut is a free and open-source toolbox to track user-defined features in videofiles.4,5 What is the minimal number of additional labelled frames needed for good tracking accuracy of markerless pose estimation (DeepLabCut) during treadmill walking? An increasing number of videos (1, 2, 5, 10, 15 and 20 videos) from typically developed adults (mean age = 50.7±17.3 years) were included in the analysis. Participants walked at comfortable walking speed on a dual-belt instrumented treadmill (Computer Assisted Rehabilitation Environment (CAREN), Motekforce Link, Amsterdam, The Netherlands). 2D video recordings were conducted in the sagittal plane with a gray-scale camera (50 Hz, Basler scA640-74gm, Basler, Germany). Using the pre-trained MPII human model (ResNet101; pcut-off = 0.8) in DeepLabCut, the following joints and anatomical landmarks were tracked unilaterally (left side): Ankle, knee, hip, shoulder, elbow and wrist (chin and forehead were excluded). An increasing number of frames was labeled per video (1 and 5 frames per video) and added to the pre-trained MPII human model, which was then retrained till 500.000 iterations. 95% of the labelled frames were used for training, 5% for testing. For each scenario with an increasing number of videos and manually labelled frames, the train and test error was calculated. Good tracking accuracy was defined as an error smaller then the diameter of a retroreflective marker (= 1.4 cm). The results of the train and test pixel errors are presented in Fig. 1 for 11 different scenarios. When the number of videos increased to 5 videos with 1 or 5 labelled frames, the train pixel error reduced to 1.11 and 1.16 pixels, respectively (corresponding to an error of < 1 cm). From labelling at least 20 frames, the test pixel error was less then 5 pixels (corresponding to an error of < 3 cm).Download : Download high-res image (91KB)Download : Download full-size image A good tracking accuracy (error < 1 cm) in the training set was achieved from 5 additionally labeled videos. The tracking accuracy for the test dataset remained constant (≈ 2-3 cm) from labelling 20 frames or more. Further research is needed and ongoing to determine the optimal number of training iterations and additional labelled videos and frames for good test and train tracking accuracy (< 1.4 cm). This optimal setup will then be used to validate DeepLabCut to measure joint centres and angles during walking with respect to the gold standard.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Gait & posture

自引率

0.00%

发文量