Maud Van Den Bogaart, Maaike M. Eken, Rachel H.J. Senden, Rik G.J. Marcellis, Kenneth Meijer, Pieter Meyns, Hans M.N. Essers
{"title":"The effect of the number of labelled frames on the accuracy of 2D markerless pose estimation (DeepLabCut) during treadmill walking","authors":"Maud Van Den Bogaart, Maaike M. Eken, Rachel H.J. Senden, Rik G.J. Marcellis, Kenneth Meijer, Pieter Meyns, Hans M.N. Essers","doi":"10.1016/j.gaitpost.2023.07.254","DOIUrl":null,"url":null,"abstract":"Gait analysis is imperative for tailoring evidence-based interventions in individuals with and without a physical disability.1 The gold standard for gait analysis is optoelectronic three-dimensional motion analysis, which requires expertise, is laboratory based, and requires expensive equipment, which is not available in all settings, particularly in low to middle-income countries. New techniques based on deep learning to track body landmarks in simple video recordings allow recordings in a natural environment.2,3 Deeplabcut is a free and open-source toolbox to track user-defined features in videofiles.4,5 What is the minimal number of additional labelled frames needed for good tracking accuracy of markerless pose estimation (DeepLabCut) during treadmill walking? An increasing number of videos (1, 2, 5, 10, 15 and 20 videos) from typically developed adults (mean age = 50.7±17.3 years) were included in the analysis. Participants walked at comfortable walking speed on a dual-belt instrumented treadmill (Computer Assisted Rehabilitation Environment (CAREN), Motekforce Link, Amsterdam, The Netherlands). 2D video recordings were conducted in the sagittal plane with a gray-scale camera (50 Hz, Basler scA640-74gm, Basler, Germany). Using the pre-trained MPII human model (ResNet101; pcut-off = 0.8) in DeepLabCut, the following joints and anatomical landmarks were tracked unilaterally (left side): Ankle, knee, hip, shoulder, elbow and wrist (chin and forehead were excluded). An increasing number of frames was labeled per video (1 and 5 frames per video) and added to the pre-trained MPII human model, which was then retrained till 500.000 iterations. 95% of the labelled frames were used for training, 5% for testing. For each scenario with an increasing number of videos and manually labelled frames, the train and test error was calculated. Good tracking accuracy was defined as an error smaller then the diameter of a retroreflective marker (= 1.4 cm). The results of the train and test pixel errors are presented in Fig. 1 for 11 different scenarios. When the number of videos increased to 5 videos with 1 or 5 labelled frames, the train pixel error reduced to 1.11 and 1.16 pixels, respectively (corresponding to an error of < 1 cm). From labelling at least 20 frames, the test pixel error was less then 5 pixels (corresponding to an error of < 3 cm).Download : Download high-res image (91KB)Download : Download full-size image A good tracking accuracy (error < 1 cm) in the training set was achieved from 5 additionally labeled videos. The tracking accuracy for the test dataset remained constant (≈ 2-3 cm) from labelling 20 frames or more. Further research is needed and ongoing to determine the optimal number of training iterations and additional labelled videos and frames for good test and train tracking accuracy (< 1.4 cm). This optimal setup will then be used to validate DeepLabCut to measure joint centres and angles during walking with respect to the gold standard.","PeriodicalId":94018,"journal":{"name":"Gait & posture","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gait & posture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.gaitpost.2023.07.254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gait analysis is imperative for tailoring evidence-based interventions in individuals with and without a physical disability.1 The gold standard for gait analysis is optoelectronic three-dimensional motion analysis, which requires expertise, is laboratory based, and requires expensive equipment, which is not available in all settings, particularly in low to middle-income countries. New techniques based on deep learning to track body landmarks in simple video recordings allow recordings in a natural environment.2,3 Deeplabcut is a free and open-source toolbox to track user-defined features in videofiles.4,5 What is the minimal number of additional labelled frames needed for good tracking accuracy of markerless pose estimation (DeepLabCut) during treadmill walking? An increasing number of videos (1, 2, 5, 10, 15 and 20 videos) from typically developed adults (mean age = 50.7±17.3 years) were included in the analysis. Participants walked at comfortable walking speed on a dual-belt instrumented treadmill (Computer Assisted Rehabilitation Environment (CAREN), Motekforce Link, Amsterdam, The Netherlands). 2D video recordings were conducted in the sagittal plane with a gray-scale camera (50 Hz, Basler scA640-74gm, Basler, Germany). Using the pre-trained MPII human model (ResNet101; pcut-off = 0.8) in DeepLabCut, the following joints and anatomical landmarks were tracked unilaterally (left side): Ankle, knee, hip, shoulder, elbow and wrist (chin and forehead were excluded). An increasing number of frames was labeled per video (1 and 5 frames per video) and added to the pre-trained MPII human model, which was then retrained till 500.000 iterations. 95% of the labelled frames were used for training, 5% for testing. For each scenario with an increasing number of videos and manually labelled frames, the train and test error was calculated. Good tracking accuracy was defined as an error smaller then the diameter of a retroreflective marker (= 1.4 cm). The results of the train and test pixel errors are presented in Fig. 1 for 11 different scenarios. When the number of videos increased to 5 videos with 1 or 5 labelled frames, the train pixel error reduced to 1.11 and 1.16 pixels, respectively (corresponding to an error of < 1 cm). From labelling at least 20 frames, the test pixel error was less then 5 pixels (corresponding to an error of < 3 cm).Download : Download high-res image (91KB)Download : Download full-size image A good tracking accuracy (error < 1 cm) in the training set was achieved from 5 additionally labeled videos. The tracking accuracy for the test dataset remained constant (≈ 2-3 cm) from labelling 20 frames or more. Further research is needed and ongoing to determine the optimal number of training iterations and additional labelled videos and frames for good test and train tracking accuracy (< 1.4 cm). This optimal setup will then be used to validate DeepLabCut to measure joint centres and angles during walking with respect to the gold standard.