The Classification of Abnormal Hand Movement to Aid in Autism Detection: Machine Learning Study

JMIR biomedical engineering Pub Date : 2021-08-18 DOI:10.2196/33771

Anish Lakkapragada, A. Kline, O. Mutlu, K. Paskov, B. Chrisman, N. Stockham, P. Washington, D. Wall

{"title":"The Classification of Abnormal Hand Movement to Aid in Autism Detection: Machine Learning Study","authors":"Anish Lakkapragada, A. Kline, O. Mutlu, K. Paskov, B. Chrisman, N. Stockham, P. Washington, D. Wall","doi":"10.2196/33771","DOIUrl":null,"url":null,"abstract":"\n \n A formal autism diagnosis can be an inefficient and lengthy process. Families may wait several months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies that detect the presence of behaviors related to autism can scale access to pediatric diagnoses. A strong indicator of the presence of autism is self-stimulatory behaviors such as hand flapping.\n \n \n \n This study aims to demonstrate the feasibility of deep learning technologies for the detection of hand flapping from unstructured home videos as a first step toward validation of whether statistical models coupled with digital technologies can be leveraged to aid in the automatic behavioral analysis of autism. To support the widespread sharing of such home videos, we explored privacy-preserving modifications to the input space via conversion of each video to hand landmark coordinates and measured the performance of corresponding time series classifiers.\n \n \n \n We used the Self-Stimulatory Behavior Dataset (SSBD) that contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From this data set, we extracted 100 hand flapping videos and 100 control videos, each between 2 to 5 seconds in duration. We evaluated five separate feature representations: four privacy-preserved subsets of hand landmarks detected by MediaPipe and one feature representation obtained from the output of the penultimate layer of a MobileNetV2 model fine-tuned on the SSBD. We fed these feature vectors into a long short-term memory network that predicted the presence of hand flapping in each video clip.\n \n \n \n The highest-performing model used MobileNetV2 to extract features and achieved a test F1 score of 84 (SD 3.7; precision 89.6, SD 4.3 and recall 80.4, SD 6) using 5-fold cross-validation for 100 random seeds on the SSBD data (500 total distinct folds). Of the models we trained on privacy-preserved data, the model trained with all hand landmarks reached an F1 score of 66.6 (SD 3.35). Another such model trained with a select 6 landmarks reached an F1 score of 68.3 (SD 3.6). A privacy-preserved model trained using a single landmark at the base of the hands and a model trained with the average of the locations of all the hand landmarks reached an F1 score of 64.9 (SD 6.5) and 64.2 (SD 6.8), respectively.\n \n \n \n We created five lightweight neural networks that can detect hand flapping from unstructured videos. Training a long short-term memory network with convolutional feature vectors outperformed training with feature vectors of hand coordinates and used almost 900,000 fewer model parameters. This study provides the first step toward developing precise deep learning methods for activity detection of autism-related behaviors.\n","PeriodicalId":87288,"journal":{"name":"JMIR biomedical engineering","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR biomedical engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/33771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

A formal autism diagnosis can be an inefficient and lengthy process. Families may wait several months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies that detect the presence of behaviors related to autism can scale access to pediatric diagnoses. A strong indicator of the presence of autism is self-stimulatory behaviors such as hand flapping. This study aims to demonstrate the feasibility of deep learning technologies for the detection of hand flapping from unstructured home videos as a first step toward validation of whether statistical models coupled with digital technologies can be leveraged to aid in the automatic behavioral analysis of autism. To support the widespread sharing of such home videos, we explored privacy-preserving modifications to the input space via conversion of each video to hand landmark coordinates and measured the performance of corresponding time series classifiers. We used the Self-Stimulatory Behavior Dataset (SSBD) that contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From this data set, we extracted 100 hand flapping videos and 100 control videos, each between 2 to 5 seconds in duration. We evaluated five separate feature representations: four privacy-preserved subsets of hand landmarks detected by MediaPipe and one feature representation obtained from the output of the penultimate layer of a MobileNetV2 model fine-tuned on the SSBD. We fed these feature vectors into a long short-term memory network that predicted the presence of hand flapping in each video clip. The highest-performing model used MobileNetV2 to extract features and achieved a test F1 score of 84 (SD 3.7; precision 89.6, SD 4.3 and recall 80.4, SD 6) using 5-fold cross-validation for 100 random seeds on the SSBD data (500 total distinct folds). Of the models we trained on privacy-preserved data, the model trained with all hand landmarks reached an F1 score of 66.6 (SD 3.35). Another such model trained with a select 6 landmarks reached an F1 score of 68.3 (SD 3.6). A privacy-preserved model trained using a single landmark at the base of the hands and a model trained with the average of the locations of all the hand landmarks reached an F1 score of 64.9 (SD 6.5) and 64.2 (SD 6.8), respectively. We created five lightweight neural networks that can detect hand flapping from unstructured videos. Training a long short-term memory network with convolutional feature vectors outperformed training with feature vectors of hand coordinates and used almost 900,000 fewer model parameters. This study provides the first step toward developing precise deep learning methods for activity detection of autism-related behaviors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

帮助自闭症检测的手部异常运动分类：机器学习研究

正式的自闭症诊断可能是一个低效且漫长的过程。尽管有证据表明早期干预可以带来更好的治疗结果，但家庭可能要等几个月或更长时间才能为孩子确诊。检测自闭症相关行为的数字技术可以扩大儿科诊断的范围。自闭症存在的一个有力指标是自我刺激行为，如拍打手。这项研究旨在证明深度学习技术在非结构化家庭视频中检测手拍打的可行性，作为验证统计模型与数字技术相结合是否可以用于自闭症的自动行为分析的第一步。为了支持这种家庭视频的广泛共享，我们探索了通过将每个视频转换为手部地标坐标来对输入空间进行隐私保护修改，并测量了相应时间序列分类器的性能。我们使用了自我刺激行为数据集（SSBD），其中包含75个儿童展示的手拍打、头撞击和旋转的视频。从这个数据集中，我们提取了100个拍打手的视频和100个控制视频，每个视频的持续时间在2到5秒之间。我们评估了五种独立的特征表示：MediaPipe检测到的手部地标的四个隐私保留子集，以及从在SSBD上微调的MobileNetV2模型倒数第二层的输出中获得的一个特征表示。我们将这些特征向量输入到一个长短期记忆网络中，该网络预测每个视频片段中手拍打的存在。性能最高的模型使用MobileNetV2提取特征，并对SSBD数据上的100个随机种子（总共500个不同的折叠）进行5倍交叉验证，获得了84的测试F1分数（SD 3.7；精度89.6，SD 4.3和召回率80.4，SD 6）。在我们针对隐私保护数据训练的模型中，用所有手部标志训练的模型的F1得分达到66.6（SD 3.35）。另一个用选定的6个标志训练的此类模型的F1分数达到68.3（SD 3.6）。一个使用手部底部单个标志训练的隐私保护模型和一个使用所有手部标志位置的平均值训练的模型，F1得分分别达到64.9（SD 6.5）和64.2（SD 6.8），分别地我们创建了五个轻量级神经网络，可以从非结构化视频中检测手的拍打。用卷积特征向量训练长短期记忆网络优于用手坐标的特征向量训练，并且使用的模型参数减少了近900000个。这项研究为开发精确的深度学习方法来检测自闭症相关行为迈出了第一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JMIR biomedical engineering

自引率

0.00%

发文量

审稿时长

20 weeks