Walker S. Arce , Seth G. Walker , Jordan DeBrine , Benjamin S. Riggan , James E. Gehringer
{"title":"Detecting aggression in clinical treatment videos","authors":"Walker S. Arce , Seth G. Walker , Jordan DeBrine , Benjamin S. Riggan , James E. Gehringer","doi":"10.1016/j.mlwa.2023.100515","DOIUrl":null,"url":null,"abstract":"<div><p>Many clinical spaces are outfitted with centralized video recording systems to monitor patient–client interactions. Considering the increasing interest in video-based machine learning methods, the potential of using these clinical recordings to automate observational data collection is apparent. To explore this, seven patients had videos of their functional assessment and treatment sessions annotated by coders trained by our clinical team. Commonly used clinical software has inherent limitations aligning behavioral and video data, so a custom software tool was employed to address this functionality gap. After developing a Canvas-based coder training course for this tool, a team of six trained coders annotated 82.33 h of data. Two machine learning approaches were considered, where both used a convolutional neural network as a video feature extractor. The first approach used a recurrent network as the classifier on the extracted features and the second used a Transformer architecture. Both models produced promising metrics indicating that the capability of detecting aggression from clinical videos is possible and generalizable. Model performance is directly tied to the feature extractor’s performance on ImageNet, where ConvNeXtXL produced the best performing models. This has applications in automating patient incident response to improve patient and clinician safety and could be directly integrated into existing video management systems for real-time analysis.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100515"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000683/pdfft?md5=7be193e80aa9244b29f8609ccc55e9e6&pid=1-s2.0-S2666827023000683-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827023000683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many clinical spaces are outfitted with centralized video recording systems to monitor patient–client interactions. Considering the increasing interest in video-based machine learning methods, the potential of using these clinical recordings to automate observational data collection is apparent. To explore this, seven patients had videos of their functional assessment and treatment sessions annotated by coders trained by our clinical team. Commonly used clinical software has inherent limitations aligning behavioral and video data, so a custom software tool was employed to address this functionality gap. After developing a Canvas-based coder training course for this tool, a team of six trained coders annotated 82.33 h of data. Two machine learning approaches were considered, where both used a convolutional neural network as a video feature extractor. The first approach used a recurrent network as the classifier on the extracted features and the second used a Transformer architecture. Both models produced promising metrics indicating that the capability of detecting aggression from clinical videos is possible and generalizable. Model performance is directly tied to the feature extractor’s performance on ImageNet, where ConvNeXtXL produced the best performing models. This has applications in automating patient incident response to improve patient and clinician safety and could be directly integrated into existing video management systems for real-time analysis.