Chen Xia;Hexu Chen;Junwei Han;Dingwen Zhang;Kuan Li
{"title":"Identifying Children With Autism Spectrum Disorder via Transformer-Based Representation Learning From Dynamic Facial Cues","authors":"Chen Xia;Hexu Chen;Junwei Han;Dingwen Zhang;Kuan Li","doi":"10.1109/TAFFC.2024.3412032","DOIUrl":null,"url":null,"abstract":"Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 1","pages":"83-97"},"PeriodicalIF":9.8000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10553264/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.