Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan
{"title":"儿童与成人二元互动中以自我为中心的说话者分类:从感知到计算建模","authors":"Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan","doi":"arxiv-2409.09340","DOIUrl":null,"url":null,"abstract":"Autism spectrum disorder (ASD) is a neurodevelopmental condition\ncharacterized by challenges in social communication, repetitive behavior, and\nsensory processing. One important research area in ASD is evaluating children's\nbehavioral changes over time during treatment. The standard protocol with this\nobjective is BOSCC, which involves dyadic interactions between a child and\nclinicians performing a pre-defined set of activities. A fundamental aspect of\nunderstanding children's behavior in these interactions is automatic speech\nunderstanding, particularly identifying who speaks and when. Conventional\napproaches in this area heavily rely on speech samples recorded from a\nspectator perspective, and there is limited research on egocentric speech\nmodeling. In this study, we design an experiment to perform speech sampling in\nBOSCC interviews from an egocentric perspective using wearable sensors and\nexplore pre-training Ego4D speech samples to enhance child-adult speaker\nclassification in dyadic interactions. Our findings highlight the potential of\negocentric speech collection and pre-training to improve speaker classification\naccuracy.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling\",\"authors\":\"Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan\",\"doi\":\"arxiv-2409.09340\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autism spectrum disorder (ASD) is a neurodevelopmental condition\\ncharacterized by challenges in social communication, repetitive behavior, and\\nsensory processing. One important research area in ASD is evaluating children's\\nbehavioral changes over time during treatment. The standard protocol with this\\nobjective is BOSCC, which involves dyadic interactions between a child and\\nclinicians performing a pre-defined set of activities. A fundamental aspect of\\nunderstanding children's behavior in these interactions is automatic speech\\nunderstanding, particularly identifying who speaks and when. Conventional\\napproaches in this area heavily rely on speech samples recorded from a\\nspectator perspective, and there is limited research on egocentric speech\\nmodeling. In this study, we design an experiment to perform speech sampling in\\nBOSCC interviews from an egocentric perspective using wearable sensors and\\nexplore pre-training Ego4D speech samples to enhance child-adult speaker\\nclassification in dyadic interactions. Our findings highlight the potential of\\negocentric speech collection and pre-training to improve speaker classification\\naccuracy.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09340\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
Autism spectrum disorder (ASD) is a neurodevelopmental condition
characterized by challenges in social communication, repetitive behavior, and
sensory processing. One important research area in ASD is evaluating children's
behavioral changes over time during treatment. The standard protocol with this
objective is BOSCC, which involves dyadic interactions between a child and
clinicians performing a pre-defined set of activities. A fundamental aspect of
understanding children's behavior in these interactions is automatic speech
understanding, particularly identifying who speaks and when. Conventional
approaches in this area heavily rely on speech samples recorded from a
spectator perspective, and there is limited research on egocentric speech
modeling. In this study, we design an experiment to perform speech sampling in
BOSCC interviews from an egocentric perspective using wearable sensors and
explore pre-training Ego4D speech samples to enhance child-adult speaker
classification in dyadic interactions. Our findings highlight the potential of
egocentric speech collection and pre-training to improve speaker classification
accuracy.