Pub Date : 2025-07-08DOI: 10.1109/TCSS.2025.3575939
Yutian Li;Zhuopan Yang;Zhenguo Yang;Xiaoping Li;Wenyin Liu;Qing Li
Addressing the bias problem in multimodal zero-shot learning tasks is challenging due to the domain shift between seen and unseen classes, as well as the semantic gap across different modalities. To tackle these challenges, we propose a multimodal disentangled fusion network (MDFN) that unifies the class embedding space for multimodal zero-shot learning. MDFN exploits feature disentangled variational autoencoder (FD-VAE) in two branches to distangle unimodal features into modality-specific representations that are semantically consistent and unrelated, where semantics are shared within classes. In particular, semantically consistent representations and unimodal features are integrated to retain the semantics of the original features in the form of residuals. Furthermore, multimodal conditional VAE (MC-VAE) in two branches is adopted to learn cross-modal interactions with modality-specific conditions. Finally, the complementary multimodal representations achieved by MC-VAE are encoded into a fusion network (FN) with a self-adaptive margin center loss (SAMC-loss) to predict target class labels in embedding forms. By learning the distance among domain samples, SAMC-loss promotes intraclass compactness and interclass separability. Experiments on zero-shot and news event datasets demonstrate the superior performance of MDFN, with the harmonic mean improved by 27.2% on the MMED dataset and 5.1% on the SUN dataset.
{"title":"Multimodal Disentangled Fusion Network via VAEs for Multimodal Zero-Shot Learning","authors":"Yutian Li;Zhuopan Yang;Zhenguo Yang;Xiaoping Li;Wenyin Liu;Qing Li","doi":"10.1109/TCSS.2025.3575939","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3575939","url":null,"abstract":"Addressing the bias problem in multimodal zero-shot learning tasks is challenging due to the domain shift between seen and unseen classes, as well as the semantic gap across different modalities. To tackle these challenges, we propose a multimodal disentangled fusion network (MDFN) that unifies the class embedding space for multimodal zero-shot learning. MDFN exploits feature disentangled variational autoencoder (FD-VAE) in two branches to distangle unimodal features into modality-specific representations that are semantically consistent and unrelated, where semantics are shared within classes. In particular, semantically consistent representations and unimodal features are integrated to retain the semantics of the original features in the form of residuals. Furthermore, multimodal conditional VAE (MC-VAE) in two branches is adopted to learn cross-modal interactions with modality-specific conditions. Finally, the complementary multimodal representations achieved by MC-VAE are encoded into a fusion network (FN) with a self-adaptive margin center loss (SAMC-loss) to predict target class labels in embedding forms. By learning the distance among domain samples, SAMC-loss promotes intraclass compactness and interclass separability. Experiments on zero-shot and news event datasets demonstrate the superior performance of MDFN, with the harmonic mean improved by 27.2% on the MMED dataset and 5.1% on the SUN dataset.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3684-3697"},"PeriodicalIF":4.5,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the current era of AI technology, where systems increasingly rely on big data to process vast amounts of societal information, efficient methods for integrating and utilizing diverse datasets are essential. This article presents a novel approach for transforming the feature space of different datasets through singular value decomposition (SVD) to extract common and hidden features as using the prior domain knowledge. Specifically, we apply this method to two datasets: 1) one related to physical and cognitive frailty in the elderly; and 2) another focusing on identifying IKIGAI (happiness, self-efficacy, and sense of contribution) in volunteer staff of a civic health promotion activity. Both datasets consist of multiple sub-datasets measured using different modalities, such as facial expressions, sound, activity, and heart rates. By defining feature extraction methods for each subdataset, we compare and integrate the overlapping data. The results demonstrated that our method could effectively preserve common characteristics across different data types, offering a more interpretable solution than traditional dimensionality reduction methods based on linear and nonlinear transformation. This approach has significant implications for data integration in multidisciplinary fields and opens the door for future applications to a wide range of datasets.
{"title":"Coordinate System Transformation Method for Comparing Different Types of Data in Different Dataset Using Singular Value Decomposition","authors":"Emiko Uchiyama;Wataru Takano;Yoshihiko Nakamura;Tomoki Tanaka;Katsuya Iijima;Gentiane Venture;Vincent Hernandez;Kenta Kamikokuryo;Ken-ichiro Yabu;Takahiro Miura;Kimitaka Nakazawa;Bo-Kyung Son","doi":"10.1109/TCSS.2025.3561078","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3561078","url":null,"abstract":"In the current era of AI technology, where systems increasingly rely on big data to process vast amounts of societal information, efficient methods for integrating and utilizing diverse datasets are essential. This article presents a novel approach for transforming the feature space of different datasets through singular value decomposition (SVD) to extract common and hidden features as using the prior domain knowledge. Specifically, we apply this method to two datasets: 1) one related to physical and cognitive frailty in the elderly; and 2) another focusing on identifying <italic>IKIGAI</i> (happiness, self-efficacy, and sense of contribution) in volunteer staff of a civic health promotion activity. Both datasets consist of multiple sub-datasets measured using different modalities, such as facial expressions, sound, activity, and heart rates. By defining feature extraction methods for each subdataset, we compare and integrate the overlapping data. The results demonstrated that our method could effectively preserve common characteristics across different data types, offering a more interpretable solution than traditional dimensionality reduction methods based on linear and nonlinear transformation. This approach has significant implications for data integration in multidisciplinary fields and opens the door for future applications to a wide range of datasets.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3610-3626"},"PeriodicalIF":4.5,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11073557","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-17DOI: 10.1109/TCSS.2025.3561073
Mazin Abdalla;Parya Abadeh;Zeinab Noorian;Amira Ghenai;Fattane Zarrinkalam;Soroush Zamani Alavijeh
The intersection of music and mental health has gained increasing attention, with previous studies highlighting music’s potential to reduce stress and anxiety. Despite these promising findings, many of these studies are limited by small sample sizes and traditional observational methods, leaving a gap in our understanding of music’s broader impact on mental health. In response to these limitations, this study introduces a novel approach that combines generalized linear mixed models (GLMM) with propensity score matching (PSM) to explore the relationship between music listening and stress levels among social media users diagnosed with anxiety, depression, and posttraumatic stress disorder (PTSD). Our research not only identifies associative patterns between music listening and stress but also provides a more rigorous examination of potential causal effects, taking into account demographic factors such as education level, gender, and age. Our findings reveal that across all mental health conditions, music listening is significantly associated with reduced stress levels, with an observed 21.3% reduction for anxiety, 15.4% for depression, and 19.3% for PTSD. Additionally, users who listened to music were more likely to report a zero stress score, indicating a stronger relaxation effect. Further, our analysis of demographic variations shows that age and education level influence the impact of music on stress reduction, highlighting the potential for personalized interventions. These findings contribute to a deeper understanding of music’s therapeutic potential, particularly in crafting interventions tailored to the diverse needs of different populations.
{"title":"The Impact of Listening to Music on Stress Level for Anxiety, Depression, and PTSD: Mixed-Effect Models and Propensity Score Analysis","authors":"Mazin Abdalla;Parya Abadeh;Zeinab Noorian;Amira Ghenai;Fattane Zarrinkalam;Soroush Zamani Alavijeh","doi":"10.1109/TCSS.2025.3561073","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3561073","url":null,"abstract":"The intersection of music and mental health has gained increasing attention, with previous studies highlighting music’s potential to reduce stress and anxiety. Despite these promising findings, many of these studies are limited by small sample sizes and traditional observational methods, leaving a gap in our understanding of music’s broader impact on mental health. In response to these limitations, this study introduces a novel approach that combines generalized linear mixed models (GLMM) with propensity score matching (PSM) to explore the relationship between music listening and stress levels among social media users diagnosed with anxiety, depression, and posttraumatic stress disorder (PTSD). Our research not only identifies associative patterns between music listening and stress but also provides a more rigorous examination of potential causal effects, taking into account demographic factors such as education level, gender, and age. Our findings reveal that across all mental health conditions, music listening is significantly associated with reduced stress levels, with an observed 21.3% reduction for anxiety, 15.4% for depression, and 19.3% for PTSD. Additionally, users who listened to music were more likely to report a zero stress score, indicating a stronger relaxation effect. Further, our analysis of demographic variations shows that age and education level influence the impact of music on stress reduction, highlighting the potential for personalized interventions. These findings contribute to a deeper understanding of music’s therapeutic potential, particularly in crafting interventions tailored to the diverse needs of different populations.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3816-3830"},"PeriodicalIF":4.5,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1109/TCSS.2025.3565414
Ye Zhang;Qing Gao;Rong Hu;Qingtang Ding;Boyang Li;Yulan Guo
Sensor-based human activity recognition (HAR) usually suffers from the problem of insufficient annotated data, due to the difficulty in labeling the intuitive signals of wearable sensors. To this end, recent advances have adopted handcrafted operations or generative models for data augmentation. The handcrafted operations are driven by some physical priors of human activities, e.g., action distortion and strength fluctuations. However, these approaches may face challenges in maintaining semantic data properties. Although the generative models have better data adaptability, it is difficult for them to incorporate important action priors into data generation. This article proposes a differentiable prior-driven data augmentation framework for HAR. First, we embed the handcrafted augmentation operations into a differentiable module, which adaptively selects and optimizes the operations to be combined together. Then, we construct a generative module to add controllable perturbations to the data derived by the handcrafted operations and further improve the diversity of data augmentation. By integrating the handcrafted operation module and the generative module into one learnable framework, the generalization performance of the recognition models is enhanced effectively. Extensive experimental results with three different classifiers on five public datasets demonstrate the effectiveness of the proposed framework. Project page: https://github.com/crocodilegogogo/DriveData-Under-Review.
{"title":"Differentiable Prior-Driven Data Augmentation for Sensor-Based Human Activity Recognition","authors":"Ye Zhang;Qing Gao;Rong Hu;Qingtang Ding;Boyang Li;Yulan Guo","doi":"10.1109/TCSS.2025.3565414","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3565414","url":null,"abstract":"Sensor-based human activity recognition (HAR) usually suffers from the problem of insufficient annotated data, due to the difficulty in labeling the intuitive signals of wearable sensors. To this end, recent advances have adopted handcrafted operations or generative models for data augmentation. The handcrafted operations are driven by some physical priors of human activities, e.g., action distortion and strength fluctuations. However, these approaches may face challenges in maintaining semantic data properties. Although the generative models have better data adaptability, it is difficult for them to incorporate important action priors into data generation. This article proposes a differentiable prior-driven data augmentation framework for HAR. First, we embed the handcrafted augmentation operations into a differentiable module, which adaptively selects and optimizes the operations to be combined together. Then, we construct a generative module to add controllable perturbations to the data derived by the handcrafted operations and further improve the diversity of data augmentation. By integrating the handcrafted operation module and the generative module into one learnable framework, the generalization performance of the recognition models is enhanced effectively. Extensive experimental results with three different classifiers on five public datasets demonstrate the effectiveness of the proposed framework. Project page: <uri>https://github.com/crocodilegogogo/DriveData-Under-Review</uri>.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3778-3790"},"PeriodicalIF":4.5,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TCSS.2025.3563733
Yueran Pan;Biyuan Chen;Wenxing Liu;Ming Cheng;Dong Zhang;Hongzhu Deng;Xiaobing Zou;Ming Li
The World Health Organization (WHO) has established the caregiver skill training (CST) program, designed to empower families with children diagnosed with autism spectrum disorder the essential caregiving skills. The joint engagement rating inventory (JERI) protocol evaluates participants’ engagement levels within the CST initiative. Traditionally, rating the expressive language level and use (EXLA) item in JERI relies on retrospective video analysis conducted by qualified professionals, thus incurring substantial labor costs. This study introduces a multimodal behavioral signal-processing framework designed to analyze both child and caregiver behaviors automatically, thereby rating EXLA. Initially, raw audio and video signals are segmented into concise intervals via voice activity detection, speaker diarization and speaker age classification, serving the dual purpose of eliminating nonspeech content and tagging each segment with its respective speaker. Subsequently, we extract an array of audio-visual features, encompassing our proposed interpretable, hand-crafted textual features, end-to-end audio embeddings and end-to-end video embeddings. Finally, these features are fused at the feature level to train a linear regression model aimed at predicting the EXLA scores. Our framework has been evaluated on the largest in-the-wild database currently available under the CST program. Experimental results indicate that the proposed system achieves a Pearson correlation coefficient of 0.768 against the expert ratings, evidencing promising performance comparable to that of human experts.
{"title":"Assessing the Expressive Language Levels of Autistic Children in Home Intervention","authors":"Yueran Pan;Biyuan Chen;Wenxing Liu;Ming Cheng;Dong Zhang;Hongzhu Deng;Xiaobing Zou;Ming Li","doi":"10.1109/TCSS.2025.3563733","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3563733","url":null,"abstract":"The World Health Organization (WHO) has established the caregiver skill training (CST) program, designed to empower families with children diagnosed with autism spectrum disorder the essential caregiving skills. The joint engagement rating inventory (JERI) protocol evaluates participants’ engagement levels within the CST initiative. Traditionally, rating the expressive language level and use (EXLA) item in JERI relies on retrospective video analysis conducted by qualified professionals, thus incurring substantial labor costs. This study introduces a multimodal behavioral signal-processing framework designed to analyze both child and caregiver behaviors automatically, thereby rating EXLA. Initially, raw audio and video signals are segmented into concise intervals via voice activity detection, speaker diarization and speaker age classification, serving the dual purpose of eliminating nonspeech content and tagging each segment with its respective speaker. Subsequently, we extract an array of audio-visual features, encompassing our proposed interpretable, hand-crafted textual features, end-to-end audio embeddings and end-to-end video embeddings. Finally, these features are fused at the feature level to train a linear regression model aimed at predicting the EXLA scores. Our framework has been evaluated on the largest in-the-wild database currently available under the CST program. Experimental results indicate that the proposed system achieves a Pearson correlation coefficient of 0.768 against the expert ratings, evidencing promising performance comparable to that of human experts.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3647-3659"},"PeriodicalIF":4.5,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of digital technology, augmented by connected devices, has catalyzed a dramatic increase in multimedia content consumption, facilitating on-the-go access and communication. However, this surge also heightens the risks of unauthorized access, privacy breaches, and cyberattacks. Consequently, ensuring the secure and efficient transmission and storage of multimedia content is of paramount importance. This article presents a robust encryption scheme for secure image transmission, utilizing a novel one-dimensional chaotic map characterized by random and complex dynamics, validated through NIST test and meticulous evaluation. Key matrices are derived from the chaotic map, with the SHA-256 hash of random, nonoverlapping blocks of the input image influencing the initial conditions, thereby ensuring resistance to differential cryptanalysis. The encryption process encompasses a dual shuffling mechanism: an adaptive shuffling guided by the chaotic key, followed by orbital shuffling, which rearranges pixel positions by segmenting the image into distinct orbital patterns. This is complemented by a feedback diffusion technique that ensures each pixel’s encryption is influenced by neighboring values and the keys employed. Extensive evaluation with multimodal images demonstrates the scheme’s versatility, with significant resilience against various cryptographic attacks, as evidenced by thorough assessments. Comparative analysis further highlights the superiority of the proposed scheme over state-of-the-art approaches. These attributes position the proposed scheme as a highly effective solution for contemporary digital security challenges.
{"title":"A Novel Chaotic Map and Its Application to Secure Transmission of Multimodal Images","authors":"Parkala Vishnu Bharadwaj Bayari;Yashmita Sangwan;Gaurav Bhatnagar;Chiranjoy Chattopadhyay","doi":"10.1109/TCSS.2025.3568467","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3568467","url":null,"abstract":"The advent of digital technology, augmented by connected devices, has catalyzed a dramatic increase in multimedia content consumption, facilitating on-the-go access and communication. However, this surge also heightens the risks of unauthorized access, privacy breaches, and cyberattacks. Consequently, ensuring the secure and efficient transmission and storage of multimedia content is of paramount importance. This article presents a robust encryption scheme for secure image transmission, utilizing a novel one-dimensional chaotic map characterized by random and complex dynamics, validated through NIST test and meticulous evaluation. Key matrices are derived from the chaotic map, with the SHA-256 hash of random, nonoverlapping blocks of the input image influencing the initial conditions, thereby ensuring resistance to differential cryptanalysis. The encryption process encompasses a dual shuffling mechanism: an adaptive shuffling guided by the chaotic key, followed by orbital shuffling, which rearranges pixel positions by segmenting the image into distinct orbital patterns. This is complemented by a feedback diffusion technique that ensures each pixel’s encryption is influenced by neighboring values and the keys employed. Extensive evaluation with multimodal images demonstrates the scheme’s versatility, with significant resilience against various cryptographic attacks, as evidenced by thorough assessments. Comparative analysis further highlights the superiority of the proposed scheme over state-of-the-art approaches. These attributes position the proposed scheme as a highly effective solution for contemporary digital security challenges.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3765-3777"},"PeriodicalIF":4.5,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03DOI: 10.1109/TCSS.2025.3561921
Xuan Luo;Bin Liang;Qianlong Wang;Jing Li;Erik Cambria;Xiaojun Zhang;Yulan He;Min Yang;Ruifeng Xu
Sexism has become a pressing issue, driven by the rapid-spreading influence of societal norms, media portrayals, and online platforms that perpetuate and amplify gender biases. Curbing sexism has emerged as a critical challenge globally. Being capable of recognizing sexist statements and behaviors is of particular importance since it is the first step in mind change. This survey provides an extensive overview of recent advancements in sexism detection. We present details of the various resources used in this field and methodologies applied to the task, covering different languages, modalities, models, and approaches. Moreover, we examine the specific challenges these models encounter in accurately identifying and classifying sexism. Additionally, we highlight areas that require further research and propose potential new directions for future exploration in the domain of sexism detection. Through this comprehensive exploration, we strive to contribute to the advancement of interdisciplinary research, fostering a collective effort to combat sexism in its multifaceted manifestations.
{"title":"A Literature Survey on Multimodal and Multilingual Sexism Detection","authors":"Xuan Luo;Bin Liang;Qianlong Wang;Jing Li;Erik Cambria;Xiaojun Zhang;Yulan He;Min Yang;Ruifeng Xu","doi":"10.1109/TCSS.2025.3561921","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3561921","url":null,"abstract":"Sexism has become a pressing issue, driven by the rapid-spreading influence of societal norms, media portrayals, and online platforms that perpetuate and amplify gender biases. Curbing sexism has emerged as a critical challenge globally. Being capable of recognizing sexist statements and behaviors is of particular importance since it is the first step in mind change. This survey provides an extensive overview of recent advancements in sexism detection. We present details of the various resources used in this field and methodologies applied to the task, covering different languages, modalities, models, and approaches. Moreover, we examine the specific challenges these models encounter in accurately identifying and classifying sexism. Additionally, we highlight areas that require further research and propose potential new directions for future exploration in the domain of sexism detection. Through this comprehensive exploration, we strive to contribute to the advancement of interdisciplinary research, fostering a collective effort to combat sexism in its multifaceted manifestations.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3709-3727"},"PeriodicalIF":4.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1109/TCSS.2025.3555419
Zijian Long;Haopeng Wang;Haiwei Dong;Abdulmotaleb El Saddik
The social metaverse is a growing digital ecosystem that blends virtual and physical worlds. It allows users to interact socially, work, shop, and enjoy entertainment. However, privacy remains a major challenge, as immersive interactions require continuous collection of biometric and behavioral data. At the same time, ensuring high-quality, low-latency streaming is difficult due to the demands of real-time interaction, immersive rendering, and bandwidth optimization. To address these issues, we propose adaptive social metaverse streaming (ASMS), a novel streaming system based on federated multiagent proximal policy optimization (F-MAPPO). ASMS leverages F-MAPPO, which integrates federated learning (FL) and deep reinforcement learning (DRL) to dynamically adjust streaming bit rates while preserving user privacy. Experimental results show that ASMS improves user experience by at least 14% compared to existing streaming methods across various network conditions. Therefore, ASMS enhances the social metaverse experience by providing seamless and immersive streaming, even in dynamic and resource-constrained networks, while ensuring that sensitive user data remain on local devices.
{"title":"Adaptive Social Metaverse Streaming Based on Federated Multiagent Deep Reinforcement Learning","authors":"Zijian Long;Haopeng Wang;Haiwei Dong;Abdulmotaleb El Saddik","doi":"10.1109/TCSS.2025.3555419","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3555419","url":null,"abstract":"The social metaverse is a growing digital ecosystem that blends virtual and physical worlds. It allows users to interact socially, work, shop, and enjoy entertainment. However, privacy remains a major challenge, as immersive interactions require continuous collection of biometric and behavioral data. At the same time, ensuring high-quality, low-latency streaming is difficult due to the demands of real-time interaction, immersive rendering, and bandwidth optimization. To address these issues, we propose adaptive social metaverse streaming (ASMS), a novel streaming system based on federated multiagent proximal policy optimization (F-MAPPO). ASMS leverages F-MAPPO, which integrates federated learning (FL) and deep reinforcement learning (DRL) to dynamically adjust streaming bit rates while preserving user privacy. Experimental results show that ASMS improves user experience by at least 14% compared to existing streaming methods across various network conditions. Therefore, ASMS enhances the social metaverse experience by providing seamless and immersive streaming, even in dynamic and resource-constrained networks, while ensuring that sensitive user data remain on local devices.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3804-3815"},"PeriodicalIF":4.5,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sentiment analysis of social media platforms is crucial for extracting actionable insights from unstructured textual data. However, modern sentiment analysis models using deep learning lack explainability, acting as black box and limiting trust. This study focuses on improving the explainability of sentiment analysis models of social media platforms by leveraging explainable artificial intelligence (XAI). We propose a novel explainable sentiment analysis (XSA) framework incorporating intrinsic and posthoc XAI methods, i.e., local interpretable model-agnostic explanations (LIME) and counterfactual explanations. Specifically, to solve the problem of lack of local fidelity and stability in interpretations caused by the LIME random perturbation sampling method, a new model-independent interpretation method is proposed, which uses the isometric mapping virtual sample generation method based on manifold learning instead of LIMEs random perturbation sampling method to generate samples. Additionally, a generative link tree is presented to create counterfactual explanations that maintain strong data fidelity, which constructs counterfactual narratives by leveraging examples from the training data, employing a divide-and-conquer strategy combined with local greedy. Experiments conducted on social media datasets from Twitter, YouTube comments, Yelp, and Amazon demonstrate XSAs ability to provide local aspect-level explanations while maintaining sentiment analysis performance. Analyses reveal improved model explainability and enhanced user trust, demonstrating XAIs potential in sentiment analysis of social media platforms. The proposed XSA framework provides a valuable direction for developing transparent and trustworthy sentiment analysis models for social media platforms.
{"title":"Explaining Sentiments: Improving Explainability in Sentiment Analysis Using Local Interpretable Model-Agnostic Explanations and Counterfactual Explanations","authors":"Xin Wang;Jianhui Lyu;J. Dinesh Peter;Byung-Gyu Kim;B.D. Parameshachari;Keqin Li;Wei Wei","doi":"10.1109/TCSS.2025.3531718","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3531718","url":null,"abstract":"Sentiment analysis of social media platforms is crucial for extracting actionable insights from unstructured textual data. However, modern sentiment analysis models using deep learning lack explainability, acting as black box and limiting trust. This study focuses on improving the explainability of sentiment analysis models of social media platforms by leveraging explainable artificial intelligence (XAI). We propose a novel explainable sentiment analysis (XSA) framework incorporating intrinsic and posthoc XAI methods, i.e., local interpretable model-agnostic explanations (LIME) and counterfactual explanations. Specifically, to solve the problem of lack of local fidelity and stability in interpretations caused by the LIME random perturbation sampling method, a new model-independent interpretation method is proposed, which uses the isometric mapping virtual sample generation method based on manifold learning instead of LIMEs random perturbation sampling method to generate samples. Additionally, a generative link tree is presented to create counterfactual explanations that maintain strong data fidelity, which constructs counterfactual narratives by leveraging examples from the training data, employing a divide-and-conquer strategy combined with local greedy. Experiments conducted on social media datasets from Twitter, YouTube comments, Yelp, and Amazon demonstrate XSAs ability to provide local aspect-level explanations while maintaining sentiment analysis performance. Analyses reveal improved model explainability and enhanced user trust, demonstrating XAIs potential in sentiment analysis of social media platforms. The proposed XSA framework provides a valuable direction for developing transparent and trustworthy sentiment analysis models for social media platforms.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 3","pages":"1390-1403"},"PeriodicalIF":4.5,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-04DOI: 10.1109/TCSS.2025.3548862
Xiaohong Guan;Xiaobing Li;Björn W. Schuller;Xinran Zhang
{"title":"Guest Editorial: Special Issue on Music Intelligence and Social Computation","authors":"Xiaohong Guan;Xiaobing Li;Björn W. Schuller;Xinran Zhang","doi":"10.1109/TCSS.2025.3548862","DOIUrl":"https://doi.org/10.1109/TCSS.2025.3548862","url":null,"abstract":"","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"847-850"},"PeriodicalIF":4.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10949083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}