Pub Date : 2024-12-02DOI: 10.1109/TCSS.2024.3497725
Jinpeng Hu;Tengteng Dong;Gang Luo;Hui Ma;Peng Zou;Xiao Sun;Dan Guo;Xun Yang;Meng Wang
Mental health has attracted substantial attention in recent years and large language model (LLM) can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this article, we propose a specialized psychological LLM, named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multiturn dialogues, and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multiturn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared with other LLMs.
{"title":"PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation","authors":"Jinpeng Hu;Tengteng Dong;Gang Luo;Hui Ma;Peng Zou;Xiao Sun;Dan Guo;Xun Yang;Meng Wang","doi":"10.1109/TCSS.2024.3497725","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3497725","url":null,"abstract":"Mental health has attracted substantial attention in recent years and large language model (LLM) can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this article, we propose a specialized psychological LLM, named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multiturn dialogues, and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multiturn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared with other LLMs.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"539-551"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the escalating diversity, sophistication, and frequency of cyber attacks, it is imperative for critical infrastructure entities, e.g. smart grids, to recognize the inherent risks of operating in isolation. Sharing cyber threat intelligence (CTI) helps them stand together and build a collective cyber defense by knowledge, skills, and experience encompassing information related to identifying and evaluating cyber and physical threats. The present studies lack on robust CTI sharing strategies in smart grid systems. To address the critical need for secure and effective CTI sharing in smart grid systems, this article proposes a novel approach. Our solution leverages encrypted federated learning (FL) with integrated malicious client detection mechanisms. This approach facilitates collaborative learning of a threat detection model while preserving the privacy of raw CTI data. Employing real-world, heterogeneous smart grid datasets, we rigorously evaluated our approach under two distinct attack scenarios. The results demonstrate resilience against both man-in-the-middle attacks and malicious clients, exceeding the performance typically observed in traditional FL models.
{"title":"Robust Cyber Threat Intelligence Sharing Using Federated Learning for Smart Grids","authors":"Saifur Rahman;Shantanu Pal;Zahra Jadidi;Chandan Karmakar","doi":"10.1109/TCSS.2024.3496746","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3496746","url":null,"abstract":"Given the escalating diversity, sophistication, and frequency of cyber attacks, it is imperative for critical infrastructure entities, e.g. smart grids, to recognize the inherent risks of operating in isolation. Sharing cyber threat intelligence (CTI) helps them stand together and build a collective cyber defense by knowledge, skills, and experience encompassing information related to identifying and evaluating cyber and physical threats. The present studies lack on robust CTI sharing strategies in smart grid systems. To address the critical need for secure and effective CTI sharing in smart grid systems, this article proposes a novel approach. Our solution leverages encrypted federated learning (FL) with integrated malicious client detection mechanisms. This approach facilitates collaborative learning of a threat detection model while preserving the privacy of raw CTI data. Employing real-world, heterogeneous smart grid datasets, we rigorously evaluated our approach under two distinct attack scenarios. The results demonstrate resilience against both man-in-the-middle attacks and malicious clients, exceeding the performance typically observed in traditional FL models.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"635-644"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1109/TCSS.2024.3502357
Xuan Zhang;Tingshao Zhu;Baobin Li
Bot accounts on microblogging platforms significantly impact information reliability and cyberspace security. Accurately identifying these bots is essential for effective community governance and opinion management. This article introduces a category of online social behavior features (OSBF), derived from microblog behaviors such as emotional expression, language organization, and self-description. Through a series of experiments, OSBF has demonstrated the stable and robust performance in characterizing and detecting microblog bots on Twitter and Chinese Weibo. By identifying significant differences in OSBF between bot and human accounts, we established an OSBF-based detection model. This model showed excellent performance across multitask and multiscale challenges in two English Twitter datasets. Additionally, we explored cross-language and cross-dataset applications using two Chinese Weibo datasets, further affirming the model's effectiveness and robustness. The experimental results confirm that our OSBF-based model surpasses existing methods in detecting microblog bots.
{"title":"Online Social Behaviors: Robust and Stable Features for Detecting Microblog Bots","authors":"Xuan Zhang;Tingshao Zhu;Baobin Li","doi":"10.1109/TCSS.2024.3502357","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3502357","url":null,"abstract":"Bot accounts on microblogging platforms significantly impact information reliability and cyberspace security. Accurately identifying these bots is essential for effective community governance and opinion management. This article introduces a category of online social behavior features (OSBF), derived from microblog behaviors such as emotional expression, language organization, and self-description. Through a series of experiments, OSBF has demonstrated the stable and robust performance in characterizing and detecting microblog bots on Twitter and Chinese Weibo. By identifying significant differences in OSBF between bot and human accounts, we established an OSBF-based detection model. This model showed excellent performance across multitask and multiscale challenges in two English Twitter datasets. Additionally, we explored cross-language and cross-dataset applications using two Chinese Weibo datasets, further affirming the model's effectiveness and robustness. The experimental results confirm that our OSBF-based model surpasses existing methods in detecting microblog bots.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"671-681"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1109/TCSS.2024.3493355
{"title":"IEEE Transactions on Computational Social Systems Publication Information","authors":"","doi":"10.1109/TCSS.2024.3493355","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3493355","url":null,"abstract":"","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 6","pages":"C2-C2"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10772355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guest Editorial: Special Issue on Social Manufacturing After ChatGPT","authors":"Fei-Yue Wang;Pingyu Jiang;Gang Xiong;MengChu Zhou;Bernd Kuhlenkötter;Petri Helo;Zhen Shen","doi":"10.1109/TCSS.2024.3496032","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3496032","url":null,"abstract":"","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 6","pages":"7892-7897"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10772366","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1109/TCSS.2024.3494265
Zhongsheng Qian;Hui Zhu;Jinping Liu;Zilong Wan
Generative large language models (GLLMs) have achieved extreme success in the academic community of recommender systems. However, the application of such a powerful tool in the industrial world is still nascent. In Chinese home renovation industry, advisory consultants engage in offline conversations to fully understand the intentions of potential clients before subsequently recommending designers to them. Although conventional recommender systems can somewhat substitute for the consultants, they fall short in addressing two significant challenges. First, clients frequently revise their intentions during conversations, complicating the accurate capture of key intentions. Second, the process of recommending designers, which relies heavily on consultants’ manual efforts, is not only time-consuming but also prone to inaccuracies. To address the challenges, we present a recommendation agent, named DCICDRec, which leverages the robust conversational understanding and generation capabilities of the large language model MOSS. The creation of this agent involves two key steps. The first step is to prepare the corpus from the renovation domain by organizing it into conversational graphs, to which balanced sampling and profile normalization mechanisms are applied. This preparation ensures that the corpus is well-structured and unbiased before proceeding to fine-tune MOSS. The second step is to utilize the fine-tuned MOSS as a recommendation agent. In this capacity, the agent engages in conversations with potential clients and recommends designers, providing detailed reasons for each recommendation. Furthermore, if the client is dissatisfied with the recommended designers, the agent will delve deeper into understanding the client's true intentions and continually update the recommendations until the client is satisfied. We evaluate the agent's effectiveness on a real dialog dataset CRM between clients and consultants, as well as two publicly available datasets, INSPIRED and ReDIAL. Through comprehensive experiments with six baseline models, the DCICDRec agent demonstrate superior performances on the three datasets. Such experimental achievements indicate that the DCICDRec agent holds significant potential for generalization and commercial value. Moreover, the results of case study with 11 offline tests illustrate the scalability and efficiency of the agent in real-time scenarios.
{"title":"Harnessing Generative Large Language Models for Dynamic Intention Understanding in Recommender Systems: Insights From a Client–Designer Interaction Case Study","authors":"Zhongsheng Qian;Hui Zhu;Jinping Liu;Zilong Wan","doi":"10.1109/TCSS.2024.3494265","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3494265","url":null,"abstract":"Generative large language models (GLLMs) have achieved extreme success in the academic community of recommender systems. However, the application of such a powerful tool in the industrial world is still nascent. In Chinese home renovation industry, advisory consultants engage in offline conversations to fully understand the intentions of potential clients before subsequently recommending designers to them. Although conventional recommender systems can somewhat substitute for the consultants, they fall short in addressing two significant challenges. First, clients frequently revise their intentions during conversations, complicating the accurate capture of key intentions. Second, the process of recommending designers, which relies heavily on consultants’ manual efforts, is not only time-consuming but also prone to inaccuracies. To address the challenges, we present a recommendation agent, named DCICDRec, which leverages the robust conversational understanding and generation capabilities of the large language model MOSS. The creation of this agent involves two key steps. The first step is to prepare the corpus from the renovation domain by organizing it into conversational graphs, to which balanced sampling and profile normalization mechanisms are applied. This preparation ensures that the corpus is well-structured and unbiased before proceeding to fine-tune MOSS. The second step is to utilize the fine-tuned MOSS as a recommendation agent. In this capacity, the agent engages in conversations with potential clients and recommends designers, providing detailed reasons for each recommendation. Furthermore, if the client is dissatisfied with the recommended designers, the agent will delve deeper into understanding the client's true intentions and continually update the recommendations until the client is satisfied. We evaluate the agent's effectiveness on a real dialog dataset CRM between clients and consultants, as well as two publicly available datasets, INSPIRED and ReDIAL. Through comprehensive experiments with six baseline models, the DCICDRec agent demonstrate superior performances on the three datasets. Such experimental achievements indicate that the DCICDRec agent holds significant potential for generalization and commercial value. Moreover, the results of case study with 11 offline tests illustrate the scalability and efficiency of the agent in real-time scenarios.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"807-817"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Music recommendation systems aim to suggest tracks that users may enjoy. However, the accuracy of recommendation results is affected by popularity bias. Previous studies have focused on mitigating the direct effect of single-item popularity in video, news, or e-commerce recommendations, but have overlooked the multisource popularity biases in music recommendations. This article proposes a causal inference-based method to reduce the influence of both track and artist popularity. First, we construct a causal graph that encompasses users, tracks, and artists within the context of music recommendations. Next, we employ matrix factorization in conjunction with counterfactual inference theory to mitigate the popularity effects of artists and tracks, taking into account both the natural direct and indirect effects of these entities on music recommendations. Experimental results evaluated on four music recommendation datasets indicate that our method outperforms other baselines and effectively alleviates the popularity bias of both tracks and artists.
{"title":"Counterfactual Music Recommendation for Mitigating Popularity Bias","authors":"Jidong Yuan;Bingyu Gao;Xiaokang Wang;Haiyang Liu;Lingyin Zhang","doi":"10.1109/TCSS.2024.3491800","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3491800","url":null,"abstract":"Music recommendation systems aim to suggest tracks that users may enjoy. However, the accuracy of recommendation results is affected by popularity bias. Previous studies have focused on mitigating the direct effect of single-item popularity in video, news, or e-commerce recommendations, but have overlooked the multisource popularity biases in music recommendations. This article proposes a causal inference-based method to reduce the influence of both track and artist popularity. First, we construct a causal graph that encompasses users, tracks, and artists within the context of music recommendations. Next, we employ matrix factorization in conjunction with counterfactual inference theory to mitigate the popularity effects of artists and tracks, taking into account both the natural direct and indirect effects of these entities on music recommendations. Experimental results evaluated on four music recommendation datasets indicate that our method outperforms other baselines and effectively alleviates the popularity bias of both tracks and artists.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"851-861"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1109/TCSS.2024.3502662
Mengyi Zhang;Qingxing Dong;Xiaozhen Wu
The advent of information distribution mechanism constituted by self-exploration, network neighbors, and especially algorithms, has aroused widespread concerns about the reinforcement of misinformation beliefs and the resulting polarization. However, few existing researches fully consider the inherent characteristics of misinformation (e.g. evoking repulsive effects), as well as the adaptive nature of social relationship or come to see the impacts of algorithmic interventions on online misinformation and the formation process of social groups. To comprehensively investigate the coevolution process of user misinformation beliefs and social relationships under algorithmic interventions, we proposed a novel model with configurations as follows: 1) a nonlinear social influence function constructed to reflect the process of reinforcing misinformation beliefs; 2) probabilities for the rewiring of links among individuals determined by their opinion distance and social distance; and 3) multiple algorithmic mechanisms reformulated, regarding five recommendation processes and the information distribution rules integrating three information sources. Such extensive numerical simulation experiments have revealed diversification, radicalization, and polarization of misinformation. We observe that the introduction of moderate repulsive interactions fosters the emergence of diverse opinions. In absence of algorithmic interventions, misinformation naturally evolves into radicalization, while the introduction of algorithmic interventions exacerbates polarization, particularly with extensive reliance on content-based recommendations and excessive allowance of distributed opinions from recommendations. It is noteworthy that we discover that encouraging recommendation based on predetermined information effectively reverses the trend of misinformation evolution. Our research contributes to clarifying the interaction between human behavior and artificial intelligence, as well as providing insights for misinformation supervision and governance.
{"title":"How Misinformation Diffuses on Online Social Networks: Radical Opinions, Adaptive Relationship, and Algorithmic Intervention","authors":"Mengyi Zhang;Qingxing Dong;Xiaozhen Wu","doi":"10.1109/TCSS.2024.3502662","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3502662","url":null,"abstract":"The advent of information distribution mechanism constituted by self-exploration, network neighbors, and especially algorithms, has aroused widespread concerns about the reinforcement of misinformation beliefs and the resulting polarization. However, few existing researches fully consider the inherent characteristics of misinformation (e.g. evoking repulsive effects), as well as the adaptive nature of social relationship or come to see the impacts of algorithmic interventions on online misinformation and the formation process of social groups. To comprehensively investigate the coevolution process of user misinformation beliefs and social relationships under algorithmic interventions, we proposed a novel model with configurations as follows: 1) a nonlinear social influence function constructed to reflect the process of reinforcing misinformation beliefs; 2) probabilities for the rewiring of links among individuals determined by their opinion distance and social distance; and 3) multiple algorithmic mechanisms reformulated, regarding five recommendation processes and the information distribution rules integrating three information sources. Such extensive numerical simulation experiments have revealed diversification, radicalization, and polarization of misinformation. We observe that the introduction of moderate repulsive interactions fosters the emergence of diverse opinions. In absence of algorithmic interventions, misinformation naturally evolves into radicalization, while the introduction of algorithmic interventions exacerbates polarization, particularly with extensive reliance on content-based recommendations and excessive allowance of distributed opinions from recommendations. It is noteworthy that we discover that encouraging recommendation based on predetermined information effectively reverses the trend of misinformation evolution. Our research contributes to clarifying the interaction between human behavior and artificial intelligence, as well as providing insights for misinformation supervision and governance.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"2047-2061"},"PeriodicalIF":4.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the film industry, movie posters have been an essential part of advertising and marketing for many decades and continue to play a vital role even today in the form of digital posters through online, social media, and over-the-top (OTT) platforms. Typically, movie posters can effectively promote and communicate the essence of a film, such as its genre, visual style/tone, vibe, and storyline cue/theme, which are essential to attract potential viewers. Identifying the genres of a movie often has significant practical applications in recommending the film to target audiences. Previous studies on genre identification have primarily focused on sources such as plot synopses, subtitles, metadata, movie scenes, and trailer videos; however, posters precede the availability of these sources and provide prerelease implicit information to generate mass interest. In this article, we work for automated multilabel movie genre identification only from poster images, without any aid of additional textual/metadata/video information about movies, which is one of the earliest attempts of its kind. Here, we present a deep transformer network with a probabilistic module to identify the movie genres exclusively from the poster. For experiments, we procured 13882 number of posters of 13 genres from the Internet movie database (IMDb), where our model performances were encouraging and even outperformed some major contemporary architectures.
{"title":"Demystifying Visual Features of Movie Posters for Multilabel Genre Identification","authors":"Utsav Kumar Nareti;Chandranath Adak;Soumi Chattopadhyay","doi":"10.1109/TCSS.2024.3481157","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3481157","url":null,"abstract":"In the film industry, movie posters have been an essential part of advertising and marketing for many decades and continue to play a vital role even today in the form of digital posters through online, social media, and over-the-top (OTT) platforms. Typically, movie posters can effectively promote and communicate the essence of a film, such as its genre, visual style/tone, vibe, and storyline cue/theme, which are essential to attract potential viewers. Identifying the genres of a movie often has significant practical applications in recommending the film to target audiences. Previous studies on genre identification have primarily focused on sources such as plot synopses, subtitles, metadata, movie scenes, and trailer videos; however, posters precede the availability of these sources and provide prerelease implicit information to generate mass interest. In this article, we work for automated multilabel movie genre identification only from poster images, without any aid of additional textual/metadata/video information about movies, which is one of the earliest attempts of its kind. Here, we present a deep transformer network with a probabilistic module to identify the movie genres exclusively from the poster. For experiments, we procured 13882 number of posters of 13 genres from the Internet movie database (IMDb), where our model performances were encouraging and even outperformed some major contemporary architectures.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"2120-2129"},"PeriodicalIF":4.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimodal recommendation systems have made significant progress by leveraging graph convolutional networks to integrate user behavior with item content, including images and text. However, these systems still encounter two major challenges: noise edges in interaction graphs and noise in multimodal features of items. Existing works tend to address only one type of noise problem to enhance recommendation performance. This article proposes a new Dual Denoising Multimodal Graph Recommendation (DDRec) model, designed to enhance multimodal recommendation systems by tackling both challenges simultaneously. Specifically, we design two denoising techniques: hard denoising and soft denoising. For noise edges in interaction graphs, the hard denoising method uses preference scores of user nodes and item nodes in different modality interaction graphs as edge weights and prunes edges below a certain threshold to eliminate noise. For noise in multimodal features, the soft denoising method leverages item and item semantic graph information to denoise modal features, thus obtaining modality features related to user preferences. Finally, we employ contrastive learning to compare user and item representations derived from the denoised modality interaction graphs against those from the original graph, ensuring the consistency of nodes across various views. Our comprehensive experiments across four public datasets validate the enhanced performance and effectiveness of the DDRec model.
{"title":"DDRec: Dual Denoising Multimodal Graph Recommendation","authors":"Yuchao Ping;Shuqin Wang;Ziyi Yang;Bugui He;Nan Zhou;Yongquan Dong","doi":"10.1109/TCSS.2024.3490801","DOIUrl":"https://doi.org/10.1109/TCSS.2024.3490801","url":null,"abstract":"Multimodal recommendation systems have made significant progress by leveraging graph convolutional networks to integrate user behavior with item content, including images and text. However, these systems still encounter two major challenges: noise edges in interaction graphs and noise in multimodal features of items. Existing works tend to address only one type of noise problem to enhance recommendation performance. This article proposes a new Dual Denoising Multimodal Graph Recommendation (DDRec) model, designed to enhance multimodal recommendation systems by tackling both challenges simultaneously. Specifically, we design two denoising techniques: hard denoising and soft denoising. For noise edges in interaction graphs, the hard denoising method uses preference scores of user nodes and item nodes in different modality interaction graphs as edge weights and prunes edges below a certain threshold to eliminate noise. For noise in multimodal features, the soft denoising method leverages item and item semantic graph information to denoise modal features, thus obtaining modality features related to user preferences. Finally, we employ contrastive learning to compare user and item representations derived from the denoised modality interaction graphs against those from the original graph, ensuring the consistency of nodes across various views. Our comprehensive experiments across four public datasets validate the enhanced performance and effectiveness of the DDRec model.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 3","pages":"1100-1114"},"PeriodicalIF":4.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}