Shenghai Zhong, Shu Guo, Jing Liu, Hongren Huang, Lihong Wang, Jianxin Li, Chen Li, Yiming Hei
Bipartite graph representation learning aims to obtain node embeddings by compressing sparse vectorized representations of interactions between two types of nodes, e.g., users and items. Incorporating structural attributes among homogeneous nodes, such as user communities, improves the identification of similar interaction preferences, namely, user/item embeddings, for downstream tasks. However, existing methods often fail to proactively discover and fully utilize these latent structural attributes. Moreover, the manual collection and labeling of structural attributes is always costly. In this paper, we propose a novel approach called Dirichlet Max-margin Matrix Factorization (DM3F), which adopts a self-supervised strategy to discover latent structural attributes and model discriminative node representations. Specifically, in self-supervised learning, our approach generates pseudo group labels (i.e., structural attributes) as a supervised signal using the Dirichlet process without relying on manual collection and labeling, and employs them in a max-margin classification. Additionally, we introduce a Variational Markov Chain Monte Carlo algorithm (Variational MCMC) to effectively update the parameters. The experimental results on six real datasets demonstrate that, in the majority of cases, the proposed method outperforms existing approaches based on matrix factorization and neural networks. Furthermore, the modularity analysis confirms the effectiveness of our model in capturing structural attributes to produce high-quality user embeddings.
双向图表示学习旨在通过压缩两类节点(如用户和物品)之间交互的稀疏向量表示来获得节点嵌入。将用户社区等同类节点之间的结构属性纳入其中,可提高下游任务对类似交互偏好(即用户/物品嵌入)的识别能力。然而,现有的方法往往无法主动发现和充分利用这些潜在的结构属性。此外,手动收集和标注结构属性总是成本高昂。在本文中,我们提出了一种名为 "Dirichlet Max-margin Matrix Factorization"(DM3F)的新方法,该方法采用自我监督策略来发现潜在结构属性并对节点表征进行判别建模。具体来说,在自我监督学习中,我们的方法利用 Dirichlet 过程生成伪组标签(即结构属性)作为监督信号,而无需依赖人工收集和标记,并将其用于最大边际分类。此外,我们还引入了变异马尔可夫链蒙特卡罗算法(Variational Markov Chain Monte Carlo algorithm,Variational MCMC)来有效更新参数。在六个真实数据集上的实验结果表明,在大多数情况下,所提出的方法优于现有的基于矩阵因式分解和神经网络的方法。此外,模块化分析证实了我们的模型在捕捉结构属性以生成高质量用户嵌入方面的有效性。
{"title":"Self-supervised Bipartite Graph Representation Learning: A Dirichlet Max-margin Matrix Factorization Approach","authors":"Shenghai Zhong, Shu Guo, Jing Liu, Hongren Huang, Lihong Wang, Jianxin Li, Chen Li, Yiming Hei","doi":"10.1145/3645098","DOIUrl":"https://doi.org/10.1145/3645098","url":null,"abstract":"<p>Bipartite graph representation learning aims to obtain node embeddings by compressing sparse vectorized representations of interactions between two types of nodes, e.g., users and items. Incorporating structural attributes among homogeneous nodes, such as user communities, improves the identification of similar interaction preferences, namely, user/item embeddings, for downstream tasks. However, existing methods often fail to proactively discover and fully utilize these latent structural attributes. Moreover, the manual collection and labeling of structural attributes is always costly. In this paper, we propose a novel approach called Dirichlet Max-margin Matrix Factorization (DM3F), which adopts a self-supervised strategy to discover latent structural attributes and model discriminative node representations. Specifically, in self-supervised learning, our approach generates pseudo group labels (i.e., structural attributes) as a supervised signal using the Dirichlet process without relying on manual collection and labeling, and employs them in a max-margin classification. Additionally, we introduce a Variational Markov Chain Monte Carlo algorithm (Variational MCMC) to effectively update the parameters. The experimental results on six real datasets demonstrate that, in the majority of cases, the proposed method outperforms existing approaches based on matrix factorization and neural networks. Furthermore, the modularity analysis confirms the effectiveness of our model in capturing structural attributes to produce high-quality user embeddings.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"57 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.
面向对象的微视频背景音乐推荐是一项复杂的任务,视频与背景音乐之间的匹配度是一个主要问题。然而,用户生成内容(UGC)中的音乐选择容易因上传者的历史偏好而产生选择偏差。由于历史偏好并不完全可靠,而且可能反映的是过时的行为,因此随着知识和兴趣的动态发展,应避免过度依赖历史偏好。在本文中,我们提出了一种去基础交叉模式(DecCM)匹配模型来减轻这种偏差。具体来说,上传者对音乐流派的个人偏好会被识别为混杂因素,这些混杂因素会使音乐嵌入和背景音乐选择之间产生虚假关联,从而导致学习系统过度推荐来自多数群体的音乐。为了解决这种混杂因素,我们利用后门调整来消除音乐嵌入和预测分数之间的虚假相关性。我们进一步利用蒙特卡洛(Monte Carlo,MC)估计器和批量平均值作为近似值,以避免整合调整计算出的整个混杂因素空间。此外,我们还设计了一个师生网络,利用音乐视频的匹配(即专业生成内容(PGC)的专业匹配)来更好地推荐内容匹配的背景音乐。教师网络对 PGC 数据进行建模,通过基于库尔贝克-莱伯勒的知识转移,指导学生网络对上传者选择的 UGC 数据进行匹配。在 TT-150k-genre 数据集上进行的大量实验证明了所提方法的有效性。代码可在以下网址公开获取:https://github.com/jing-1/DecCM。
{"title":"Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation","authors":"Jing Yi, Zhenzhong Chen","doi":"10.1145/3650042","DOIUrl":"https://doi.org/10.1145/3650042","url":null,"abstract":"<p>Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"33 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140046964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani
Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search systems. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In our recent work, we proposed an approach to tackle these issues with a user simulator, USi. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. However, while the answers generated by USi are both in line with the underlying information need and in natural language, a deeper understanding of such utterances is lacking. Thus, in this work, we explore utterance formulation of large language model (LLM) based user simulators. To this end, we first analyze the differences between USi, based on GPT-2, and the next generation of generative LLMs, such as GPT-3. Then, to gain a deeper understanding of LLM-based utterance generation, we compare the generated answers to the recently proposed set of patterns of human-based query reformulations. Finally, we discuss potential applications, as well as limitations, of LLM-based user simulators and outline promising directions for future work on the topic.
{"title":"Analysing Utterances in LLM-based User Simulation for Conversational Search","authors":"Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani","doi":"10.1145/3650041","DOIUrl":"https://doi.org/10.1145/3650041","url":null,"abstract":"<p>Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search systems. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In our recent work, we proposed an approach to tackle these issues with a user simulator, <i>USi</i>. Given a description of an information need, <i>USi</i> is capable of automatically answering clarifying questions about the topic throughout the search session. However, while the answers generated by <i>USi</i> are both in line with the underlying information need and in natural language, a deeper understanding of such utterances is lacking. Thus, in this work, we explore utterance formulation of large language model (LLM) based user simulators. To this end, we first analyze the differences between <i>USi</i>, based on GPT-2, and the next generation of generative LLMs, such as GPT-3. Then, to gain a deeper understanding of LLM-based utterance generation, we compare the generated answers to the recently proposed set of patterns of human-based query reformulations. Finally, we discuss potential applications, as well as limitations, of LLM-based user simulators and outline promising directions for future work on the topic.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"63 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140036123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tieliang Gao, Li Duan, Lufeng Feng, Wei Ni, Quan Z. Sheng
Service composition platforms play a crucial role in creating personalized service processes. Challenges, including the risk of tampering with service data during service invocation and the potential single point of failure in centralized service registration centers, hinder the efficient and responsible creation of service processes. This paper presents a novel framework called Context-Aware Responsible Service Process Creation and Recommendation (SPCR-CA), which incorporates blockchain, Recurrent Neural Networks (RNNs), and a Skip-Gram model holistically to enhance the security, efficiency, and quality of service process creation and recommendation. Specifically, the blockchain establishes a trusted service provision environment, ensuring transparent and secure transactions between services and mitigating the risk of tampering. The RNN trains responsible service processes, contextualizing service components and producing coherent recommendations of linkage components. The Skip-Gram model trains responsible user-service process records, generating semantic vectors that facilitate the recommendation of similar service processes to users. Experiments using the Programmable-Web dataset demonstrate the superiority of the SPCR-CA framework to existing benchmarks in precision and recall. The proposed framework enhances the reliability, efficiency, and quality of service process creation and recommendation, enabling users to create responsible and tailored service processes. The SPCR-CA framework offers promising potential to provide users with secure and user-centric service creation and recommendation capabilities.
{"title":"A Novel Blockchain-Based Responsible Recommendation System for Service Process Creation and Recommendation","authors":"Tieliang Gao, Li Duan, Lufeng Feng, Wei Ni, Quan Z. Sheng","doi":"10.1145/3643858","DOIUrl":"https://doi.org/10.1145/3643858","url":null,"abstract":"<p>Service composition platforms play a crucial role in creating personalized service processes. Challenges, including the risk of tampering with service data during service invocation and the potential single point of failure in centralized service registration centers, hinder the efficient and responsible creation of service processes. This paper presents a novel framework called Context-Aware Responsible Service Process Creation and Recommendation (SPCR-CA), which incorporates blockchain, Recurrent Neural Networks (RNNs), and a Skip-Gram model holistically to enhance the security, efficiency, and quality of service process creation and recommendation. Specifically, the blockchain establishes a trusted service provision environment, ensuring transparent and secure transactions between services and mitigating the risk of tampering. The RNN trains responsible service processes, contextualizing service components and producing coherent recommendations of linkage components. The Skip-Gram model trains responsible user-service process records, generating semantic vectors that facilitate the recommendation of similar service processes to users. Experiments using the Programmable-Web dataset demonstrate the superiority of the SPCR-CA framework to existing benchmarks in precision and recall. The proposed framework enhances the reliability, efficiency, and quality of service process creation and recommendation, enabling users to create responsible and tailored service processes. The SPCR-CA framework offers promising potential to provide users with secure and user-centric service creation and recommendation capabilities.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"19 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140018645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erica Coppolillo, Marco Minici, Ettore Ritacco, Luciano Caroprese, Francesco Sergio Pisani, Giuseppe Manco
Popularity bias is the tendency of recommender systems to further suggest popular items while disregarding niche ones, hence giving no chance for items with low popularity to emerge. Although the literature is rich in debiasing techniques, it still lacks quality measures that effectively enable their analyses and comparisons.
In this paper, we first introduce a formal, data-driven, and parameter-free strategy for classifying items into low, medium, and high popularity categories. Then we introduce BQS, a quality measure that rewards the debiasing techniques that successfully push a recommender system to suggest niche items, without losing points in its predictive capability in terms of global accuracy.
We conduct tests of BQS on three distinct baseline collaborative filtering (CF) frameworks: one based on history-embedding and two on user/item-embedding modeling. These evaluations are performed on multiple benchmark datasets and against various state-of-the-art competitors, demonstrating the effectiveness of BQS.
{"title":"Balanced Quality Score (BQS): Measuring Popularity Debiasing in Recommendation","authors":"Erica Coppolillo, Marco Minici, Ettore Ritacco, Luciano Caroprese, Francesco Sergio Pisani, Giuseppe Manco","doi":"10.1145/3650043","DOIUrl":"https://doi.org/10.1145/3650043","url":null,"abstract":"<p>Popularity bias is the tendency of recommender systems to further suggest popular items while disregarding niche ones, hence giving no chance for items with low popularity to emerge. Although the literature is rich in debiasing techniques, it still lacks quality measures that effectively enable their analyses and comparisons. </p><p>In this paper, we first introduce a formal, data-driven, and parameter-free strategy for classifying items into low, medium, and high popularity categories. Then we introduce <i>BQS</i>, a quality measure that rewards the debiasing techniques that successfully push a recommender system to suggest niche items, without losing points in its predictive capability in terms of global accuracy. </p><p>We conduct tests of <i>BQS</i> on three distinct baseline collaborative filtering (CF) frameworks: one based on history-embedding and two on user/item-embedding modeling. These evaluations are performed on multiple benchmark datasets and against various state-of-the-art competitors, demonstrating the effectiveness of <i>BQS</i>.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"4 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saira Bano, Nicola Tonellotto, Pietro Cassarà, Alberto Gotta
Emotion recognition has attracted a lot of interest in recent years in various application areas such as healthcare and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiological signals. However, recent studies are looking at multimodal techniques that combine different modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative training method based on cross-modal distillation, i.e., ”FedCMD” (Federated Cross-Modal Distillation). Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main advantage of FL is that only local data is used for training, thus maintaining privacy and providing a secure and efficient emotion recognition system. The local model in FL is trained for each vehicle device with unlabeled video data by using sensor data as a proxy. Specifically, for each local model, we show how driver emotional annotations can be transferred from the sensor domain to the visual domain by using cross-modal distillation. The key idea is based on the observation that a driver’s emotional state indicated by a sensor correlates with facial expressions shown in videos. The proposed ”FedCMD” approach is tested on the multimodal dataset ”BioVid Emo DB” and achieves state-of-the-art performance. Experimental results show that our approach is robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our model is much more robust to overfitting, resulting in better generalization than the other existing methods.
{"title":"FedCMD: A Federated Cross-Modal Knowledge Distillation for Drivers Emotion Recognition","authors":"Saira Bano, Nicola Tonellotto, Pietro Cassarà, Alberto Gotta","doi":"10.1145/3650040","DOIUrl":"https://doi.org/10.1145/3650040","url":null,"abstract":"<p>Emotion recognition has attracted a lot of interest in recent years in various application areas such as healthcare and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiological signals. However, recent studies are looking at multimodal techniques that combine different modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative training method based on cross-modal distillation, i.e., ”FedCMD” (Federated Cross-Modal Distillation). Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main advantage of FL is that only local data is used for training, thus maintaining privacy and providing a secure and efficient emotion recognition system. The local model in FL is trained for each vehicle device with unlabeled video data by using sensor data as a proxy. Specifically, for each local model, we show how driver emotional annotations can be transferred from the sensor domain to the visual domain by using cross-modal distillation. The key idea is based on the observation that a driver’s emotional state indicated by a sensor correlates with facial expressions shown in videos. The proposed ”FedCMD” approach is tested on the multimodal dataset ”BioVid Emo DB” and achieves state-of-the-art performance. Experimental results show that our approach is robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our model is much more robust to overfitting, resulting in better generalization than the other existing methods.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"60 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunji Liang, Nengzhen Chen, Zhiwen Yu, Lei Tang, Hongkai Yu, Bin Guo, Daniel Dajun Zeng
As one of the fundamental tasks of autonomous driving, depth perception aims to perceive physical objects in three dimensions and to judge their distances away from the ego vehicle. Although great efforts have been made for depth perception, LiDAR-based and camera-based solutions have limitations with low accuracy and poor robustness for noise input. With the integration of monocular cameras and LiDAR sensors in autonomous vehicles, in this paper, we introduce a two-stream architecture to learn the modality interaction representation under the guidance of an image reconstruction task to compensate for the deficiencies of each modality in a parallel manner. Specifically, in the two-stream architecture, the multi-scale cross-modality interactions are preserved via a cascading interaction network under the guidance of the reconstruction task. Next, the shared representation of modality interaction is integrated to infer the dense depth map due to the complementary and the heterogeneity of the two modalities. We evaluated the proposed solution on the KITTI dataset and CALAR synthetic dataset. Our experimental results show that learning the coupled interaction of modalities under the guidance of an auxiliary task can lead to significant performance improvements. Furthermore, our approach is competitive against the state-of-the-art models and robust against the noisy input. The source code is available at https://github.com/tonyFengye/Code/tree/master .
{"title":"Learning Cross-Modality Interaction for Robust Depth Perception of Autonomous Driving","authors":"Yunji Liang, Nengzhen Chen, Zhiwen Yu, Lei Tang, Hongkai Yu, Bin Guo, Daniel Dajun Zeng","doi":"10.1145/3650039","DOIUrl":"https://doi.org/10.1145/3650039","url":null,"abstract":"<p>As one of the fundamental tasks of autonomous driving, depth perception aims to perceive physical objects in three dimensions and to judge their distances away from the ego vehicle. Although great efforts have been made for depth perception, LiDAR-based and camera-based solutions have limitations with low accuracy and poor robustness for noise input. With the integration of monocular cameras and LiDAR sensors in autonomous vehicles, in this paper, we introduce a two-stream architecture to learn the modality interaction representation under the guidance of an image reconstruction task to compensate for the deficiencies of each modality in a parallel manner. Specifically, in the two-stream architecture, the multi-scale cross-modality interactions are preserved via a cascading interaction network under the guidance of the reconstruction task. Next, the shared representation of modality interaction is integrated to infer the dense depth map due to the complementary and the heterogeneity of the two modalities. We evaluated the proposed solution on the KITTI dataset and CALAR synthetic dataset. Our experimental results show that learning the coupled interaction of modalities under the guidance of an auxiliary task can lead to significant performance improvements. Furthermore, our approach is competitive against the state-of-the-art models and robust against the noisy input. The source code is available at <i>https://github.com/tonyFengye/Code/tree/master\u0000</i>.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"34 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous graph convolutional networks have gained great popularity in tackling various network analytical tasks on heterogeneous graph data, ranging from link prediction to node classification. However, most existing works ignore the relation heterogeneity with multiplex networks between multi-typed nodes and the different importance of relations in meta-paths for node embedding, which can hardly capture the heterogeneous structure signals across different relations. To tackle this challenge, this work proposes a Multiplex Heterogeneous Graph Convolutional Network (MHGCN+) for multiplex heterogeneous network embedding. Our MHGCN+ can automatically learn the useful heterogeneous meta-path interactions of different lengths with different importance in multiplex heterogeneous networks through multi-layer convolution aggregation. Additionally, we effectively integrate both multi-relation structural signals and attribute semantics into the learned node embeddings with both unsupervised and semi-supervised learning paradigms. Extensive experiments on seven real-world datasets with various network analytical tasks demonstrate the significant superiority of MHGCN+ against state-of-the-art embedding baselines in terms of all evaluation metrics. The source code of our method is available at: https://github.com/FuChF/MHGCN-plus.
{"title":"MHGCN+: Multiplex Heterogeneous Graph Convolutional Network","authors":"Chaofan Fu, Pengyang Yu, Yanwei Yu, Chao Huang, Zhongying Zhao, Junyu Dong","doi":"10.1145/3650046","DOIUrl":"https://doi.org/10.1145/3650046","url":null,"abstract":"<p>Heterogeneous graph convolutional networks have gained great popularity in tackling various network analytical tasks on heterogeneous graph data, ranging from link prediction to node classification. However, most existing works ignore the relation heterogeneity with multiplex networks between multi-typed nodes and the different importance of relations in meta-paths for node embedding, which can hardly capture the heterogeneous structure signals across different relations. To tackle this challenge, this work proposes a <underline><b>M</b></underline>ultiplex <underline><b>H</b></underline>eterogeneous <underline><b>G</b></underline>raph <underline><b>C</b></underline>onvolutional <underline><b>N</b></underline>etwork (MHGCN+) for multiplex heterogeneous network embedding. Our MHGCN+ can automatically learn the useful heterogeneous meta-path interactions of different lengths with different importance in multiplex heterogeneous networks through multi-layer convolution aggregation. Additionally, we effectively integrate both multi-relation structural signals and attribute semantics into the learned node embeddings with both unsupervised and semi-supervised learning paradigms. Extensive experiments on seven real-world datasets with various network analytical tasks demonstrate the significant superiority of MHGCN+ against state-of-the-art embedding baselines in terms of all evaluation metrics. The source code of our method is available at: https://github.com/FuChF/MHGCN-plus.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"22 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the benefits of personalizing items and information tailored to users’ needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.
{"title":"Break Out of a Pigeonhole: A Unified Framework for Examining Miscalibration, Bias, and Stereotype in Recommender Systems","authors":"Yongsu Ahn, Yu-Ru Lin","doi":"10.1145/3650044","DOIUrl":"https://doi.org/10.1145/3650044","url":null,"abstract":"<p>Despite the benefits of personalizing items and information tailored to users’ needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"46 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender systems have become important tools in the daily life of human beings since they are powerful to address information overload, and discover relevant and useful items for users. The success of recommender systems largely relies on the interaction history between users and items, which is expected to accurately reflect the preferences of users on items. However, the expectation is easily broken in practice, due to the corruptions made in the interaction history, resulting in unreliable and untrusted recommender systems. Previous works either ignore this issue (assume that the interaction history is precise) or are limited to handling additive noise. Motivated by this, in this paper, we study rating flip noise which is widely existed in the interaction history of recommender systems and combat it by modelling the noise generation process. Specifically, the rating flip noise allows a rating to be flipped to any other ratings within the given rating set, which reflects various real-world situations of rating corruption, e.g., a user may randomly click a rating from the rating set and then submit it. The noise generation process is modelled by the noise transition matrix that denotes the probabilities of a clean rating flip into a noisy rating. A statistically consistent algorithm is afterwards applied with the estimated transition matrix to learn a robust recommender system against rating flip noise. Comprehensive experiments on multiple benchmarks confirm the superiority of our method.
{"title":"Robust Recommender Systems with Rating Flip Noise","authors":"Shanshan Ye, Jie Lu","doi":"10.1145/3641285","DOIUrl":"https://doi.org/10.1145/3641285","url":null,"abstract":"<p>Recommender systems have become important tools in the daily life of human beings since they are powerful to address information overload, and discover relevant and useful items for users. The success of recommender systems largely relies on the interaction history between users and items, which is expected to accurately reflect the preferences of users on items. However, the expectation is easily broken in practice, due to the corruptions made in the interaction history, resulting in unreliable and untrusted recommender systems. Previous works either ignore this issue (assume that the interaction history is precise) or are limited to handling additive noise. Motivated by this, in this paper, we study rating flip noise which is widely existed in the interaction history of recommender systems and combat it by modelling the noise generation process. Specifically, the rating flip noise allows a rating to be flipped to any other ratings within the given rating set, which reflects various real-world situations of rating corruption, <i>e.g.</i>, a user may randomly click a rating from the rating set and then submit it. The noise generation process is modelled by the noise transition matrix that denotes the probabilities of a clean rating flip into a noisy rating. A statistically consistent algorithm is afterwards applied with the estimated transition matrix to learn a robust recommender system against rating flip noise. Comprehensive experiments on multiple benchmarks confirm the superiority of our method.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"19 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}