Session-based recommendation (SBR), which mainly relies on a user’s limited interactions with items to generate recommendations, is a widely investigated task. Existing methods often apply RNNs or GNNs to model user’s sequential behavior or transition relationship between items to capture her current preference. For training such models, the supervision signals are merely generated from the sequential interactions inside a session, neglecting the correlations of different sessions, which we argue can provide additional supervisions for learning the item representations. Moreover, previous methods mainly adopt the cross-entropy loss for training, where the user’s ground truth preference distribution towards items is regarded as a one-hot vector of the target item, easily making the network over-confident and leading to a serious overfitting problem. Thus, in this article, we propose a Collaborative Graph Learning (CGL) approach for session-based recommendation. CGL first applies the Gated Graph Neural Networks (GGNNs) to learn item embeddings and then is trained by considering both the main supervision as well as the self-supervision signals simultaneously. The main supervisions are produced by the sequential order while the self-supervisions are derived from the global graph constructed by all sessions. In addition, to prevent overfitting, we propose a Target-aware Label Confusion (TLC) learning method in the main supervised component. Extensive experiments are conducted on three publicly available datasets, i.e., Retailrocket, Diginetica, and Gowalla. The experimental results show that CGL can outperform the state-of-the-art baselines in terms of Recall and MRR.
{"title":"Collaborative Graph Learning for Session-based Recommendation","authors":"Zhiqiang Pan, Fei Cai, Wanyu Chen, Chonghao Chen, Honghui Chen","doi":"10.1145/3490479","DOIUrl":"https://doi.org/10.1145/3490479","url":null,"abstract":"Session-based recommendation (SBR), which mainly relies on a user’s limited interactions with items to generate recommendations, is a widely investigated task. Existing methods often apply RNNs or GNNs to model user’s sequential behavior or transition relationship between items to capture her current preference. For training such models, the supervision signals are merely generated from the sequential interactions inside a session, neglecting the correlations of different sessions, which we argue can provide additional supervisions for learning the item representations. Moreover, previous methods mainly adopt the cross-entropy loss for training, where the user’s ground truth preference distribution towards items is regarded as a one-hot vector of the target item, easily making the network over-confident and leading to a serious overfitting problem. Thus, in this article, we propose a Collaborative Graph Learning (CGL) approach for session-based recommendation. CGL first applies the Gated Graph Neural Networks (GGNNs) to learn item embeddings and then is trained by considering both the main supervision as well as the self-supervision signals simultaneously. The main supervisions are produced by the sequential order while the self-supervisions are derived from the global graph constructed by all sessions. In addition, to prevent overfitting, we propose a Target-aware Label Confusion (TLC) learning method in the main supervised component. Extensive experiments are conducted on three publicly available datasets, i.e., Retailrocket, Diginetica, and Gowalla. The experimental results show that CGL can outperform the state-of-the-art baselines in terms of Recall and MRR.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"36 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84466519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola
Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor nodes, which introduces an early summarization issue. In this article, we propose GraphHINGE (Heterogeneous INteract and aggreGatE), which captures and aggregates the interactive patterns between each pair of nodes through their structured neighborhoods. Specifically, we first introduce Neighborhood-based Interaction (NI) module to model the interactive patterns under the same metapaths, and then extend it to Cross Neighborhood-based Interaction (CNI) module to deal with different metapaths. Next, in order to address the complexity issue on large-scale networks, we formulate the interaction modules via a convolutional framework and learn the parameters efficiently with fast Fourier transform. Furthermore, we design a novel neighborhood-based selection (NS) mechanism, a sampling strategy, to filter high-order neighborhood information based on their low-order performance. The extensive experiments on six different types of heterogeneous graphs demonstrate the performance gains by comparing with state-of-the-arts in both click-through rate prediction and top-N recommendation tasks.
{"title":"GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network","authors":"Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola","doi":"10.1145/3472956","DOIUrl":"https://doi.org/10.1145/3472956","url":null,"abstract":"Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor nodes, which introduces an early summarization issue. In this article, we propose GraphHINGE (Heterogeneous INteract and aggreGatE), which captures and aggregates the interactive patterns between each pair of nodes through their structured neighborhoods. Specifically, we first introduce Neighborhood-based Interaction (NI) module to model the interactive patterns under the same metapaths, and then extend it to Cross Neighborhood-based Interaction (CNI) module to deal with different metapaths. Next, in order to address the complexity issue on large-scale networks, we formulate the interaction modules via a convolutional framework and learn the parameters efficiently with fast Fourier transform. Furthermore, we design a novel neighborhood-based selection (NS) mechanism, a sampling strategy, to filter high-order neighborhood information based on their low-order performance. The extensive experiments on six different types of heterogeneous graphs demonstrate the performance gains by comparing with state-of-the-arts in both click-through rate prediction and top-N recommendation tasks.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"143 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75040559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen
Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.
语言建模在自然语言处理和信息检索相关的任务中是必不可少的。继统计语言模型之后,又提出了量子语言模型(Quantum language Model, QLM),该模型在不以指数方式扩展词空间的情况下,将单个词和复合词统一在同一概率空间中。尽管QLM在临时检索方面取得了良好的性能,但它仍然存在两个主要的局限性:(1)QLM不能利用监督信息,主要是由于密度矩阵的迭代和不可微估计,而密度矩阵在QLM中既代表查询,也代表文档。(2) QLM假设词的互换性或词的依赖性,忽略词的顺序或位置信息。本文旨在概括QLM,并使其适用于更复杂的匹配任务(例如,问答),而不是特别检索。我们提出了一种基于复值神经网络的QLM解决方案,称为C-NNQLM,采用端到端方法以轻量级和可微的方式构建和训练密度矩阵,因此它可以利用外部训练良好的词向量和监督标签。此外,C-NNQLM采用复值词向量,其相位向量可以直接编码词的顺序(或位置)信息。请注意,复数在量子理论中也是必不可少的。我们证明了实值NNQLM (R-NNQLM)是C-NNQLM的一个特例。在QA任务上的实验结果表明,R-NNQLM和C-NNQLM的性能都比普通的QLM好得多,C-NNQLM的性能与最先进的神经网络模型相当。我们还评估了C-NNQLM在文本分类和文档检索任务上的性能。在大多数数据集上的结果表明,C-NNQLM可以优于R-NNQLM,这证明了C-NNQLM对单词和句子的复杂表示的有用性。
{"title":"Complex-valued Neural Network-based Quantum Language Models","authors":"Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen","doi":"10.1145/3505138","DOIUrl":"https://doi.org/10.1145/3505138","url":null,"abstract":"Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"29 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86861288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke
Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.
{"title":"Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs","authors":"Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke","doi":"10.1145/3485189","DOIUrl":"https://doi.org/10.1145/3485189","url":null,"abstract":"Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"49 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85276104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srivatsa Ramesh Jayashree, G. Dias, J. Andrew, S. Saha, Fabrice Maurel, S. Ferrari
Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues. However, such methodologies highly depend on manually tuned heuristics and are parameter-dependent. To overcome these drawbacks, principled frameworks have been proposed that provide the theoretical bases to achieve optimal solutions. However, existing methodologies only combine few discriminant features and do not define strategies to automatically select the optimal number of segments. In this article, we present a multi-objective clustering technique called MCS that relies on ( K ) -means, in which (1) visual, logical, and text cues are all combined in a early fusion manner and (2) an evolutionary process automatically discovers the optimal number of clusters (segments) as well as the correct positioning of seeds. As such, our proposal is parameter-free, combines many different modalities, does not depend on manually tuned heuristics, and can be run on any web page without any constraint. An exhaustive evaluation over two different tasks, where (1) the number of segments must be discovered or (2) the number of clusters is fixed with respect to the task at hand, shows that MCS drastically improves over most competitive and up-to-date algorithms for a wide variety of external and internal validation indices. In particular, results clearly evidence the impact of the visual and logical modalities towards segmentation performance.
网页分割(Web page segmentation, WPS)的目的是将网页分割成具有连贯的内语义和间语义的不同部分。通过证明网页的形态-倾向语义,WPS传统上被用来区分信息和非信息内容,但它也证明了它在视障人士非线性访问网络信息的背景下的关键作用。为此,已经提出了大量依赖于视觉、逻辑和/或文本线索的特殊解决方案。然而,这种方法高度依赖于手动调整的启发式,并且依赖于参数。为了克服这些缺点,提出了原则性框架,为实现最优解提供了理论基础。然而,现有的方法只结合了很少的判别特征,并且没有定义自动选择最优段数量的策略。在本文中,我们提出了一种称为MCS的多目标聚类技术,该技术依赖于( K ) -means,其中(1)视觉、逻辑和文本线索都以早期融合的方式组合在一起;(2)进化过程自动发现聚类(片段)的最佳数量以及种子的正确定位。因此,我们的建议是无参数的,结合了许多不同的模式,不依赖于手动调整的启发式,并且可以在任何网页上不受任何约束地运行。对两个不同的任务进行详尽的评估,其中(1)必须发现的片段数量或(2)相对于手头的任务,集群的数量是固定的,表明MCS在各种外部和内部验证指标上比大多数竞争激烈和最新的算法有了巨大的改进。特别是,结果清楚地证明了视觉和逻辑模式对分割性能的影响。
{"title":"Multimodal Web Page Segmentation Using Self-organized Multi-objective Clustering","authors":"Srivatsa Ramesh Jayashree, G. Dias, J. Andrew, S. Saha, Fabrice Maurel, S. Ferrari","doi":"10.1145/3480966","DOIUrl":"https://doi.org/10.1145/3480966","url":null,"abstract":"Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues. However, such methodologies highly depend on manually tuned heuristics and are parameter-dependent. To overcome these drawbacks, principled frameworks have been proposed that provide the theoretical bases to achieve optimal solutions. However, existing methodologies only combine few discriminant features and do not define strategies to automatically select the optimal number of segments. In this article, we present a multi-objective clustering technique called MCS that relies on ( K ) -means, in which (1) visual, logical, and text cues are all combined in a early fusion manner and (2) an evolutionary process automatically discovers the optimal number of clusters (segments) as well as the correct positioning of seeds. As such, our proposal is parameter-free, combines many different modalities, does not depend on manually tuned heuristics, and can be run on any web page without any constraint. An exhaustive evaluation over two different tasks, where (1) the number of segments must be discovered or (2) the number of clusters is fixed with respect to the task at hand, shows that MCS drastically improves over most competitive and up-to-date algorithms for a wide variety of external and internal validation indices. In particular, results clearly evidence the impact of the visual and logical modalities towards segmentation performance.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"20 1","pages":"1 - 49"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89470802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of e-commerce, fraud behaviors have been becoming one of the biggest threats to the e-commerce business. Fraud behaviors seriously damage the ranking system of e-commerce platforms and adversely influence the shopping experience of users. It is of great practical value to detect fraud behaviors on e-commerce platforms. However, the task is non-trivial, since the adversarial action taken by fraudsters. Existing fraud detection systems used in the e-commerce industry easily suffer from performance decay and can not adapt to the upgrade of fraud patterns, as they take already known fraud behaviors as supervision information to detect other suspicious behaviors. In this article, we propose a competitive graph neural networks (CGNN)-based fraud detection system (eFraudCom) to detect fraud behaviors at one of the largest e-commerce platforms, “Taobao”1. In the eFraudCom system, (1) the competitive graph neural networks (CGNN) as the core part of eFraudCom can classify behaviors of users directly by modeling the distributions of normal and fraud behaviors separately; (2) some normal behaviors will be utilized as weak supervision information to guide the CGNN to build the profile for normal behaviors that are more stable than fraud behaviors. The algorithm dependency on fraud behaviors will be eliminated, which enables eFraudCom to detect fraud behaviors in presence of the new fraud patterns; (3) the mutual information regularization term can maximize the separability between normal and fraud behaviors to further improve CGNN. eFraudCom is implemented into a prototype system and the performance of the system is evaluated by extensive experiments. The experiments on two Taobao and two public datasets demonstrate that the proposed deep framework CGNN is superior to other baselines in detecting fraud behaviors. A case study on Taobao datasets verifies that CGNN is still robust when the fraud patterns have been upgraded.
{"title":"eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks","authors":"Ge Zhang, Zhao Li, Jiaming Huang, Jia Wu, Chuan Zhou, Jian Yang, Jianliang Gao","doi":"10.1145/3474379","DOIUrl":"https://doi.org/10.1145/3474379","url":null,"abstract":"With the development of e-commerce, fraud behaviors have been becoming one of the biggest threats to the e-commerce business. Fraud behaviors seriously damage the ranking system of e-commerce platforms and adversely influence the shopping experience of users. It is of great practical value to detect fraud behaviors on e-commerce platforms. However, the task is non-trivial, since the adversarial action taken by fraudsters. Existing fraud detection systems used in the e-commerce industry easily suffer from performance decay and can not adapt to the upgrade of fraud patterns, as they take already known fraud behaviors as supervision information to detect other suspicious behaviors. In this article, we propose a competitive graph neural networks (CGNN)-based fraud detection system (eFraudCom) to detect fraud behaviors at one of the largest e-commerce platforms, “Taobao”1. In the eFraudCom system, (1) the competitive graph neural networks (CGNN) as the core part of eFraudCom can classify behaviors of users directly by modeling the distributions of normal and fraud behaviors separately; (2) some normal behaviors will be utilized as weak supervision information to guide the CGNN to build the profile for normal behaviors that are more stable than fraud behaviors. The algorithm dependency on fraud behaviors will be eliminated, which enables eFraudCom to detect fraud behaviors in presence of the new fraud patterns; (3) the mutual information regularization term can maximize the separability between normal and fraud behaviors to further improve CGNN. eFraudCom is implemented into a prototype system and the performance of the system is evaluated by extensive experiments. The experiments on two Taobao and two public datasets demonstrate that the proposed deep framework CGNN is superior to other baselines in detecting fraud behaviors. A case study on Taobao datasets verifies that CGNN is still robust when the fraud patterns have been upgraded.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"40 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84905164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yutao Zhu, Ruihua Song, J. Nie, Pan Du, Zhicheng Dou, Jin Zhou
Generating a text based on a predefined guideline is an interesting but challenging problem. A series of studies have been carried out in recent years. In dialogue systems, researchers have explored driving a dialogue based on a plan, while in story generation, a storyline has also been proved to be useful. In this article, we address a new task—generating movie scripts based on a predefined narrative. As an early exploration, we study this problem in a “retrieval-based” setting. We propose a model (ScriptWriter-CPre) to select the best response (i.e., next script line) among the candidates that fit the context (i.e., previous script lines) as well as the given narrative. Our model can keep track of what in the narrative has been said and what is to be said. Besides, it can also predict which part of the narrative should be paid more attention to when selecting the next line of script. In our study, we find the narrative plays a different role than the context. Therefore, different mechanisms are designed for deal with them. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. This new dataset is made available publicly to facilitate other studies in text generation under the guideline. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.
{"title":"Leveraging Narrative to Generate Movie Script","authors":"Yutao Zhu, Ruihua Song, J. Nie, Pan Du, Zhicheng Dou, Jin Zhou","doi":"10.1145/3507356","DOIUrl":"https://doi.org/10.1145/3507356","url":null,"abstract":"Generating a text based on a predefined guideline is an interesting but challenging problem. A series of studies have been carried out in recent years. In dialogue systems, researchers have explored driving a dialogue based on a plan, while in story generation, a storyline has also been proved to be useful. In this article, we address a new task—generating movie scripts based on a predefined narrative. As an early exploration, we study this problem in a “retrieval-based” setting. We propose a model (ScriptWriter-CPre) to select the best response (i.e., next script line) among the candidates that fit the context (i.e., previous script lines) as well as the given narrative. Our model can keep track of what in the narrative has been said and what is to be said. Besides, it can also predict which part of the narrative should be paid more attention to when selecting the next line of script. In our study, we find the narrative plays a different role than the context. Therefore, different mechanisms are designed for deal with them. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. This new dataset is made available publicly to facilitate other studies in text generation under the guideline. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"41 1","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86275280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein A. Rahmani, Mohammad Aliannejadi, Mitra Baratchi, F. Crestani
As the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user’s main activity location and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this article, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this article are as follows: (i) providing an extensive survey of context-aware location recommendation; (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, which can incorporate all the major contextual information into a single recommendation model; and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.
{"title":"A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation","authors":"Hossein A. Rahmani, Mohammad Aliannejadi, Mitra Baratchi, F. Crestani","doi":"10.1145/3508478","DOIUrl":"https://doi.org/10.1145/3508478","url":null,"abstract":"As the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user’s main activity location and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this article, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this article are as follows: (i) providing an extensive survey of context-aware location recommendation; (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, which can incorporate all the major contextual information into a single recommendation model; and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"28 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88779310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the context of depth-k pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: The prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 dataset, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency. Our main findings are: (a) The presentation order has no substantial impact on assessment efficiency; (b) While the presentation order substantially affects which documents are judged (highly) relevant, the difference between the inter-assessor agreement under the PRI condition and that under the RND condition is of no practical significance; (c) Different system rankings under the PRI condition are substantially more similar to one another than those under the RND condition; and (d) PRI-based relevance assessment files (qrels) are substantially and statistically significantly more robust to new systems than RND-based ones. Finding (d) suggests that PRI helps the assessors identify relevant documents that affect the evaluation of many existing systems, including those that did not contribute to the pools. Hence, if researchers need to evaluate their current IR systems using legacy IR test collections, we recommend the use of those constructed using the PRI approach unless they have a good reason to believe that their systems retrieve relevant documents that are vastly different from the pooled documents. While this robustness of PRI may also mean that the PRI-based pools are biased against future systems that retrieve highly novel relevant documents, one should note that there is no evidence that RND is any better in this respect.
{"title":"Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?","authors":"T. Sakai, Sijie Tao, Zhaohao Zeng","doi":"10.1145/3494833","DOIUrl":"https://doi.org/10.1145/3494833","url":null,"abstract":"In the context of depth-k pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: The prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 dataset, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency. Our main findings are: (a) The presentation order has no substantial impact on assessment efficiency; (b) While the presentation order substantially affects which documents are judged (highly) relevant, the difference between the inter-assessor agreement under the PRI condition and that under the RND condition is of no practical significance; (c) Different system rankings under the PRI condition are substantially more similar to one another than those under the RND condition; and (d) PRI-based relevance assessment files (qrels) are substantially and statistically significantly more robust to new systems than RND-based ones. Finding (d) suggests that PRI helps the assessors identify relevant documents that affect the evaluation of many existing systems, including those that did not contribute to the pools. Hence, if researchers need to evaluate their current IR systems using legacy IR test collections, we recommend the use of those constructed using the PRI approach unless they have a good reason to believe that their systems retrieve relevant documents that are vastly different from the pooled documents. While this robustness of PRI may also mean that the PRI-based pools are biased against future systems that retrieve highly novel relevant documents, one should note that there is no evidence that RND is any better in this respect.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"90 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88199515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, conversational agents have provided a natural and convenient access to useful information in people’s daily life, along with a broad and new research topic, conversational question answering (QA). On the shoulders of conversational QA, we study the conversational open-domain QA problem, where users’ information needs are presented in a conversation and exact answers are required to extract from the Web. Despite its significance and value, building an effective conversational open-domain QA system is non-trivial due to the following challenges: (1) precisely understand conversational questions based on the conversation context; (2) extract exact answers by capturing the answer dependency and transition flow in a conversation; and (3) deeply integrate question understanding and answer extraction. To address the aforementioned issues, we propose an end-to-end Dynamic Graph Reasoning approach to Conversational open-domain QA (DGRCoQA for short). DGRCoQA comprises three components, i.e., a dynamic question interpreter (DQI), a graph reasoning enhanced retriever (GRR), and a typical Reader, where the first one is developed to understand and formulate conversational questions while the other two are responsible to extract an exact answer from the Web. In particular, DQI understands conversational questions by utilizing the QA context, sourcing from predicted answers returned by the Reader, to dynamically attend to the most relevant information in the conversation context. Afterwards, GRR attempts to capture the answer flow and select the most possible passage that contains the answer by reasoning answer paths over a dynamically constructed context graph. Finally, the Reader, a reading comprehension model, predicts a text span from the selected passage as the answer. DGRCoQA demonstrates its strength in the extensive experiments conducted on a benchmark dataset. It significantly outperforms the existing methods and achieves the state-of-the-art performance.
{"title":"Dynamic Graph Reasoning for Conversational Open-Domain Question Answering","authors":"Yongqi Li, Wenjie Li, Liqiang Nie","doi":"10.1145/3498557","DOIUrl":"https://doi.org/10.1145/3498557","url":null,"abstract":"In recent years, conversational agents have provided a natural and convenient access to useful information in people’s daily life, along with a broad and new research topic, conversational question answering (QA). On the shoulders of conversational QA, we study the conversational open-domain QA problem, where users’ information needs are presented in a conversation and exact answers are required to extract from the Web. Despite its significance and value, building an effective conversational open-domain QA system is non-trivial due to the following challenges: (1) precisely understand conversational questions based on the conversation context; (2) extract exact answers by capturing the answer dependency and transition flow in a conversation; and (3) deeply integrate question understanding and answer extraction. To address the aforementioned issues, we propose an end-to-end Dynamic Graph Reasoning approach to Conversational open-domain QA (DGRCoQA for short). DGRCoQA comprises three components, i.e., a dynamic question interpreter (DQI), a graph reasoning enhanced retriever (GRR), and a typical Reader, where the first one is developed to understand and formulate conversational questions while the other two are responsible to extract an exact answer from the Web. In particular, DQI understands conversational questions by utilizing the QA context, sourcing from predicted answers returned by the Reader, to dynamically attend to the most relevant information in the conversation context. Afterwards, GRR attempts to capture the answer flow and select the most possible passage that contains the answer by reasoning answer paths over a dynamically constructed context graph. Finally, the Reader, a reading comprehension model, predicts a text span from the selected passage as the answer. DGRCoQA demonstrates its strength in the extensive experiments conducted on a benchmark dataset. It significantly outperforms the existing methods and achieves the state-of-the-art performance.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"13 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81882102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}