ACM Transactions on Information Systems (TOIS)最新文献

Collaborative Graph Learning for Session-based Recommendation 基于会话推荐的协同图学习

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-04-12 DOI: 10.1145/3490479

Zhiqiang Pan, Fei Cai, Wanyu Chen, Chonghao Chen, Honghui Chen

Session-based recommendation (SBR), which mainly relies on a user’s limited interactions with items to generate recommendations, is a widely investigated task. Existing methods often apply RNNs or GNNs to model user’s sequential behavior or transition relationship between items to capture her current preference. For training such models, the supervision signals are merely generated from the sequential interactions inside a session, neglecting the correlations of different sessions, which we argue can provide additional supervisions for learning the item representations. Moreover, previous methods mainly adopt the cross-entropy loss for training, where the user’s ground truth preference distribution towards items is regarded as a one-hot vector of the target item, easily making the network over-confident and leading to a serious overfitting problem. Thus, in this article, we propose a Collaborative Graph Learning (CGL) approach for session-based recommendation. CGL first applies the Gated Graph Neural Networks (GGNNs) to learn item embeddings and then is trained by considering both the main supervision as well as the self-supervision signals simultaneously. The main supervisions are produced by the sequential order while the self-supervisions are derived from the global graph constructed by all sessions. In addition, to prevent overfitting, we propose a Target-aware Label Confusion (TLC) learning method in the main supervised component. Extensive experiments are conducted on three publicly available datasets, i.e., Retailrocket, Diginetica, and Gowalla. The experimental results show that CGL can outperform the state-of-the-art baselines in terms of Recall and MRR.

基于会话的推荐(Session-based recommendation, SBR)是一项被广泛研究的任务，它主要依靠用户与物品的有限交互来生成推荐。现有的方法通常使用rnn或gnn来模拟用户的顺序行为或物品之间的转换关系，以捕获用户当前的偏好。对于训练这样的模型，监督信号仅仅是从会话内的顺序交互中产生的，忽略了不同会话之间的相关性，我们认为这可以为学习项目表示提供额外的监督。而且，以往的方法主要采用交叉熵损失进行训练，将用户对物品的真实度偏好分布视为目标物品的一个单热向量，容易使网络过于自信，导致严重的过拟合问题。因此，在本文中，我们提出了一种基于会话的推荐的协作图学习(CGL)方法。CGL首先应用门控图神经网络(GGNNs)学习项目嵌入，然后同时考虑主监督信号和自监督信号进行训练。主监督由序列排序产生，自监督由所有会话构造的全局图派生。此外，为了防止过拟合，我们在主监督组件中提出了一种目标感知标签混淆(TLC)学习方法。在三个公开可用的数据集上进行了广泛的实验，即Retailrocket, Diginetica和Gowalla。实验结果表明，CGL在召回率和MRR方面优于最先进的基线。

{"title":"Collaborative Graph Learning for Session-based Recommendation","authors":"Zhiqiang Pan, Fei Cai, Wanyu Chen, Chonghao Chen, Honghui Chen","doi":"10.1145/3490479","DOIUrl":"https://doi.org/10.1145/3490479","url":null,"abstract":"Session-based recommendation (SBR), which mainly relies on a user’s limited interactions with items to generate recommendations, is a widely investigated task. Existing methods often apply RNNs or GNNs to model user’s sequential behavior or transition relationship between items to capture her current preference. For training such models, the supervision signals are merely generated from the sequential interactions inside a session, neglecting the correlations of different sessions, which we argue can provide additional supervisions for learning the item representations. Moreover, previous methods mainly adopt the cross-entropy loss for training, where the user’s ground truth preference distribution towards items is regarded as a one-hot vector of the target item, easily making the network over-confident and leading to a serious overfitting problem. Thus, in this article, we propose a Collaborative Graph Learning (CGL) approach for session-based recommendation. CGL first applies the Gated Graph Neural Networks (GGNNs) to learn item embeddings and then is trained by considering both the main supervision as well as the self-supervision signals simultaneously. The main supervisions are produced by the sequential order while the self-supervisions are derived from the global graph constructed by all sessions. In addition, to prevent overfitting, we propose a Target-aware Label Confusion (TLC) learning method in the main supervised component. Extensive experiments are conducted on three publicly available datasets, i.e., Retailrocket, Diginetica, and Gowalla. The experimental results show that CGL can outperform the state-of-the-art baselines in terms of Recall and MRR.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"36 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84466519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network 异构信息网络上结构化邻域的学习交互模型

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-03-24 DOI: 10.1145/3472956

Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola

Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor nodes, which introduces an early summarization issue. In this article, we propose GraphHINGE (Heterogeneous INteract and aggreGatE), which captures and aggregates the interactive patterns between each pair of nodes through their structured neighborhoods. Specifically, we first introduce Neighborhood-based Interaction (NI) module to model the interactive patterns under the same metapaths, and then extend it to Cross Neighborhood-based Interaction (CNI) module to deal with different metapaths. Next, in order to address the complexity issue on large-scale networks, we formulate the interaction modules via a convolutional framework and learn the parameters efficiently with fast Fourier transform. Furthermore, we design a novel neighborhood-based selection (NS) mechanism, a sampling strategy, to filter high-order neighborhood information based on their low-order performance. The extensive experiments on six different types of heterogeneous graphs demonstrate the performance gains by comparing with state-of-the-arts in both click-through rate prediction and top-N recommendation tasks.

异构信息网络(HIN)被广泛用于描述各种类型的实体及其复杂关系。最近的尝试要么依赖显式路径可达性来利用基于路径的语义相关性，要么依赖图邻域来在预测之前学习异构网络表示。这些弱耦合方式忽略了相邻节点之间的丰富交互，这就引入了一个早期的总结问题。在本文中，我们提出了GraphHINGE(异构交互和聚合)，它捕获并聚合每对节点之间通过其结构化邻域的交互模式。具体来说，我们首先引入基于邻域的交互(NI)模块对同一元路径下的交互模式进行建模，然后将其扩展到基于跨邻域的交互(CNI)模块来处理不同元路径下的交互模式。其次，为了解决大规模网络的复杂性问题，我们通过卷积框架制定交互模块，并使用快速傅里叶变换有效地学习参数。此外，我们设计了一种新的基于邻域选择(NS)机制，即基于邻域信息的低阶性能来过滤高阶邻域信息的采样策略。在六种不同类型的异构图上进行了广泛的实验，通过比较点击率预测和top-N推荐任务的最新技术，证明了性能的提高。

{"title":"GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network","authors":"Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola","doi":"10.1145/3472956","DOIUrl":"https://doi.org/10.1145/3472956","url":null,"abstract":"Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor nodes, which introduces an early summarization issue. In this article, we propose GraphHINGE (Heterogeneous INteract and aggreGatE), which captures and aggregates the interactive patterns between each pair of nodes through their structured neighborhoods. Specifically, we first introduce Neighborhood-based Interaction (NI) module to model the interactive patterns under the same metapaths, and then extend it to Cross Neighborhood-based Interaction (CNI) module to deal with different metapaths. Next, in order to address the complexity issue on large-scale networks, we formulate the interaction modules via a convolutional framework and learn the parameters efficiently with fast Fourier transform. Furthermore, we design a novel neighborhood-based selection (NS) mechanism, a sampling strategy, to filter high-order neighborhood information based on their low-order performance. The extensive experiments on six different types of heterogeneous graphs demonstrate the performance gains by comparing with state-of-the-arts in both click-through rate prediction and top-N recommendation tasks.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"143 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75040559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Complex-valued Neural Network-based Quantum Language Models 基于复值神经网络的量子语言模型

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-03-10 DOI: 10.1145/3505138

Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen

Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.

语言建模在自然语言处理和信息检索相关的任务中是必不可少的。继统计语言模型之后，又提出了量子语言模型(Quantum language Model, QLM)，该模型在不以指数方式扩展词空间的情况下，将单个词和复合词统一在同一概率空间中。尽管QLM在临时检索方面取得了良好的性能，但它仍然存在两个主要的局限性:(1)QLM不能利用监督信息，主要是由于密度矩阵的迭代和不可微估计，而密度矩阵在QLM中既代表查询，也代表文档。(2) QLM假设词的互换性或词的依赖性，忽略词的顺序或位置信息。本文旨在概括QLM，并使其适用于更复杂的匹配任务(例如，问答)，而不是特别检索。我们提出了一种基于复值神经网络的QLM解决方案，称为C-NNQLM，采用端到端方法以轻量级和可微的方式构建和训练密度矩阵，因此它可以利用外部训练良好的词向量和监督标签。此外，C-NNQLM采用复值词向量，其相位向量可以直接编码词的顺序(或位置)信息。请注意，复数在量子理论中也是必不可少的。我们证明了实值NNQLM (R-NNQLM)是C-NNQLM的一个特例。在QA任务上的实验结果表明，R-NNQLM和C-NNQLM的性能都比普通的QLM好得多，C-NNQLM的性能与最先进的神经网络模型相当。我们还评估了C-NNQLM在文本分类和文档检索任务上的性能。在大多数数据集上的结果表明，C-NNQLM可以优于R-NNQLM，这证明了C-NNQLM对单词和句子的复杂表示的有用性。

{"title":"Complex-valued Neural Network-based Quantum Language Models","authors":"Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen","doi":"10.1145/3505138","DOIUrl":"https://doi.org/10.1145/3505138","url":null,"abstract":"Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"29 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86861288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs 基于元图的动态异构信息网络的可扩展表示学习

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-03-10 DOI: 10.1145/3485189

Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke

Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.

内容表示是信息检索中的一项基本任务。表征学习的目的是在低维空间中捕捉信息对象的特征。异构信息网络表示学习的研究大多集中在静态异构信息网络上。然而，在实践中，网络是动态的，受到不断变化的影响。在本文中，我们提出了一种新颖的可扩展表示学习模型M-DHIN，以探索动态HIN的演变。我们认为动态HIN是一系列具有不同时间戳的快照。我们首先使用静态嵌入方法来学习动态HIN在第一个时间戳的初始嵌入。我们通过元图描述初始HIN的特征，它比传统的面向路径的静态模型保留了更多的结构和语义信息。我们还采用了一种复杂的嵌入方案来更好地区分对称和非对称元图。与在每个时间戳处理整个网络的传统模型不同，我们构建了一个所谓的变化数据集，该数据集仅包括涉及三元关闭或打开过程的节点，以及新添加或删除的节点。然后，我们利用上述基于元图的机制在变更数据集上进行训练。由于这种设置，M-DHIN可以扩展到大型动态HIN，因为它只需要对整个HIN建模一次，而只需要处理更改的部分。现有的动态嵌入模型只能表达现有的快照，不能预测未来的网络结构。为了使M-DHIN具备这种能力，我们引入了一个基于LSTM的深度自编码器模型，该模型通过LSTM编码器处理图的演化并输出预测图。最后，我们在实际数据集上评估了所提出的模型M-DHIN，并证明它显著且始终优于最先进的模型。

{"title":"Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs","authors":"Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke","doi":"10.1145/3485189","DOIUrl":"https://doi.org/10.1145/3485189","url":null,"abstract":"Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"49 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85276104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Multimodal Web Page Segmentation Using Self-organized Multi-objective Clustering 基于自组织多目标聚类的多模态网页分割

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-03-07 DOI: 10.1145/3480966

Srivatsa Ramesh Jayashree, G. Dias, J. Andrew, S. Saha, Fabrice Maurel, S. Ferrari

Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues. However, such methodologies highly depend on manually tuned heuristics and are parameter-dependent. To overcome these drawbacks, principled frameworks have been proposed that provide the theoretical bases to achieve optimal solutions. However, existing methodologies only combine few discriminant features and do not define strategies to automatically select the optimal number of segments. In this article, we present a multi-objective clustering technique called MCS that relies on ( K ) -means, in which (1) visual, logical, and text cues are all combined in a early fusion manner and (2) an evolutionary process automatically discovers the optimal number of clusters (segments) as well as the correct positioning of seeds. As such, our proposal is parameter-free, combines many different modalities, does not depend on manually tuned heuristics, and can be run on any web page without any constraint. An exhaustive evaluation over two different tasks, where (1) the number of segments must be discovered or (2) the number of clusters is fixed with respect to the task at hand, shows that MCS drastically improves over most competitive and up-to-date algorithms for a wide variety of external and internal validation indices. In particular, results clearly evidence the impact of the visual and logical modalities towards segmentation performance.

网页分割(Web page segmentation, WPS)的目的是将网页分割成具有连贯的内语义和间语义的不同部分。通过证明网页的形态-倾向语义，WPS传统上被用来区分信息和非信息内容，但它也证明了它在视障人士非线性访问网络信息的背景下的关键作用。为此，已经提出了大量依赖于视觉、逻辑和/或文本线索的特殊解决方案。然而，这种方法高度依赖于手动调整的启发式，并且依赖于参数。为了克服这些缺点，提出了原则性框架，为实现最优解提供了理论基础。然而，现有的方法只结合了很少的判别特征，并且没有定义自动选择最优段数量的策略。在本文中，我们提出了一种称为MCS的多目标聚类技术，该技术依赖于( K ) -means，其中(1)视觉、逻辑和文本线索都以早期融合的方式组合在一起;(2)进化过程自动发现聚类(片段)的最佳数量以及种子的正确定位。因此，我们的建议是无参数的，结合了许多不同的模式，不依赖于手动调整的启发式，并且可以在任何网页上不受任何约束地运行。对两个不同的任务进行详尽的评估，其中(1)必须发现的片段数量或(2)相对于手头的任务，集群的数量是固定的，表明MCS在各种外部和内部验证指标上比大多数竞争激烈和最新的算法有了巨大的改进。特别是，结果清楚地证明了视觉和逻辑模式对分割性能的影响。

{"title":"Multimodal Web Page Segmentation Using Self-organized Multi-objective Clustering","authors":"Srivatsa Ramesh Jayashree, G. Dias, J. Andrew, S. Saha, Fabrice Maurel, S. Ferrari","doi":"10.1145/3480966","DOIUrl":"https://doi.org/10.1145/3480966","url":null,"abstract":"Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues. However, such methodologies highly depend on manually tuned heuristics and are parameter-dependent. To overcome these drawbacks, principled frameworks have been proposed that provide the theoretical bases to achieve optimal solutions. However, existing methodologies only combine few discriminant features and do not define strategies to automatically select the optimal number of segments. In this article, we present a multi-objective clustering technique called MCS that relies on ( K ) -means, in which (1) visual, logical, and text cues are all combined in a early fusion manner and (2) an evolutionary process automatically discovers the optimal number of clusters (segments) as well as the correct positioning of seeds. As such, our proposal is parameter-free, combines many different modalities, does not depend on manually tuned heuristics, and can be run on any web page without any constraint. An exhaustive evaluation over two different tasks, where (1) the number of segments must be discovered or (2) the number of clusters is fixed with respect to the task at hand, shows that MCS drastically improves over most competitive and up-to-date algorithms for a wide variety of external and internal validation indices. In particular, results clearly evidence the impact of the visual and logical modalities towards segmentation performance.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"20 1","pages":"1 - 49"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89470802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks 基于竞争图神经网络的电子商务欺诈检测系统

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-03-07 DOI: 10.1145/3474379

Ge Zhang, Zhao Li, Jiaming Huang, Jia Wu, Chuan Zhou, Jian Yang, Jianliang Gao

With the development of e-commerce, fraud behaviors have been becoming one of the biggest threats to the e-commerce business. Fraud behaviors seriously damage the ranking system of e-commerce platforms and adversely influence the shopping experience of users. It is of great practical value to detect fraud behaviors on e-commerce platforms. However, the task is non-trivial, since the adversarial action taken by fraudsters. Existing fraud detection systems used in the e-commerce industry easily suffer from performance decay and can not adapt to the upgrade of fraud patterns, as they take already known fraud behaviors as supervision information to detect other suspicious behaviors. In this article, we propose a competitive graph neural networks (CGNN)-based fraud detection system (eFraudCom) to detect fraud behaviors at one of the largest e-commerce platforms, “Taobao”1. In the eFraudCom system, (1) the competitive graph neural networks (CGNN) as the core part of eFraudCom can classify behaviors of users directly by modeling the distributions of normal and fraud behaviors separately; (2) some normal behaviors will be utilized as weak supervision information to guide the CGNN to build the profile for normal behaviors that are more stable than fraud behaviors. The algorithm dependency on fraud behaviors will be eliminated, which enables eFraudCom to detect fraud behaviors in presence of the new fraud patterns; (3) the mutual information regularization term can maximize the separability between normal and fraud behaviors to further improve CGNN. eFraudCom is implemented into a prototype system and the performance of the system is evaluated by extensive experiments. The experiments on two Taobao and two public datasets demonstrate that the proposed deep framework CGNN is superior to other baselines in detecting fraud behaviors. A case study on Taobao datasets verifies that CGNN is still robust when the fraud patterns have been upgraded.

随着电子商务的发展，欺诈行为已经成为电子商务企业面临的最大威胁之一。欺诈行为严重破坏了电子商务平台的排名体系，影响了用户的购物体验。对电子商务平台的欺诈行为进行检测具有重要的实用价值。然而，由于欺诈者采取的对抗行动，这项任务并非微不足道。电子商务行业现有的欺诈检测系统以已知的欺诈行为作为监管信息，检测其他可疑行为，容易出现性能衰减，不能适应欺诈模式的升级。在本文中，我们提出了一个基于竞争图神经网络(CGNN)的欺诈检测系统(eFraudCom)来检测最大的电子商务平台之一“淘宝”的欺诈行为。在eFraudCom系统中，(1)竞争图神经网络(CGNN)作为eFraudCom的核心部分，通过对正常行为和欺诈行为的分布分别建模，可以直接对用户的行为进行分类;(2)将一些正常行为作为弱监督信息，引导CGNN构建比欺诈行为更稳定的正常行为profile。消除了算法对欺诈行为的依赖，使eFraudCom能够在存在新的欺诈模式的情况下检测欺诈行为;(3)互信息正则化项可以最大限度地提高正常行为与欺诈行为之间的可分离性，进一步改进CGNN。在原型系统中实现了eFraudCom，并通过大量的实验对系统的性能进行了评价。在两个淘宝数据集和两个公共数据集上的实验表明，所提出的深度框架CGNN在检测欺诈行为方面优于其他基线。以淘宝数据集为例，验证了CGNN在欺诈模式升级后仍然具有鲁棒性。

{"title":"eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks","authors":"Ge Zhang, Zhao Li, Jiaming Huang, Jia Wu, Chuan Zhou, Jian Yang, Jianliang Gao","doi":"10.1145/3474379","DOIUrl":"https://doi.org/10.1145/3474379","url":null,"abstract":"With the development of e-commerce, fraud behaviors have been becoming one of the biggest threats to the e-commerce business. Fraud behaviors seriously damage the ranking system of e-commerce platforms and adversely influence the shopping experience of users. It is of great practical value to detect fraud behaviors on e-commerce platforms. However, the task is non-trivial, since the adversarial action taken by fraudsters. Existing fraud detection systems used in the e-commerce industry easily suffer from performance decay and can not adapt to the upgrade of fraud patterns, as they take already known fraud behaviors as supervision information to detect other suspicious behaviors. In this article, we propose a competitive graph neural networks (CGNN)-based fraud detection system (eFraudCom) to detect fraud behaviors at one of the largest e-commerce platforms, “Taobao”1. In the eFraudCom system, (1) the competitive graph neural networks (CGNN) as the core part of eFraudCom can classify behaviors of users directly by modeling the distributions of normal and fraud behaviors separately; (2) some normal behaviors will be utilized as weak supervision information to guide the CGNN to build the profile for normal behaviors that are more stable than fraud behaviors. The algorithm dependency on fraud behaviors will be eliminated, which enables eFraudCom to detect fraud behaviors in presence of the new fraud patterns; (3) the mutual information regularization term can maximize the separability between normal and fraud behaviors to further improve CGNN. eFraudCom is implemented into a prototype system and the performance of the system is evaluated by extensive experiments. The experiments on two Taobao and two public datasets demonstrate that the proposed deep framework CGNN is superior to other baselines in detecting fraud behaviors. A case study on Taobao datasets verifies that CGNN is still robust when the fraud patterns have been upgraded.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"40 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84905164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Leveraging Narrative to Generate Movie Script 利用叙事来生成电影脚本

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-02-01 DOI: 10.1145/3507356

Yutao Zhu, Ruihua Song, J. Nie, Pan Du, Zhicheng Dou, Jin Zhou

Generating a text based on a predefined guideline is an interesting but challenging problem. A series of studies have been carried out in recent years. In dialogue systems, researchers have explored driving a dialogue based on a plan, while in story generation, a storyline has also been proved to be useful. In this article, we address a new task—generating movie scripts based on a predefined narrative. As an early exploration, we study this problem in a “retrieval-based” setting. We propose a model (ScriptWriter-CPre) to select the best response (i.e., next script line) among the candidates that fit the context (i.e., previous script lines) as well as the given narrative. Our model can keep track of what in the narrative has been said and what is to be said. Besides, it can also predict which part of the narrative should be paid more attention to when selecting the next line of script. In our study, we find the narrative plays a different role than the context. Therefore, different mechanisms are designed for deal with them. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. This new dataset is made available publicly to facilitate other studies in text generation under the guideline. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.

根据预定义的指南生成文本是一个有趣但具有挑战性的问题。近年来进行了一系列的研究。在对话系统中，研究人员探索了基于计划驱动对话，而在故事生成中，故事情节也被证明是有用的。在本文中，我们将讨论一个基于预定义的叙述生成电影脚本的新任务。作为早期的探索，我们在“基于检索”的设置中研究这个问题。我们提出了一个模型(scriptwritter - cpre)来从符合上下文(即之前的脚本行)以及给定叙述的候选中选择最佳响应(即下一个脚本行)。我们的模型可以跟踪叙述中已经说过的和将要说的内容。此外，它还可以预测在选择下一行剧本时，应该更关注哪一部分的叙事。在我们的研究中，我们发现叙事与语境起着不同的作用。因此，设计了不同的机制来处理它们。由于这个新应用程序的数据不可用，我们从一个电影网站构建了一个新的大规模数据集GraphMovie，最终用户可以在观看电影时自由上传他们的叙述。这个新的数据集是公开的，以促进在指南下的文本生成的其他研究。在数据集上的实验结果表明，我们提出的基于叙事的方法明显优于简单地将叙事作为一种上下文的基线。

{"title":"Leveraging Narrative to Generate Movie Script","authors":"Yutao Zhu, Ruihua Song, J. Nie, Pan Du, Zhicheng Dou, Jin Zhou","doi":"10.1145/3507356","DOIUrl":"https://doi.org/10.1145/3507356","url":null,"abstract":"Generating a text based on a predefined guideline is an interesting but challenging problem. A series of studies have been carried out in recent years. In dialogue systems, researchers have explored driving a dialogue based on a plan, while in story generation, a storyline has also been proved to be useful. In this article, we address a new task—generating movie scripts based on a predefined narrative. As an early exploration, we study this problem in a “retrieval-based” setting. We propose a model (ScriptWriter-CPre) to select the best response (i.e., next script line) among the candidates that fit the context (i.e., previous script lines) as well as the given narrative. Our model can keep track of what in the narrative has been said and what is to be said. Besides, it can also predict which part of the narrative should be paid more attention to when selecting the next line of script. In our study, we find the narrative plays a different role than the context. Therefore, different mechanisms are designed for deal with them. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. This new dataset is made available publicly to facilitate other studies in text generation under the guideline. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"41 1","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86275280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation 上下文信息对兴趣点推荐影响的系统分析

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-01-20 DOI: 10.1145/3508478

Hossein A. Rahmani, Mohammad Aliannejadi, Mitra Baratchi, F. Crestani

As the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user’s main activity location and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this article, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this article are as follows: (i) providing an extensive survey of context-aware location recommendation; (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, which can incorporate all the major contextual information into a single recommendation model; and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.

随着基于位置的社交网络的日益普及，设计准确的兴趣点推荐模型受到越来越多的关注。POI推荐通常通过将上下文信息合并到先前设计的推荐算法中来执行。在POI推荐中考虑的一些主要上下文信息是位置属性(即位置的精确坐标、类别和签到时间)、用户属性(即对位置的评论、评论、提示和签到)以及其他信息，例如POI与用户主要活动位置的距离以及用户之间的社会关系。正确选择这些因素可以显著影响POI推荐的性能。然而，以往的研究并没有考虑这些不同因素组合的影响。在本文中，我们提出了不同的上下文模型，并分析了不同主要上下文信息在POI推荐中的融合。本文的主要贡献如下:(i)提供了上下文感知位置推荐的广泛调查;(ii)在现有基线和两种新的线性和非线性模型上量化和分析POI推荐中不同背景信息(如社会、时间、空间和类别)的影响，这两种模型可以将所有主要背景信息合并到一个推荐模型中;(iii)使用两个众所周知的真实世界数据集评估所考虑的模型。我们的研究结果表明，虽然建模地理和时间影响可以提高推荐质量，但将所有其他上下文信息融合到推荐模型中并不总是最好的策略。

{"title":"A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation","authors":"Hossein A. Rahmani, Mohammad Aliannejadi, Mitra Baratchi, F. Crestani","doi":"10.1145/3508478","DOIUrl":"https://doi.org/10.1145/3508478","url":null,"abstract":"As the popularity of Location-based Social Networks increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user’s main activity location and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this article, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this article are as follows: (i) providing an extensive survey of context-aware location recommendation; (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, which can incorporate all the major contextual information into a single recommendation model; and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"28 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88779310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? 网络搜索评估的相关性评估:我们应该随机化还是优先化汇集的文档?

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-01-12 DOI: 10.1145/3494833

T. Sakai, Sijie Tao, Zhaohao Zeng

In the context of depth-k pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: The prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 dataset, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency. Our main findings are: (a) The presentation order has no substantial impact on assessment efficiency; (b) While the presentation order substantially affects which documents are judged (highly) relevant, the difference between the inter-assessor agreement under the PRI condition and that under the RND condition is of no practical significance; (c) Different system rankings under the PRI condition are substantially more similar to one another than those under the RND condition; and (d) PRI-based relevance assessment files (qrels) are substantially and statistically significantly more robust to new systems than RND-based ones. Finding (d) suggests that PRI helps the assessors identify relevant documents that affect the evaluation of many existing systems, including those that did not contribute to the pools. Hence, if researchers need to evaluate their current IR systems using legacy IR test collections, we recommend the use of those constructed using the PRI approach unless they have a good reason to believe that their systems retrieve relevant documents that are vastly different from the pooled documents. While this robustness of PRI may also mean that the PRI-based pools are biased against future systems that retrieve highly novel relevant documents, one should note that there is no evidence that RND is any better in this respect.

在构建web搜索测试集合的深度k池的背景下，我们比较了两种为相关性评估者排序池文档的方法:优先级策略(PRI)和简单随机化策略(RND)。为了解决关于PRI和RND的研究问题，我们构建并发布了WWW3E8数据集，该数据集包含8个独立的相关标签，涉及32,375个主题-文档对，即总共259,000个标签。八个相关标签中的四个是从基于pri的池中获得的;其他四个是从基于rnd的池中获得的。使用WWW3E8，我们比较了PRI和RND在评估者间一致性、系统排名一致性和对不参与池的新系统的鲁棒性方面的差异。我们还利用作为WWW3E8的副产品获得的评估者活动日志来比较评估效率方面的两种策略。我们的主要发现是:(a)列报顺序对评估效率没有实质性影响;(b)虽然列报顺序对判定哪些文件(高度)相关有重大影响，但优先次序条件下的分摊员间协议与重新编制条件下的协议之间的差异没有实际意义;(c) PRI条件下的不同系统排名比RND条件下的系统排名彼此之间的相似性要大得多;(d)基于pri的相关性评估文件(qrel)对新系统的鲁棒性在实质上和统计上显著高于基于rnd的评估文件。发现(d)表明，PRI有助于评估人员识别影响许多现有系统评估的相关文件，包括那些没有对资源池做出贡献的系统。因此，如果研究人员需要使用遗留的IR测试集合来评估他们当前的IR系统，我们建议使用那些使用PRI方法构建的系统，除非他们有充分的理由相信他们的系统检索的相关文档与汇集的文档有很大的不同。虽然PRI的这种鲁棒性也可能意味着基于PRI的池对检索高度新颖的相关文档的未来系统有偏见，但应该注意的是，没有证据表明RND在这方面更好。

{"title":"Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?","authors":"T. Sakai, Sijie Tao, Zhaohao Zeng","doi":"10.1145/3494833","DOIUrl":"https://doi.org/10.1145/3494833","url":null,"abstract":"In the context of depth-k pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: The prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 dataset, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency. Our main findings are: (a) The presentation order has no substantial impact on assessment efficiency; (b) While the presentation order substantially affects which documents are judged (highly) relevant, the difference between the inter-assessor agreement under the PRI condition and that under the RND condition is of no practical significance; (c) Different system rankings under the PRI condition are substantially more similar to one another than those under the RND condition; and (d) PRI-based relevance assessment files (qrels) are substantially and statistically significantly more robust to new systems than RND-based ones. Finding (d) suggests that PRI helps the assessors identify relevant documents that affect the evaluation of many existing systems, including those that did not contribute to the pools. Hence, if researchers need to evaluate their current IR systems using legacy IR test collections, we recommend the use of those constructed using the PRI approach unless they have a good reason to believe that their systems retrieve relevant documents that are vastly different from the pooled documents. While this robustness of PRI may also mean that the PRI-based pools are biased against future systems that retrieve highly novel relevant documents, one should note that there is no evidence that RND is any better in this respect.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"90 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88199515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Dynamic Graph Reasoning for Conversational Open-Domain Question Answering 会话式开放域问答的动态图推理

ACM Transactions on Information Systems (TOIS)

Pub Date : 2022-01-12 DOI: 10.1145/3498557

Yongqi Li, Wenjie Li, Liqiang Nie

In recent years, conversational agents have provided a natural and convenient access to useful information in people’s daily life, along with a broad and new research topic, conversational question answering (QA). On the shoulders of conversational QA, we study the conversational open-domain QA problem, where users’ information needs are presented in a conversation and exact answers are required to extract from the Web. Despite its significance and value, building an effective conversational open-domain QA system is non-trivial due to the following challenges: (1) precisely understand conversational questions based on the conversation context; (2) extract exact answers by capturing the answer dependency and transition flow in a conversation; and (3) deeply integrate question understanding and answer extraction. To address the aforementioned issues, we propose an end-to-end Dynamic Graph Reasoning approach to Conversational open-domain QA (DGRCoQA for short). DGRCoQA comprises three components, i.e., a dynamic question interpreter (DQI), a graph reasoning enhanced retriever (GRR), and a typical Reader, where the first one is developed to understand and formulate conversational questions while the other two are responsible to extract an exact answer from the Web. In particular, DQI understands conversational questions by utilizing the QA context, sourcing from predicted answers returned by the Reader, to dynamically attend to the most relevant information in the conversation context. Afterwards, GRR attempts to capture the answer flow and select the most possible passage that contains the answer by reasoning answer paths over a dynamically constructed context graph. Finally, the Reader, a reading comprehension model, predicts a text span from the selected passage as the answer. DGRCoQA demonstrates its strength in the extensive experiments conducted on a benchmark dataset. It significantly outperforms the existing methods and achieves the state-of-the-art performance.

近年来，会话智能体为人们日常生活中获取有用信息提供了一种自然而便捷的途径，同时也带来了一个广泛而新颖的研究课题——会话问答。在会话式质量保证的基础上，我们研究了会话式开放域质量保证问题，即用户的信息需求以会话形式呈现，需要从Web中提取准确的答案。尽管具有重要的意义和价值，但由于以下挑战，构建一个有效的会话开放域QA系统并非易事:(1)基于会话上下文精确理解会话问题;(2)通过捕获会话中的答案依赖和转换流程提取准确的答案;(3)将问题理解与答案提取深度融合。为了解决上述问题，我们提出了一种会话开放域QA(简称DGRCoQA)的端到端动态图推理方法。DGRCoQA由三个组件组成，即一个动态问题解释器(DQI)、一个图形推理增强检索器(GRR)和一个典型的Reader，其中第一个用于理解和制定会话问题，而另外两个负责从Web提取准确的答案。特别是，DQI通过利用QA上下文(从Reader返回的预测答案中获取)来理解会话问题，从而动态地关注会话上下文中最相关的信息。然后，GRR尝试捕获答案流，并通过在动态构建的上下文图上推理答案路径来选择包含答案的最可能的通道。最后，Reader，一个阅读理解模型，从选定的段落中预测一个文本跨度作为答案。在对基准数据集进行的大量实验中，DGRCoQA证明了它的优势。它明显优于现有的方法，达到了最先进的性能。

{"title":"Dynamic Graph Reasoning for Conversational Open-Domain Question Answering","authors":"Yongqi Li, Wenjie Li, Liqiang Nie","doi":"10.1145/3498557","DOIUrl":"https://doi.org/10.1145/3498557","url":null,"abstract":"In recent years, conversational agents have provided a natural and convenient access to useful information in people’s daily life, along with a broad and new research topic, conversational question answering (QA). On the shoulders of conversational QA, we study the conversational open-domain QA problem, where users’ information needs are presented in a conversation and exact answers are required to extract from the Web. Despite its significance and value, building an effective conversational open-domain QA system is non-trivial due to the following challenges: (1) precisely understand conversational questions based on the conversation context; (2) extract exact answers by capturing the answer dependency and transition flow in a conversation; and (3) deeply integrate question understanding and answer extraction. To address the aforementioned issues, we propose an end-to-end Dynamic Graph Reasoning approach to Conversational open-domain QA (DGRCoQA for short). DGRCoQA comprises three components, i.e., a dynamic question interpreter (DQI), a graph reasoning enhanced retriever (GRR), and a typical Reader, where the first one is developed to understand and formulate conversational questions while the other two are responsible to extract an exact answer from the Web. In particular, DQI understands conversational questions by utilizing the QA context, sourcing from predicted answers returned by the Reader, to dynamically attend to the most relevant information in the conversation context. Afterwards, GRR attempts to capture the answer flow and select the most possible passage that contains the answer by reasoning answer paths over a dynamically constructed context graph. Finally, the Reader, a reading comprehension model, predicts a text span from the selected passage as the answer. DGRCoQA demonstrates its strength in the extensive experiments conducted on a benchmark dataset. It significantly outperforms the existing methods and achieves the state-of-the-art performance.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"13 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81882102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17