ACM Transactions on Information Systems (TOIS)最新文献_第4页

Fast Filtering of Search Results Sorted by Attribute 快速过滤按属性排序的搜索结果

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-24 DOI: 10.1145/3477982

F. M. Nardini, Roberto Trani, Rossano Venturini, F. M. Nardini, Roberto Trani

Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.

现代搜索服务通常提供多种选项来对搜索结果进行排序，例如，在电子商务中，“按相关性”、“按价格”或“按折扣”排序。虽然传统的相关性排名有效地将相关结果放在结果列表的顶部位置，但按属性排名可能会将许多不太相关的结果放在结果列表的顶部，从而导致糟糕的用户体验。在过去，这个问题已经通过研究相关性感知过滤问题来解决，该问题要求选择使属性排序列表的相关性最大化的结果子集。最近，人们提出了一种精确的算法来最优地解决这一问题。然而，该算法的高计算成本使其不适合具有巨大结果列表和严格时间限制的Web搜索场景。由于这个原因，通常使用高效但不准确的启发式算法来解决问题。在本文中，我们首先证明了现有启发式算法的性能界限。然后，我们提出了两种高效的算法来解决关联感知过滤问题。首先，我们提出了一种新的精确算法OPT-Filtering，它比现有的最优算法更快。其次，我们提出了一种近似且更有效的算法ϵ-Filtering，该算法在给定允许的近似误差λ的情况下，找到(1- λ)最优滤波，即其解的相关性至少是(1- λ)最优滤波的倍。我们在两个真实世界的公共数据集上对两种提议的算法进行了全面的评估，以对抗最先进的竞争对手。实验结果表明，OPT-Filtering相对于现有的最优解获得了高达两个数量级的显著加速，而ϵ-Filtering通过以效率为代价的有效性进一步改善了这一结果。特别是，实验表明ϵ-Filtering可以在大多数测试配置中获得准最优解，同时比所有最先进的竞争对手更快。

{"title":"Fast Filtering of Search Results Sorted by Attribute","authors":"F. M. Nardini, Roberto Trani, Rossano Venturini, F. M. Nardini, Roberto Trani","doi":"10.1145/3477982","DOIUrl":"https://doi.org/10.1145/3477982","url":null,"abstract":"Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"26 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75900500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized, Sequential, Attentive, Metric-Aware Product Search 个性化的，顺序的，细心的，有度量意识的产品搜索

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-24 DOI: 10.1145/3473337

Yaoxin Pan, Shangsong Liang, Jiaxin Ren, Zaiqiao Meng, Qiang Zhang

The task of personalized product search aims at retrieving a ranked list of products given a user’s input query and his/her purchase history. To address this task, we propose the PSAM model, a Personalized, Sequential, Attentive and Metric-aware (PSAM) model, that learns the semantic representations of three different categories of entities, i.e., users, queries, and products, based on user sequential purchase historical data and the corresponding sequential queries. Specifically, a query-based attentive LSTM (QA-LSTM) model and an attention mechanism are designed to infer users dynamic embeddings, which is able to capture their short-term and long-term preferences. To obtain more fine-grained embeddings of the three categories of entities, a metric-aware objective is deployed in our model to force the inferred embeddings subject to the triangle inequality, which is a more realistic distance measurement for product search. Experiments conducted on four benchmark datasets show that our PSAM model significantly outperforms the state-of-the-art product search baselines in terms of effectiveness by up to 50.9% improvement under NDCG@20. Our visualization experiments further illustrate that the learned product embeddings are able to distinguish different types of products.

个性化产品搜索任务的目标是根据用户的输入查询和他/她的购买历史记录检索产品的排名列表。为了解决这个问题，我们提出了PSAM模型，一个个性化、顺序、注意和度量感知(PSAM)模型，它根据用户顺序购买历史数据和相应的顺序查询学习三种不同类别实体的语义表示，即用户、查询和产品。具体而言，设计了基于查询的关注LSTM (QA-LSTM)模型和注意机制来推断用户的动态嵌入，从而能够捕获用户的短期和长期偏好。为了获得这三类实体的更细粒度的嵌入，我们的模型中部署了一个度量感知目标来强制推断的嵌入服从三角形不等式，这是一个更现实的产品搜索距离测量。在四个基准数据集上进行的实验表明，我们的PSAM模型在NDCG@20下的有效性方面显著优于最先进的产品搜索基线，提高了50.9%。我们的可视化实验进一步说明了学习到的产品嵌入能够区分不同类型的产品。

{"title":"Personalized, Sequential, Attentive, Metric-Aware Product Search","authors":"Yaoxin Pan, Shangsong Liang, Jiaxin Ren, Zaiqiao Meng, Qiang Zhang","doi":"10.1145/3473337","DOIUrl":"https://doi.org/10.1145/3473337","url":null,"abstract":"The task of personalized product search aims at retrieving a ranked list of products given a user’s input query and his/her purchase history. To address this task, we propose the PSAM model, a Personalized, Sequential, Attentive and Metric-aware (PSAM) model, that learns the semantic representations of three different categories of entities, i.e., users, queries, and products, based on user sequential purchase historical data and the corresponding sequential queries. Specifically, a query-based attentive LSTM (QA-LSTM) model and an attention mechanism are designed to infer users dynamic embeddings, which is able to capture their short-term and long-term preferences. To obtain more fine-grained embeddings of the three categories of entities, a metric-aware objective is deployed in our model to force the inferred embeddings subject to the triangle inequality, which is a more realistic distance measurement for product search. Experiments conducted on four benchmark datasets show that our PSAM model significantly outperforms the state-of-the-art product search baselines in terms of effectiveness by up to 50.9% improvement under NDCG@20. Our visualization experiments further illustrate that the learned product embeddings are able to distinguish different types of products.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"2 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80691027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Profiling Users for Question Answering Communities via Flow-Based Constrained Co-Embedding Model 基于流约束协同嵌入模型的问答社区用户分析

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-24 DOI: 10.1145/3470565

Shangsong Liang, Yupeng Luo, Zaiqiao Meng

In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.

在本文中，我们研究了问答社区(QACs)中的用户分析任务。以前的用户分析算法存在一些缺陷:它们将用户和单词视为原子单位，导致它们之间的不匹配;它们是为其他应用而设计的，但不适用于qac;而一些语义分析算法没有将用户和单词共同嵌入，导致用户和单词之间的亲和力测量变得困难。为了提高分析性能，我们提出了一种基于神经流的约束共嵌入模型(简称FCCM)。FCCM将用户和单词的向量表示共同嵌入到qac中，从而可以在语义上测量它们之间的亲和力。具体来说，FCCM扩展了标准的变分自编码器模型，以强制执行受投票约束的用户和单词的推断嵌入，即给定一个问题和在社区中回答这个问题的用户，与其获得较少投票的用户的表示相比，其回答获得更多投票的用户的表示更接近与这些答案相关的单词的表示。此外，FCCM将归一化流程集成到变分自编码器框架中，避免了嵌入的分布是高斯分布的假设，使得推断的嵌入更符合数据的真实分布。在中文知乎问答数据集上的实验结果证明了我们提出的FCCM模型在问答用户分析任务中的有效性。

{"title":"Profiling Users for Question Answering Communities via Flow-Based Constrained Co-Embedding Model","authors":"Shangsong Liang, Yupeng Luo, Zaiqiao Meng","doi":"10.1145/3470565","DOIUrl":"https://doi.org/10.1145/3470565","url":null,"abstract":"In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"447 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91001067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Interpretable Aspect-Aware Capsule Network for Peer Review Based Citation Count Prediction 基于同行评议的引文计数预测的可解释方面感知胶囊网络

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-24 DOI: 10.1145/3466640

Siqing Li, Yaliang Li, Wayne Xin Zhao, Bolin Ding, Ji-rong Wen

Citation count prediction is an important task for estimating the future impact of research papers. Most of the existing works utilize the information extracted from the paper itself. In this article, we focus on how to utilize another kind of useful data signal (i.e., peer review text) to improve both the performance and interpretability of the prediction models. Specially, we propose a novel aspect-aware capsule network for citation count prediction based on review text. It contains two major capsule layers, namely the feature capsule layer and the aspect capsule layer, with two different routing approaches, respectively. Feature capsules encode the local semantics from review sentences as the input of aspect capsule layer, whereas aspect capsules aim to capture high-level semantic features that will be served as final representations for prediction. Besides the predictive capacity, we also enhance the model interpretability with two strategies. First, we use the topic distribution of the review text to guide the learning of aspect capsules so that each aspect capsule can represent a specific aspect in the review. Then, we use the learned aspect capsules to generate readable text for explaining the predicted citation count. Extensive experiments on two real-world datasets have demonstrated the effectiveness of the proposed model in both performance and interpretability.

引文数预测是评估研究论文未来影响力的重要任务。现有的大部分作品都是利用从论文本身提取的信息。在本文中，我们关注如何利用另一种有用的数据信号(即同行评审文本)来提高预测模型的性能和可解释性。特别地，我们提出了一种新的基于综述文本的引文数量预测的方面感知胶囊网络。它包含两个主要的胶囊层，即特征胶囊层和方面胶囊层，分别采用两种不同的路由方式。特征胶囊对评审句子的局部语义进行编码，作为方面胶囊层的输入，而方面胶囊的目的是捕获高级语义特征，作为预测的最终表示。除了提高预测能力外，我们还采用了两种策略来增强模型的可解释性。首先，我们利用复习文本的主题分布来指导方面胶囊的学习，使每个方面胶囊在复习中代表一个特定的方面。然后，我们使用学习到的方面胶囊生成可读文本来解释预测的引用计数。在两个真实数据集上进行的大量实验证明了所提出模型在性能和可解释性方面的有效性。

{"title":"Interpretable Aspect-Aware Capsule Network for Peer Review Based Citation Count Prediction","authors":"Siqing Li, Yaliang Li, Wayne Xin Zhao, Bolin Ding, Ji-rong Wen","doi":"10.1145/3466640","DOIUrl":"https://doi.org/10.1145/3466640","url":null,"abstract":"Citation count prediction is an important task for estimating the future impact of research papers. Most of the existing works utilize the information extracted from the paper itself. In this article, we focus on how to utilize another kind of useful data signal (i.e., peer review text) to improve both the performance and interpretability of the prediction models. Specially, we propose a novel aspect-aware capsule network for citation count prediction based on review text. It contains two major capsule layers, namely the feature capsule layer and the aspect capsule layer, with two different routing approaches, respectively. Feature capsules encode the local semantics from review sentences as the input of aspect capsule layer, whereas aspect capsules aim to capture high-level semantic features that will be served as final representations for prediction. Besides the predictive capacity, we also enhance the model interpretability with two strategies. First, we use the topic distribution of the review text to guide the learning of aspect capsules so that each aspect capsule can represent a specific aspect in the review. Then, we use the learned aspect capsules to generate readable text for explaining the predicted citation count. Extensive experiments on two real-world datasets have demonstrated the effectiveness of the proposed model in both performance and interpretability.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"51 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80401688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Social Context-aware Person Search in Videos via Multi-modal Cues 基于多模态线索的视频社交情境感知人物搜索

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3480967

Dan Li, Tong Xu, Peilun Zhou, Weidong He, Y. Hao, Yi Zheng, Enhong Chen

Person search has long been treated as a crucial and challenging task to support deeper insight in personalized summarization and personality discovery. Traditional methods, e.g., person re-identification and face recognition techniques, which profile video characters based on visual information, are often limited by relatively fixed poses or small variation of viewpoints and suffer from more realistic scenes with high motion complexity (e.g., movies). At the same time, long videos such as movies often have logical story lines and are composed of continuously developmental plots. In this situation, different persons usually meet on a specific occasion, in which informative social cues are performed. We notice that these social cues could semantically profile their personality and benefit person search task in two aspects. First, persons with certain relationships usually co-occur in short intervals; in case one of them is easier to be identified, the social relation cues extracted from their co-occurrences could further benefit the identification for the harder ones. Second, social relations could reveal the association between certain scenes and characters (e.g., classmate relationship may only exist among students), which could narrow down candidates into certain persons with a specific relationship. In this way, high-level social relation cues could improve the effectiveness of person search. Along this line, in this article, we propose a social context-aware framework, which fuses visual and social contexts to profile persons in more semantic perspectives and better deal with person search task in complex scenarios. Specifically, we first segment videos into several independent scene units and abstract out social contexts within these scene units. Then, we construct inner-personal links through a graph formulation operation for each scene unit, in which both visual cues and relation cues are considered. Finally, we perform a relation-aware label propagation to identify characters’ occurrences, combining low-level semantic cues (i.e., visual cues) and high-level semantic cues (i.e., relation cues) to further enhance the accuracy. Experiments on real-world datasets validate that our solution outperforms several competitive baselines.

长期以来，人物搜索一直被视为一项至关重要且具有挑战性的任务，以支持更深入的个性化总结和个性发现。传统的方法，如人物再识别和人脸识别技术，基于视觉信息对视频人物进行分析，往往受到相对固定的姿势或视点变化的限制，并且受到高运动复杂性的更真实场景(如电影)的影响。同时，像电影这样的长视频往往有逻辑的故事情节，由不断发展的情节组成。在这种情况下，不同的人通常在一个特定的场合见面，在这个场合中，信息丰富的社会线索被执行。我们注意到，这些社会线索可以从两个方面对他们的个性进行语义刻画，并有利于找人任务。首先，具有特定关系的人通常在短时间内同时出现;如果其中一个更容易被识别，从它们的共现中提取的社会关系线索可以进一步有利于识别更难的。其次，社会关系可以揭示某些场景和人物之间的联系(例如，同学关系可能只存在于学生之间)，这可以将候选人缩小到具有特定关系的某些人。在这种情况下，高层次的社会关系线索可以提高人的搜索效率。在此基础上，本文提出了一种社会语境感知框架，该框架融合了视觉和社会语境，从更多的语义角度描述人物，更好地处理复杂场景中的人物搜索任务。具体来说，我们首先将视频分割成几个独立的场景单元，并在这些场景单元中抽象出社会背景。然后，我们通过对每个场景单元进行图形化运算来构建内部个人联系，其中同时考虑了视觉线索和关系线索。最后，我们结合低级语义线索(即视觉线索)和高级语义线索(即关系线索)进行关系感知标签传播来识别字符的出现，进一步提高准确性。在真实世界数据集上的实验验证了我们的解决方案优于几个竞争基线。

{"title":"Social Context-aware Person Search in Videos via Multi-modal Cues","authors":"Dan Li, Tong Xu, Peilun Zhou, Weidong He, Y. Hao, Yi Zheng, Enhong Chen","doi":"10.1145/3480967","DOIUrl":"https://doi.org/10.1145/3480967","url":null,"abstract":"Person search has long been treated as a crucial and challenging task to support deeper insight in personalized summarization and personality discovery. Traditional methods, e.g., person re-identification and face recognition techniques, which profile video characters based on visual information, are often limited by relatively fixed poses or small variation of viewpoints and suffer from more realistic scenes with high motion complexity (e.g., movies). At the same time, long videos such as movies often have logical story lines and are composed of continuously developmental plots. In this situation, different persons usually meet on a specific occasion, in which informative social cues are performed. We notice that these social cues could semantically profile their personality and benefit person search task in two aspects. First, persons with certain relationships usually co-occur in short intervals; in case one of them is easier to be identified, the social relation cues extracted from their co-occurrences could further benefit the identification for the harder ones. Second, social relations could reveal the association between certain scenes and characters (e.g., classmate relationship may only exist among students), which could narrow down candidates into certain persons with a specific relationship. In this way, high-level social relation cues could improve the effectiveness of person search. Along this line, in this article, we propose a social context-aware framework, which fuses visual and social contexts to profile persons in more semantic perspectives and better deal with person search task in complex scenarios. Specifically, we first segment videos into several independent scene units and abstract out social contexts within these scene units. Then, we construct inner-personal links through a graph formulation operation for each scene unit, in which both visual cues and relation cues are considered. Finally, we perform a relation-aware label propagation to identify characters’ occurrences, combining low-level semantic cues (i.e., visual cues) and high-level semantic cues (i.e., relation cues) to further enhance the accuracy. Experiments on real-world datasets validate that our solution outperforms several competitive baselines.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"91 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75120961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Dynamic Structural Role Node Embedding for User Modeling in Evolving Networks 演化网络中用户建模的动态结构角色节点嵌入

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3472955

Lili Wang, Chenghan Huang, Ying Lu, Weicheng Ma, Ruibo Liu, Soroush Vosoughi

Complex user behavior, especially in settings such as social media, can be organized as time-evolving networks. Through network embedding, we can extract general-purpose vector representations of these dynamic networks which allow us to analyze them without extensive feature engineering. Prior work has shown how to generate network embeddings while preserving the structural role proximity of nodes. These methods, however, cannot capture the temporal evolution of the structural identity of the nodes in dynamic networks. Other works, on the other hand, have focused on learning microscopic dynamic embeddings. Though these methods can learn node representations over dynamic networks, these representations capture the local context of nodes and do not learn the structural roles of nodes. In this article, we propose a novel method for learning structural node embeddings in discrete-time dynamic networks. Our method, called HR2vec, tracks historical topology information in dynamic networks to learn dynamic structural role embeddings. Through experiments on synthetic and real-world temporal datasets, we show that our method outperforms other well-known methods in tasks where structural equivalence and historical information both play important roles. HR2vec can be used to model dynamic user behavior in any networked setting where users can be represented as nodes. Additionally, we propose a novel method (called network fingerprinting) that uses HR2vec embeddings for modeling whole (or partial) time-evolving networks. We showcase our network fingerprinting method on synthetic and real-world networks. Specifically, we demonstrate how our method can be used for detecting foreign-backed information operations on Twitter.

复杂的用户行为，特别是在社交媒体等环境中，可以组织为随时间演变的网络。通过网络嵌入，我们可以提取这些动态网络的通用向量表示，使我们能够在不进行大量特征工程的情况下对其进行分析。先前的工作已经展示了如何在保持节点的结构角色接近性的同时生成网络嵌入。然而，这些方法不能捕捉动态网络中节点结构同一性的时间演变。另一方面，其他的工作集中在学习微观动态嵌入。虽然这些方法可以学习动态网络上的节点表示，但这些表示捕获节点的本地上下文，而不能学习节点的结构角色。在本文中，我们提出了一种学习离散时间动态网络中结构节点嵌入的新方法。我们的方法，称为HR2vec，跟踪动态网络中的历史拓扑信息，以学习动态结构角色嵌入。通过对合成和真实时间数据集的实验，我们表明我们的方法在结构等效性和历史信息都起重要作用的任务中优于其他已知的方法。HR2vec可用于对任何网络设置中的动态用户行为建模，其中用户可以表示为节点。此外，我们提出了一种新的方法(称为网络指纹)，它使用HR2vec嵌入来建模整个(或部分)时间演化网络。我们在合成网络和真实网络上展示了我们的网络指纹识别方法。具体来说，我们演示了如何使用我们的方法来检测Twitter上外国支持的信息操作。

{"title":"Dynamic Structural Role Node Embedding for User Modeling in Evolving Networks","authors":"Lili Wang, Chenghan Huang, Ying Lu, Weicheng Ma, Ruibo Liu, Soroush Vosoughi","doi":"10.1145/3472955","DOIUrl":"https://doi.org/10.1145/3472955","url":null,"abstract":"Complex user behavior, especially in settings such as social media, can be organized as time-evolving networks. Through network embedding, we can extract general-purpose vector representations of these dynamic networks which allow us to analyze them without extensive feature engineering. Prior work has shown how to generate network embeddings while preserving the structural role proximity of nodes. These methods, however, cannot capture the temporal evolution of the structural identity of the nodes in dynamic networks. Other works, on the other hand, have focused on learning microscopic dynamic embeddings. Though these methods can learn node representations over dynamic networks, these representations capture the local context of nodes and do not learn the structural roles of nodes. In this article, we propose a novel method for learning structural node embeddings in discrete-time dynamic networks. Our method, called HR2vec, tracks historical topology information in dynamic networks to learn dynamic structural role embeddings. Through experiments on synthetic and real-world temporal datasets, we show that our method outperforms other well-known methods in tasks where structural equivalence and historical information both play important roles. HR2vec can be used to model dynamic user behavior in any networked setting where users can be represented as nodes. Additionally, we propose a novel method (called network fingerprinting) that uses HR2vec embeddings for modeling whole (or partial) time-evolving networks. We showcase our network fingerprinting method on synthetic and real-world networks. Specifically, we demonstrate how our method can be used for detecting foreign-backed information operations on Twitter.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"1 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78542822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Embedding Hierarchical Structures for Venue Category Representation 嵌入层次结构的场馆类别表示

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3478285

Meng Chen, Lei Zhu, Ronghui Xu, Yang Liu, Xiaohui Yu, Yilong Yin

Venue categories used in location-based social networks often exhibit a hierarchical structure, together with the category sequences derived from users’ check-ins. The two data modalities provide a wealth of information for us to capture the semantic relationships between those categories. To understand the venue semantics, existing methods usually embed venue categories into low-dimensional spaces by modeling the linear context (i.e., the positional neighbors of the given category) in check-in sequences. However, the hierarchical structure of venue categories, which inherently encodes the relationships between categories, is largely untapped. In this article, we propose a venue Category Embedding Model named Hier-CEM, which generates a latent representation for each venue category by embedding the Hierarchical structure of categories and utilizing multiple types of context. Specifically, we investigate two kinds of hierarchical context based on any given venue category hierarchy and show how to model them together with the linear context collaboratively. We apply Hier-CEM to three tasks on two real check-in datasets collected from Foursquare. Experimental results show that Hier-CEM is better at capturing both semantic and sequential information inherent in venues than state-of-the-art embedding methods.

基于位置的社交网络中使用的地点类别通常呈现层次结构，以及来自用户签到的类别序列。这两种数据模式为我们捕获这些类别之间的语义关系提供了丰富的信息。为了理解场馆语义，现有的方法通常通过在签到序列中建模线性上下文(即给定类别的位置邻居)将场馆类别嵌入到低维空间中。然而，场馆类别的层次结构，其内在编码类别之间的关系，在很大程度上是未开发的。在本文中，我们提出了一个名为Hier-CEM的场馆类别嵌入模型，该模型通过嵌入类别的层次结构和利用多种类型的上下文来生成每个场馆类别的潜在表示。具体而言，我们研究了两种基于任何给定场地类别层次结构的分层上下文，并展示了如何将它们与线性上下文协同建模。我们将her - cem应用于从Foursquare收集的两个真实签到数据集上的三个任务。实验结果表明，与最先进的嵌入方法相比，Hier-CEM在捕获场地固有的语义和顺序信息方面做得更好。

{"title":"Embedding Hierarchical Structures for Venue Category Representation","authors":"Meng Chen, Lei Zhu, Ronghui Xu, Yang Liu, Xiaohui Yu, Yilong Yin","doi":"10.1145/3478285","DOIUrl":"https://doi.org/10.1145/3478285","url":null,"abstract":"Venue categories used in location-based social networks often exhibit a hierarchical structure, together with the category sequences derived from users’ check-ins. The two data modalities provide a wealth of information for us to capture the semantic relationships between those categories. To understand the venue semantics, existing methods usually embed venue categories into low-dimensional spaces by modeling the linear context (i.e., the positional neighbors of the given category) in check-in sequences. However, the hierarchical structure of venue categories, which inherently encodes the relationships between categories, is largely untapped. In this article, we propose a venue Category Embedding Model named Hier-CEM, which generates a latent representation for each venue category by embedding the Hierarchical structure of categories and utilizing multiple types of context. Specifically, we investigate two kinds of hierarchical context based on any given venue category hierarchy and show how to model them together with the linear context collaboratively. We apply Hier-CEM to three tasks on two real check-in datasets collected from Foursquare. Experimental results show that Hier-CEM is better at capturing both semantic and sequential information inherent in venues than state-of-the-art embedding methods.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"1 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90916116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Component-based Analysis of Dynamic Search Performance 基于组件的动态搜索性能分析

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3483237

Ameer Albahem, Damiano Spina, Falk Scholer, L. Cavedon

In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.

在许多搜索场景中，例如探索性、比较性或面向调查的搜索，用户与动态搜索系统交互以满足多方面的信息需求。这些系统利用不同的动态方法，利用不同的用户反馈粒度类型。尽管研究已经对这些系统的许多组成部分的作用提供了见解，但他们使用的是黑盒和孤立的实验设置。因此，这些成分的作用或它们之间的相互作用仍然没有得到很好的理解。我们通过遵循基于方差分析(ANOVA)的方法来解决这个问题。我们构建了一个由基于不同方式实例化三个组件的系统组成的点网格:初始排名、动态重新排名和用户反馈粒度。使用基于TREC动态域集合的评价分数，我们建立了几个方差分析模型来估计效果。我们发现(i)尽管所有成分都显著影响搜索有效性，但初始排名具有最大的有效大小，(ii)这些成分的效应大小根据搜索会话的长度和使用的有效性度量而变化，以及(iii)初始排名和动态重新排名的影响比用户反馈粒度更突出。为了提高有效性，我们建议提高初始排名和动态重新排名的质量。这不需要获取详细的用户反馈，这可能是昂贵的或侵入性的。

{"title":"Component-based Analysis of Dynamic Search Performance","authors":"Ameer Albahem, Damiano Spina, Falk Scholer, L. Cavedon","doi":"10.1145/3483237","DOIUrl":"https://doi.org/10.1145/3483237","url":null,"abstract":"In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"283 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83105366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Clarifying Ambiguous Keywords with Personal Word Embeddings for Personalized Search 个性化搜索用个人词嵌入澄清歧义关键词

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3470564

Jing Yao, Zhicheng Dou, Ji-rong Wen

Personalized search tailors document ranking lists for each individual user based on her interests and query intent to better satisfy the user’s information need. Many personalized search models have been proposed. They first build a user interest profile from the user’s search history, and then re-rank the documents based on the personalized matching scores between the created profile and candidate documents. In this article, we attempt to solve the personalized search problem from an alternative perspective of clarifying the user’s intention of the current query. We know that there are many ambiguous words in natural language such as “Apple.” People with different knowledge backgrounds and interests have personalized understandings of these words. Therefore, we propose a personalized search model with personal word embeddings for each individual user that mainly contain the word meanings that the user already knows and can reflect the user interests. To learn great personal word embeddings, we design a pre-training model that captures both the textual information of the query log and the information about user interests contained in the click-through data represented as a graph structure. With personal word embeddings, we obtain the personalized word and context-aware representations of the query and documents. Furthermore, we also employ the current session as the short-term search context to dynamically disambiguate the current query. Finally, we use a matching model to calculate the matching score between the personalized query and document representations for ranking. Experimental results on two large-scale query logs show that our designed model significantly outperforms state-of-the-art personalization models.

个性化搜索根据用户的兴趣和查询意图为每个用户量身定制文档排名列表，以更好地满足用户的信息需求。人们提出了许多个性化搜索模型。他们首先根据用户的搜索历史建立用户兴趣档案，然后根据创建的档案和候选文档之间的个性化匹配分数对文档重新排序。在本文中，我们试图从另一个角度来解决个性化搜索问题，即澄清用户当前查询的意图。我们知道自然语言中有很多模棱两可的词，比如“苹果”。不同知识背景和兴趣的人对这些词有个性化的理解。因此，我们提出了一种个性化搜索模型，针对每个用户的个性化词嵌入，主要包含用户已经知道的、能反映用户兴趣的词的含义。为了学习个人词嵌入，我们设计了一个预训练模型，该模型既捕获查询日志的文本信息，也捕获以图结构表示的点击数据中包含的用户兴趣信息。通过个人词嵌入，我们获得查询和文档的个性化词和上下文感知表示。此外，我们还使用当前会话作为短期搜索上下文来动态消除当前查询的歧义。最后，我们使用匹配模型来计算个性化查询与文档表示之间的匹配分数，以进行排名。在两个大规模查询日志上的实验结果表明，我们设计的模型明显优于最先进的个性化模型。

{"title":"Clarifying Ambiguous Keywords with Personal Word Embeddings for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Ji-rong Wen","doi":"10.1145/3470564","DOIUrl":"https://doi.org/10.1145/3470564","url":null,"abstract":"Personalized search tailors document ranking lists for each individual user based on her interests and query intent to better satisfy the user’s information need. Many personalized search models have been proposed. They first build a user interest profile from the user’s search history, and then re-rank the documents based on the personalized matching scores between the created profile and candidate documents. In this article, we attempt to solve the personalized search problem from an alternative perspective of clarifying the user’s intention of the current query. We know that there are many ambiguous words in natural language such as “Apple.” People with different knowledge backgrounds and interests have personalized understandings of these words. Therefore, we propose a personalized search model with personal word embeddings for each individual user that mainly contain the word meanings that the user already knows and can reflect the user interests. To learn great personal word embeddings, we design a pre-training model that captures both the textual information of the query log and the information about user interests contained in the click-through data represented as a graph structure. With personal word embeddings, we obtain the personalized word and context-aware representations of the query and documents. Furthermore, we also employ the current session as the short-term search context to dynamically disambiguate the current query. Finally, we use a matching model to calculate the matching score between the personalized query and document representations for ranking. Experimental results on two large-scale query logs show that our designed model significantly outperforms state-of-the-art personalization models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"8 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90373644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Personalizing Medication Recommendation with a Graph-Based Approach 基于图的个性化药物推荐方法

ACM Transactions on Information Systems (TOIS)

Pub Date : 2021-11-22 DOI: 10.1145/3488668

Suman Bhoi, M. Lee, W. Hsu, A. Fang, N. Tan

The broad adoption of electronic health records (EHRs) has led to vast amounts of data being accumulated on a patient’s history, diagnosis, prescriptions, and lab tests. Advances in recommender technologies have the potential to utilize this information to help doctors personalize the prescribed medications. However, existing medication recommendation systems have yet to make use of all these information sources in a seamless manner, and they do not provide a justification on why a particular medication is recommended. In this work, we design a two-stage personalized medication recommender system called PREMIER that incorporates information from the EHR. We utilize the various weights in the system to compute the contributions from the information sources for the recommended medications. Our system models the drug interaction from an external drug database and the drug co-occurrence from the EHR as graphs. Experiment results on MIMIC-III and a proprietary outpatient dataset show that PREMIER outperforms state-of-the-art medication recommendation systems while achieving the best tradeoff between accuracy and drug-drug interaction. Case studies demonstrate that the justifications provided by PREMIER are appropriate and aligned to clinical practices.

电子健康记录(EHRs)的广泛采用导致了大量关于患者病史、诊断、处方和实验室测试的数据的积累。推荐技术的进步有可能利用这些信息来帮助医生个性化处方药物。然而，现有的药物推荐系统还没有以无缝的方式利用所有这些信息源，并且它们没有提供为什么推荐特定药物的理由。在这项工作中，我们设计了一个名为PREMIER的两阶段个性化药物推荐系统，该系统结合了来自电子病历的信息。我们利用系统中的各种权重来计算信息源对推荐药物的贡献。我们的系统将来自外部药物数据库的药物相互作用和来自电子病历的药物共现现象建模为图形。在MIMIC-III和专有门诊数据集上的实验结果表明，PREMIER优于最先进的药物推荐系统，同时实现了准确性和药物相互作用之间的最佳权衡。案例研究表明，PREMIER提供的理由是适当的，符合临床实践。

{"title":"Personalizing Medication Recommendation with a Graph-Based Approach","authors":"Suman Bhoi, M. Lee, W. Hsu, A. Fang, N. Tan","doi":"10.1145/3488668","DOIUrl":"https://doi.org/10.1145/3488668","url":null,"abstract":"The broad adoption of electronic health records (EHRs) has led to vast amounts of data being accumulated on a patient’s history, diagnosis, prescriptions, and lab tests. Advances in recommender technologies have the potential to utilize this information to help doctors personalize the prescribed medications. However, existing medication recommendation systems have yet to make use of all these information sources in a seamless manner, and they do not provide a justification on why a particular medication is recommended. In this work, we design a two-stage personalized medication recommender system called PREMIER that incorporates information from the EHR. We utilize the various weights in the system to compute the contributions from the information sources for the recommended medications. Our system models the drug interaction from an external drug database and the drug co-occurrence from the EHR as graphs. Experiment results on MIMIC-III and a proprietary outpatient dataset show that PREMIER outperforms state-of-the-art medication recommendation systems while achieving the best tradeoff between accuracy and drug-drug interaction. Case studies demonstrate that the justifications provided by PREMIER are appropriate and aligned to clinical practices.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"107 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77424457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19