首页 > 最新文献

Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献

英文 中文
Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions 利用聊天机器人信息建议评估人类-人工智能混合会话系统
Zihan Gao, Jiepu Jiang
AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions. Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites. However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.
人工智能聊天机器人可以提供建议,帮助人类回答问题,减少输入文本的工作量,并为不熟悉的问题提供相关知识。我们研究聊天机器人的建议是否可以帮助人们在对话中回答知识要求高的问题,并影响响应的质量和效率。我们进行了大规模的众包用户研究,并评估了20种混合系统变体和仅限人类的基线。混合系统使用了四个聊天机器人,它们的响应质量各不相同,在建议的数量和是否预先设置热门建议的消息框方面也有所不同。实验结果表明,即使使用性能较差的聊天机器人,聊天机器人的建议也能持续提高响应效率。与人工设置相比,混合系统的响应时间减少了12%- 35%,按键次数减少了33%- 60%,用户在没有任何变化的情况下采纳了最终响应的建议,占44%- 68%。相比之下,在只有人类的环境中,人群工作者输入了大部分的回复文本,并从其他网站复制了5%的答案。然而,我们也发现聊天机器人的建议并不总是有助于回复质量。具体来说,在配备了性能较差的聊天机器人的混合系统中,用户的回答质量要低于只有人类的情况。用户似乎不会简单地忽略糟糕的建议,而在没有看到建议的情况下撰写回复。此外,预设消息框在不影响回复质量的情况下提高了回复效率。我们没有发现显示更多的建议有助于或损害响应的质量或效率。我们的研究揭示了人工智能聊天机器人的建议如何以及何时可以帮助人们在混合对话系统中回答问题。
{"title":"Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions","authors":"Zihan Gao, Jiepu Jiang","doi":"10.1145/3459637.3482340","DOIUrl":"https://doi.org/10.1145/3459637.3482340","url":null,"abstract":"AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions. Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites. However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115093375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
LiteGT
Cong Chen, Chaofan Tao, Ngai Wong
Transformers have shown great potential for modeling long-term dependencies for natural language processing and computer vision. However, little study has applied transformers to graphs, which is challenging due to the poor scalability of the attention mechanism and the under-exploration of graph inductive bias. To bridge this gap, we propose a Lite Graph Transformer (LiteGT) that learns on arbitrary graphs efficiently. First, a node sampling strategy is proposed to sparsify the considered nodes in self-attention with only O (Nlog N) time. Second, we devise two kernelization approaches to form two-branch attention blocks, which not only leverage graph-specific topology information, but also reduce computation further to O (1 over 2 Nlog N). Third, the nodes are updated with different attention schemes during training, thus largely mitigating over-smoothing problems when the model layers deepen. Extensive experiments demonstrate that LiteGT achieves competitive performance on both node classification and link prediction on datasets with millions of nodes. Specifically, Jaccard + Sampling + Dim. reducing setting reduces more than 100x computation and halves the model size without performance degradation.
{"title":"LiteGT","authors":"Cong Chen, Chaofan Tao, Ngai Wong","doi":"10.1145/3459637.3482272","DOIUrl":"https://doi.org/10.1145/3459637.3482272","url":null,"abstract":"Transformers have shown great potential for modeling long-term dependencies for natural language processing and computer vision. However, little study has applied transformers to graphs, which is challenging due to the poor scalability of the attention mechanism and the under-exploration of graph inductive bias. To bridge this gap, we propose a Lite Graph Transformer (LiteGT) that learns on arbitrary graphs efficiently. First, a node sampling strategy is proposed to sparsify the considered nodes in self-attention with only O (Nlog N) time. Second, we devise two kernelization approaches to form two-branch attention blocks, which not only leverage graph-specific topology information, but also reduce computation further to O (1 over 2 Nlog N). Third, the nodes are updated with different attention schemes during training, thus largely mitigating over-smoothing problems when the model layers deepen. Extensive experiments demonstrate that LiteGT achieves competitive performance on both node classification and link prediction on datasets with millions of nodes. Specifically, Jaccard + Sampling + Dim. reducing setting reduces more than 100x computation and halves the model size without performance degradation.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series 基于GNN和时序故障序列的大型管网故障预测
Shuming Liang, Zhidong Li, Binxin Liang, Yu Ding, Yang Wang, Fang Chen
Pipe failure prediction in the water industry aims to prioritize the pipes that are at high risk of failure for proactive maintenance. However, existing statistical or machine learning models that rely on historical failures and asset attributes can hardly leverage the structure information of pipe networks. In this work, we develop a failure prediction framework for pipe networks by jointly considering the pipes' features, the network structure, the geographical neighboring effect, and the temporal failure series. We apply a multi-hop Graph Neural Network (GNN) to failure prediction. We propose a method of constructing a geographical graph structure depending on not only the physical connections but also geographical distances between pipes. To differentiate the pipes with diverse properties, we employ an attention mechanism in the neighborhood aggregation process of each GNN layer. Also, residual connections and layer-wise aggregation are used to avoid the over-smoothing issue in deep GNNs. The historical failures exhibit a strong temporal pattern. Inspired by point process, we develop a module to learn the pipes' evolutionary effect and the time-decayed excitement of historical failures on the current state of the pipe. The proposed framework is evaluated on two real-world large-scale pipe networks. It outperforms the existing statistical, machine learning, and state-of-the-art GNN baselines. Our framework provides the water utility with core data-driven support for proactive maintenance including regular pipe inspection, pipe renewal planning, and sensor system deployment. It can be extended to other infrastructure networks in the future.
供水行业管道故障预测的目的是优先考虑高故障风险的管道进行主动维护。然而,现有的统计或机器学习模型依赖于历史故障和资产属性,很难利用管网的结构信息。本文综合考虑管道的特性、网络结构、地理邻近效应和时间序列等因素,建立了管网失效预测框架。我们将多跳图神经网络(GNN)应用于故障预测。我们提出了一种构造地理图形结构的方法,该方法不仅依赖于物理连接,而且依赖于管道之间的地理距离。为了区分具有不同属性的管道,我们在每个GNN层的邻域聚合过程中采用了注意机制。此外,残差连接和分层聚合也被用于避免深度gnn中的过度平滑问题。历史上的失败表现出强烈的时间模式。受点过程的启发,我们开发了一个模块来学习管道的演化效应和历史故障对管道当前状态的时间衰减刺激。在两个真实的大型管网中对所提出的框架进行了评估。它优于现有的统计、机器学习和最先进的GNN基线。我们的框架为水务公司提供了核心数据驱动的主动维护支持,包括定期管道检查、管道更新计划和传感器系统部署。它可以在未来扩展到其他基础设施网络。
{"title":"Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series","authors":"Shuming Liang, Zhidong Li, Binxin Liang, Yu Ding, Yang Wang, Fang Chen","doi":"10.1145/3459637.3481918","DOIUrl":"https://doi.org/10.1145/3459637.3481918","url":null,"abstract":"Pipe failure prediction in the water industry aims to prioritize the pipes that are at high risk of failure for proactive maintenance. However, existing statistical or machine learning models that rely on historical failures and asset attributes can hardly leverage the structure information of pipe networks. In this work, we develop a failure prediction framework for pipe networks by jointly considering the pipes' features, the network structure, the geographical neighboring effect, and the temporal failure series. We apply a multi-hop Graph Neural Network (GNN) to failure prediction. We propose a method of constructing a geographical graph structure depending on not only the physical connections but also geographical distances between pipes. To differentiate the pipes with diverse properties, we employ an attention mechanism in the neighborhood aggregation process of each GNN layer. Also, residual connections and layer-wise aggregation are used to avoid the over-smoothing issue in deep GNNs. The historical failures exhibit a strong temporal pattern. Inspired by point process, we develop a module to learn the pipes' evolutionary effect and the time-decayed excitement of historical failures on the current state of the pipe. The proposed framework is evaluated on two real-world large-scale pipe networks. It outperforms the existing statistical, machine learning, and state-of-the-art GNN baselines. Our framework provides the water utility with core data-driven support for proactive maintenance including regular pipe inspection, pipe renewal planning, and sensor system deployment. It can be extended to other infrastructure networks in the future.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UQJG
Toon Koppelaars, Xavier Oriol, Ernest Teniente, Sérgio Curto, E. Pujol
An SQL assertion is a declarative statement about data that must always be satisfied in any database state. Assertions were introduced in the SQL92 standard but no commercial DBMS has implemented them so far. Some approaches have been proposed to incrementally determine whether a transaction violates an SQL assertion, but they assume that transactions are applied in isolation, hence not considering the problem of concurrent transaction executions that collaborate to violate an assertion. This is the main stopper for its commercial implementation. To handle this problem, we have developed a technique for efficiently serializing concurrent transactions that might interact to violate an SQL assertion.
{"title":"UQJG","authors":"Toon Koppelaars, Xavier Oriol, Ernest Teniente, Sérgio Curto, E. Pujol","doi":"10.1145/3459637.3482210","DOIUrl":"https://doi.org/10.1145/3459637.3482210","url":null,"abstract":"An SQL assertion is a declarative statement about data that must always be satisfied in any database state. Assertions were introduced in the SQL92 standard but no commercial DBMS has implemented them so far. Some approaches have been proposed to incrementally determine whether a transaction violates an SQL assertion, but they assume that transactions are applied in isolation, hence not considering the problem of concurrent transaction executions that collaborate to violate an assertion. This is the main stopper for its commercial implementation. To handle this problem, we have developed a technique for efficiently serializing concurrent transactions that might interact to violate an SQL assertion.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114850514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To Be or not to Be, Tail Labels in Extreme Multi-label Learning 极端多标签学习中的尾标签
Zhiqi Ge, Ximing Li
EXtreme Multi-label Learning (XML) aims to predict each instance its most relevant subset of labels from an extremely huge label space, often exceeding one million or even larger in many real applications. In XML scenarios, the labels exhibit a long tail distribution, where a significant number of labels appear in very few instances, referred to as tail labels. Unfortunately, due to the lack of positive instances, the tail labels are intractable to learn as well as predict. Several previous studies even suggested that the tail labels can be directly removed by referring to their label frequencies. We consider that such violent principle may miss many significant tail labels, because the predictive accuracy is not strictly consistent with the label frequency especially for tail labels. In this paper, we are interested in finding a reasonable principle to determine whether a tail label should be removed, not only depending on their label frequencies. To this end, we investigate a method named Nearest Neighbor Positive Proportion Score (N2P2S) to score the tail labels by annotations of the instance neighbors. Extensive empirical results indicate that the proposed N2P2S can effectively screen the tail labels, where many preserved tail labels can be learned and accurately predicted even with very few positive instances.
极端多标签学习(EXtreme Multi-label Learning, XML)旨在从一个巨大的标签空间中预测每个实例最相关的标签子集,在许多实际应用中,这个空间通常超过一百万个,甚至更多。在XML场景中,标签呈现长尾分布,在极少数实例中出现大量标签,称为尾标签。不幸的是,由于缺乏正面实例,尾部标签难以学习和预测。之前的一些研究甚至表明,可以通过参考它们的标签频率直接删除尾部标签。我们认为这种暴力原则可能会遗漏许多重要的尾标签,因为预测精度与标签频率并不严格一致,特别是对于尾标签。在本文中,我们感兴趣的是找到一个合理的原则来确定是否应该删除尾部标签,而不仅仅取决于它们的标签频率。为此,我们研究了一种名为最近邻正比例评分(N2P2S)的方法,通过实例邻居的注释对尾部标签进行评分。大量的实证结果表明,所提出的N2P2S可以有效地筛选尾标签,即使只有很少的正面实例,也可以学习到许多保留的尾标签并准确预测。
{"title":"To Be or not to Be, Tail Labels in Extreme Multi-label Learning","authors":"Zhiqi Ge, Ximing Li","doi":"10.1145/3459637.3482303","DOIUrl":"https://doi.org/10.1145/3459637.3482303","url":null,"abstract":"EXtreme Multi-label Learning (XML) aims to predict each instance its most relevant subset of labels from an extremely huge label space, often exceeding one million or even larger in many real applications. In XML scenarios, the labels exhibit a long tail distribution, where a significant number of labels appear in very few instances, referred to as tail labels. Unfortunately, due to the lack of positive instances, the tail labels are intractable to learn as well as predict. Several previous studies even suggested that the tail labels can be directly removed by referring to their label frequencies. We consider that such violent principle may miss many significant tail labels, because the predictive accuracy is not strictly consistent with the label frequency especially for tail labels. In this paper, we are interested in finding a reasonable principle to determine whether a tail label should be removed, not only depending on their label frequencies. To this end, we investigate a method named Nearest Neighbor Positive Proportion Score (N2P2S) to score the tail labels by annotations of the instance neighbors. Extensive empirical results indicate that the proposed N2P2S can effectively screen the tail labels, where many preserved tail labels can be learned and accurately predicted even with very few positive instances.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115877915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Self-Supervised Learning based on Sentiment Analysis with Word Weight Calculation 基于词权计算情感分析的自监督学习
Dongcheol Son, Youngjoong Ko
Learning domain information for a downstream task is important to improve the performance of sentiment analysis. However, the labeling task to obtain a sufficient amount of training data in an application domain tends to be highly time-consuming and tedious. To solve this problem, we propose a novel method to effectively learn domain information and improve sentiment analysis performance with a small amount of training data. We use the masked language model (MLM), which is a self-supervised learning model, to calculate word weights and improve a downstream fine-tuning task for sentiment analysis. In particular, the MLM with the calculated word weights is executed simultaneously with the fine-tuning task. The results show that the proposed model achieves better performances than previous models in four different datasets for sentiment analysis.
学习下游任务的领域信息对于提高情感分析的性能非常重要。然而,在一个应用领域中获得足够数量的训练数据的标记任务往往是非常耗时和繁琐的。为了解决这一问题,我们提出了一种新的方法,利用少量的训练数据有效地学习领域信息,提高情感分析的性能。我们使用蒙面语言模型(MLM),这是一种自监督学习模型,来计算单词权重并改进下游微调任务用于情感分析。特别地,计算出单词权重的MLM与微调任务同时执行。结果表明,该模型在四种不同的数据集上都取得了较好的情感分析效果。
{"title":"Self-Supervised Learning based on Sentiment Analysis with Word Weight Calculation","authors":"Dongcheol Son, Youngjoong Ko","doi":"10.1145/3459637.3482180","DOIUrl":"https://doi.org/10.1145/3459637.3482180","url":null,"abstract":"Learning domain information for a downstream task is important to improve the performance of sentiment analysis. However, the labeling task to obtain a sufficient amount of training data in an application domain tends to be highly time-consuming and tedious. To solve this problem, we propose a novel method to effectively learn domain information and improve sentiment analysis performance with a small amount of training data. We use the masked language model (MLM), which is a self-supervised learning model, to calculate word weights and improve a downstream fine-tuning task for sentiment analysis. In particular, the MLM with the calculated word weights is executed simultaneously with the fine-tuning task. The results show that the proposed model achieves better performances than previous models in four different datasets for sentiment analysis.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115140127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DLQ
You Peng, Wenjie Zhao, Wenjie Zhang, Xuemin Lin, Ying Zhang
Label-Constraint Reachability query (LCR) which extracts of reachability information from large edge-labeled graphs, has attracted tremendous interest. Various LCR algorithms have been proposed to solve this fundamental query, which has a wide range of applications in social networks, biological networks, economic networks, etc. In this paper, we implement the state-of-the-art P2H+ algorithm as well as functions to analyze the effectiveness. Moreover, our Dynamic LCR Query (DLQ) system also supports dynamic updates with the 2-hop labeling method. In this demonstration, we present the DLQ system for Label-Constrained Reachability Queries that utilize the 2-hop labeling algorithm with dynamic graph maintenance.
{"title":"DLQ","authors":"You Peng, Wenjie Zhao, Wenjie Zhang, Xuemin Lin, Ying Zhang","doi":"10.1145/3459637.3481978","DOIUrl":"https://doi.org/10.1145/3459637.3481978","url":null,"abstract":"Label-Constraint Reachability query (LCR) which extracts of reachability information from large edge-labeled graphs, has attracted tremendous interest. Various LCR algorithms have been proposed to solve this fundamental query, which has a wide range of applications in social networks, biological networks, economic networks, etc. In this paper, we implement the state-of-the-art P2H+ algorithm as well as functions to analyze the effectiveness. Moreover, our Dynamic LCR Query (DLQ) system also supports dynamic updates with the 2-hop labeling method. In this demonstration, we present the DLQ system for Label-Constrained Reachability Queries that utilize the 2-hop labeling algorithm with dynamic graph maintenance.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115394549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Task Allocation with Geographic Partition in Spatial Crowdsourcing 空间众包中具有地理分区的任务分配
Guanyu Ye, Yan Zhao, Xuanhao Chen, Kai Zheng
Recent years have witnessed a revolution in Spatial Crowdsourcing (SC), in which people with mobile connectivity can perform spatio-temporal tasks that involve travel to specified locations. In this paper, we identify and study in depth a new multi-center-based task allocation problem in the context of SC, where multiple allocation centers exist. In particular, we aim to maximize the total number of the allocated tasks while minimizing the average allocated task number difference. To solve the problem, we propose a two-phase framework, called Task Allocation with Geographic Partition, consisting of a geographic partition phase and a task allocation phase. The first phase is to divide the whole study area based on the allocation centers by using both a basic Voronoi diagram-based algorithm and an adaptive weighted Voronoi diagram-based algorithm. In the allocation phase, we utilize a Reinforcement Learning method to achieve the task allocation, where a graph neural network with the attention mechanism is used to learn the embeddings of allocation centers, delivery points and workers. Extensive experiments give insight into the effectiveness and efficiency of the proposed solutions.
近年来见证了空间众包(SC)的革命,在这种革命中,拥有移动连接的人可以执行涉及到特定地点旅行的时空任务。本文对供应链环境下存在多个分配中心的一种新的基于多中心的任务分配问题进行了深入的研究。特别是,我们的目标是最大化分配的任务总数,同时最小化平均分配的任务数差。为了解决这个问题,我们提出了一个两阶段的框架,称为带有地理分区的任务分配,它由地理分区阶段和任务分配阶段组成。第一阶段采用基于Voronoi图的基本算法和基于自适应加权Voronoi图的算法,根据分配中心对整个研究区域进行划分。在分配阶段,我们使用强化学习方法来实现任务分配,其中使用带有注意机制的图神经网络来学习分配中心、交付点和工人的嵌入。大量的实验可以深入了解所提出的解决方案的有效性和效率。
{"title":"Task Allocation with Geographic Partition in Spatial Crowdsourcing","authors":"Guanyu Ye, Yan Zhao, Xuanhao Chen, Kai Zheng","doi":"10.1145/3459637.3482300","DOIUrl":"https://doi.org/10.1145/3459637.3482300","url":null,"abstract":"Recent years have witnessed a revolution in Spatial Crowdsourcing (SC), in which people with mobile connectivity can perform spatio-temporal tasks that involve travel to specified locations. In this paper, we identify and study in depth a new multi-center-based task allocation problem in the context of SC, where multiple allocation centers exist. In particular, we aim to maximize the total number of the allocated tasks while minimizing the average allocated task number difference. To solve the problem, we propose a two-phase framework, called Task Allocation with Geographic Partition, consisting of a geographic partition phase and a task allocation phase. The first phase is to divide the whole study area based on the allocation centers by using both a basic Voronoi diagram-based algorithm and an adaptive weighted Voronoi diagram-based algorithm. In the allocation phase, we utilize a Reinforcement Learning method to achieve the task allocation, where a graph neural network with the attention mechanism is used to learn the embeddings of allocation centers, delivery points and workers. Extensive experiments give insight into the effectiveness and efficiency of the proposed solutions.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Low-dimensional Alignment for Cross-Domain Recommendation 跨领域推荐的低维对齐
Tian-Qi Wang, Fuzhen Zhuang, Zhiqiang Zhang, Daixin Wang, Jun Zhou, Qing He
Cold start problem is one of the most challenging and long-standing problems in recommender systems, and cross-domain recommendation (CDR) methods are effective for tackling it. Most cold-start related CDR methods require training a mapping function between high-dimensional embedding space using overlapping user data. However, the overlapping data is scarce in many recommendation tasks, which makes it difficult to train the mapping function. In this paper, we propose a new approach for CDR, which aims to alleviate the training difficulty. The proposed method can be viewed as a special parameterization of the mapping function without hurting expressiveness, which makes use of non-overlapping user data and leads to effective optimization. Extensive experiments on two real-world CDR tasks are performed to evaluate the proposed method. In the case that there are few overlapping data, the proposed method outperforms the existed state-of-the-art method by 14% (relative improvement).
冷启动问题是推荐系统中最具挑战性和长期存在的问题之一,跨域推荐(CDR)方法是解决冷启动问题的有效方法。大多数与冷启动相关的CDR方法都需要使用重叠的用户数据来训练高维嵌入空间之间的映射函数。然而,在许多推荐任务中,重叠数据很少,这给映射函数的训练带来了困难。在本文中,我们提出了一种新的CDR方法,旨在减轻训练难度。该方法可以看作是映射函数的一种特殊的参数化,在不影响表达性的情况下,利用了不重叠的用户数据,实现了有效的优化。在两个真实的CDR任务上进行了广泛的实验来评估所提出的方法。在重叠数据较少的情况下,本文提出的方法比现有的最先进方法高出14%(相对改进)。
{"title":"Low-dimensional Alignment for Cross-Domain Recommendation","authors":"Tian-Qi Wang, Fuzhen Zhuang, Zhiqiang Zhang, Daixin Wang, Jun Zhou, Qing He","doi":"10.1145/3459637.3482137","DOIUrl":"https://doi.org/10.1145/3459637.3482137","url":null,"abstract":"Cold start problem is one of the most challenging and long-standing problems in recommender systems, and cross-domain recommendation (CDR) methods are effective for tackling it. Most cold-start related CDR methods require training a mapping function between high-dimensional embedding space using overlapping user data. However, the overlapping data is scarce in many recommendation tasks, which makes it difficult to train the mapping function. In this paper, we propose a new approach for CDR, which aims to alleviate the training difficulty. The proposed method can be viewed as a special parameterization of the mapping function without hurting expressiveness, which makes use of non-overlapping user data and leads to effective optimization. Extensive experiments on two real-world CDR tasks are performed to evaluate the proposed method. In the case that there are few overlapping data, the proposed method outperforms the existed state-of-the-art method by 14% (relative improvement).","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"618 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123201056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation 天作之合:监督查询重构的工具箱和大规模数据集
Negar Arabzadeh, A. Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, E. Bagheri
Researchers have already shown that it is possible to improve retrieval effectiveness through the systematic reformulation of users' queries. Traditionally, most query reformulation techniques relied on unsupervised approaches such as query expansion through pseudo-relevance feedback. More recently and with the increasing effectiveness of neural sequence-to-sequence architectures, the problem of query reformulation has been studied as a supervised query translation problem, which learns to rewrite a query into a more effective alternative. While quite effective in practice, such supervised query reformulation methods require a large number of training instances. In this paper, we present three large-scale query reformulation datasets, namely Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset. The Diamond dataset consists of over 188,000 query pairs where the original source query is matched with an alternative query that has a perfect retrieval effectiveness (an average precision of 1). To the best of our knowledge, this is the first set of datasets for supervised query reformulation that offers perfect query reformulations for a large number of queries. The implementation of our fully automated tool, which is based on a transformer architecture, and our three datasets are made publicly available. We also establish a neural query reformulation baseline performance on our datasets by reporting the performance of strong neural query reformulation baselines. It is our belief that our datasets will significantly impact the development of supervised query reformulation methods in the future.
研究人员已经表明,有可能通过系统地重新表述用户的查询来提高检索效率。传统上,大多数查询重构技术依赖于无监督的方法,例如通过伪相关反馈进行查询扩展。最近,随着神经序列到序列架构的有效性不断提高,查询重新表述问题被研究为一个监督查询翻译问题,该问题学习将查询重写为更有效的替代方案。这种监督式查询重构方法虽然在实践中非常有效,但需要大量的训练实例。本文基于MS MARCO数据集中的查询,提出了Diamond、Platinum和Gold三个大规模查询重构数据集。Diamond数据集由超过188,000个查询对组成,其中原始源查询与具有完美检索效率(平均精度为1)的替代查询相匹配。据我们所知,这是第一组用于监督查询重新表述的数据集,它为大量查询提供了完美的查询重新表述。我们的完全自动化的工具的实现是基于一个转换器架构的,我们的三个数据集是公开可用的。我们还通过报告强神经查询重构基线的性能,在我们的数据集上建立了神经查询重构基线性能。我们相信,我们的数据集将在未来显著影响监督查询重构方法的发展。
{"title":"Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation","authors":"Negar Arabzadeh, A. Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, E. Bagheri","doi":"10.1145/3459637.3482009","DOIUrl":"https://doi.org/10.1145/3459637.3482009","url":null,"abstract":"Researchers have already shown that it is possible to improve retrieval effectiveness through the systematic reformulation of users' queries. Traditionally, most query reformulation techniques relied on unsupervised approaches such as query expansion through pseudo-relevance feedback. More recently and with the increasing effectiveness of neural sequence-to-sequence architectures, the problem of query reformulation has been studied as a supervised query translation problem, which learns to rewrite a query into a more effective alternative. While quite effective in practice, such supervised query reformulation methods require a large number of training instances. In this paper, we present three large-scale query reformulation datasets, namely Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset. The Diamond dataset consists of over 188,000 query pairs where the original source query is matched with an alternative query that has a perfect retrieval effectiveness (an average precision of 1). To the best of our knowledge, this is the first set of datasets for supervised query reformulation that offers perfect query reformulations for a large number of queries. The implementation of our fully automated tool, which is based on a transformer architecture, and our three datasets are made publicly available. We also establish a neural query reformulation baseline performance on our datasets by reporting the performance of strong neural query reformulation baselines. It is our belief that our datasets will significantly impact the development of supervised query reformulation methods in the future.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1