首页 > 最新文献

Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献

英文 中文
DashBot DashBot
S. Da Col, Radu Ciucanu, Marta Soare, Nassim Bouarour, Sihem Amer-Yahia
Data summarization provides a bird's eye view of data and groupby queries have been the method of choice for data summarization. Such queries provide the ability to group by some attributes and aggregate by others, and their results can be coupled with a visualization to convey insights. The number of possible groupbys that can be computed over a dataset is quite large which naturally calls for developing approaches to aid users in choosing which groupbys best summarize data. We demonstrate DashBot, a system that leverages Machine Learning to guide users in generating data-driven and customized dashboards. A dashboard contains a set of panels, each of which is a groupby query. DashBot iteratively recommends the most relevant panel while ensuring coverage. Relevance is computed based on intrinsic measures of the dataset and coverage aims to provide comprehensive summaries. DashBot relies on a Multi-Armed Bandits (MABs) approach to balance exploitation of relevance and exploration of different regions of the data to achieve coverage. Users can provide feedback and explanations to customize recommended panels. We demonstrate the utility and features of DashBot on different datasets.
{"title":"DashBot","authors":"S. Da Col, Radu Ciucanu, Marta Soare, Nassim Bouarour, Sihem Amer-Yahia","doi":"10.1145/3459637.3481968","DOIUrl":"https://doi.org/10.1145/3459637.3481968","url":null,"abstract":"Data summarization provides a bird's eye view of data and groupby queries have been the method of choice for data summarization. Such queries provide the ability to group by some attributes and aggregate by others, and their results can be coupled with a visualization to convey insights. The number of possible groupbys that can be computed over a dataset is quite large which naturally calls for developing approaches to aid users in choosing which groupbys best summarize data. We demonstrate DashBot, a system that leverages Machine Learning to guide users in generating data-driven and customized dashboards. A dashboard contains a set of panels, each of which is a groupby query. DashBot iteratively recommends the most relevant panel while ensuring coverage. Relevance is computed based on intrinsic measures of the dataset and coverage aims to provide comprehensive summaries. DashBot relies on a Multi-Armed Bandits (MABs) approach to balance exploitation of relevance and exploration of different regions of the data to achieve coverage. Users can provide feedback and explanations to customize recommended panels. We demonstrate the utility and features of DashBot on different datasets.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"12 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134363027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MDFEND: Multi-domain Fake News Detection md挡德:多域假新闻检测
Qiong Nan, Juan Cao, Yongchun Zhu, Yanyan Wang, Jintao Li
Fake news spread widely on social media in various domains, which lead to real-world threats in many aspects like politics, disasters, and finance. Most existing approaches focus on single-domain fake news detection (SFND), which leads to unsatisfying performance when these methods are applied to multi-domain fake news detection. As an emerging field, multi-domain fake news detection (MFND) is increasingly attracting attention. However, data distributions, such as word frequency and propagation patterns, vary from domain to domain, namely domain shift. Facing the challenge of serious domain shift, existing fake news detection techniques perform poorly for multi-domain scenarios. Therefore, it is demanding to design a specialized model for MFND. In this paper, we first design a benchmark of fake news dataset for MFDN with domain label annotated, namely Weibo21, which consists of 4,488 fake news and 4,640 real news from 9 different domains. We further propose an effective Multi-domain Fake News Detection Model (MDFEND) by utilizing domain gate to aggregate multiple representations extracted by a mixture of experts. The experiments show that MDFEND can significantly improve the performance of multi-domain fake news detection. Our dataset and code are available at https://github.com/kennqiang/MDFEND-Weibo21.
假新闻在各个领域的社交媒体上广泛传播,在政治、灾难和金融等许多方面导致现实世界的威胁。现有的假新闻检测方法大多集中在单域假新闻检测(SFND)上,当这些方法应用于多域假新闻检测时,其性能并不令人满意。作为一门新兴的领域,多域假新闻检测越来越受到人们的关注。然而,数据分布,如词频和传播模式,因域而异,即域移位。面对严重的领域转移挑战,现有的假新闻检测技术在多领域场景下表现不佳。因此,需要为MFND设计一个专门的模型。本文首先设计了一个带域标签标注的MFDN假新闻数据集基准,即Weibo21,该数据集由来自9个不同域的4488条假新闻和4640条真实新闻组成。我们进一步提出了一种有效的多域假新闻检测模型(md挡德),该模型利用域门对混合专家提取的多个表示进行聚合。实验表明,该方法可以显著提高多域假新闻检测的性能。我们的数据集和代码可在https://github.com/kennqiang/MDFEND-Weibo21上获得。
{"title":"MDFEND: Multi-domain Fake News Detection","authors":"Qiong Nan, Juan Cao, Yongchun Zhu, Yanyan Wang, Jintao Li","doi":"10.1145/3459637.3482139","DOIUrl":"https://doi.org/10.1145/3459637.3482139","url":null,"abstract":"Fake news spread widely on social media in various domains, which lead to real-world threats in many aspects like politics, disasters, and finance. Most existing approaches focus on single-domain fake news detection (SFND), which leads to unsatisfying performance when these methods are applied to multi-domain fake news detection. As an emerging field, multi-domain fake news detection (MFND) is increasingly attracting attention. However, data distributions, such as word frequency and propagation patterns, vary from domain to domain, namely domain shift. Facing the challenge of serious domain shift, existing fake news detection techniques perform poorly for multi-domain scenarios. Therefore, it is demanding to design a specialized model for MFND. In this paper, we first design a benchmark of fake news dataset for MFDN with domain label annotated, namely Weibo21, which consists of 4,488 fake news and 4,640 real news from 9 different domains. We further propose an effective Multi-domain Fake News Detection Model (MDFEND) by utilizing domain gate to aggregate multiple representations extracted by a mixture of experts. The experiments show that MDFEND can significantly improve the performance of multi-domain fake news detection. Our dataset and code are available at https://github.com/kennqiang/MDFEND-Weibo21.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134434889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Unsupervised Domain Adaptation for Static Malware Detection based on Gradient Boosting Trees 基于梯度增强树的无监督域自适应静态恶意软件检测
Panpan Qi, Wei Wang, Lei Zhu, See-Kiong Ng
Static malware detection is important for protection against malware by allowing for malicious files to be detected prior to execution. It is also especially suitable for machine learning-based approaches. Recently, gradient boosting decision trees (GBDT) models, e.g., LightGBM (a popular implementation of GBDT), have shown outstanding performance for malware detection. However, as malware programs are known to evolve rapidly, malware classification models trained on the (source) training data often fail to generalize to the target domain, i.e., the deployed environment. To handle the underlying data distribution drifts, unsupervised domain adaptation techniques have been proposed for machine learning models including deep learning models. However, unsupervised domain adaptation for GBDT has remained challenging. In this paper, we adapt the adversarial learning framework for unsupervised domain adaptation to enable GBDT learn domain-invariant features and alleviate performance degradation in the target domain. In addition, to fully exploit the unlabelled target data, we merge them into the training dataset after pseudo-labelling. We propose a new weighting scheme integrated into GBDT for sampling instances in each boosting round to reduce the negative impact of wrongly labelled target instances. Experiments on two large malware datasets demonstrate the superiority of our proposed method.
静态恶意软件检测对于防止恶意软件非常重要,因为它允许在执行之前检测恶意文件。它也特别适用于基于机器学习的方法。最近,梯度增强决策树(GBDT)模型,例如LightGBM(一种流行的GBDT实现),在恶意软件检测方面表现出了出色的性能。然而,众所周知,恶意软件程序发展迅速,在(源)训练数据上训练的恶意软件分类模型往往不能推广到目标领域,即部署环境。为了处理底层数据分布漂移,人们提出了用于机器学习模型(包括深度学习模型)的无监督域自适应技术。然而,GBDT的无监督域自适应仍然具有挑战性。在本文中,我们将对抗学习框架用于无监督域自适应,使GBDT能够学习域不变特征,并减轻目标域的性能下降。此外,为了充分利用未标记的目标数据,我们在伪标记后将其合并到训练数据集中。我们提出了一种新的加权方案,将其集成到GBDT中,用于每个增强轮的采样实例,以减少错误标记目标实例的负面影响。在两个大型恶意软件数据集上的实验证明了该方法的优越性。
{"title":"Unsupervised Domain Adaptation for Static Malware Detection based on Gradient Boosting Trees","authors":"Panpan Qi, Wei Wang, Lei Zhu, See-Kiong Ng","doi":"10.1145/3459637.3482400","DOIUrl":"https://doi.org/10.1145/3459637.3482400","url":null,"abstract":"Static malware detection is important for protection against malware by allowing for malicious files to be detected prior to execution. It is also especially suitable for machine learning-based approaches. Recently, gradient boosting decision trees (GBDT) models, e.g., LightGBM (a popular implementation of GBDT), have shown outstanding performance for malware detection. However, as malware programs are known to evolve rapidly, malware classification models trained on the (source) training data often fail to generalize to the target domain, i.e., the deployed environment. To handle the underlying data distribution drifts, unsupervised domain adaptation techniques have been proposed for machine learning models including deep learning models. However, unsupervised domain adaptation for GBDT has remained challenging. In this paper, we adapt the adversarial learning framework for unsupervised domain adaptation to enable GBDT learn domain-invariant features and alleviate performance degradation in the target domain. In addition, to fully exploit the unlabelled target data, we merge them into the training dataset after pseudo-labelling. We propose a new weighting scheme integrated into GBDT for sampling instances in each boosting round to reduce the negative impact of wrongly labelled target instances. Experiments on two large malware datasets demonstrate the superiority of our proposed method.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134504181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Semantic Data Marketplace for Easy Data Sharing within a Smart City 智能城市中易于数据共享的语义数据市场
André Pomp, A. Paulus, Andreas Burgdorf, Tobias Meisen
Today, smart city applications are largely based on data collected from different stakeholders. This presupposes that the required data sources are publicly available. While open data platforms already provide a number of urban data sources, enterprises and citizens have few opportunities to make their data available. To complicate things further, if the data is published, the processing of this data is already extremely time-consuming today, as the data sources are heterogeneous and the corresponding homogenization has to be carried out by the data consumers themselves. In this paper, we present a data marketplace that enables different stakeholders (public institutions, enterprises, citizens) to easily provide data that can especially contribute to the further realization of smart cities. This marketplace is based on the principles of semantic data management, i.e., data providers annotate their added data with semantic models. With the help of these models, the data sources can be found and understood by data consumers and finally homogenized in a way that is suitable for their application.
如今,智慧城市应用主要基于从不同利益相关者那里收集的数据。这假定所需的数据源是公开可用的。虽然开放数据平台已经提供了一些城市数据源,但企业和公民很少有机会提供他们的数据。更复杂的是,如果发布数据,那么处理这些数据现在已经非常耗时,因为数据源是异构的,相应的同质化必须由数据消费者自己执行。在本文中,我们提出了一个数据市场,使不同的利益相关者(公共机构、企业、公民)能够轻松地提供数据,特别是有助于进一步实现智慧城市。这个市场基于语义数据管理的原则,也就是说,数据提供者用语义模型注释他们添加的数据。在这些模型的帮助下,数据源可以被数据消费者发现和理解,并最终以适合其应用的方式进行同质化。
{"title":"A Semantic Data Marketplace for Easy Data Sharing within a Smart City","authors":"André Pomp, A. Paulus, Andreas Burgdorf, Tobias Meisen","doi":"10.1145/3459637.3481995","DOIUrl":"https://doi.org/10.1145/3459637.3481995","url":null,"abstract":"Today, smart city applications are largely based on data collected from different stakeholders. This presupposes that the required data sources are publicly available. While open data platforms already provide a number of urban data sources, enterprises and citizens have few opportunities to make their data available. To complicate things further, if the data is published, the processing of this data is already extremely time-consuming today, as the data sources are heterogeneous and the corresponding homogenization has to be carried out by the data consumers themselves. In this paper, we present a data marketplace that enables different stakeholders (public institutions, enterprises, citizens) to easily provide data that can especially contribute to the further realization of smart cities. This marketplace is based on the principles of semantic data management, i.e., data providers annotate their added data with semantic models. With the help of these models, the data sources can be found and understood by data consumers and finally homogenized in a way that is suitable for their application.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131509292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Health Claims Unpacked: A toolkit to Enhance the Communication of Health Claims for Food 健康声明揭秘:加强食品健康声明沟通的工具包
Xiao Li, Huizhi Liang, Zehao Liu
Health claims are sentences on the food product packages to claim the nutrition and the benefits of the nutrition. Consumers in different European contexts often have difficulties understanding health claims, leading to increased confusion about and decreased trust in the food they buy. Focusing on this problem, we develop a toolkit for improving the communication of health claims for consumers. The toolkit provides (1) interactive activities to disseminate knowledge about health claims to the public, and (2) an NLP-based analysis and prediction engine that food manufacturers can use to estimate how consumers like the health claims that the manufacturers created. By using the AI-powered toolkit, consumers, manufacturers, and food safety regulators are engaged in determining the different linguistic and cultural barriers to the effective communication of health claims and formulating solutions that can be implemented on multiple levels, including regulation, enforcement, marketing, and consumer education.
健康声明是在食品包装上声称营养成分和营养益处的句子。欧洲不同地区的消费者往往难以理解健康声明,导致他们对购买的食品越来越困惑,对食品的信任度也越来越低。针对这一问题,我们开发了一个工具包,用于改善消费者对健康声明的沟通。该工具包提供了(1)互动活动,向公众传播有关健康声明的知识;(2)基于nlp的分析和预测引擎,食品制造商可以使用它来估计消费者对制造商创造的健康声明的喜爱程度。通过使用人工智能工具包,消费者、制造商和食品安全监管机构参与确定健康声明有效沟通的不同语言和文化障碍,并制定可在多个层面实施的解决方案,包括监管、执法、营销和消费者教育。
{"title":"Health Claims Unpacked: A toolkit to Enhance the Communication of Health Claims for Food","authors":"Xiao Li, Huizhi Liang, Zehao Liu","doi":"10.1145/3459637.3481984","DOIUrl":"https://doi.org/10.1145/3459637.3481984","url":null,"abstract":"Health claims are sentences on the food product packages to claim the nutrition and the benefits of the nutrition. Consumers in different European contexts often have difficulties understanding health claims, leading to increased confusion about and decreased trust in the food they buy. Focusing on this problem, we develop a toolkit for improving the communication of health claims for consumers. The toolkit provides (1) interactive activities to disseminate knowledge about health claims to the public, and (2) an NLP-based analysis and prediction engine that food manufacturers can use to estimate how consumers like the health claims that the manufacturers created. By using the AI-powered toolkit, consumers, manufacturers, and food safety regulators are engaged in determining the different linguistic and cultural barriers to the effective communication of health claims and formulating solutions that can be implemented on multiple levels, including regulation, enforcement, marketing, and consumer education.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115908465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Top-k Tree Similarity Join Top-k树相似度连接
Jianhua Wang, Jianye Yang, Wenjie Zhang
Tree similarity join is useful for analyzing tree structured data. The traditional threshold-based tree similarity join requires a similarity threshold, which is usually a difficult task for users. To remedy this issue, we advocate the problem of top-k tree similarity join. Given a collection of trees and a parameter k, the top-k tree similarity join aims to find k tree pairs with minimum tree edit distance (TED). Although we show that this problem can be resolved by utilizing the threshold-based join, the efficiency is unsatisfactory. In this paper, we propose an efficient algorithm, namely TopKTJoin, which generates the candidate tree pairs incrementally using an inverted index. We also derive TED lower bound for the unseen tree pairs. Together with TED value of the k-th best join result seen so far, we have a chance to terminate the algorithm early without missing any correct results. To further improve the efficiency, we propose two optimization techniques in terms of index structure and verification mechanism. We conduct comprehensive performance studies on real and synthetic datasets. The experimental results demonstrate that TopKTJoin significantly outperforms the baseline method.
树相似连接对于分析树结构数据非常有用。传统的基于阈值的树相似度连接需要一个相似度阈值,这对用户来说通常是一项困难的任务。为了解决这个问题,我们提出了top-k树相似连接的问题。给定一个树的集合和一个参数k, top-k树相似度连接的目标是找到k个具有最小树编辑距离(TED)的树对。尽管我们展示了这个问题可以通过使用基于阈值的连接来解决,但效率并不令人满意。在本文中,我们提出了一种高效的算法,即TopKTJoin,它使用倒排索引增量生成候选树对。我们也推导了不可见的树对的TED下界。结合到目前为止看到的第k个最佳连接结果的TED值,我们有机会在不丢失任何正确结果的情况下提前终止算法。为了进一步提高效率,我们在索引结构和验证机制方面提出了两种优化技术。我们对真实和合成数据集进行全面的性能研究。实验结果表明,TopKTJoin明显优于基线方法。
{"title":"Top-k Tree Similarity Join","authors":"Jianhua Wang, Jianye Yang, Wenjie Zhang","doi":"10.1145/3459637.3482304","DOIUrl":"https://doi.org/10.1145/3459637.3482304","url":null,"abstract":"Tree similarity join is useful for analyzing tree structured data. The traditional threshold-based tree similarity join requires a similarity threshold, which is usually a difficult task for users. To remedy this issue, we advocate the problem of top-k tree similarity join. Given a collection of trees and a parameter k, the top-k tree similarity join aims to find k tree pairs with minimum tree edit distance (TED). Although we show that this problem can be resolved by utilizing the threshold-based join, the efficiency is unsatisfactory. In this paper, we propose an efficient algorithm, namely TopKTJoin, which generates the candidate tree pairs incrementally using an inverted index. We also derive TED lower bound for the unseen tree pairs. Together with TED value of the k-th best join result seen so far, we have a chance to terminate the algorithm early without missing any correct results. To further improve the efficiency, we propose two optimization techniques in terms of index structure and verification mechanism. We conduct comprehensive performance studies on real and synthetic datasets. The experimental results demonstrate that TopKTJoin significantly outperforms the baseline method.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134397017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Template-guided Clarifying Question Generation for Web Search Clarification 模板引导澄清问题生成的网络搜索澄清
Jian Wang, Wenjie Li
Clarification has attracted much attention because of its many potential applications especially in Web search. Since search queries are very short, the underlying user intents are often ambiguous. This makes it challenging for search engines to return the appropriate results that pertain to the users' actual information needs. To address this issue, asking clarifying questions has been recognized as a critical technique. Although previous studies have analyzed the importance of asking to clarify, generating clarifying questions for Web search remains under-explored. In this paper, we tackle this problem in a template-guided manner. Our objective is jointly learning to select question templates and fill question slots, using Transformer-based networks. We conduct experiments on MIMICS, a collection of datasets containing real Web search queries sampled from Bing's search logs. Our method is demonstrated to achieve significant improvements over various competitive baselines.
由于具有许多潜在的应用,特别是在网络搜索中,澄清引起了人们的广泛关注。由于搜索查询非常短,底层用户的意图通常是不明确的。这使得搜索引擎很难返回与用户实际信息需求相关的适当结果。为了解决这个问题,提出澄清性的问题被认为是一种关键的技巧。尽管之前的研究已经分析了要求澄清的重要性,但是为网络搜索生成澄清性问题的探索仍然不足。在本文中,我们以模板引导的方式来解决这个问题。我们的目标是共同学习选择问题模板和填写问题槽,使用基于transformer的网络。我们在MIMICS上进行实验,MIMICS是一组包含从必应搜索日志中抽样的真实Web搜索查询的数据集。我们的方法被证明在各种竞争基线上取得了显著的改进。
{"title":"Template-guided Clarifying Question Generation for Web Search Clarification","authors":"Jian Wang, Wenjie Li","doi":"10.1145/3459637.3482199","DOIUrl":"https://doi.org/10.1145/3459637.3482199","url":null,"abstract":"Clarification has attracted much attention because of its many potential applications especially in Web search. Since search queries are very short, the underlying user intents are often ambiguous. This makes it challenging for search engines to return the appropriate results that pertain to the users' actual information needs. To address this issue, asking clarifying questions has been recognized as a critical technique. Although previous studies have analyzed the importance of asking to clarify, generating clarifying questions for Web search remains under-explored. In this paper, we tackle this problem in a template-guided manner. Our objective is jointly learning to select question templates and fill question slots, using Transformer-based networks. We conduct experiments on MIMICS, a collection of datasets containing real Web search queries sampled from Bing's search logs. Our method is demonstrated to achieve significant improvements over various competitive baselines.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132168450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction 面向目标的意见词提取的关系感知BERT
Taegwan Kang, Minwoo Lee, Nakyeong Yang, Kyomin Jung
Targeted Opinion Word Extraction (TOWE) is a subtask of aspect-based sentiment analysis, which aims to identify the correspondingopinion terms for given opinion targets in a review. To solve theTOWE task, recent works mainly focus on learning the target-aware context representation that infuses target information intocontext representation by using various neural networks. However,it has been unclear how to encode the target information to BERT,a powerful pre-trained language model. In this paper, we proposea novel TOWE model, RABERT (Relation-Aware BERT), that canfully utilize BERT to obtain target-aware context representations.To introduce the target information into BERT layers clearly, wedesign a simple but effective encoding method that adds targetmarkers indicating the opinion targets to the sentence. In addi-tion, we find that the neighbor word information is also importantfor extracting the opinion terms. Therefore, RABERT employs thetarget-sentence relation network and the neighbor-aware relationnetwork to consider both the opinion target and the neighbor wordsinformation. Our experimental results on four benchmark datasetsshow that RABERT significantly outperforms the other baselinesand achieves state-of-the-art performance. We also demonstrate theeffectiveness of each component of RABERT in further analysis
目标意见词提取(TOWE)是基于方面的情感分析的一个子任务,其目的是为评论中给定的意见目标识别相应的意见词。为了解决towe任务,最近的工作主要集中在学习目标感知上下文表示,通过使用各种神经网络将目标信息注入上下文表示中。然而,如何将目标信息编码到BERT(一种强大的预训练语言模型)中一直不清楚。在本文中,我们提出了一种新的TOWE模型,RABERT(关系感知BERT),它能够利用BERT来获得目标感知的上下文表示。为了将目标信息清晰地引入BERT层,我们设计了一种简单而有效的编码方法,即在句子中添加指示意见目标的目标标记。此外,我们发现邻词信息对提取意见词也很重要。因此,RABERT采用目标-句子关系网络和邻居感知关系网络来同时考虑意见目标和邻居词的信息。我们在四个基准数据集上的实验结果表明,RABERT显著优于其他基准,并达到了最先进的性能。我们还在进一步的分析中证明了RABERT的每个组成部分的有效性
{"title":"RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction","authors":"Taegwan Kang, Minwoo Lee, Nakyeong Yang, Kyomin Jung","doi":"10.1145/3459637.3482165","DOIUrl":"https://doi.org/10.1145/3459637.3482165","url":null,"abstract":"Targeted Opinion Word Extraction (TOWE) is a subtask of aspect-based sentiment analysis, which aims to identify the correspondingopinion terms for given opinion targets in a review. To solve theTOWE task, recent works mainly focus on learning the target-aware context representation that infuses target information intocontext representation by using various neural networks. However,it has been unclear how to encode the target information to BERT,a powerful pre-trained language model. In this paper, we proposea novel TOWE model, RABERT (Relation-Aware BERT), that canfully utilize BERT to obtain target-aware context representations.To introduce the target information into BERT layers clearly, wedesign a simple but effective encoding method that adds targetmarkers indicating the opinion targets to the sentence. In addi-tion, we find that the neighbor word information is also importantfor extracting the opinion terms. Therefore, RABERT employs thetarget-sentence relation network and the neighbor-aware relationnetwork to consider both the opinion target and the neighbor wordsinformation. Our experimental results on four benchmark datasetsshow that RABERT significantly outperforms the other baselinesand achieves state-of-the-art performance. We also demonstrate theeffectiveness of each component of RABERT in further analysis","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
'Could You Describe the Reason for the Transfer?': A Reinforcement Learning Based Voice-Enabled Bot Protecting Customers from Financial Frauds 你能描述一下调职的原因吗?:一个基于强化学习的语音机器人,保护客户免受金融欺诈
Zihao Wang, Fudong Wang, Haipeng Zhang, Minghui Yang, Shaosheng Cao, Zujie Wen, Zhe Zhang
With the booming of the Internet finance and e-payment business, telecom and online fraud has become a serious problem which grows rapidly. In China, 351 billion RMB (approximately 0.3% of China's GDP) was lost in 2018 due to telecommunication and online fraud, influencing tens of millions of individual customers. Anti-fraud algorithms have been widely adopted by major Internet finance companies to detect and block transactions induced by scam. However, due to limited contextual information, most systems would probably mistakenly block the normal transactions, leading to poor user experience. On the other hand, if the transactions induced by scam are detected yet not fully explained to the users, the users will continue to pay, suffering from direct financial losses. To address these problems, we design a voice-enabled bot that interacts with the customers who are involved with potential telecommunication and online frauds decided by the back-end system. The bot seeks additional information from the customers through natural conversations to confirm whether the customers are scammed and identify the actual fraud types. The details about the frauds are then provided to convince the customers that they are on the edge of being scammed. Our bot adopts offline reinforcement learning (RL) to learn dialogue policies from real-world human-human chat logs. During the conversations, our bot also identifies fraud types every turn based on the dialogue state. The bot proposed outperforms baseline dialogue strategies by 2.8% in terms of task success rate, and 5% in terms of dialogue accuracy in offline evaluations. Furthermore, in the 8 months of real-world deployment, our bot lowers the dissatisfaction rate by 25% and increases the fraud prevention rate by 135% relatively, indicating a significant improvement in user experience as well as anti-fraud effectiveness. More importantly, we help prevent millions of users from being deceived, and avoid trillions of financial losses.
随着互联网金融和电子支付业务的蓬勃发展,电信和网络诈骗已成为一个严重的问题,并迅速增长。在中国,由于电信和网络欺诈,2018年损失了3510亿元人民币(约占中国GDP的0.3%),影响了数千万个人客户。反欺诈算法已被各大互联网金融公司广泛采用,用于检测和阻止诈骗引发的交易。然而,由于上下文信息有限,大多数系统可能会错误地阻止正常的事务,从而导致糟糕的用户体验。另一方面,如果发现了骗局引发的交易,但没有向用户充分解释,用户将继续支付,遭受直接的经济损失。为了解决这些问题,我们设计了一个支持语音的机器人,它与后端系统决定的涉及潜在电信和在线欺诈的客户进行交互。该机器人通过自然对话从客户那里获取额外信息,以确认客户是否被骗,并识别实际的欺诈类型。然后提供有关欺诈的细节,以使客户相信他们正处于被骗的边缘。我们的机器人采用离线强化学习(RL)从现实世界的人际聊天日志中学习对话策略。在对话过程中,我们的机器人还根据对话状态识别每一轮的欺诈类型。在任务成功率方面,该机器人比基线对话策略高出2.8%,在离线评估中,对话准确率高出5%。此外,在8个月的实际部署中,我们的机器人将不满意率降低了25%,相对提高了135%的防欺诈率,这表明在用户体验和反欺诈效果方面都有了显著的改善。更重要的是,我们帮助防止数百万用户被欺骗,避免数万亿美元的经济损失。
{"title":"'Could You Describe the Reason for the Transfer?': A Reinforcement Learning Based Voice-Enabled Bot Protecting Customers from Financial Frauds","authors":"Zihao Wang, Fudong Wang, Haipeng Zhang, Minghui Yang, Shaosheng Cao, Zujie Wen, Zhe Zhang","doi":"10.1145/3459637.3481906","DOIUrl":"https://doi.org/10.1145/3459637.3481906","url":null,"abstract":"With the booming of the Internet finance and e-payment business, telecom and online fraud has become a serious problem which grows rapidly. In China, 351 billion RMB (approximately 0.3% of China's GDP) was lost in 2018 due to telecommunication and online fraud, influencing tens of millions of individual customers. Anti-fraud algorithms have been widely adopted by major Internet finance companies to detect and block transactions induced by scam. However, due to limited contextual information, most systems would probably mistakenly block the normal transactions, leading to poor user experience. On the other hand, if the transactions induced by scam are detected yet not fully explained to the users, the users will continue to pay, suffering from direct financial losses. To address these problems, we design a voice-enabled bot that interacts with the customers who are involved with potential telecommunication and online frauds decided by the back-end system. The bot seeks additional information from the customers through natural conversations to confirm whether the customers are scammed and identify the actual fraud types. The details about the frauds are then provided to convince the customers that they are on the edge of being scammed. Our bot adopts offline reinforcement learning (RL) to learn dialogue policies from real-world human-human chat logs. During the conversations, our bot also identifies fraud types every turn based on the dialogue state. The bot proposed outperforms baseline dialogue strategies by 2.8% in terms of task success rate, and 5% in terms of dialogue accuracy in offline evaluations. Furthermore, in the 8 months of real-world deployment, our bot lowers the dissatisfaction rate by 25% and increases the fraud prevention rate by 135% relatively, indicating a significant improvement in user experience as well as anti-fraud effectiveness. More importantly, we help prevent millions of users from being deceived, and avoid trillions of financial losses.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132618705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PATROL: A Velocity Control Framework for Autonomous Vehicle via Spatial-Temporal Reinforcement Learning 基于时空强化学习的自动驾驶车辆速度控制框架
Zhi Xu, Shuncheng Liu, Ziniu Wu, Xu Chen, Kai Zeng, K. Zheng, Han Su
The largest portion of urban congestion is caused by 'phantom' traffic jams, causing significant delay travel time, fuel waste, and air pollution. It frequently occurs in high-density traffics without any obvious signs of accidents or roadworks. The root cause of 'phantom' traffic jams in one-lane traffics is the sudden change in velocity of some vehicles (i.e. harsh driving behavior (HDB)), which may generate a chain reaction with accumulated impact throughout the vehicles along the lane. This paper makes the first attempt to address this notorious problem in a one-lane traffic environment through velocity control of autonomous vehicles. Specifically, we propose a velocity control framework, called PATROL (sPAtial-temporal ReinfOrcement Learning). First, we design a spatial-temporal graph inside the reinforcement learning model to process and extract the information (e.g. velocity and distance difference) of multiple vehicles ahead across several historical time steps in the interactive environment. Then, we propose an attention mechanism to characterize the vehicle interactions and an LSTM structure to understand the vehicles' driving patterns through time. At last, we modify the reward function used in previous velocity control works to enable the autonomous driving agent to predict the HDB of preceding vehicles and smoothly adjust its velocity, which could alleviate the chain reaction caused by HDB. We conduct extensive experiments to demonstrate the effectiveness and superiority of PATROL in alleviating the 'phantom' traffic jam in simulation environments. Further, on the real-world velocity control dataset, our method significantly outperforms the existing methods in terms of driving safety, comfortability, and efficiency.
城市拥堵的最大部分是由“虚幻”交通堵塞造成的,造成严重的旅行延误时间、燃料浪费和空气污染。它经常发生在高密度的交通中,没有任何明显的事故迹象或道路施工。单车道交通中“幽灵”交通堵塞的根本原因是一些车辆的速度突然变化(即恶劣的驾驶行为(HDB)),这可能会产生连锁反应,并累积影响整个车道上的车辆。本文首次尝试通过自动驾驶汽车的速度控制来解决单车道交通环境中这个臭名昭著的问题。具体来说,我们提出了一个速度控制框架,称为PATROL(时空强化学习)。首先,我们在强化学习模型内部设计了一个时空图,以处理和提取交互环境中多个历史时间步长的前方多辆车的信息(例如速度和距离差)。然后,我们提出了一个注意力机制来表征车辆之间的相互作用,并提出了一个LSTM结构来理解车辆随时间的驾驶模式。最后,我们修改了之前速度控制工作中使用的奖励函数,使自动驾驶代理能够预测前车的HDB并平滑地调整其速度,从而缓解HDB引起的连锁反应。我们进行了大量的实验,以证明PATROL在缓解模拟环境中“幽灵”交通堵塞方面的有效性和优越性。此外,在真实世界的速度控制数据集上,我们的方法在驾驶安全性、舒适性和效率方面显著优于现有方法。
{"title":"PATROL: A Velocity Control Framework for Autonomous Vehicle via Spatial-Temporal Reinforcement Learning","authors":"Zhi Xu, Shuncheng Liu, Ziniu Wu, Xu Chen, Kai Zeng, K. Zheng, Han Su","doi":"10.1145/3459637.3482283","DOIUrl":"https://doi.org/10.1145/3459637.3482283","url":null,"abstract":"The largest portion of urban congestion is caused by 'phantom' traffic jams, causing significant delay travel time, fuel waste, and air pollution. It frequently occurs in high-density traffics without any obvious signs of accidents or roadworks. The root cause of 'phantom' traffic jams in one-lane traffics is the sudden change in velocity of some vehicles (i.e. harsh driving behavior (HDB)), which may generate a chain reaction with accumulated impact throughout the vehicles along the lane. This paper makes the first attempt to address this notorious problem in a one-lane traffic environment through velocity control of autonomous vehicles. Specifically, we propose a velocity control framework, called PATROL (sPAtial-temporal ReinfOrcement Learning). First, we design a spatial-temporal graph inside the reinforcement learning model to process and extract the information (e.g. velocity and distance difference) of multiple vehicles ahead across several historical time steps in the interactive environment. Then, we propose an attention mechanism to characterize the vehicle interactions and an LSTM structure to understand the vehicles' driving patterns through time. At last, we modify the reward function used in previous velocity control works to enable the autonomous driving agent to predict the HDB of preceding vehicles and smoothly adjust its velocity, which could alleviate the chain reaction caused by HDB. We conduct extensive experiments to demonstrate the effectiveness and superiority of PATROL in alleviating the 'phantom' traffic jam in simulation environments. Further, on the real-world velocity control dataset, our method significantly outperforms the existing methods in terms of driving safety, comfortability, and efficiency.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133084416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1