首页 > 最新文献

Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

英文 中文
Beyond Sessions: Exploiting Hybrid Contextual Information for Web Search 超越会话:为网络搜索开发混合上下文信息
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372179
Jia Chen
It is essential to fully understand user intents for the optimization of downstream tasks such as document ranking and query suggestion in web search. As users tend to submit ambiguous queries, numer- ous studies utilize contextual information such as query sequence and user clicks for the auxiliary of user intent modeling. Most of these work adopted Recurrent Neural Network (RNN) based frame- works to encode sequential information within a session, which is hard to realize parallel computation. To this end, we plan to adopt attention-based units to generate context-aware representations for elements in sessions. As intra-session contexts are deficient for handling the data sparsity and cold-start problems in session search, we would also attempt to integrate cross-session dependen- cies by constructing session graphs on the whole corpus to enrich the representation of queries and documents.
充分了解用户意图对于优化web搜索中的文档排序和查询建议等下游任务至关重要。由于用户倾向于提交模棱两可的查询,许多研究利用上下文信息(如查询序列和用户点击)作为用户意图建模的辅助。这些工作大多采用基于递归神经网络(RNN)的帧工作对一个会话内的顺序信息进行编码,难以实现并行计算。为此,我们计划采用基于注意力的单元来为会话中的元素生成上下文感知的表示。由于会话上下文在处理会话搜索中的数据稀疏性和冷启动问题方面存在不足,我们还将尝试通过在整个语料库上构建会话图来集成跨会话依赖关系,以丰富查询和文档的表示。
{"title":"Beyond Sessions: Exploiting Hybrid Contextual Information for Web Search","authors":"Jia Chen","doi":"10.1145/3336191.3372179","DOIUrl":"https://doi.org/10.1145/3336191.3372179","url":null,"abstract":"It is essential to fully understand user intents for the optimization of downstream tasks such as document ranking and query suggestion in web search. As users tend to submit ambiguous queries, numer- ous studies utilize contextual information such as query sequence and user clicks for the auxiliary of user intent modeling. Most of these work adopted Recurrent Neural Network (RNN) based frame- works to encode sequential information within a session, which is hard to realize parallel computation. To this end, we plan to adopt attention-based units to generate context-aware representations for elements in sessions. As intra-session contexts are deficient for handling the data sparsity and cold-start problems in session search, we would also attempt to integrate cross-session dependen- cies by constructing session graphs on the whole corpus to enrich the representation of queries and documents.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126150727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
NLP4REC: The WSDM 2020 Workshop on Natural Language Processing for Recommendations NLP4REC: WSDM 2020自然语言处理推荐研讨会
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371884
Pengjie Ren, Z. Ren, Fei Sun, Xiangnan He, Dawei Yin, M. de Rijke
Natural language processing is becoming more and more important in recommender systems. This half day workshop explores challenges and potential research directions in Recommender Systems (RSs) combining Natural Language Processing (NLP). The focus will be on stimulating discussions around how to combine natural language processing technologies with recommendation. We welcome theoretical, experimental, and methodological studies that leverage NLP technologies to advance recommender systems, as well as emphasize the applicability in practical applications. The workshop aims to bring together a diverse set of researchers and practitioners interested in investigating the interaction between NLP and RSs to develop more intelligent RSs.
自然语言处理在推荐系统中的作用越来越重要。这个为期半天的研讨会探讨了结合自然语言处理(NLP)的推荐系统(RSs)的挑战和潜在的研究方向。会议的重点将是激发关于如何将自然语言处理技术与推荐技术相结合的讨论。我们欢迎利用自然语言处理技术来推进推荐系统的理论、实验和方法研究,并强调其在实际应用中的适用性。研讨会的目的是汇集不同的研究人员和实践者,他们对研究自然语言处理和RSs之间的相互作用感兴趣,以开发更智能的RSs。
{"title":"NLP4REC: The WSDM 2020 Workshop on Natural Language Processing for Recommendations","authors":"Pengjie Ren, Z. Ren, Fei Sun, Xiangnan He, Dawei Yin, M. de Rijke","doi":"10.1145/3336191.3371884","DOIUrl":"https://doi.org/10.1145/3336191.3371884","url":null,"abstract":"Natural language processing is becoming more and more important in recommender systems. This half day workshop explores challenges and potential research directions in Recommender Systems (RSs) combining Natural Language Processing (NLP). The focus will be on stimulating discussions around how to combine natural language processing technologies with recommendation. We welcome theoretical, experimental, and methodological studies that leverage NLP technologies to advance recommender systems, as well as emphasize the applicability in practical applications. The workshop aims to bring together a diverse set of researchers and practitioners interested in investigating the interaction between NLP and RSs to develop more intelligent RSs.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124611039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ADMM SLIM: Sparse Recommendations for Many Users ADMM SLIM:针对许多用户的稀疏推荐
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371774
H. Steck, Maria Dimakopoulou, Nickolai Riabov, T. Jebara
The Sparse Linear Method (SLIM) is a well-established approach for top-N recommendations. This article proposes several improvements that are enabled by the Alternating Directions Method of Multipliers (ADMM), a well-known optimization method with many application areas. First, we show that optimizing the original SLIM-objective by ADMM results in an approach where the training time is independent of the number of users in the training data, and hence trivially scales to large numbers of users. Second, the flexibility of ADMM allows us to switch on and off the various constraints and regularization terms in the original SLIM-objective, in order to empirically assess their contributions to ranking accuracy on given data. Third, we also propose two extensions to the original SLIM training-objective in order to improve recommendation accuracy further without increasing the computational cost. In our experiments on three well-known data-sets, we first compare to the original SLIM-implementation and find that not only ADMM reduces training time considerably, but also achieves an improvement in recommendation accuracy due to better optimization. We then compare to various state-of-the-art approaches and observe up to 25% improvement in recommendation accuracy in our experiments. Finally, we evaluate the importance of sparsity and the non-negativity constraint in the original SLIM-objective with sub-sampling experiments that simulate scenarios of cold-starting and large catalog sizes compared to relatively small user base, which often occur in practice.
稀疏线性方法(SLIM)是一种成熟的top-N推荐方法。本文提出了乘法器交替方向法(ADMM)的几个改进,ADMM是一种众所周知的优化方法,具有许多应用领域。首先,我们证明了通过ADMM优化原始SLIM-objective的结果是训练时间与训练数据中的用户数量无关,因此可以轻松扩展到大量用户。其次,ADMM的灵活性允许我们打开和关闭原始SLIM-objective中的各种约束和正则化项,以便根据经验评估它们对给定数据的排名准确性的贡献。第三,我们在原有SLIM训练目标的基础上提出了两个扩展,在不增加计算成本的前提下进一步提高推荐准确率。在我们对三个知名数据集的实验中,我们首先与原始SLIM-implementation进行了比较,发现ADMM不仅大大减少了训练时间,而且由于更好的优化,推荐准确率也得到了提高。然后,我们比较了各种最先进的方法,并观察到在我们的实验中推荐准确性提高了25%。最后,我们评估了稀疏性和非负性约束在原始slim目标中的重要性,通过模拟实际中经常发生的冷启动和大目录规模相比于相对较小的用户群的子抽样实验。
{"title":"ADMM SLIM: Sparse Recommendations for Many Users","authors":"H. Steck, Maria Dimakopoulou, Nickolai Riabov, T. Jebara","doi":"10.1145/3336191.3371774","DOIUrl":"https://doi.org/10.1145/3336191.3371774","url":null,"abstract":"The Sparse Linear Method (SLIM) is a well-established approach for top-N recommendations. This article proposes several improvements that are enabled by the Alternating Directions Method of Multipliers (ADMM), a well-known optimization method with many application areas. First, we show that optimizing the original SLIM-objective by ADMM results in an approach where the training time is independent of the number of users in the training data, and hence trivially scales to large numbers of users. Second, the flexibility of ADMM allows us to switch on and off the various constraints and regularization terms in the original SLIM-objective, in order to empirically assess their contributions to ranking accuracy on given data. Third, we also propose two extensions to the original SLIM training-objective in order to improve recommendation accuracy further without increasing the computational cost. In our experiments on three well-known data-sets, we first compare to the original SLIM-implementation and find that not only ADMM reduces training time considerably, but also achieves an improvement in recommendation accuracy due to better optimization. We then compare to various state-of-the-art approaches and observe up to 25% improvement in recommendation accuracy in our experiments. Finally, we evaluate the importance of sparsity and the non-negativity constraint in the original SLIM-objective with sub-sampling experiments that simulate scenarios of cold-starting and large catalog sizes compared to relatively small user base, which often occur in practice.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Overlapping Community Detection in Static and Dynamic Networks 静态和动态网络中的重叠社区检测
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372185
Renny Márquez
Studying behavior of systems through networks is important because it allows to understand them and make decisions based on this knowledge. Community detection is one of the tools used in this sense, for detection of groups in graphs. This can be done not only considering connections between nodes, but also including their attributes. Also, objects can be part of different groups with varying degrees, so overlapping fuzzy assignment is relevant in this context. Furthermore, most networks change overtime, so including this aspect also enhance the benefits of using community detection. Hence, in this doctoral thesis we propose to design models for overlapping community detection for static and dynamic networks with node attributes. Firstly, an approach based on a nonnegative matrix factorization generative model that automatically detects the number of communities in the network, is designed. Secondly, tensor factorization is used in order to overcome some of the challenges faced in the first model.
通过网络研究系统的行为是很重要的,因为它允许理解它们并根据这些知识做出决策。社区检测是在这种意义上使用的工具之一,用于检测图中的组。这不仅可以考虑节点之间的连接,还可以考虑它们的属性。此外,对象可以是不同程度的不同组的一部分,因此重叠模糊分配在这种情况下是相关的。此外,大多数网络会随着时间的推移而变化,因此包含这一方面也增强了使用社区检测的好处。因此,在本博士论文中,我们提出设计具有节点属性的静态和动态网络的重叠社区检测模型。首先,设计了一种基于非负矩阵分解生成模型的自动检测社区数量的方法。其次,为了克服第一个模型所面临的一些挑战,使用了张量分解。
{"title":"Overlapping Community Detection in Static and Dynamic Networks","authors":"Renny Márquez","doi":"10.1145/3336191.3372185","DOIUrl":"https://doi.org/10.1145/3336191.3372185","url":null,"abstract":"Studying behavior of systems through networks is important because it allows to understand them and make decisions based on this knowledge. Community detection is one of the tools used in this sense, for detection of groups in graphs. This can be done not only considering connections between nodes, but also including their attributes. Also, objects can be part of different groups with varying degrees, so overlapping fuzzy assignment is relevant in this context. Furthermore, most networks change overtime, so including this aspect also enhance the benefits of using community detection. Hence, in this doctoral thesis we propose to design models for overlapping community detection for static and dynamic networks with node attributes. Firstly, an approach based on a nonnegative matrix factorization generative model that automatically detects the number of communities in the network, is designed. Secondly, tensor factorization is used in order to overcome some of the challenges faced in the first model.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122257595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
MRAEA
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371804
Xin Mao, Wenting Wang, Huimin Xu, Man Lan, Yuanbin Wu
Entity alignment to find equivalent entities in cross-lingual Knowledge Graphs (KGs) plays a vital role in automatically integrating multiple KGs. Existing translation-based entity alignment methods jointly model the cross-lingual knowledge and monolingual knowledge into one unified optimization problem. On the other hand, the Graph Neural Network (GNN) based methods either ignore the node differentiations, or represent relation through entity or triple instances. They all fail to model the meta semantics embedded in relation nor complex relations such as n-to-n and multi-graphs. To tackle these challenges, we propose a novel Meta Relation Aware Entity Alignment (MRAEA) to directly model cross-lingual entity embeddings by attending over the node's incoming and outgoing neighbors and its connected relations' meta semantics. In addition, we also propose a simple and effective bi-directional iterative strategy to add new aligned seeds during training. Our experiments on all three benchmark entity alignment datasets show that our approach consistently outperforms the state-of-the-art methods, exceeding by 15%-58% on Hit@1. Through an extensive ablation study, we validate that the proposed meta relation aware representations, relation aware self-attention and bi-directional iterative strategy of new seed selection all make contributions to significant performance improvement. The code is available at https://github.com/MaoXinn/MRAEA.
{"title":"MRAEA","authors":"Xin Mao, Wenting Wang, Huimin Xu, Man Lan, Yuanbin Wu","doi":"10.1145/3336191.3371804","DOIUrl":"https://doi.org/10.1145/3336191.3371804","url":null,"abstract":"Entity alignment to find equivalent entities in cross-lingual Knowledge Graphs (KGs) plays a vital role in automatically integrating multiple KGs. Existing translation-based entity alignment methods jointly model the cross-lingual knowledge and monolingual knowledge into one unified optimization problem. On the other hand, the Graph Neural Network (GNN) based methods either ignore the node differentiations, or represent relation through entity or triple instances. They all fail to model the meta semantics embedded in relation nor complex relations such as n-to-n and multi-graphs. To tackle these challenges, we propose a novel Meta Relation Aware Entity Alignment (MRAEA) to directly model cross-lingual entity embeddings by attending over the node's incoming and outgoing neighbors and its connected relations' meta semantics. In addition, we also propose a simple and effective bi-directional iterative strategy to add new aligned seeds during training. Our experiments on all three benchmark entity alignment datasets show that our approach consistently outperforms the state-of-the-art methods, exceeding by 15%-58% on Hit@1. Through an extensive ablation study, we validate that the proposed meta relation aware representations, relation aware self-attention and bi-directional iterative strategy of new seed selection all make contributions to significant performance improvement. The code is available at https://github.com/MaoXinn/MRAEA.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122275021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hierarchical User Profiling for E-commerce Recommender Systems 电子商务推荐系统的分层用户分析
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371827
Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, Dawei Yin
Hierarchical user profiling that aims to model users' real-time interests in different granularity is an essential issue for personalized recommendations in E-commerce. On one hand, items (i.e. products) are usually organized hierarchically in categories, and correspondingly users' interests are naturally hierarchical on different granularity of items and categories. On the other hand, multiple granularity oriented recommendations become very popular in E-commerce sites, which require hierarchical user profiling in different granularity as well. In this paper, we propose HUP, a Hierarchical User Profiling framework to solve the hierarchical user profiling problem in E-commerce recommender systems. In HUP, we provide a Pyramid Recurrent Neural Networks, equipped with Behavior-LSTM to formulate users' hierarchical real-time interests at multiple scales. Furthermore, instead of simply utilizing users' item-level behaviors (e.g., ratings or clicks) in conventional methods, HUP harvests the sequential information of users' temporal finely-granular interactions (micro-behaviors, e.g., clicks on components of items like pictures or comments, browses with navigation of the search engines or recommendations) for modeling. Extensive experiments on two real-world E-commerce datasets demonstrate the significant performance gains of the HUP against state-of-the-art methods for the hierarchical user profiling and recommendation problems. We release the codes and datasets at https://github.com/guyulongcs/WSDM2020_HUP.
分层用户分析是电子商务中个性化推荐的一个重要问题,它旨在对用户的实时兴趣进行不同粒度的建模。一方面,物品(即产品)通常按类别进行分层组织,相应地,用户的兴趣自然会在物品和类别的不同粒度上进行分层。另一方面,面向多个粒度的推荐在电子商务网站中变得非常流行,这也需要不同粒度的分层用户分析。为了解决电子商务推荐系统中的分层用户分析问题,本文提出了分层用户分析框架HUP。在HUP中,我们提供了一个金字塔递归神经网络,配备了Behavior-LSTM来制定用户在多个尺度上的分层实时兴趣。此外,与传统方法中简单地利用用户的项目级行为(例如,评分或点击)不同,HUP收集用户时间细粒度交互(微观行为,例如,点击图片或评论等项目组件,浏览搜索引擎导航或推荐)的顺序信息进行建模。在两个真实世界的电子商务数据集上进行的大量实验表明,在分层用户分析和推荐问题上,HUP相对于最先进的方法取得了显著的性能提升。我们在https://github.com/guyulongcs/WSDM2020_HUP上发布代码和数据集。
{"title":"Hierarchical User Profiling for E-commerce Recommender Systems","authors":"Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, Dawei Yin","doi":"10.1145/3336191.3371827","DOIUrl":"https://doi.org/10.1145/3336191.3371827","url":null,"abstract":"Hierarchical user profiling that aims to model users' real-time interests in different granularity is an essential issue for personalized recommendations in E-commerce. On one hand, items (i.e. products) are usually organized hierarchically in categories, and correspondingly users' interests are naturally hierarchical on different granularity of items and categories. On the other hand, multiple granularity oriented recommendations become very popular in E-commerce sites, which require hierarchical user profiling in different granularity as well. In this paper, we propose HUP, a Hierarchical User Profiling framework to solve the hierarchical user profiling problem in E-commerce recommender systems. In HUP, we provide a Pyramid Recurrent Neural Networks, equipped with Behavior-LSTM to formulate users' hierarchical real-time interests at multiple scales. Furthermore, instead of simply utilizing users' item-level behaviors (e.g., ratings or clicks) in conventional methods, HUP harvests the sequential information of users' temporal finely-granular interactions (micro-behaviors, e.g., clicks on components of items like pictures or comments, browses with navigation of the search engines or recommendations) for modeling. Extensive experiments on two real-world E-commerce datasets demonstrate the significant performance gains of the HUP against state-of-the-art methods for the hierarchical user profiling and recommendation problems. We release the codes and datasets at https://github.com/guyulongcs/WSDM2020_HUP.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126574228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Entities with Quantities: Extraction, Search, and Ranking 有数量的实体:抽取、搜索和排序
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371860
Vinh Thinh Ho, K. Pal, Niko Kleer, K. Berberich, G. Weikum
Quantities are more than numeric values. They represent measures for entities, expressed in numbers with associated units. Search queries often include quantities, such as athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. Processing such queries requires understanding the quantities, where capturing the surrounding context is an essential part of it. Although modern search engines or QA systems handle entity-centric queries well, they consider numbers and units as simple keywords, and therefore fail to understand the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.) As a result, they cannot generate the correct candidate answers. In this work, we demonstrate a prototype QA system, called Qsearch, that can handle advanced queries with quantity constraints using the common cues present in both query and the text sources.
数量不仅仅是数值。它们代表实体的度量,用带有相关单位的数字表示。搜索查询通常包含数量,比如在20秒内跑完2亿的运动员,或者季度收入超过20亿美元的公司。处理这样的查询需要理解数量,而捕获周围的上下文是其中的一个重要部分。尽管现代搜索引擎或QA系统可以很好地处理以实体为中心的查询,但它们将数字和单位视为简单的关键字,因此无法理解条件(少于,高于等),感兴趣的单位(秒,美元等)以及数量的上下文(200m比赛,季度收入等),因此它们无法生成正确的候选答案。在这项工作中,我们演示了一个原型QA系统,称为Qsearch,它可以使用查询和文本源中存在的常见线索处理带有数量约束的高级查询。
{"title":"Entities with Quantities: Extraction, Search, and Ranking","authors":"Vinh Thinh Ho, K. Pal, Niko Kleer, K. Berberich, G. Weikum","doi":"10.1145/3336191.3371860","DOIUrl":"https://doi.org/10.1145/3336191.3371860","url":null,"abstract":"Quantities are more than numeric values. They represent measures for entities, expressed in numbers with associated units. Search queries often include quantities, such as athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. Processing such queries requires understanding the quantities, where capturing the surrounding context is an essential part of it. Although modern search engines or QA systems handle entity-centric queries well, they consider numbers and units as simple keywords, and therefore fail to understand the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.) As a result, they cannot generate the correct candidate answers. In this work, we demonstrate a prototype QA system, called Qsearch, that can handle advanced queries with quantity constraints using the common cues present in both query and the text sources.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117208437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
SPread: Automated Financial Metric Extraction and Spreading Tool from Earnings Reports 从收益报告中自动提取和传播财务指标工具
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371869
Armineh Nourbakhsh, M. Ghassemi, Steven Pomerville
In this paper, we present SPread, an automated financial metric extraction and spreading tool from earnings reports. The tool is created in a document-agnostic fashion, and uses an interpolation of tagging methods to capture arbitrarily complicated expressions. SPread can handle single-line items as well as metrics broken down into sub-items. A validation layer further improves the performance of upstream modules and enables the tool to reach an F1 performance of more than 87% for metrics expressed in tabular format, and 76% for metrics in free-form text. The results are displayed to end-users in an interactive web interface, which allows them to locate, compare, validate, adjust, and export the values.
在本文中,我们介绍了SPread,一个从收益报告中自动提取和传播财务指标的工具。该工具以与文档无关的方式创建,并使用标记方法的插值来捕获任意复杂的表达式。SPread可以处理单行项目,也可以处理分解成子项目的指标。验证层进一步提高了上游模块的性能,并使该工具能够达到F1性能,以表格格式表示的指标超过87%,以自由格式文本表示的指标超过76%。结果在交互式web界面中显示给最终用户,允许他们定位、比较、验证、调整和导出值。
{"title":"SPread: Automated Financial Metric Extraction and Spreading Tool from Earnings Reports","authors":"Armineh Nourbakhsh, M. Ghassemi, Steven Pomerville","doi":"10.1145/3336191.3371869","DOIUrl":"https://doi.org/10.1145/3336191.3371869","url":null,"abstract":"In this paper, we present SPread, an automated financial metric extraction and spreading tool from earnings reports. The tool is created in a document-agnostic fashion, and uses an interpolation of tagging methods to capture arbitrarily complicated expressions. SPread can handle single-line items as well as metrics broken down into sub-items. A validation layer further improves the performance of upstream modules and enables the tool to reach an F1 performance of more than 87% for metrics expressed in tabular format, and 76% for metrics in free-form text. The results are displayed to end-users in an interactive web interface, which allows them to locate, compare, validate, adjust, and export the values.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116779073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Workshop on Privacy in NLP (PrivateNLP 2020) NLP私隐工作坊(PrivateNLP 2020)
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371881
Oluwaseyi Feyisetan, S. Ghanavati, Patricia Thaine
Privacy-preserving data analysis has become essential in Machine Learning (ML), where access to vast amounts of data can provide large gains the in accuracies of tuned models. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants. It is therefore important for curated natural language datasets to preserve the privacy of the users whose data is collected and for the models trained on sensitive data to only retain non-identifying (i.e., generalizable) information. The workshop aims to bring together researchers and practitioners from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy-preserving systems in the context of Natural Language Processing (NLP).
保护隐私的数据分析在机器学习(ML)中变得至关重要,在机器学习中,对大量数据的访问可以大大提高调优模型的准确性。很大一部分用户贡献的数据来自自然语言,例如语音助手的文本转录。因此,对于精心策划的自然语言数据集来说,保护被收集数据的用户的隐私以及对敏感数据进行训练的模型只保留非识别(即可概括)信息是很重要的。研讨会旨在汇集来自学术界和工业界的研究人员和实践者,讨论在自然语言处理(NLP)背景下设计、构建、验证和测试隐私保护系统的挑战和方法。
{"title":"Workshop on Privacy in NLP (PrivateNLP 2020)","authors":"Oluwaseyi Feyisetan, S. Ghanavati, Patricia Thaine","doi":"10.1145/3336191.3371881","DOIUrl":"https://doi.org/10.1145/3336191.3371881","url":null,"abstract":"Privacy-preserving data analysis has become essential in Machine Learning (ML), where access to vast amounts of data can provide large gains the in accuracies of tuned models. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants. It is therefore important for curated natural language datasets to preserve the privacy of the users whose data is collected and for the models trained on sensitive data to only retain non-identifying (i.e., generalizable) information. The workshop aims to bring together researchers and practitioners from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy-preserving systems in the context of Natural Language Processing (NLP).","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114697931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Predicting Human Mobility via Attentive Convolutional Network 通过细心卷积网络预测人类移动性
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371846
Congcong Miao, Ziyan Luo, Fengzhu Zeng, Jilong Wang
Predicting human mobility is an important trajectory mining task for various applications, ranging from smart city planning to personalized recommendation system. While most of previous works adopt GPS tracking data to model human mobility, the recent fast-growing geo-tagged social media (GTSM) data brings new opportunities to this task. However, predicting human mobility on GTSM data is not trivial because of three challenges: 1) extreme data sparsity; 2) high order sequential patterns of human mobility and 3) evolving preference of users for tagging. In this paper, we propose ACN, an attentive convolutional network model for predicting human mobility from sparse and complex GTSM data. In ACN, we firstly design a multi-dimension embedding layer which jointly embeds key features (i.e., spatial, temporal and user features) that govern human mobility. Then, we regard the embedded trajectory as an "image" and learn short-term sequential patterns as local features of the image using convolution filters. Instead of directly using convention filters, we design hybrid dilated and separable convolution filters to effectively capture high order sequential patterns from lengthy trajectory. In addition, we propose an attention mechanism which learns the user long-term preference to augment convolutional network for mobility prediction. We conduct extensive experiments on three publicly available GTSM datasets to evaluate the effectiveness of our model. The results demonstrate that ACN consistently outperforms existing state-of-art mobility prediction approaches on a variety of common evaluation metrics.
从智能城市规划到个性化推荐系统,预测人类的移动性是一项重要的轨迹挖掘任务。虽然以前的大多数工作采用GPS跟踪数据来模拟人类的移动,但最近快速增长的地理标记社交媒体(GTSM)数据为这一任务带来了新的机会。然而,在GTSM数据上预测人类流动性并非易事,因为存在三个挑战:1)极端的数据稀疏性;2)人类活动的高阶顺序模式和3)用户对标签的偏好演变。在本文中,我们提出了一种关注卷积网络模型ACN,用于从稀疏和复杂的GTSM数据中预测人类迁移。在ACN中,我们首先设计了一个多维嵌入层,该嵌入层联合嵌入了控制人类移动性的关键特征(即空间、时间和用户特征)。然后,我们将嵌入的轨迹视为“图像”,并使用卷积滤波器学习短期序列模式作为图像的局部特征。我们设计了混合扩展和可分离卷积滤波器,以有效地捕获长轨迹中的高阶序列模式,而不是直接使用常规滤波器。此外,我们提出了一种学习用户长期偏好的注意机制,以增强卷积网络的移动性预测。我们在三个公开可用的GTSM数据集上进行了广泛的实验,以评估我们模型的有效性。结果表明,在各种常见的评估指标上,ACN始终优于现有的最先进的移动性预测方法。
{"title":"Predicting Human Mobility via Attentive Convolutional Network","authors":"Congcong Miao, Ziyan Luo, Fengzhu Zeng, Jilong Wang","doi":"10.1145/3336191.3371846","DOIUrl":"https://doi.org/10.1145/3336191.3371846","url":null,"abstract":"Predicting human mobility is an important trajectory mining task for various applications, ranging from smart city planning to personalized recommendation system. While most of previous works adopt GPS tracking data to model human mobility, the recent fast-growing geo-tagged social media (GTSM) data brings new opportunities to this task. However, predicting human mobility on GTSM data is not trivial because of three challenges: 1) extreme data sparsity; 2) high order sequential patterns of human mobility and 3) evolving preference of users for tagging. In this paper, we propose ACN, an attentive convolutional network model for predicting human mobility from sparse and complex GTSM data. In ACN, we firstly design a multi-dimension embedding layer which jointly embeds key features (i.e., spatial, temporal and user features) that govern human mobility. Then, we regard the embedded trajectory as an \"image\" and learn short-term sequential patterns as local features of the image using convolution filters. Instead of directly using convention filters, we design hybrid dilated and separable convolution filters to effectively capture high order sequential patterns from lengthy trajectory. In addition, we propose an attention mechanism which learns the user long-term preference to augment convolutional network for mobility prediction. We conduct extensive experiments on three publicly available GTSM datasets to evaluate the effectiveness of our model. The results demonstrate that ACN consistently outperforms existing state-of-art mobility prediction approaches on a variety of common evaluation metrics.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125273952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of the 13th International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1