首页 > 最新文献

Companion Proceedings of the Web Conference 2021最新文献

英文 中文
Deep Learning meets Knowledge Graphs for Scholarly Data Classification 深度学习满足知识图的学术数据分类
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451361
Fabian Hoppe, D. Dessí, Harald Sack
The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.
科学文献的数量不断增长,这对科研人员管理、发现和探索科研成果提出了越来越大的挑战。因此,科学作品的分类被广泛应用于检索,支持在审稿过程中寻找合适的审稿人,并根据给定的模式组织现有文献。这个分类过程的自动化不仅简化了作者的提交过程,而且还确保了类的一致分配。然而,特别是细粒度类和新的研究领域并没有提供足够的训练数据来自动化这个过程。此外,考虑到大量的非互斥类,训练能够处理多类多标签设置的模型通常是困难的,并且计算成本很高。为了克服这些问题,本工作提出了一个初步的深度学习框架,作为计算机科学学术论文的多标签文本分类的解决方案。提出的模型通过利用类的语义来解决数据不足的问题,类的语义是由类标签的潜在表示明确提供的。本研究使用知识图作为这些所需的外部类定义的来源,通过在DBpedia中识别相应的实体来改进总体分类。
{"title":"Deep Learning meets Knowledge Graphs for Scholarly Data Classification","authors":"Fabian Hoppe, D. Dessí, Harald Sack","doi":"10.1145/3442442.3451361","DOIUrl":"https://doi.org/10.1145/3442442.3451361","url":null,"abstract":"The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134237406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
What Happens Behind the Scene? Towards Fraud Community Detection in E-Commerce from Online to Offline 幕后发生了什么?论电子商务从线上到线下的欺诈群体检测
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451147
Zhao Li, Pengrui Hui, Peng Zhang, Jiaming Huang, Biao Wang, Ling Tian, Ji Zhang, Jianliang Gao, Xing Tang
Fraud behavior poses a severe threat to e-commerce platforms and anti-fraud systems have become indispensable infrastructure of these platforms. Recently, there have been a large number of fraud detection models proposed to monitor online purchasing transactions and extract hidden fraud patterns. Thanks to these fraud detection models, we have observed a significant reduction of committed frauds in the last several years. However, there have been an increasing number of malicious sellers on e-commerce platforms, according to our recent statistics, who purposely circumvent these online fraud detection systems by transferring their fake purchasing behaviors from online to offline. This way, the effectiveness of our existing fraud detection system built based upon online transactions is compromised. To solve this problem, we study in this paper a new problem, called offline fraud community detection, which can greatly strengthen our existing fraud detection systems. We propose a new FRaud COmmunity Detection from Online to Offline (FRODO) framework which combines the strength of both online and offline data views, especially the offline spatial-temporal data, for fraud community discovery. Moreover, a new Multi-view Heterogeneous Graph Neural Network model is proposed within our new FRODO framework which can find anomalous graph patterns such as biclique communities through only a small number of black seeds, i.e., a small number of labeled fraud users. The seeds are processed by a streamlined pipeline of three components comprised of label propagation for a high coverage, multi-view heterogeneous graph neural networks for high-risky fraud user recognition, and spatial-temporal network reconstruction and mining for offline fraud community detection. The extensive experimental results on a large real-life Taobao network, with 20 millions of users, 5 millions of product items and 30 millions of transactions, demonstrate the good effectiveness of the proposed methods.
欺诈行为对电子商务平台构成严重威胁,反欺诈系统已成为电子商务平台不可或缺的基础设施。最近,人们提出了大量的欺诈检测模型来监控网上购物交易并提取隐藏的欺诈模式。多亏了这些欺诈检测模型,我们观察到在过去几年中欺诈行为的显著减少。然而,根据我们最近的统计,电子商务平台上的恶意卖家越来越多,他们故意绕过这些在线欺诈检测系统,将他们的虚假购买行为从线上转移到线下。这样,我们现有的基于网上交易的欺诈检测系统的有效性就会受到损害。为了解决这个问题,本文研究了一个新的问题,即离线欺诈社区检测,它可以大大加强我们现有的欺诈检测系统。我们提出了一种新的欺诈社区从在线到离线检测(FRODO)框架,该框架结合了在线和离线数据视图的优势,特别是离线时空数据,用于欺诈社区的发现。此外,我们在新的FRODO框架内提出了一种新的多视图异构图神经网络模型,该模型仅通过少量黑种子(即少量标记欺诈用户)即可发现异常图模式,如biclique社区。种子通过三部分组成的流线型管道进行处理,包括用于高覆盖率的标签传播,用于高风险欺诈用户识别的多视图异构图神经网络,以及用于离线欺诈社区检测的时空网络重建和挖掘。在一个拥有2000万用户、500万件商品和3000万笔交易的大型现实淘宝网络上进行的大量实验结果表明,所提出的方法具有良好的有效性。
{"title":"What Happens Behind the Scene? Towards Fraud Community Detection in E-Commerce from Online to Offline","authors":"Zhao Li, Pengrui Hui, Peng Zhang, Jiaming Huang, Biao Wang, Ling Tian, Ji Zhang, Jianliang Gao, Xing Tang","doi":"10.1145/3442442.3451147","DOIUrl":"https://doi.org/10.1145/3442442.3451147","url":null,"abstract":"Fraud behavior poses a severe threat to e-commerce platforms and anti-fraud systems have become indispensable infrastructure of these platforms. Recently, there have been a large number of fraud detection models proposed to monitor online purchasing transactions and extract hidden fraud patterns. Thanks to these fraud detection models, we have observed a significant reduction of committed frauds in the last several years. However, there have been an increasing number of malicious sellers on e-commerce platforms, according to our recent statistics, who purposely circumvent these online fraud detection systems by transferring their fake purchasing behaviors from online to offline. This way, the effectiveness of our existing fraud detection system built based upon online transactions is compromised. To solve this problem, we study in this paper a new problem, called offline fraud community detection, which can greatly strengthen our existing fraud detection systems. We propose a new FRaud COmmunity Detection from Online to Offline (FRODO) framework which combines the strength of both online and offline data views, especially the offline spatial-temporal data, for fraud community discovery. Moreover, a new Multi-view Heterogeneous Graph Neural Network model is proposed within our new FRODO framework which can find anomalous graph patterns such as biclique communities through only a small number of black seeds, i.e., a small number of labeled fraud users. The seeds are processed by a streamlined pipeline of three components comprised of label propagation for a high coverage, multi-view heterogeneous graph neural networks for high-risky fraud user recognition, and spatial-temporal network reconstruction and mining for offline fraud community detection. The extensive experimental results on a large real-life Taobao network, with 20 millions of users, 5 millions of product items and 30 millions of transactions, demonstrate the good effectiveness of the proposed methods.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125701167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
C-Rex: A Comprehensive System for Recommending In-Text Citations with Explanations C-Rex:一个综合的推荐文本引用和解释系统
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451366
Michael Färber, Vinzenz Zinecker, Isabela Bragaglia Cartus, S. Celis, Maria Duma
Finding suitable citations for scientific publications can be challenging and time-consuming. To this end, context-aware citation recommendation approaches that recommend publications as candidates for in-text citations have been developed. In this paper, we present C-Rex, a web-based demonstration system available at http://c-rex.org for context-aware citation recommendation based on the Neural Citation Network [5] and millions of publications from the Microsoft Academic Graph. Our system is one of the first online context-aware citation recommendation systems and the first to incorporate not only a deep learning recommendation approach, but also explanation components to help users better understand why papers were recommended. In our offline evaluation, our model performs similarly to the one presented in the original paper and can serve as a basic framework for further implementations. In our online evaluation, we found that the explanations of recommendations increased users’ satisfaction.
为科学出版物寻找合适的引文既具有挑战性又耗时。为此,已经开发了上下文感知的引文推荐方法,将出版物推荐为文本引用的候选人。在本文中,我们介绍了C-Rex,这是一个基于web的演示系统,可在http://c-rex.org上获得,用于基于神经引文网络[5]和来自微软学术图的数百万出版物的上下文感知引文推荐。我们的系统是第一个在线上下文感知引文推荐系统之一,也是第一个不仅采用深度学习推荐方法,而且还包含解释组件以帮助用户更好地理解论文被推荐的原因的系统。在我们的离线评估中,我们的模型的执行类似于原始论文中提出的模型,并且可以作为进一步实现的基本框架。在我们的在线评估中,我们发现推荐的解释提高了用户的满意度。
{"title":"C-Rex: A Comprehensive System for Recommending In-Text Citations with Explanations","authors":"Michael Färber, Vinzenz Zinecker, Isabela Bragaglia Cartus, S. Celis, Maria Duma","doi":"10.1145/3442442.3451366","DOIUrl":"https://doi.org/10.1145/3442442.3451366","url":null,"abstract":"Finding suitable citations for scientific publications can be challenging and time-consuming. To this end, context-aware citation recommendation approaches that recommend publications as candidates for in-text citations have been developed. In this paper, we present C-Rex, a web-based demonstration system available at http://c-rex.org for context-aware citation recommendation based on the Neural Citation Network [5] and millions of publications from the Microsoft Academic Graph. Our system is one of the first online context-aware citation recommendation systems and the first to incorporate not only a deep learning recommendation approach, but also explanation components to help users better understand why papers were recommended. In our offline evaluation, our model performs similarly to the one presented in the original paper and can serve as a basic framework for further implementations. In our online evaluation, we found that the explanations of recommendations increased users’ satisfaction.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123462877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Negative Knowledge for Open-world Wikidata 开放世界维基数据的负知识
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452339
Hiba Arnaout, S. Razniewski, G. Weikum, Jeff Z. Pan
The Wikidata knowledge base (KB) is one of the most popular structured data repositories on the web, containing more than 1 billion statements for over 90 million entities. Like most major KBs, it is nonetheless incomplete and therefore operates under the open-world assumption (OWA) - statements not contained in Wikidata should be assumed to have an unknown truth. The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model. In this paper, we review the challenges arising from the OWA, as well as some specific attempts Wikidata has made to overcome them. We review a statistical inference method for negative statements, called peer-based inference, and present Wikinegata, a platform that implements this inference over Wikidata. We discuss lessons learned from the development of this platform, as well as how the platform can be used both for learning about interesting negations, as well as about modelling challenges inside Wikidata. Wikinegata is available at https://d5demos.mpi-inf.mpg.de/negation.
Wikidata知识库(KB)是web上最流行的结构化数据存储库之一,包含9000多万个实体的10多亿个语句。像大多数主要的KBs一样,它仍然是不完整的,因此在开放世界假设(OWA)下运行——不包含在维基数据中的语句应该被假设为具有未知的真理。然而,OWA忽略了有趣的知识中有很大一部分是负面的,这在此数据模型中无法轻易表达。在本文中,我们回顾了OWA带来的挑战,以及维基数据为克服这些挑战所做的一些具体尝试。我们回顾了一种消极陈述的统计推理方法,称为基于对等的推理,并提出了Wikinegata,一个在Wikidata上实现这种推理的平台。我们讨论了从这个平台的开发中获得的经验教训,以及如何使用这个平台来学习有趣的否定,以及如何在维基数据内部建模挑战。维基百科的网址是https://d5demos.mpi-inf.mpg.de/negation。
{"title":"Negative Knowledge for Open-world Wikidata","authors":"Hiba Arnaout, S. Razniewski, G. Weikum, Jeff Z. Pan","doi":"10.1145/3442442.3452339","DOIUrl":"https://doi.org/10.1145/3442442.3452339","url":null,"abstract":"The Wikidata knowledge base (KB) is one of the most popular structured data repositories on the web, containing more than 1 billion statements for over 90 million entities. Like most major KBs, it is nonetheless incomplete and therefore operates under the open-world assumption (OWA) - statements not contained in Wikidata should be assumed to have an unknown truth. The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model. In this paper, we review the challenges arising from the OWA, as well as some specific attempts Wikidata has made to overcome them. We review a statistical inference method for negative statements, called peer-based inference, and present Wikinegata, a platform that implements this inference over Wikidata. We discuss lessons learned from the development of this platform, as well as how the platform can be used both for learning about interesting negations, as well as about modelling challenges inside Wikidata. Wikinegata is available at https://d5demos.mpi-inf.mpg.de/negation.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Deep End-to-end Hand Detection Application On Mobile Device Based On Web Of Things 基于物联网的移动设备深度端到端手部检测应用
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451141
Linjuan Ma, Fuquan Zhang
In this paper, a novel end-to-end hand detection method YOLObile-KCF on mobile device based on Web of Things (WoT) is presented, which can also be applied in practice. While hand detection has been become a hot topic in recent years, little attention has been paid to the practical use of hand detection on mobile device. It is demonstrated that our hand detection system can effectively detect and track hand with high accuracy and fast speed that enables us not only to communicate with each other on mobile devices, but also can assist and guide the people on the other side on the mobile device in real-time. The method used in our study is known as object detection, which is a working theory based on deep learning. And lightweight neural network suitable for mobile device which can has few parameters and easily deployed is adopted in our model. What's more, KCF algorithms is added in our model. And several experiments were carried out to test the validity of hand detection system. From the experiment, it came to realize that the YOLObile-KCF hand detection system based on WoT is considerable, which is more efficient and convenient in smart life. Our work involving studies of hand detection for smart life proves to be encouraging.
本文提出了一种基于物联网(Web of Things, WoT)的移动设备端到端手部检测方法YOLObile-KCF,该方法也可以应用于实际。虽然手部检测是近年来的一个热门话题,但在移动设备上的实际应用却很少受到关注。实验证明,我们的手部检测系统能够高效、准确、快速地检测和跟踪手部,使我们不仅可以在移动设备上相互交流,还可以在移动设备上实时帮助和引导对方的人。我们研究中使用的方法被称为对象检测,这是一种基于深度学习的工作理论。该模型采用了适合移动设备的轻量神经网络,具有参数少、易于部署等特点。此外,我们还在模型中加入了KCF算法。并通过实验验证了该手部检测系统的有效性。通过实验,我们意识到基于WoT的YOLObile-KCF手部检测系统是相当可观的,在智能生活中更加高效和便捷。我们的工作涉及智能生活的手部检测研究,证明是令人鼓舞的。
{"title":"A Deep End-to-end Hand Detection Application On Mobile Device Based On Web Of Things","authors":"Linjuan Ma, Fuquan Zhang","doi":"10.1145/3442442.3451141","DOIUrl":"https://doi.org/10.1145/3442442.3451141","url":null,"abstract":"In this paper, a novel end-to-end hand detection method YOLObile-KCF on mobile device based on Web of Things (WoT) is presented, which can also be applied in practice. While hand detection has been become a hot topic in recent years, little attention has been paid to the practical use of hand detection on mobile device. It is demonstrated that our hand detection system can effectively detect and track hand with high accuracy and fast speed that enables us not only to communicate with each other on mobile devices, but also can assist and guide the people on the other side on the mobile device in real-time. The method used in our study is known as object detection, which is a working theory based on deep learning. And lightweight neural network suitable for mobile device which can has few parameters and easily deployed is adopted in our model. What's more, KCF algorithms is added in our model. And several experiments were carried out to test the validity of hand detection system. From the experiment, it came to realize that the YOLObile-KCF hand detection system based on WoT is considerable, which is more efficient and convenient in smart life. Our work involving studies of hand detection for smart life proves to be encouraging.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"650 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122701240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counterfactual-Augmented Data for Multi-Hop Knowledge Base Question Answering 多跳知识库问答的反事实增强数据
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453706
Yingting Li
The rise of the counterfactual concept promoted the study of reasoning, and we applied it to Knowledge Base Question Answering (KBQA) multi-hop reasoning as a way of data augmentation for the first time. Intuitively, we propose a model-agnostic Counterfactual Samples Synthesizing(CSS) training scheme. The CSS uses two augmentation methods Q-CSS and T-CSS to augment the training set. That is, for each training instance, we create two augmented instances, one per augmentation method. Furthermore, perform the Dynamic Answer Equipment(DAE) algorithm to dynamically assign ground-truth answers for the expanded question, constructing counterfactual examples. After training with the supplemented examples, the KBQA model can focus on all key entities and words, which significantly improved model’s sensitivity. Experimental verified the effectiveness of CSS and achieved consistent improvements across settings with different extents of KB incompleteness.
反事实概念的兴起促进了推理的研究,并首次将其应用于知识库问答(KBQA)的多跳推理中,作为数据扩充的一种方式。直观地,我们提出了一个模型不可知的反事实样本合成(CSS)训练方案。CSS使用Q-CSS和T-CSS两种增强方法对训练集进行增强。也就是说,对于每个训练实例,我们创建两个增强实例,每个增强方法一个。此外,执行动态答案设备(DAE)算法,为扩展问题动态分配基本事实答案,构建反事实示例。利用补充的样例进行训练后,KBQA模型可以专注于所有关键实体和关键词,显著提高了模型的灵敏度。实验验证了CSS的有效性,并在不同KB不完整程度的设置中取得了一致的改进。
{"title":"Counterfactual-Augmented Data for Multi-Hop Knowledge Base Question Answering","authors":"Yingting Li","doi":"10.1145/3442442.3453706","DOIUrl":"https://doi.org/10.1145/3442442.3453706","url":null,"abstract":"The rise of the counterfactual concept promoted the study of reasoning, and we applied it to Knowledge Base Question Answering (KBQA) multi-hop reasoning as a way of data augmentation for the first time. Intuitively, we propose a model-agnostic Counterfactual Samples Synthesizing(CSS) training scheme. The CSS uses two augmentation methods Q-CSS and T-CSS to augment the training set. That is, for each training instance, we create two augmented instances, one per augmentation method. Furthermore, perform the Dynamic Answer Equipment(DAE) algorithm to dynamically assign ground-truth answers for the expanded question, constructing counterfactual examples. After training with the supplemented examples, the KBQA model can focus on all key entities and words, which significantly improved model’s sensitivity. Experimental verified the effectiveness of CSS and achieved consistent improvements across settings with different extents of KB incompleteness.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131464073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TCS_WITM_2021 @FinSim-2: Transformer based Models for Automatic Classification of Financial Terms TCS_WITM_2021 @FinSim-2:基于变压器的金融术语自动分类模型
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451386
Tushar Goel, Vipul Chauhan, Ishan Verma, Tirthankar Dasgupta, Lipika Dey
Recent advancement in neural network architectures has provided several opportunities to develop systems to automatically extract and represent information from domain specific unstructured text sources. The Finsim-2021 shared task, collocated with the FinNLP workshop, offered the challenge to automatically learn effective and precise semantic models of financial domain concepts. Building such semantic representations of domain concepts requires knowledge about the specific domain. Such a thorough knowledge can be obtained through the contextual information available in raw text documents on those domains. In this paper, we proposed a transformer-based BERT architecture that captures such contextual information from a set of domain specific raw documents and then perform a classification task to segregate domain terms into fixed number of class labels. The proposed model not only considers the contextual BERT embeddings but also incorporates a TF-IDF vectorizer that gives a word level importance to the model. The performance of the model has been evaluated against several baseline architectures.
神经网络体系结构的最新进展为开发从特定领域的非结构化文本源中自动提取和表示信息的系统提供了一些机会。Finsim-2021共享任务与FinNLP研讨会同时进行,提供了自动学习金融领域概念的有效和精确语义模型的挑战。构建领域概念的这种语义表示需要有关特定领域的知识。这种全面的知识可以通过这些领域的原始文本文档中提供的上下文信息获得。在本文中,我们提出了一个基于转换器的BERT架构,它从一组特定领域的原始文档中捕获上下文信息,然后执行分类任务,将领域术语分离到固定数量的类标签中。提出的模型不仅考虑了上下文BERT嵌入,而且还结合了一个TF-IDF矢量器,该矢量器为模型提供了单词级别的重要性。模型的性能已经根据几个基线架构进行了评估。
{"title":"TCS_WITM_2021 @FinSim-2: Transformer based Models for Automatic Classification of Financial Terms","authors":"Tushar Goel, Vipul Chauhan, Ishan Verma, Tirthankar Dasgupta, Lipika Dey","doi":"10.1145/3442442.3451386","DOIUrl":"https://doi.org/10.1145/3442442.3451386","url":null,"abstract":"Recent advancement in neural network architectures has provided several opportunities to develop systems to automatically extract and represent information from domain specific unstructured text sources. The Finsim-2021 shared task, collocated with the FinNLP workshop, offered the challenge to automatically learn effective and precise semantic models of financial domain concepts. Building such semantic representations of domain concepts requires knowledge about the specific domain. Such a thorough knowledge can be obtained through the contextual information available in raw text documents on those domains. In this paper, we proposed a transformer-based BERT architecture that captures such contextual information from a set of domain specific raw documents and then perform a classification task to segregate domain terms into fixed number of class labels. The proposed model not only considers the contextual BERT embeddings but also incorporates a TF-IDF vectorizer that gives a word level importance to the model. The performance of the model has been evaluated against several baseline architectures.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Plumber: A Modular Framework to Create Information Extraction Pipelines 管道工:创建信息提取管道的模块化框架
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3458603
M. Y. Jaradeh, Kuldeep Singh, M. Stocker, S. Auer
Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured triples are extracted from a text and aligned to an existing Knowledge Graph (KG). In this paper, we present Plumber , the first framework that allows users to manually and automatically create suitable IE pipelines from a community-created pool of tools to perform triple extraction and alignment on unstructured text. Our approach provides an interactive medium to alter the pipelines and perform IE tasks. A short video to show the working of the framework for different use-cases is available online1
信息抽取(IE)任务是各个研究领域中经常研究的课题。因此,社区不断地产生多种技术、解决方案和工具来执行这些任务。然而,运行这些工具并将它们集成到现有的基础设施中需要时间、专业知识和资源。这里的一个相关任务是三元组提取和链接,从文本中提取结构化三元组,并将其与现有的知识图(KG)对齐。在本文中,我们介绍了Plumber,这是第一个允许用户从社区创建的工具池中手动和自动创建合适的IE管道的框架,用于对非结构化文本进行三重提取和对齐。我们的方法提供了一个交互的媒介来改变管道和执行IE任务。一个简短的视频展示了框架在不同用例下的工作原理,可以在网上找到1
{"title":"Plumber: A Modular Framework to Create Information Extraction Pipelines","authors":"M. Y. Jaradeh, Kuldeep Singh, M. Stocker, S. Auer","doi":"10.1145/3442442.3458603","DOIUrl":"https://doi.org/10.1145/3442442.3458603","url":null,"abstract":"Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured triples are extracted from a text and aligned to an existing Knowledge Graph (KG). In this paper, we present Plumber , the first framework that allows users to manually and automatically create suitable IE pipelines from a community-created pool of tools to perform triple extraction and alignment on unstructured text. Our approach provides an interactive medium to alter the pipelines and perform IE tasks. A short video to show the working of the framework for different use-cases is available online1","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"54 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120854413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers FinSim-2任务:用连体变形金刚学习金融语义相似性
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451384
Nhu Khoa Nguyen, Emanuela Boros, Gaël Lejeune, A. Doucet, Thierry Delahaut
In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.
在本文中,我们提出了FinSIM-2共享任务2021中关于金融领域语义相似性学习的不同方法。此任务的主要重点是评估从外部本体提取的金融术语分类为相应的顶级概念(也称为缩略词)。我们将此任务视为语义文本相似性问题。通过使用预训练语言模型编码器的连体网络,我们获得了语义上有意义的术语嵌入,并以排名的方式计算了它们之间的相似度得分。此外,我们展示了将任务作为多类分类问题处理的不同基线的结果。提出的方法优于我们的基线,并证明了基于文本相似连体网络的模型的鲁棒性。
{"title":"L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers","authors":"Nhu Khoa Nguyen, Emanuela Boros, Gaël Lejeune, A. Doucet, Thierry Delahaut","doi":"10.1145/3442442.3451384","DOIUrl":"https://doi.org/10.1145/3442442.3451384","url":null,"abstract":"In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124230673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
WikiShark: An Online Tool for Analyzing Wikipedia Traffic and Trends WikiShark:一个分析维基百科流量和趋势的在线工具
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452341
Elad Vardi, Lev Muchnik, Alex Conway, Micha Breakstone
Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity to measure how societies respond to events and how our collective perception of the world evolves over time and in response to events. Wikipedia use and the reading patterns of its users reflect our collective interests and the way they are expressed in our search for information – whether as part of fleeting, zeitgeist-fed trends or long-term – on most every topic, from personal to business, through political, health-related, academic and scientific. In a very real sense, events are defined by how we interpret them and how they affect our perception of the context in which they occurred, rendering Wikipedia invaluable for understanding events and their context. This paper introduces WikiShark (www.wikishark.com) – an online tool that allows researchers to analyze Wikipedia traffic and trends quickly and effectively, by (1) instantly querying pageview traffic data; (2) comparing traffic across articles; (3) surfacing and analyzing trending topics; and (4) easily leveraging findings for use in their own research.
维基百科是全球互联网用户使用的主要信息来源,用于事实核查和获取一般的百科全书式信息。对于研究人员来说,它提供了一个前所未有的机会来衡量社会如何应对事件,以及我们对世界的集体感知如何随着时间的推移和对事件的反应而演变。维基百科的使用及其用户的阅读模式反映了我们的集体兴趣,以及我们在搜索信息时表达这些兴趣的方式——无论是短暂的、时代精神推动的趋势还是长期的——从个人到商业,从政治、健康、学术到科学,几乎涵盖了所有主题。在一个非常真实的意义上,事件是由我们如何解释它们以及它们如何影响我们对它们发生的背景的感知来定义的,这使得维基百科对于理解事件及其背景非常宝贵。本文介绍了WikiShark (www.wikishark.com)——一个允许研究人员快速有效地分析维基百科流量和趋势的在线工具,通过(1)即时查询页面浏览量数据;(2)跨文章流量比较;(3)挖掘和分析热门话题;(4)很容易将研究结果用于自己的研究。
{"title":"WikiShark: An Online Tool for Analyzing Wikipedia Traffic and Trends","authors":"Elad Vardi, Lev Muchnik, Alex Conway, Micha Breakstone","doi":"10.1145/3442442.3452341","DOIUrl":"https://doi.org/10.1145/3442442.3452341","url":null,"abstract":"Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity to measure how societies respond to events and how our collective perception of the world evolves over time and in response to events. Wikipedia use and the reading patterns of its users reflect our collective interests and the way they are expressed in our search for information – whether as part of fleeting, zeitgeist-fed trends or long-term – on most every topic, from personal to business, through political, health-related, academic and scientific. In a very real sense, events are defined by how we interpret them and how they affect our perception of the context in which they occurred, rendering Wikipedia invaluable for understanding events and their context. This paper introduces WikiShark (www.wikishark.com) – an online tool that allows researchers to analyze Wikipedia traffic and trends quickly and effectively, by (1) instantly querying pageview traffic data; (2) comparing traffic across articles; (3) surfacing and analyzing trending topics; and (4) easily leveraging findings for use in their own research.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115420589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Companion Proceedings of the Web Conference 2021
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1