首页 > 最新文献

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation最新文献

英文 中文
Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach 基于贝叶斯优化的法律文件摘要文本库微调方法
Deepali Jain, M. Borah, A. Biswas
Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.
由于法律文件的复杂性和冗长性,自动文本摘要技术在法律领域具有很高的适用性。在法律领域使用的经典文本摘要算法中,大多数都有一定的超参数,如果对这些超参数进行适当的优化,可以进一步改进这些算法。这些超参数的选择对这些算法的性能有很大的影响,但在实际应用这些算法时,这一步的超参数调优往往被忽略。在这项工作中,提出了一种基于贝叶斯优化的方法,通过优化基于ROUGE分数混合的目标函数,在这个选择空间上优化经典的摘要算法之一Textrank。微调和进一步评估的过程是在公开可用数据集的帮助下执行的。从实验评估中可以观察到,相对于ROUGE-1、ROUGE-2和ROUGE-L指标,超参数调优的Textrank能够优于基于基线单热向量的Textrank和基于word2vec的Textrank模型。实验分析表明,如果进行适当的超参数调优,即使是像Textrank这样简单的算法也可以在法律文档摘要任务中表现出色。
{"title":"Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach","authors":"Deepali Jain, M. Borah, A. Biswas","doi":"10.1145/3441501.3441502","DOIUrl":"https://doi.org/10.1145/3441501.3441502","url":null,"abstract":"Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127888664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German 2020年FIRE大会HASOC专题概述:泰米尔语、马拉雅拉姆语、印地语、英语和德语中的仇恨言论和攻击性语言识别
Thomas Mandl, Sandip J Modha, M. Anandkumar, Bharathi Raja Chakravarthi
This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.
本文介绍了HASOC轨道及其两部分。HASOC致力于评估发现攻击性语言和仇恨言论的技术。HASOC正在为资源较少的语言创建测试集,并将英语作为比较。HASOC的第一条轨道从2019年开始继续工作,并为印地语、德语和英语提供Twitter帖子的测试平台。HASOC的第二个轨道是创建泰米尔语和马拉雅拉姆语的本地和拉丁文字测试资源。这些帖子主要来自Youtube和Twitter。这两个方向都吸引了很多人的兴趣,超过40个研究小组参与其中,并在论文中描述了他们的方法。在本综述中,我们介绍了任务、数据和主要结果。
{"title":"Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German","authors":"Thomas Mandl, Sandip J Modha, M. Anandkumar, Bharathi Raja Chakravarthi","doi":"10.1145/3441501.3441517","DOIUrl":"https://doi.org/10.1145/3441501.3441517","url":null,"abstract":"This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"520 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134432112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
On the Evaluation of Data Fusion for Information Retrieval 信息检索数据融合评价研究
David Lillis
Data Fusion combines document rankings from multiple systems into one, in order to improve retrieval effectiveness. Many approaches to this task have been proposed in the literature, and these have been evaluated in various ways. This paper examines a number of such evaluations, to extract commonalities between approaches. Some drawbacks of the prevailing evaluation strategies are then identified, and suggestions made for more appropriate evaluation of data fusion.
数据融合将来自多个系统的文档排名合并为一个,以提高检索效率。文献中提出了许多方法来完成这项任务,并以各种方式对这些方法进行了评估。本文考察了一些这样的评估,以提取方法之间的共性。然后确定了当前评估策略的一些缺点,并提出了更适当地评估数据融合的建议。
{"title":"On the Evaluation of Data Fusion for Information Retrieval","authors":"David Lillis","doi":"10.1145/3441501.3441506","DOIUrl":"https://doi.org/10.1145/3441501.3441506","url":null,"abstract":"Data Fusion combines document rankings from multiple systems into one, in order to improve retrieval effectiveness. Many approaches to this task have been proposed in the literature, and these have been evaluated in various ways. This paper examines a number of such evaluations, to extract commonalities between approaches. Some drawbacks of the prevailing evaluation strategies are then identified, and suggestions made for more appropriate evaluation of data fusion.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde 关于源代码作者身份识别的PAN@FIRE 2020任务概述
A. Fadel, Husam Musleh, Ibraheem Tuffaha, M. Al-Ayyoub, Y. Jararweh, E. Benkhelifa, Paolo Rosso
Authorship identification is essential to the detection of undesirable deception of others’ content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of SOurce COde (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the CodeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task’s CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.
作者身份识别对于检测不受欢迎的欺骗他人的内容滥用或暴露某些匿名恶意内容的所有者至关重要。虽然它在自然语言中被广泛研究,但在编程语言中却很少被考虑。因此,提出了一个PAN@FIRE任务,命名为源代码作者身份识别(AI-SOCO),重点是识别源代码作者。该数据集由前1000名人类用户提交的抓取源代码组成,这些用户从CodeForces在线判断平台提交了100个或更多正确的c++提交。参与系统被要求从预定义的代码作者列表中预测给定源代码的作者。总共有60个团队在该任务的CodaLab页面上注册。其中,14支队伍提交了94次测试。结果令人惊讶地高,许多团队和基线都突破了90%的准确率障碍。这些系统使用了广泛的模型和技术,从预训练的词嵌入(特别是那些经过调整以处理源代码的词嵌入)到风格特征。
{"title":"Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde","authors":"A. Fadel, Husam Musleh, Ibraheem Tuffaha, M. Al-Ayyoub, Y. Jararweh, E. Benkhelifa, Paolo Rosso","doi":"10.1145/3441501.3441532","DOIUrl":"https://doi.org/10.1145/3441501.3441532","url":null,"abstract":"Authorship identification is essential to the detection of undesirable deception of others’ content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of SOurce COde (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the CodeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task’s CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115378777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FIRE 2020 AILA Track: Artificial Intelligence for Legal Assistance FIRE 2020 AILA专题:法律援助中的人工智能
Paheli Bhattacharya, Parth Mehta, Kripabandhu Ghosh, Saptarshi Ghosh, Arindam Pal, A. Bhattacharya, Prasenjit Majumder
The FIRE 2020 AILA track aimed at developing datasets and frameworks for the following two tasks: (i) Precedent and Statute Retrieval, where the task was to identify relevant prior cases and statutes (written laws) given a factual scenario, and (ii) Rhetorical Role Labelling for legal judgements, where given a case document, sentences were to be classified into 7 rhetorical roles – Fact, Ruling by Lower Court, Argument, Precedent, Statute, Ratio of the decision and Ruling by Present Court. For both the tasks, we used publicly available Indian Supreme Court case documents.
FIRE 2020 AILA旨在为以下两项任务开发数据集和框架:(i)先例和法规检索,其任务是在给定事实场景的情况下识别相关的先前案例和法规(成文法),以及(ii)法律判决的修辞角色标签,在给定案件文件时,将句子分为7个修辞角色-事实,下级法院裁决,论证,先例,法规,判决和本院裁决的比例。对于这两项任务,我们都使用了公开的印度最高法院案件文件。
{"title":"FIRE 2020 AILA Track: Artificial Intelligence for Legal Assistance","authors":"Paheli Bhattacharya, Parth Mehta, Kripabandhu Ghosh, Saptarshi Ghosh, Arindam Pal, A. Bhattacharya, Prasenjit Majumder","doi":"10.1145/3441501.3441510","DOIUrl":"https://doi.org/10.1145/3441501.3441510","url":null,"abstract":"The FIRE 2020 AILA track aimed at developing datasets and frameworks for the following two tasks: (i) Precedent and Statute Retrieval, where the task was to identify relevant prior cases and statutes (written laws) given a factual scenario, and (ii) Rhetorical Role Labelling for legal judgements, where given a case document, sentences were to be classified into 7 rhetorical roles – Fact, Ruling by Lower Court, Argument, Precedent, Statute, Ratio of the decision and Ruling by Present Court. For both the tasks, we used publicly available Indian Supreme Court case documents.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Anaphora Resolution from Social Media Text in Indian Languages (SocAnaRes-IL)- Overview 从印度语言的社会媒体文本回指解析(SocAnaRes-IL)-概述
S. L. Devi
Resolution of anaphors is required for any application which require Natural Language Understanding (NLU) such as Information Extraction, Conversation Analysis, Opinion Mining, Machine Translation etc. The growth of social media platforms such as twitter, facebook for communication between people has led to the creation of huge user generated data different from the normal text data. This is leading to the development of a new challenge and perspective in language technology research. Thus there is great need to develop applications such as Anaphora resolution, co-reference resolution which can be used for the development of NLU systems. This shared task is on Anaphora resolution from the microblog text from Twitter for languages such as Hindi, Tamil and Malayalam (Indian Languages). Also we gave data from English which can be used as resource rich language, if one wants to take Indian languages as resources poor language. There were six registered groups who took data for development and testing but only one group submitted the run. They have used Deep learning for analysis.
任何需要自然语言理解(NLU)的应用,如信息提取、会话分析、意见挖掘、机器翻译等,都需要对仿指进行解析。随着twitter、facebook等社交媒体平台的发展,人们之间的交流产生了大量不同于正常文本数据的用户生成数据。这给语言技术研究带来了新的挑战和新的视角。因此,迫切需要开发诸如回指解析、共指解析等可用于非语言推理系统开发的应用。这个共同的任务是对印地语、泰米尔语和马拉雅拉姆语(印度语言)等语言的微博文本进行回指解析。我们还提供了英语的数据,如果有人想把印度语言当作资源贫乏的语言,英语可以作为资源丰富的语言。有6个注册小组将数据用于开发和测试,但只有一个小组提交了运行。他们使用深度学习进行分析。
{"title":"Anaphora Resolution from Social Media Text in Indian Languages (SocAnaRes-IL)- Overview","authors":"S. L. Devi","doi":"10.1145/3441501.3441512","DOIUrl":"https://doi.org/10.1145/3441501.3441512","url":null,"abstract":"Resolution of anaphors is required for any application which require Natural Language Understanding (NLU) such as Information Extraction, Conversation Analysis, Opinion Mining, Machine Translation etc. The growth of social media platforms such as twitter, facebook for communication between people has led to the creation of huge user generated data different from the normal text data. This is leading to the development of a new challenge and perspective in language technology research. Thus there is great need to develop applications such as Anaphora resolution, co-reference resolution which can be used for the development of NLU systems. This shared task is on Anaphora resolution from the microblog text from Twitter for languages such as Hindi, Tamil and Malayalam (Indian Languages). Also we gave data from English which can be used as resource rich language, if one wants to take Indian languages as resources poor language. There were six registered groups who took data for development and testing but only one group submitted the run. They have used Deep learning for analysis.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126723783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Professional Search: A German Construction Law Use Case 评估专业搜索:一个德国建筑法用例
Wei Li, G. Jones
We present a real world case study for the evaluation of professional search focusing on German construction law. Reliable identification of relevant previous cases is an important part of many legal disputes, and currently relies on domain expertise acquired over a lengthy professional career. We describe our experiences from the development of a Cranfield type test collection for a German construction law dataset to enable research into the development of search technologies for new tools which are less dependent on expert knowledge. We describe examination of the search needs of lawyers, the development of a set of search queries created by lawyers, and our experiences in collecting expert relevance data for the completion of a test collection for legal search. Important findings of this latter process are the need for individuals with expert legal training to assess relevance, and the identification of context dependence in determining relevance. While the cost of the development of this test collection was found to be very high, we demonstrate its value in terms of identifying the effectiveness of legal search methods and in identifying research directions for legal case search.
我们提出了一个真实世界的案例研究,以评估专业搜索,重点是德国建筑法。对以往相关案件的可靠鉴定是许多法律纠纷的重要组成部分,目前依赖于在漫长的职业生涯中获得的领域专业知识。我们描述了我们为德国建筑法数据集开发克兰菲尔德类型测试集的经验,以使研究能够开发较少依赖专家知识的新工具的搜索技术。我们描述了对律师搜索需求的检查,律师创建的一组搜索查询的开发,以及我们在收集专家相关数据以完成法律搜索测试集方面的经验。后一过程的重要发现是,需要受过专业法律培训的个人来评估相关性,并在确定相关性时确定上下文依赖性。虽然这个测试集的开发成本很高,但我们证明了它在确定法律检索方法的有效性和确定法律案件检索的研究方向方面的价值。
{"title":"Evaluating Professional Search: A German Construction Law Use Case","authors":"Wei Li, G. Jones","doi":"10.1145/3441501.3441677","DOIUrl":"https://doi.org/10.1145/3441501.3441677","url":null,"abstract":"We present a real world case study for the evaluation of professional search focusing on German construction law. Reliable identification of relevant previous cases is an important part of many legal disputes, and currently relies on domain expertise acquired over a lengthy professional career. We describe our experiences from the development of a Cranfield type test collection for a German construction law dataset to enable research into the development of search technologies for new tools which are less dependent on expert knowledge. We describe examination of the search needs of lawyers, the development of a set of search queries created by lawyers, and our experiences in collecting expert relevance data for the completion of a test collection for legal search. Important findings of this latter process are the need for individuals with expert legal training to assess relevance, and the identification of context dependence in determining relevance. While the cost of the development of this test collection was found to be very high, we demonstrate its value in terms of identifying the effectiveness of legal search methods and in identifying research directions for legal case search.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131233519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Legal Concept Extraction from Indian Case Documents using Statutes 用成文法提取印度案例文件中的无监督法律概念
Riya Sanjay Podder, Paheli Bhattacharya
Finding legal concepts pertaining to court case judgement documents, is an important task in the field of legal data mining. These concepts are also popularly termed as catchwords/keywords. Existing methods for the task lack the ability to extract legal concepts that may not explicitly be mentioned in the document, but present abstractly. This is because the methods do not incorporate legal domain specific information. In this paper, we propose the use of Statutes to solve this task. Evaluation on a set of 1200 Indian Supreme Court Case Documents suggest the effectiveness of our approach, and opens the possibilities of exploring more in this direction.
寻找与案件判决书相关的法律概念,是法律数据挖掘领域的一项重要任务。这些概念通常也被称为流行语/关键词。这项任务的现有方法缺乏提取文件中可能没有明确提到但抽象地提出的法律概念的能力。这是因为这些方法不包含法律领域特定的信息。在本文中,我们建议使用法规来解决这一任务。对1200份印度最高法院案件文件的评估表明,我们的方法是有效的,并打开了在这个方向上进行更多探索的可能性。
{"title":"Unsupervised Legal Concept Extraction from Indian Case Documents using Statutes","authors":"Riya Sanjay Podder, Paheli Bhattacharya","doi":"10.1145/3441501.3441508","DOIUrl":"https://doi.org/10.1145/3441501.3441508","url":null,"abstract":"Finding legal concepts pertaining to court case judgement documents, is an important task in the field of legal data mining. These concepts are also popularly termed as catchwords/keywords. Existing methods for the task lack the ability to extract legal concepts that may not explicitly be mentioned in the document, but present abstractly. This is because the methods do not incorporate legal domain specific information. In this paper, we propose the use of Statutes to solve this task. Evaluation on a set of 1200 Indian Supreme Court Case Documents suggest the effectiveness of our approach, and opens the possibilities of exploring more in this direction.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129286419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Gain in Low Resource MT with Transfer Learning: An Analysis concerning Language Families 基于迁移学习的低资源机器翻译的性能提升:基于语族的分析
S. Mahata, Subhabrata Dutta, Dipankar Das, Sivaji Bandyopadhyay
Translation systems require a huge amount of parallel data to produce quality translations, but acquiring one for low-resource languages is difficult. To counter this, recent research has been shown to combine languages and use them to augment the low resource data, through transfer learning. While the gain in performance is apparent using transfer learning, we try to investigate the correlation between the performance gain and position of the concerned languages within a language family. We further probe and try to coordinate the performance gain with the degree of vocabulary sharing between the concerned languages.
翻译系统需要大量的并行数据来产生高质量的翻译,但是获取资源少的语言的并行数据是困难的。为了解决这个问题,最近的研究表明,通过迁移学习,结合语言并使用它们来增加低资源数据。虽然使用迁移学习可以明显提高性能,但我们试图研究性能提高与相关语言在语族中的位置之间的相关性。我们进一步探索并尝试将性能增益与相关语言之间的词汇共享程度相协调。
{"title":"Performance Gain in Low Resource MT with Transfer Learning: An Analysis concerning Language Families","authors":"S. Mahata, Subhabrata Dutta, Dipankar Das, Sivaji Bandyopadhyay","doi":"10.1145/3441501.3441507","DOIUrl":"https://doi.org/10.1145/3441501.3441507","url":null,"abstract":"Translation systems require a huge amount of parallel data to produce quality translations, but acquiring one for low-resource languages is difficult. To counter this, recent research has been shown to combine languages and use them to augment the low resource data, through transfer learning. While the gain in performance is apparent using transfer learning, we try to investigate the correlation between the performance gain and position of the concerned languages within a language family. We further probe and try to coordinate the performance gain with the degree of vocabulary sharing between the concerned languages.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CEREX@FIRE-2020: Overview of the Shared Task on Cause-effect Relation Extraction CEREX@FIRE-2020:因果关系抽取共享任务概述
Manjira Sinha, Tirthankar Dasgupta, Lipika Dey
Extraction of causal relations from text is an important problem in Natural Language Processing (NLP). The extracted relations play important roles in several downstream analytical and predictive tasks like identification of actionable items, question-answering and isolation of predictor variables for a predictive system. Curating causal relations from text documents can also help in automatically building causal networks which are also useful for reasoning tasks. The proposed CEREX track aims to find a suitable model for automatic detection of causal sentences and extraction of the exact cause, effect and the causal connectives from textual mentions.
从文本中提取因果关系是自然语言处理(NLP)中的一个重要问题。提取的关系在几个下游分析和预测任务中发挥重要作用,如识别可操作项、回答问题和隔离预测系统的预测变量。从文本文档中整理因果关系也有助于自动构建因果网络,这对推理任务也很有用。提出的CEREX轨道旨在寻找一个合适的模型来自动检测因果句,并从文本提及中提取准确的因果关系和因果连接词。
{"title":"CEREX@FIRE-2020: Overview of the Shared Task on Cause-effect Relation Extraction","authors":"Manjira Sinha, Tirthankar Dasgupta, Lipika Dey","doi":"10.1145/3441501.3441514","DOIUrl":"https://doi.org/10.1145/3441501.3441514","url":null,"abstract":"Extraction of causal relations from text is an important problem in Natural Language Processing (NLP). The extracted relations play important roles in several downstream analytical and predictive tasks like identification of actionable items, question-answering and isolation of predictor variables for a predictive system. Curating causal relations from text documents can also help in automatically building causal networks which are also useful for reasoning tasks. The proposed CEREX track aims to find a suitable model for automatic detection of causal sentences and extraction of the exact cause, effect and the causal connectives from textual mentions.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128434760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1