首页 > 最新文献

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020最新文献

英文 中文
UNIGE_SE @ PRELEARN: Utility for Automatic Prerequisite Learning from Italian Wikipedia (short paper) UNIGE_SE @ PRELEARN:意大利语维基百科自动先决条件学习工具(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7553
Alessio Moggio, A. Parizzi
The present paper describes the approach proposed by the UNIGE SE team to tackle the EVALITA 2020 shared task on Prerequisite Relation Learning (PRELEARN). We developed a neural network classifier that exploits features extracted both from raw text and the structure of the Wikipedia pages provided by task organisers as training sets. We participated in all four sub– tasks proposed by task organizers: the neural network was trained on different sets of features for each of the two training settings (i.e., raw and structured features) and evaluated in all proposed scenarios (i.e. in– and cross– domain). When evaluated on the official test sets, the system was able to get improvements compared to the provided baselines, even though it ranked third (out of three participants). This contribution also describes the interface we developed to compare multiple runs of our models. 1
本文描述了UNIGE SE团队为解决EVALITA 2020关于前提关系学习(PRELEARN)的共享任务而提出的方法。我们开发了一个神经网络分类器,利用从原始文本和任务组织者提供的维基百科页面结构中提取的特征作为训练集。我们参与了任务组织者提出的所有四个子任务:神经网络在两种训练设置(即原始特征和结构化特征)的不同特征集上进行训练,并在所有提议的场景(即内域和跨域)中进行评估。当在官方测试集上进行评估时,与提供的基线相比,该系统能够得到改进,尽管它排名第三(在三个参与者中)。本文还描述了我们开发的用于比较模型的多个运行的接口。1
{"title":"UNIGE_SE @ PRELEARN: Utility for Automatic Prerequisite Learning from Italian Wikipedia (short paper)","authors":"Alessio Moggio, A. Parizzi","doi":"10.4000/BOOKS.AACCADEMIA.7553","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7553","url":null,"abstract":"The present paper describes the approach proposed by the UNIGE SE team to tackle the EVALITA 2020 shared task on Prerequisite Relation Learning (PRELEARN). We developed a neural network classifier that exploits features extracted both from raw text and the structure of the Wikipedia pages provided by task organisers as training sets. We participated in all four sub– tasks proposed by task organizers: the neural network was trained on different sets of features for each of the two training settings (i.e., raw and structured features) and evaluated in all proposed scenarios (i.e. in– and cross– domain). When evaluated on the official test sets, the system was able to get improvements compared to the provided baselines, even though it ranked third (out of three participants). This contribution also describes the interface we developed to compare multiple runs of our models. 1","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131729909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
KIPoS @ EVALITA2020: Overview of the Task on KIParla Part of Speech Tagging KIPoS @ EVALITA2020: KIParla词性标注任务概述
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7743
C. Bosco, Silvia Ballarè, Massimo Cerruti, E. Goria, Caterina Mauri
English. The paper describes the first task on Part of Speech tagging of spoken language held at the Evalita evaluation campaign, KIPoS. Benefiting from the availability of a resource of transcribed spoken Italian (i.e. the KIParla corpus), which has been newly annotated and released for KIPoS, the task includes three evaluation exercises focused on formal versus informal spoken texts. The datasets and the results achieved by participants are presented, and the insights gained from the experience are discussed. Italiano. L’articolo descrive il primo task sul Part of Speech tagging di lingua parlata tenutosi nella campagna di valutazione Evalita. Usufruendo di una risorsa che raccoglie trascrizioni di lingua italiana (il corpus KIParla), annotate appositamente per KIPoS, il task è stato focalizzato intorno a tre valutazioni con lo scopo di confrontare i risultati raggiunti sul parlato formale con quelli ottenuti sul parlato informale. Il corpus di dati ed i risultati raggiunti dai partecipanti sono presentati insieme alla discussione di quanto emerso dall’esperienza di questo task.
English。论文描述了第一个任务小组在evicata evaluation campaign上使用口语语言的标签,KIPoS。从记录在案的口语资源的可用性中获益,这已被新记录和释放给KIPoS,该小组包括三份正式口语文本的评估报告。参与者提出了数据和结果,而经验丰富的人提出了证据。意大利。这篇文章描述了在evita评估运动中进行的第一份口语标签工作小组。该工作队利用一项专门为KIPoS编写的收集意大利语笔录(kitalk语料库)的资源,集中于三项评价,以便将正式语音方面的结果与非正式语音方面的结果进行比较。与会者所取得的数据和结果连同对工作队经验的讨论一并提出。
{"title":"KIPoS @ EVALITA2020: Overview of the Task on KIParla Part of Speech Tagging","authors":"C. Bosco, Silvia Ballarè, Massimo Cerruti, E. Goria, Caterina Mauri","doi":"10.4000/BOOKS.AACCADEMIA.7743","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7743","url":null,"abstract":"English. The paper describes the first task on Part of Speech tagging of spoken language held at the Evalita evaluation campaign, KIPoS. Benefiting from the availability of a resource of transcribed spoken Italian (i.e. the KIParla corpus), which has been newly annotated and released for KIPoS, the task includes three evaluation exercises focused on formal versus informal spoken texts. The datasets and the results achieved by participants are presented, and the insights gained from the experience are discussed. Italiano. L’articolo descrive il primo task sul Part of Speech tagging di lingua parlata tenutosi nella campagna di valutazione Evalita. Usufruendo di una risorsa che raccoglie trascrizioni di lingua italiana (il corpus KIParla), annotate appositamente per KIPoS, il task è stato focalizzato intorno a tre valutazioni con lo scopo di confrontare i risultati raggiunti sul parlato formale con quelli ottenuti sul parlato informale. Il corpus di dati ed i risultati raggiunti dai partecipanti sono presentati insieme alla discussione di quanto emerso dall’esperienza di questo task.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"11 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114031958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
UNIMIB @ DIACR-Ita: Aligning Distributional Embeddings with a Compass for Semantic Change Detection in the Italian Language (short paper) UNIMIB @ DIACR-Ita:用指南针对齐分布嵌入来检测意大利语的语义变化(短论文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7688
F. Belotti, Federico Bianchi, M. Palmonari
In this paper, we present our results related to the EVALITA 2020 challenge, DIACR-Ita, for semantic change detection for the Italian language. Our approach is based on measuring the semantic distance across time-specific word vectors generated with Compass-aligned Distributional Embeddings (CADE). We first generate temporal embeddings with CADE, a strategy to align word embeddings that are specific for each time period; the quality of this alignment is the main asset of our proposal. We then measure the semantic shift of each word, combining two different semantic shift measures. Eventually, we classify a word meaning as changed or not changed by defining a threshold over the semantic distance across time.
在本文中,我们展示了与EVALITA 2020挑战(DIACR-Ita)相关的结果,该挑战用于意大利语的语义变化检测。我们的方法是基于测量由指南针对齐分布嵌入(CADE)生成的特定时间词向量之间的语义距离。我们首先使用CADE生成时间嵌入,CADE是一种对齐特定于每个时间段的词嵌入的策略;这种一致性的质量是我们建议的主要资产。然后,我们结合两种不同的语义转移测量来测量每个单词的语义转移。最后,我们通过定义一个跨越时间的语义距离的阈值,将一个单词的意义划分为改变或未改变。
{"title":"UNIMIB @ DIACR-Ita: Aligning Distributional Embeddings with a Compass for Semantic Change Detection in the Italian Language (short paper)","authors":"F. Belotti, Federico Bianchi, M. Palmonari","doi":"10.4000/BOOKS.AACCADEMIA.7688","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7688","url":null,"abstract":"In this paper, we present our results related to the EVALITA 2020 challenge, DIACR-Ita, for semantic change detection for the Italian language. Our approach is based on measuring the semantic distance across time-specific word vectors generated with Compass-aligned Distributional Embeddings (CADE). We first generate temporal embeddings with CADE, a strategy to align word embeddings that are specific for each time period; the quality of this alignment is the main asset of our proposal. We then measure the semantic shift of each word, combining two different semantic shift measures. Eventually, we classify a word meaning as changed or not changed by defining a threshold over the semantic distance across time.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132121221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
rmassidda @ DaDoEval: Document Dating Using Sentence Embeddings at EVALITA 2020 rmassidda @ DaDoEval:基于句子嵌入的文档年代测定在EVALITA 2020
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7603
Riccardo Massidda
This report describes an approach to solve the DaDoEval document dating subtasks for the EVALITA 2020 competition. The dating problem is tackled as a classification problem, where the significant length of the documents in the provided dataset is addressed by using sentence embeddings in a hierarchical architecture. Three different pre-trained models to generate sentence embeddings have been evaluated and compared: USE, LaBSE and SBERT. Other than sentence embeddings the classifier exploits a bag-of-entities representation of the document, generated using a pre-trained named entity recognizer. The final model is able to simultaneously produce the required date for each subtask.
本报告描述了一种解决EVALITA 2020竞赛中DaDoEval文档日期子任务的方法。日期问题作为分类问题来解决,其中提供的数据集中文档的有效长度通过在分层架构中使用句子嵌入来解决。三种不同的预训练模型生成句子嵌入进行了评估和比较:使用,LaBSE和SBERT。除了句子嵌入之外,分类器利用文档的实体袋表示,使用预训练的命名实体识别器生成。最终模型能够同时为每个子任务生成所需的日期。
{"title":"rmassidda @ DaDoEval: Document Dating Using Sentence Embeddings at EVALITA 2020","authors":"Riccardo Massidda","doi":"10.4000/BOOKS.AACCADEMIA.7603","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7603","url":null,"abstract":"This report describes an approach to solve the DaDoEval document dating subtasks for the EVALITA 2020 competition. The dating problem is tackled as a classification problem, where the significant length of the documents in the provided dataset is addressed by using sentence embeddings in a hierarchical architecture. Three different pre-trained models to generate sentence embeddings have been evaluated and compared: USE, LaBSE and SBERT. Other than sentence embeddings the classifier exploits a bag-of-entities representation of the document, generated using a pre-trained named entity recognizer. The final model is able to simultaneously produce the required date for each subtask.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133470210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper) uninstudents @ SardiStance:意大利语推文中的姿态检测-任务A(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7189
Maurizio Moraca, G. Sabella, Simone Morra
English. This document describes a classification system for the SardiStance task at EVALITA 2020. The task consists in classifying the stance of the author of a series of tweets towards a specific discussion topic. The resulting system was specifically developed by the authors as final project for the Natural Language Processing class of the Master in Computer Science at University of Naples Federico II. The proposed system is based on an SVM classifier with a radial basis function as kernel making use of features like 2 chargrams, unigram hashtag and Afinn weight computed on automatic translated tweets. The results are promising in that the system performances are on average higher than that of the baseline proposed by the task organizers. Italiano. Questo documento descrive un sistema di classificazione per il task SardiStance di EVALITA 2020. Il task consiste nel classificare la posizione dell’autore di una serie di tweets nei confronti di uno specifico topic di discussione. Il sistema risultante è stato specificamente sviluppato dagli autori come progetto finale per il corso di Elaborazione del Linguaggio Naturale nell’ambito del corso di laurea magistrale in Informatica presso l’università degli studi di Napoli Federico II. Il sistema qui proposto si basa su un classificatore SVM con una funzione radiale di base come kernel facendo uso di feaCopyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tures come 2 char-grams, unigram hashtag e l’Afinn weight calcolato sui tweet tradotti in automatico. I risultati sono promettenti in quanto le performance sono in media superiori rispetto a quelle della baseline proposta dagli organizzatori del
英语。本文档描述了EVALITA 2020中SardiStance任务的分类系统。该任务包括对一系列tweet的作者对特定讨论主题的立场进行分类。结果系统是由作者专门开发的,作为那不勒斯费德里科二世大学计算机科学硕士自然语言处理课程的最终项目。该系统基于以径向基函数为内核的SVM分类器,利用自动翻译推文计算的2字符图、一元标签和Afinn权重等特征。结果是有希望的,因为系统性能平均高于任务组织者提出的基线。意大利语。问题文档描述系统和分类的每一个任务SardiStance di EVALITA 2020。该任务包括分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类、分类和分类。1 .意大利高等教育系统è国家规范,意大利高等教育系统,意大利高等教育系统,意大利高等教育系统,意大利高等教育系统,意大利信息系统,意大利高等教育系统,意大利那不勒斯,费德里科二世。该系统提出了一种基于径向基的非分类支持向量机支持向量机算法。本文版权所有©2020由其作者提供。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。这是一个2克、1克的话题标签,在推特上自动发布。我risultati园子promettenti quanto le性能园子在媒体superiori rispetto您德拉基线proposta dagli organizzatori德尔
{"title":"UninaStudents @ SardiStance: Stance Detection in Italian Tweets - Task A (short paper)","authors":"Maurizio Moraca, G. Sabella, Simone Morra","doi":"10.4000/BOOKS.AACCADEMIA.7189","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7189","url":null,"abstract":"English. This document describes a classification system for the SardiStance task at EVALITA 2020. The task consists in classifying the stance of the author of a series of tweets towards a specific discussion topic. The resulting system was specifically developed by the authors as final project for the Natural Language Processing class of the Master in Computer Science at University of Naples Federico II. The proposed system is based on an SVM classifier with a radial basis function as kernel making use of features like 2 chargrams, unigram hashtag and Afinn weight computed on automatic translated tweets. The results are promising in that the system performances are on average higher than that of the baseline proposed by the task organizers. Italiano. Questo documento descrive un sistema di classificazione per il task SardiStance di EVALITA 2020. Il task consiste nel classificare la posizione dell’autore di una serie di tweets nei confronti di uno specifico topic di discussione. Il sistema risultante è stato specificamente sviluppato dagli autori come progetto finale per il corso di Elaborazione del Linguaggio Naturale nell’ambito del corso di laurea magistrale in Informatica presso l’università degli studi di Napoli Federico II. Il sistema qui proposto si basa su un classificatore SVM con una funzione radiale di base come kernel facendo uso di feaCopyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tures come 2 char-grams, unigram hashtag e l’Afinn weight calcolato sui tweet tradotti in automatico. I risultati sono promettenti in quanto le performance sono in media superiori rispetto a quelle della baseline proposta dagli organizzatori del","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133434430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
UOBIT @ TAG-it: Exploring a Multi-faceted Representation for Profiling Age, Topic and Gender in Italian Texts TAG-it:探索意大利语文本中年龄、话题和性别特征的多面表征
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7285
Roberto Labadie Tamayo, Daniel C. Castro, Reynier Ortega Bueno
English. This paper describes our system for participating in the TAG-it Author Profiling task at EVALITA 2020. The task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. Our proposal combines learned representations by RNN at word and sentence levels, Transformer Neural Nets and hand-crafted stylistic features. All these representations are mixed and fed into a fully connected layer from a feed-forward neural network in order to make predictions for addressed subtasks. Experimental results show that our model achieves encouraging performance. The growing integration of social media with people’s daily live has made this medium a common environment for the deployment of technologies that allow the retrieval of useful information in the development of business activities, social outreach processes, forensic tasks, etc. That is because people frequently upload and share content in these media with various purposes such as socialization of points of view about some topic or promotion of personal business, etc. The analysis of textual information from such data, is one of the main reasons why researches become trending on the Natural Language Processing (NLP) field. However, the fact that this information varies greatly in terms of its format, even when it comes from the same person, besides textual sequences are unstructured information, make challenging the process of analyzing it automatically. Author Profiling (AP) task aims at discovering different marks or patterns (linguistic or not) from texts, that allow a user to be characterized in terms of Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). their age, gender, personality or any other demographic attribute. Many forums, due to the applicability of AP, share tasks directed to mining features that in general way, predict that valuable information. Those tasks commonly make special focus on popular languages such as English and Spanish. Nevertheless, other languages are explored on important forums too, that is the case of EVALITA 1, this one, promoting analysis of NLP tasks in the Italian language. Among the challenges from its last campaign EVALITA 2018 was the AP (in terms of gender) task GxG (Dell’Orletta and Nissim, 2018), exploring the gender-predicting issue. The analysis of age, gender and the topic a text is related with, are tasks well explored and the most approaches employ data representation based on stylistic features, n-gram representations and/or words embedding combined with Machine Learning (ML) methods like Support Vector Machine (SVM) and Random Forest (Pizarro, 2019). Also some authors by using Deep Learning (DL) models like Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) combined with stylistic features (Aragón and López-Monroy, 2018) (Bayot and Gonçalves, 2018) have yield encouraging performances. In this work we address
英语。本文描述了我们在EVALITA 2020上参与TAG-it作者分析任务的系统。这项任务的目的是预测博客用户的年龄和性别,从他们的帖子,作为他们写的主题。我们的建议结合了RNN在单词和句子级别的学习表征,Transformer神经网络和手工制作的风格特征。所有这些表征被混合并馈入一个来自前馈神经网络的全连接层,以便对寻址子任务进行预测。实验结果表明,该模型取得了令人鼓舞的效果。社交媒体与人们日常生活的日益融合,使这种媒体成为部署技术的共同环境,这些技术允许在商业活动、社会推广过程、法医任务等的发展中检索有用的信息。这是因为人们经常在这些媒体上上传和分享内容,有各种各样的目的,比如关于某个话题的观点的社会化,或者促进个人业务等。从这些数据中分析文本信息是自然语言处理(NLP)领域研究成为趋势的主要原因之一。然而,这些信息即使来自同一个人,其格式也有很大差异,而且文本序列是非结构化信息,这给自动分析这些信息的过程带来了挑战。作者分析(AP)任务旨在从文本中发现不同的标记或模式(语言或非语言),允许其作者根据本文的版权©2020对用户进行特征描述。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。他们的年龄,性别,性格或任何其他人口统计属性。由于AP的适用性,许多论坛共享针对挖掘功能的任务,这些功能通常可以预测有价值的信息。这些任务通常特别关注英语和西班牙语等流行语言。然而,其他语言也在重要的论坛上进行了探讨,这就是EVALITA 1的情况,这个论坛促进了意大利语中NLP任务的分析。其上一个活动EVALITA 2018的挑战之一是AP(就性别而言)任务GxG (Dell 'Orletta and Nissim, 2018),探索性别预测问题。年龄、性别和文本相关主题的分析是经过充分探索的任务,大多数方法采用基于风格特征、n-gram表示和/或单词嵌入的数据表示,并结合支持向量机(SVM)和随机森林等机器学习(ML)方法(Pizarro, 2019)。此外,一些作者通过使用深度学习(DL)模型,如卷积神经网络(CNN)和长短期记忆(LSTM)结合风格特征(Aragón和López-Monroy, 2018) (Bayot和gonalves, 2018),也取得了令人鼓舞的成绩。在这项工作中,我们精确地解决了作者性别和年龄的自动检测,以及对博客文本信息的流行主题的识别。此外,我们还描述了我们开发的模型,用于参与EVALITA 2020 (Basile等人,2020)的TAG-it: Topic, Age和Gender预测意大利语2 (Cimino A., 2020)任务。考虑到已证明的DL能力http://www.evalita.it/ https://sites.google.com/view/
{"title":"UOBIT @ TAG-it: Exploring a Multi-faceted Representation for Profiling Age, Topic and Gender in Italian Texts","authors":"Roberto Labadie Tamayo, Daniel C. Castro, Reynier Ortega Bueno","doi":"10.4000/BOOKS.AACCADEMIA.7285","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7285","url":null,"abstract":"English. This paper describes our system for participating in the TAG-it Author Profiling task at EVALITA 2020. The task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. Our proposal combines learned representations by RNN at word and sentence levels, Transformer Neural Nets and hand-crafted stylistic features. All these representations are mixed and fed into a fully connected layer from a feed-forward neural network in order to make predictions for addressed subtasks. Experimental results show that our model achieves encouraging performance. The growing integration of social media with people’s daily live has made this medium a common environment for the deployment of technologies that allow the retrieval of useful information in the development of business activities, social outreach processes, forensic tasks, etc. That is because people frequently upload and share content in these media with various purposes such as socialization of points of view about some topic or promotion of personal business, etc. The analysis of textual information from such data, is one of the main reasons why researches become trending on the Natural Language Processing (NLP) field. However, the fact that this information varies greatly in terms of its format, even when it comes from the same person, besides textual sequences are unstructured information, make challenging the process of analyzing it automatically. Author Profiling (AP) task aims at discovering different marks or patterns (linguistic or not) from texts, that allow a user to be characterized in terms of Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). their age, gender, personality or any other demographic attribute. Many forums, due to the applicability of AP, share tasks directed to mining features that in general way, predict that valuable information. Those tasks commonly make special focus on popular languages such as English and Spanish. Nevertheless, other languages are explored on important forums too, that is the case of EVALITA 1, this one, promoting analysis of NLP tasks in the Italian language. Among the challenges from its last campaign EVALITA 2018 was the AP (in terms of gender) task GxG (Dell’Orletta and Nissim, 2018), exploring the gender-predicting issue. The analysis of age, gender and the topic a text is related with, are tasks well explored and the most approaches employ data representation based on stylistic features, n-gram representations and/or words embedding combined with Machine Learning (ML) methods like Support Vector Machine (SVM) and Random Forest (Pizarro, 2019). Also some authors by using Deep Learning (DL) models like Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) combined with stylistic features (Aragón and López-Monroy, 2018) (Bayot and Gonçalves, 2018) have yield encouraging performances. In this work we address ","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125798153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Svandiela @ HaSpeeDe: Detecting Hate Speech in Italian Twitter Data with BERT (short paper) Svandiela @ HaSpeeDe:用BERT检测意大利推特数据中的仇恨言论(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7037
Svea Klaus, Anna-Sophie Bartle, Daniela Rossmann
English. This paper explains the system developed for the Hate Speech Detection (HaSpeeDe) shared task within the 7th evaluation campaign EVALITA 2020 (Basile et al., 2020). The task solution proposed in this work is based on a fine-tuned BERT model. In cross-corpus evaluation, our model reached an F1 score of 77,56% on the tweets test set, and 60,31% on the news headlines test set. Italiano. Questo articolo spiega il sistema sviluppato per il tesk finalizzato all’individuazione dei discorsi d’odio all’interno della campagna di valutazione EVALITA 2020 (Basile et al., 2020). La soluzione proposta per il task è basata su un raffinemento di un modello BERT. Nella valutazione finale il nostro modello raggiunge un valore F1 di 77,56% sul dataset di tweets e di 60,31% sul dataset di titoli di giornale.
English。这份文件暴露了为仇恨言论探测(HaSpeeDe)开发的系统这项工作的建议是基于一个精细设计的伯特模型。在交叉形体评估中,我们的模型在推特测试集上的分数为77.56%,在新闻标题测试集上的分数为60.31%。意大利。这篇文章解释了在eveta 2020评估运动中为tesk开发的仇恨言论识别系统(Basile et al., 2020)。工作组提出的解决方案是基于改进BERT模型。在最终评估中,我们的模型在推特dataset上的F1值为77.56%,在新闻标题dataset上的F1值为60.31%。
{"title":"Svandiela @ HaSpeeDe: Detecting Hate Speech in Italian Twitter Data with BERT (short paper)","authors":"Svea Klaus, Anna-Sophie Bartle, Daniela Rossmann","doi":"10.4000/BOOKS.AACCADEMIA.7037","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7037","url":null,"abstract":"English. This paper explains the system developed for the Hate Speech Detection (HaSpeeDe) shared task within the 7th evaluation campaign EVALITA 2020 (Basile et al., 2020). The task solution proposed in this work is based on a fine-tuned BERT model. In cross-corpus evaluation, our model reached an F1 score of 77,56% on the tweets test set, and 60,31% on the news headlines test set. Italiano. Questo articolo spiega il sistema sviluppato per il tesk finalizzato all’individuazione dei discorsi d’odio all’interno della campagna di valutazione EVALITA 2020 (Basile et al., 2020). La soluzione proposta per il task è basata su un raffinemento di un modello BERT. Nella valutazione finale il nostro modello raggiunge un valore F1 di 77,56% sul dataset di tweets e di 60,31% sul dataset di titoli di giornale.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126186759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper) SentNA @ ATE_ABSITA:使用带有词汇和基于词汇的特征的增强树对客户评论进行情感分析(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.6874
F. Mele, A. Sorgente, Giuseppe Vettigli
English. This paper describes our submission to the tasks on Sentiment Analysis of ATE ABSITA (Aspect Term Extraction and Aspect-Based Sentiment Analysis). In particular, we focused on Task 3 using an approach based on combining frequency of words with lexicon-based polarities and uses Boosted Trees to predict the sentiment score. This approach achieved a competitive error and, thanks to the interpretability of the building blocks, allows us to show the what elements are considered when making the prediction. We also joined Task 1 proposing a hybrid model that joins rule-based and machine learning methodologies in order to combine the advantages of both. The model proposed for Task 1 is only preliminary. Italiano. Questo articolo descrive la nostra sottomissione ai tasks sulla Sentiment Analysis ATE ABSITA (Aspect Term Extraction and Aspect-Based Sentiment Analysis). I nostri sforzi si sono concentrati sul Task 3 per il quale abbiamo adottato gli alberi di predizione (Boosted Trees) utilizzando come features di ingresso una combinazione basata sulla frequenza delle parole con la polarità derivate da un lessico. L’approccio raggiunge un errore competitivo e, grazie all’interpretabilità dei moduli intermedi, ci consente di analizzare in dettaglio gli elementi che caratterizzano maggiormente la fase di predizione. Una proposta è stata realizzata anche per il Task 1, dove abbiamo sviluppato un modello ibrido che Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). combina un approcio basato su regole con tecniche Machine Learning. Il modello sviluppato per il Task 1 è solo in fase pre-
英语。本文描述了我们提交给ATE ABSITA的情感分析任务(方面术语提取和基于方面的情感分析)。特别地,我们在任务3中使用了一种基于单词频率与基于词汇的极性相结合的方法,并使用提升树来预测情感得分。这种方法实现了竞争性误差,并且由于构建块的可解释性,使我们能够显示在进行预测时考虑的元素。我们还加入了任务1,提出了一个混合模型,将基于规则的方法和机器学习方法结合起来,以结合两者的优势。为任务1提出的模型只是初步的。意大利语。基于方面的术语提取和基于方面的情感分析(ABSITA)。I nostri sforzi si sono concentrati sul Task 3 per il quale abbiamo adottato gli alberi di predizione (boosting Trees)利用了许多特征,例如:组合、basata、sulula、frequenza、delparole和politites派生数据等。L 'approccio raggiunge unerrorcompetitivo e, grazie ' interpretabilitcomdei moduli intermedi, i agree in detaglio gli elementi caratterizzano maggiormente la fase di predizione。Una proposta è stata realizzata anche per il Task 1, dove abbiamo sviluppato un modelello ibrido che版权所有©2020本文作者所有。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。将不接近的基础知识与技术机器学习相结合。我将建模为每个Il任务1 è单独在fase pre-
{"title":"SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper)","authors":"F. Mele, A. Sorgente, Giuseppe Vettigli","doi":"10.4000/BOOKS.AACCADEMIA.6874","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.6874","url":null,"abstract":"English. This paper describes our submission to the tasks on Sentiment Analysis of ATE ABSITA (Aspect Term Extraction and Aspect-Based Sentiment Analysis). In particular, we focused on Task 3 using an approach based on combining frequency of words with lexicon-based polarities and uses Boosted Trees to predict the sentiment score. This approach achieved a competitive error and, thanks to the interpretability of the building blocks, allows us to show the what elements are considered when making the prediction. We also joined Task 1 proposing a hybrid model that joins rule-based and machine learning methodologies in order to combine the advantages of both. The model proposed for Task 1 is only preliminary. Italiano. Questo articolo descrive la nostra sottomissione ai tasks sulla Sentiment Analysis ATE ABSITA (Aspect Term Extraction and Aspect-Based Sentiment Analysis). I nostri sforzi si sono concentrati sul Task 3 per il quale abbiamo adottato gli alberi di predizione (Boosted Trees) utilizzando come features di ingresso una combinazione basata sulla frequenza delle parole con la polarità derivate da un lessico. L’approccio raggiunge un errore competitivo e, grazie all’interpretabilità dei moduli intermedi, ci consente di analizzare in dettaglio gli elementi che caratterizzano maggiormente la fase di predizione. Una proposta è stata realizzata anche per il Task 1, dove abbiamo sviluppato un modello ibrido che Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). combina un approcio basato su regole con tecniche Machine Learning. Il modello sviluppato per il Task 1 è solo in fase pre-","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130276044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ANDI @ CONcreTEXT: Predicting Concreteness in Context for English and Italian using Distributional Models and Behavioural Norms (short paper) ANDI @ CONcreTEXT:使用分布模型和行为规范预测英语和意大利语语境中的具体性(短文)
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.7465
A. Rotaru
In this paper we describe our participation in the CONcreTEXT task of EVALITA 2020, which involved predicting subjective ratings of concreteness for words presented in context. Our approach, which ranked first in both the English and Italian subtasks, relies on a combination of context-dependent and context-independent distributional models, together with behavioural norms. We show that good results can be obtained for Italian, by first automatically translating the Italian stimuli into English, and then using existing resources for both Italian and English.
在本文中,我们描述了我们对EVALITA 2020的具体任务的参与,该任务涉及预测上下文中呈现的单词的具体程度的主观评分。我们的方法在英语和意大利语子任务中都排名第一,它依赖于上下文依赖和上下文独立分布模型的组合,以及行为规范。我们表明,通过首先自动将意大利语刺激翻译成英语,然后使用意大利语和英语的现有资源,可以获得良好的结果。
{"title":"ANDI @ CONcreTEXT: Predicting Concreteness in Context for English and Italian using Distributional Models and Behavioural Norms (short paper)","authors":"A. Rotaru","doi":"10.4000/books.aaccademia.7465","DOIUrl":"https://doi.org/10.4000/books.aaccademia.7465","url":null,"abstract":"In this paper we describe our participation in the CONcreTEXT task of EVALITA 2020, which involved predicting subjective ratings of concreteness for words presented in context. Our approach, which ranked first in both the English and Italian subtasks, relies on a combination of context-dependent and context-independent distributional models, together with behavioural norms. We show that good results can be obtained for Italian, by first automatically translating the Italian stimuli into English, and then using existing resources for both Italian and English.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PRELEARN @ EVALITA 2020: Overview of the Prerequisite Relation Learning Task for Italian PRELEARN @ EVALITA 2020:意大利语先决条件关系学习任务概述
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7518
Chiara Alzetta, Alessio Miaschi, F. Dell’Orletta, Frosina Koceva, Ilaria Torre
The Prerequisite Relation Learning (PRELEARN) task is the EVALITA 2020 shared task on concept prerequisite learning, which consists of classifying prerequisite relations between pairs of concepts distinguishing between prerequisite pairs and non-prerequisite pairs. Four sub-tasks were defined: two of them define different types of features that participants are allowed to use when training their model, while the other two define the classification scenarios where the proposed models would be tested. In total, 14 runs were submitted by 3 teams comprising 9 total individual participants.
前提关系学习(PRELEARN)任务是EVALITA 2020关于概念前提学习的共享任务,它包括对概念对之间的前提关系进行分类,区分前提对和非前提对。定义了四个子任务:其中两个定义了参与者在训练模型时允许使用的不同类型的特征,而另外两个定义了将对所提议的模型进行测试的分类场景。总共有3支队伍提交了14场比赛,其中包括9名个人参赛者。
{"title":"PRELEARN @ EVALITA 2020: Overview of the Prerequisite Relation Learning Task for Italian","authors":"Chiara Alzetta, Alessio Miaschi, F. Dell’Orletta, Frosina Koceva, Ilaria Torre","doi":"10.4000/BOOKS.AACCADEMIA.7518","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7518","url":null,"abstract":"The Prerequisite Relation Learning (PRELEARN) task is the EVALITA 2020 shared task on concept prerequisite learning, which consists of classifying prerequisite relations between pairs of concepts distinguishing between prerequisite pairs and non-prerequisite pairs. Four sub-tasks were defined: two of them define different types of features that participants are allowed to use when training their model, while the other two define the classification scenarios where the proposed models would be tested. In total, 14 runs were submitted by 3 teams comprising 9 total individual participants.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127476964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1