首页 > 最新文献

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020最新文献

英文 中文
UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper) UniBA @ KIPoS:词性标注的混合方法(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7773
Giovanni Luca Izzi, S. Ferilli
English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’operazione di Part of Speech tagging sta diventando sempre più importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene l’accuratezza dei POS tagger allo stato dell’arte raggiunga un alto livello di accuratezza (intorno al 9697%), esso non può ancora essere considerato un problema risolto perché ci Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato un’accuratezza pari a 0,8114 e 0,8100 rispettivamente per
英语。词性标注操作作为语音识别、机器翻译、句法分析和信息检索等高级操作的起点,正变得越来越重要。虽然最先进的pos标记器的准确性达到了很高的准确性水平(约96-97%),但它还不能被认为是一个解决的问题,因为有许多变量需要考虑。例如,大多数这些系统使用词汇知识为未知单词分配标签。本工作提出的任务解决方案基于混合标注器,它不使用任何先前的词汇知识,由顺序使用的两种不同类型的pos标注器组成:HMM标注器和RDRPOSTagger [(Nguyen et al., 2014), (Nguyen et al., 2016)]。我们使用Development集以及Development集和Silver集的组合来训练混合模型。结果表明,主要任务的准确率分别为0.8114和0.8100。意大利语。词性标注技术在语音识别、机器翻译、句法分析和信息检索等领域的重要研究进展più。Sebbene l 'accuratezza dei POS tagger允许statto dell 'arte raggiunga un alto livello di accuratezza (intorno 9697%), essso non può ancora essere考虑到unproblema risolto perchchci版权所有©2020本文由其作者提供。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。Sono molte变异性在考虑范围内是不存在的。与此同时,语言学家也在研究如何利用语言学家的语言能力。[[Nguyen et al., 2014], [Nguyen et al., 2016]] [font =宋体][font =宋体],[font =宋体],[font =宋体],[font =宋体],[font =宋体]。]Abbiamo adstrastrat将模型结合使用,并将开发集与开发集相结合。我认为,这是最不准确的数据来源,每年有8,814万至8,8100万人次访问
{"title":"UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)","authors":"Giovanni Luca Izzi, S. Ferilli","doi":"10.4000/BOOKS.AACCADEMIA.7773","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7773","url":null,"abstract":"English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’operazione di Part of Speech tagging sta diventando sempre più importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene l’accuratezza dei POS tagger allo stato dell’arte raggiunga un alto livello di accuratezza (intorno al 9697%), esso non può ancora essere considerato un problema risolto perché ci Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato un’accuratezza pari a 0,8114 e 0,8100 rispettivamente per","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126289629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAPISCO @ CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data CAPISCO @ CONcreTEXT 2020:(非)监督系统,用规范化数据将具体情境化
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7475
Alessandro Bondielli, Gianluca E. Lebani, Lucia C. Passaro, Alessandro Lenci
English. This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task.
英语。本文描述了几种自动评估上下文中概念具体程度的方法,以接近EVALITA 2020“CONcreTEXT”任务。我们的系统通过(i)利用带注释的资源,(ii)使用BERT掩蔽来寻找特定上下文中目标的潜在替代品,并测量其与具体和抽象质心的平均相似度,以及(iii)自动生成标记数据集以微调回归变压器模型,从而专注于单词与其周围上下文之间的相互作用。所有的方法都在英语和意大利语的数据上进行了测试。每种语言的最佳系统在任务中都排名第二。
{"title":"CAPISCO @ CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data","authors":"Alessandro Bondielli, Gianluca E. Lebani, Lucia C. Passaro, Alessandro Lenci","doi":"10.4000/BOOKS.AACCADEMIA.7475","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7475","url":null,"abstract":"English. This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125638555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
App2Check @ ATE_ABSITA 2020: Aspect Term Extraction and Aspect-based Sentiment Analysis (short paper) App2Check @ ATE_ABSITA 2020:面向术语提取和基于面向的情感分析(短论文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.6892
E. Rosa, A. Durante
In this paper we describe and present the results of the system we specifically developed and submitted for our participation to the ATE ABSITA 2020 evaluation campaign on the Aspect Term Extraction (ATE), Aspect-based Sentiment Analysis (ABSA), and Sentiment Analysis (SA) tasks. The official results show that App2Check ranks first in all of the three tasks, reaching a F1 score which is 0.14236 higher than the second best system in the ATE task and 0.11943 higher in the ABSA task; it shows a Root-MeanSquare Error (RMSE) that is 0.13075 lower than the second classified in the SA
在本文中,我们描述并展示了我们专门开发并提交给ATE ABSITA 2020评估活动的系统的结果,该活动涉及方面术语提取(ATE)、基于方面的情感分析(ABSA)和情感分析(SA)任务。官方结果显示,App2Check在三个任务中都排名第一,在ATE任务中比第二名高0.14236分,在ABSA任务中比第二名高0.11943分;它显示的均方根误差(RMSE)比SA中分类的第二个误差低0.13075
{"title":"App2Check @ ATE_ABSITA 2020: Aspect Term Extraction and Aspect-based Sentiment Analysis (short paper)","authors":"E. Rosa, A. Durante","doi":"10.4000/BOOKS.AACCADEMIA.6892","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.6892","url":null,"abstract":"In this paper we describe and present the results of the system we specifically developed and submitted for our participation to the ATE ABSITA 2020 evaluation campaign on the Aspect Term Extraction (ATE), Aspect-based Sentiment Analysis (ABSA), and Sentiment Analysis (SA) tasks. The official results show that App2Check ranks first in all of the three tasks, reaching a F1 score which is 0.14236 higher than the second best system in the ATE task and 0.11943 higher in the ABSA task; it shows a Root-MeanSquare Error (RMSE) that is 0.13075 lower than the second classified in the SA","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133619020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
UO @ HaSpeeDe2: Ensemble Model for Italian Hate Speech Detection (short paper) UO @ HaSpeeDe2:意大利语仇恨语音检测的集成模型(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7014
Mariano Jason Rodriguez Cisnero, Reynier Ortega Bueno
English. This document describes our participation in the Hate Speech Detection task at Evalita 2020. Our system is based on deep learning techniques, specifically RNNs and attention mechanism, mixed with transformer representations and linguistic features. In the training process a multi task learning was used to increase the system effectiveness. The results show how some of the selected features were not a good combination within the model. Nevertheless, the generalization level achieved yield encourage results.
英语。本文档描述了我们在Evalita 2020的仇恨言论检测任务中的参与情况。我们的系统基于深度学习技术,特别是rnn和注意机制,混合了转换表示和语言特征。在训练过程中,采用了多任务学习的方法来提高系统的有效性。结果表明,一些选择的特征在模型中不是一个很好的组合。然而,泛化水平取得了令人鼓舞的效果。
{"title":"UO @ HaSpeeDe2: Ensemble Model for Italian Hate Speech Detection (short paper)","authors":"Mariano Jason Rodriguez Cisnero, Reynier Ortega Bueno","doi":"10.4000/BOOKS.AACCADEMIA.7014","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7014","url":null,"abstract":"English. This document describes our participation in the Hate Speech Detection task at Evalita 2020. Our system is based on deep learning techniques, specifically RNNs and attention mechanism, mixed with transformer representations and linguistic features. In the training process a multi task learning was used to increase the system effectiveness. The results show how some of the selected features were not a good combination within the model. Nevertheless, the generalization level achieved yield encourage results.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SSN NLP @ SardiStance : Stance Detection from Italian Tweets using RNN and Transformers (short paper) SSN NLP @ SardiStance:使用RNN和transformer从意大利语推文中进行姿态检测(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7207
S. Kayalvizhi, D. Thenmozhi, Aravindan Chandrabose
Stance detection refers to the detection of one’s opinion about the target from their statements. The aim of sardistance task is to classify the Italian tweets into classes of favor, against or no feeling towards the target. The task has two sub-tasks : in Task A, the classification has to be done by considering only the textual meaning whereas in Task B the tweets must be classified by considering the contextual information along with the textual meaning. We have presented our solution to detect the stance utilizing only the textual meaning (Task A) using encoder-decoder model and transformers. Among these two approaches, simple transformers have performed better than the encoder-decoder model with an average F1-score of 0.4707.
立场检测是指从被测者的陈述中检测自己对被测者的看法。sardistance任务的目的是将意大利语推文分为对目标对象有好感、反对或没有好感的三类。该任务有两个子任务:在任务A中,分类必须只考虑文本含义,而在任务B中,必须考虑上下文信息和文本含义对tweet进行分类。我们已经提出了使用编码器-解码器模型和转换器仅利用文本含义检测姿态的解决方案(任务A)。在这两种方法中,简单变压器的表现优于编码器-解码器模型,平均f1得分为0.4707。
{"title":"SSN NLP @ SardiStance : Stance Detection from Italian Tweets using RNN and Transformers (short paper)","authors":"S. Kayalvizhi, D. Thenmozhi, Aravindan Chandrabose","doi":"10.4000/BOOKS.AACCADEMIA.7207","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7207","url":null,"abstract":"Stance detection refers to the detection of one’s opinion about the target from their statements. The aim of sardistance task is to classify the Italian tweets into classes of favor, against or no feeling towards the target. The task has two sub-tasks : in Task A, the classification has to be done by considering only the textual meaning whereas in Task B the tweets must be classified by considering the contextual information along with the textual meaning. We have presented our solution to detect the stance utilizing only the textual meaning (Task A) using encoder-decoder model and transformers. Among these two approaches, simple transformers have performed better than the encoder-decoder model with an average F1-score of 0.4707.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130083990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
UNITOR @ Sardistance2020: Combining Transformer-based Architectures and Transfer Learning for Robust Stance Detection UNITOR @ Sardistance2020:结合基于变压器的架构和迁移学习进行稳健的姿态检测
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7092
Simone Giorgioni, Marcello Politi, Samir Salman, R. Basili, D. Croce
English. This paper describes the UNITOR system that participated to the Stance Detection in Italian tweets (Sardistance) task within the context of EVALITA 2020. UNITOR implements a transformer-based architecture whose accuracy is improved by adopting a Transfer Learning technique. In particular, this work investigates the possible contribution of three auxiliary tasks related to Stance Detection, i.e., Sentiment Detection, Hate Speech Detection and Irony Detection. Moreover, UNITOR relies on an additional dataset automatically downloaded and labeled through distant supervision. The UNITOR system ranked first in Task A within the competition. This confirms the effectiveness of Transformer-based architectures and the beneficial impact of the adopted strategies. Italiano. Questo lavoro descrive UNITOR, uno dei sistemi partecipanti allo Stance Detection in Italian tweet (SardiStance) task. UNITOR implementa un’architettura neurale basata su Transformer, la cui accuratezza viene migliorata applicando un metodo di Transfer Learning, che sfrutta le informazioni di tre task ausiliari, ovvero Sentiment Detection, Hate Speech Detection e Irony Detection. Inoltre, l’addestramento di UNITOR puó contare su un insieme di dati scaricati ed etichettati automaticamente applicando un semplice metodo di Distant Supervision. Il sistema si é classificato al primo posto nella competizione, confermando l’efficacia delle architetture basate su Transformer e il contributo delle strategie
英语。本文描述了在EVALITA 2020背景下参与意大利语推文姿态检测(Sardistance)任务的UNITOR系统。UNITOR实现了一个基于变压器的体系结构,通过采用迁移学习技术提高了其准确性。特别地,这项工作研究了与姿态检测相关的三个辅助任务的可能贡献,即情感检测,仇恨言论检测和讽刺检测。此外,UNITOR依赖于通过远程监督自动下载和标记的额外数据集。UNITOR系统在竞赛中获得Task A第一名。这证实了基于transformer的架构的有效性以及所采用策略的有益影响。意大利语。描述UNITOR, undei系统参与了允许姿态检测的意大利语推特(SardiStance)任务。UNITOR实现了基于Transformer的神经网络架构,基于迁移学习(Transfer Learning)的迁移学习(Transfer Learning),基于迁移学习(Transfer Learning)的迁移学习(Transfer Learning),基于迁移学习(Transfer Learning)的迁移学习(Transfer Learning),基于迁移学习(transvero Sentiment Detection)的迁移学习(transvero Sentiment Detection),仇恨语音检测(Hate Speech Detection)和反语检测(Irony Detection)。因此,对UNITOR puó的管理包含了对远程监控的管理数据、自动化应用程序的管理数据和远程监控的管理方法。我的系统是一个经典的、最具竞争力的、最具效率的、以变压器为基础的、最具竞争力的系统,这将有助于我们的战略
{"title":"UNITOR @ Sardistance2020: Combining Transformer-based Architectures and Transfer Learning for Robust Stance Detection","authors":"Simone Giorgioni, Marcello Politi, Samir Salman, R. Basili, D. Croce","doi":"10.4000/BOOKS.AACCADEMIA.7092","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7092","url":null,"abstract":"English. This paper describes the UNITOR system that participated to the Stance Detection in Italian tweets (Sardistance) task within the context of EVALITA 2020. UNITOR implements a transformer-based architecture whose accuracy is improved by adopting a Transfer Learning technique. In particular, this work investigates the possible contribution of three auxiliary tasks related to Stance Detection, i.e., Sentiment Detection, Hate Speech Detection and Irony Detection. Moreover, UNITOR relies on an additional dataset automatically downloaded and labeled through distant supervision. The UNITOR system ranked first in Task A within the competition. This confirms the effectiveness of Transformer-based architectures and the beneficial impact of the adopted strategies. Italiano. Questo lavoro descrive UNITOR, uno dei sistemi partecipanti allo Stance Detection in Italian tweet (SardiStance) task. UNITOR implementa un’architettura neurale basata su Transformer, la cui accuratezza viene migliorata applicando un metodo di Transfer Learning, che sfrutta le informazioni di tre task ausiliari, ovvero Sentiment Detection, Hate Speech Detection e Irony Detection. Inoltre, l’addestramento di UNITOR puó contare su un insieme di dati scaricati ed etichettati automaticamente applicando un semplice metodo di Distant Supervision. Il sistema si é classificato al primo posto nella competizione, confermando l’efficacia delle architetture basate su Transformer e il contributo delle strategie","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131011621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
DeepReading @ SardiStance 2020: Combining Textual, Social and Emotional Features 深度阅读@ SardiStance 2020:结合文本、社交和情感特征
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7129
María S. Espinosa, Rodrigo Agerri, Álvaro Rodrigo, Roberto Centeno
In this paper we describe our participation to the SardiStance shared task held at EVALITA 2020. We developed a set of classifiers that combined text features, such as the best performing systems based on large pre-trained language models, together with user profile features, such as psychological traits and social media user interactions. The classification algorithms chosen for our models were various monolingual and multilingual Transformer models for text only classification, and XGBoost for the non-textual features. The combination of the textual and contextual models was performed by a weighted voting ensemble learning system. Our approach obtained the best score for Task B, on Contextual Stance Detection.
在本文中,我们描述了我们参与在EVALITA 2020举行的SardiStance共享任务。我们开发了一套分类器,将文本特征(如基于大型预训练语言模型的最佳表现系统)与用户配置文件特征(如心理特征和社交媒体用户交互)结合在一起。为我们的模型选择的分类算法是用于纯文本分类的各种单语言和多语言Transformer模型,以及用于非文本特征的XGBoost。文本模型和上下文模型的结合由加权投票集成学习系统完成。我们的方法在任务B的情境姿态检测中获得了最高分。
{"title":"DeepReading @ SardiStance 2020: Combining Textual, Social and Emotional Features","authors":"María S. Espinosa, Rodrigo Agerri, Álvaro Rodrigo, Roberto Centeno","doi":"10.4000/BOOKS.AACCADEMIA.7129","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7129","url":null,"abstract":"In this paper we describe our participation to the SardiStance shared task held at EVALITA 2020. We developed a set of classifiers that combined text features, such as the best performing systems based on large pre-trained language models, together with user profile features, such as psychological traits and social media user interactions. The classification algorithms chosen for our models were various monolingual and multilingual Transformer models for text only classification, and XGBoost for the non-textual features. The combination of the textual and contextual models was performed by a weighted voting ensemble learning system. Our approach obtained the best score for Task B, on Contextual Stance Detection.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"12 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114105625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
UniBO @ KIPoS: Fine-tuning the Italian "BERTology" for PoS-tagging Spoken Data (short paper) UniBO @ KIPoS:为pos标注语音数据微调意大利语“BERTology”(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7768
F. Tamburini
English. The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at applying simple fine-tuning methods for producing highperformance solutions at the EVALITA KIPOS PoS-tagging task (Bosco et al., 2020). Italian. L’utilizzazione di word embedding contestuali ha consentito notevoli incrementi nelle performance dei sistemi automatici sviluppati per affrontare vari task nell’ambito dell’elaborazione del linguaggio naturale. Recentemente sono stati introdotti alcuni nuovi modelli sviluppati specificatamente per la lingua italiana. Lo scopo di questo lavoro è valutare se un semplice fine-tuning di questi modelli sia sufficiente per ottenere performance di alto livello nel task KIPOS di EVALITA 2020.
English。允许使用内容嵌入的单词来提高几乎所有自然语言处理(NLP)应用程序的相关性能。最近为意大利人开发了一些新的特别设计的新模型。这项工作的目的是简单地调整生产高绩效解决方案的方法,在逃避KIPOS pos标签任务(Bosco et al., 2020)。英语。上下文嵌入式word的使用使为处理自然语言处理领域的几个任务而开发的自动化系统的性能有了显著的提高。最近引进了专门为意大利语开发的新模型。这项工作的目的是评估这些模型的简单微调是否足以在ev预期2020年工作队中获得高水平的性能。
{"title":"UniBO @ KIPoS: Fine-tuning the Italian \"BERTology\" for PoS-tagging Spoken Data (short paper)","authors":"F. Tamburini","doi":"10.4000/BOOKS.AACCADEMIA.7768","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7768","url":null,"abstract":"English. The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at applying simple fine-tuning methods for producing highperformance solutions at the EVALITA KIPOS PoS-tagging task (Bosco et al., 2020). Italian. L’utilizzazione di word embedding contestuali ha consentito notevoli incrementi nelle performance dei sistemi automatici sviluppati per affrontare vari task nell’ambito dell’elaborazione del linguaggio naturale. Recentemente sono stati introdotti alcuni nuovi modelli sviluppati specificatamente per la lingua italiana. Lo scopo di questo lavoro è valutare se un semplice fine-tuning di questi modelli sia sufficiente per ottenere performance di alto livello nel task KIPOS di EVALITA 2020.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116338771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SNK @ DANKMEMES: Leveraging Pretrained Embeddings for Multimodal Meme Detection (short paper) SNK @ DANKMEMES:利用预训练嵌入进行多模态模因检测(短文)
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7352
S. Fiorucci
English. In this paper, we describe and present the results of meme detection system, specifically developed and submitted for our participation to the first subtask of DANKMEMES (EVALITA 2020). We built simple classifiers, consisting in feed forward neural networks. They leverage existing pretrained embeddings, both for text and image representation. Our best system (SNK1) achieves good results in meme detection (F1 = 0.8473), ranking 2nd in the competition, at a distance of 0.0028 from the first classified. Italiano. In questo articolo, descriviamo e presentiamo i risultati di un sistema di individuazione dei meme, ideato e sviluppato per partecipare al primo subtask di DANKMEMES (EVALITA 2020). Abbiamo realizzato dei semplici classificatori, costituiti da una rete neurale feed-forward: essi sfruttano embedding preesistenti, per la rappresentazione numerica di testo e immagini. Il nostro miglior sistema (SNK1) raggiunge buoni risultati nell’individuazione dei meme (F1 = 0.8473) e si è classificato secondo nella competizione, ad una distanza di 0.0028 dal primo classificato. 1 System description 1.1 General approach and tools DANKMEMES (Miliani et al., 2020) is a task for meme recognition and hate speech/event identification in memes and is part of the EVALITA 2020 evaluation campaign (Basile et al., 2020). Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) For our participation to the first subtask of DANKEMES, we built simple classification models for meme detection. The main challenge is to effectively combine textual and image inputs. We tried to exploit the ability of pretrained embedding to represent the information present in text and images, paying a limited computational cost. To quickly build various prototypes of neural networks, we used Uber Ludwig framework (Molino et al., 2019): a toolbox built on top of TensorFlow, which facilitates and speeds up the training and testing of various models. We trained our models using Google Colaboratory, a hosted Jupyter notebook service, which provides free access to GPUs, with some resource and time limitations.
English。在这份文件中,我们描述并展示了模因探测系统的结果,这些结果是专门为我们参与DANKMEMES的第一个子任务而开发和限制的。我们建立了简单的排名,包括向前神经网络。他们对文本和图像表示进行预先培训和嵌入。我们最好的系统(SNK1)在模因探测方面取得了良好的结果(F1 = 0.8473),在比赛中排名第二,距离第一名0.0028。意大利。在这篇文章中,我们描述并展示了一个模因识别系统的结果,该系统旨在参与DANKMEMES的第一个子任务(eveta 2020)。我们开发了简单的分类器,它是一个神经网络的反馈-前置:它们利用现有的嵌入来生成文本和图像的数字表示。我们最好的系统(SNK1)在识别模因方面表现良好(F1 = 0.8473),在比赛中排名第二,距离第一名0.0028。1系统描述1.1通用方法和工具DANKMEMES (Miliani et al., 2020)是一个模因识别和仇恨言论/事件识别工作组,是2020年evaluation运动的一部分(Basile et al., 2020)。版权所有©2020 for this paper by its authors。使用知识共享许可归属4.0国际(CC BY 4.0)授权我们参与第一个DANKEMES子任务,我们为模因探测构建简单的分类模型。主要的挑战是有效地结合文本和图像输入。我们试图利用预先训练和嵌入文本和图像中的信息的能力,支付有限的计算成本。为了快速构建神经网络的不同原型,我们使用路德维希框架(Molino et al., 2019):在TensorFlow的顶部建立一个工具箱,它提供设施,并提供各种模型的培训和测试。我们训练我们的模型使用谷歌Colaboratory,一个托管Jupyter notebook服务,提供免费访问GPUs,有一些资源和时间限制。
{"title":"SNK @ DANKMEMES: Leveraging Pretrained Embeddings for Multimodal Meme Detection (short paper)","authors":"S. Fiorucci","doi":"10.4000/BOOKS.AACCADEMIA.7352","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7352","url":null,"abstract":"English. In this paper, we describe and present the results of meme detection system, specifically developed and submitted for our participation to the first subtask of DANKMEMES (EVALITA 2020). We built simple classifiers, consisting in feed forward neural networks. They leverage existing pretrained embeddings, both for text and image representation. Our best system (SNK1) achieves good results in meme detection (F1 = 0.8473), ranking 2nd in the competition, at a distance of 0.0028 from the first classified. Italiano. In questo articolo, descriviamo e presentiamo i risultati di un sistema di individuazione dei meme, ideato e sviluppato per partecipare al primo subtask di DANKMEMES (EVALITA 2020). Abbiamo realizzato dei semplici classificatori, costituiti da una rete neurale feed-forward: essi sfruttano embedding preesistenti, per la rappresentazione numerica di testo e immagini. Il nostro miglior sistema (SNK1) raggiunge buoni risultati nell’individuazione dei meme (F1 = 0.8473) e si è classificato secondo nella competizione, ad una distanza di 0.0028 dal primo classificato. 1 System description 1.1 General approach and tools DANKMEMES (Miliani et al., 2020) is a task for meme recognition and hate speech/event identification in memes and is part of the EVALITA 2020 evaluation campaign (Basile et al., 2020). Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) For our participation to the first subtask of DANKEMES, we built simple classification models for meme detection. The main challenge is to effectively combine textual and image inputs. We tried to exploit the ability of pretrained embedding to represent the information present in text and images, paying a limited computational cost. To quickly build various prototypes of neural networks, we used Uber Ludwig framework (Molino et al., 2019): a toolbox built on top of TensorFlow, which facilitates and speeds up the training and testing of various models. We trained our models using Google Colaboratory, a hosted Jupyter notebook service, which provides free access to GPUs, with some resource and time limitations.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123921639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CONcreTEXT @ EVALITA2020: The Concreteness in Context Task 语境中的具体任务
Pub Date : 1900-01-01 DOI: 10.4000/BOOKS.AACCADEMIA.7445
Lorenzo Gregori, Maria Montefinese, D. Radicioni, Andrea Amelio Ravelli, Rossella Varvara
Focus of the CONCRETEXT task is conceptual concreteness: systems were solicited to compute a value expressing to what extent target concepts are concrete (i.e., more or less perceptually salient) within a given context of occurrence. To these ends, we have developed a new dataset which was annotated with concreteness ratings and used as gold standard in the evaluation of systems. Four teams participated in this first edition of the task, with a total of 15 runs submitted. Interestingly, these works extend information on conceptual concreteness available in existing (non contextual) norms derived from human judgments with new knowledge from recently developed neural architectures, in much the same multidisciplinary spirit whereby the CONCRETEXT task was organized.
CONCRETEXT任务的重点是概念的具体性:系统被要求计算一个值,表示在给定的发生背景下目标概念在多大程度上是具体的(即,或多或少在感知上显着)。为此,我们开发了一个新的数据集,其中标注了具体等级,并将其用作系统评估的金标准。四个团队参加了这个任务的第一版,总共提交了15次运行。有趣的是,这些工作扩展了现有(非上下文)规范中可用的概念性具体信息,这些信息来自于最近开发的神经架构的新知识,与组织CONCRETEXT任务的多学科精神大致相同。
{"title":"CONcreTEXT @ EVALITA2020: The Concreteness in Context Task","authors":"Lorenzo Gregori, Maria Montefinese, D. Radicioni, Andrea Amelio Ravelli, Rossella Varvara","doi":"10.4000/BOOKS.AACCADEMIA.7445","DOIUrl":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7445","url":null,"abstract":"Focus of the CONCRETEXT task is conceptual concreteness: systems were solicited to compute a value expressing to what extent target concepts are concrete (i.e., more or less perceptually salient) within a given context of occurrence. To these ends, we have developed a new dataset which was annotated with concreteness ratings and used as gold standard in the evaluation of systems. Four teams participated in this first edition of the task, with a total of 15 runs submitted. Interestingly, these works extend information on conceptual concreteness available in existing (non contextual) norms derived from human judgments with new knowledge from recently developed neural architectures, in much the same multidisciplinary spirit whereby the CONCRETEXT task was organized.","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1