Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020最新文献

英文中文

Natural Language Generation in Dialogue Systems for Customer Care 客户服务对话系统中的自然语言生成

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8450

Mirko Di Lascio, M. Sanguinetti, Luca Anselma, Dario Mana, A. Mazzei, V. Patti, R. Simeoni

English. In this paper we discuss the role of natural language generation (NLG) in modern dialogue systems (DSs). In particular, we will study the role that a linguistically sound NLG architecture can have in a DS. Using real examples from a new corpus of dialogue in customer-care domain, we will study how the non-linguistic contextual data can be exploited by using NLG.

英语。本文讨论了自然语言生成(NLG)在现代对话系统中的作用。特别地，我们将研究语言上合理的NLG体系结构在DS中所起的作用。使用来自客户服务领域的新对话语料库的真实示例，我们将研究如何通过使用NLG来利用非语言上下文数据。

引用次数: 4

Predicting Social Exclusion: A Study of Linguistic Ostracism in Social Networks 预测社会排斥:社会网络中的语言排斥研究

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8353

Greta Gandolfi, C. Strapparava

Ostracism is a community-level phenomenon, shared by most social animals, including humans. Its detection plays a crucial role for the individual, with possible evolutionary consequences for the species. Considering (1) its bound with communication and (2) its social nature, we hypothesise the combination of (a) linguistic and (b) community-level features to have a positive impact on the automatic recognition of ostracism in human online communities. We model an English linguistic community through Reddit data and we analyse the performance of simple classification algorithms. We show how models based on the combination of (a) and (b) generally outperform the same architectures when fed by (a) or (b) in isolation.1

排斥是一种社区层面的现象，大多数群居动物，包括人类，都有这种现象。它的检测对个体起着至关重要的作用，可能对物种的进化产生影响。考虑到(1)它与交流的联系和(2)它的社会性质，我们假设(a)语言和(b)社区层面特征的结合对人类在线社区中排斥的自动识别产生了积极的影响。我们通过Reddit数据建立了一个英语语言社区模型，并分析了简单分类算法的性能。我们展示了当(a)或(b)单独提供时，基于(a)和(b)组合的模型通常如何优于相同的架构

引用次数: 0

Monitoring Social Media to Identify Environmental Crimes through NLP. A preliminary study 监测社交媒体，通过NLP识别环境犯罪。初步研究

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8675

Raffaele Manna, A. Pascucci, Wanda Punzi Zarino, Vincenzo Simoniello, J. Monti

This paper presents the results of research carried out on the UNIOR Eye corpus, a corpus which has been built by down-loading tweets related to environmental crimes. The corpus is made up of 228,412 tweets organized into four different sub-sections, each one concerning a speciﬁc environmental crime. For the current study we focused on the subsection of waste crimes, composed of 86,206 tweets which were tagged according to the two labels alert and no alert . The aim is to build a model able to detect which class a tweet belongs to.

本文介绍了对UNIOR Eye语料库的研究结果，该语料库是通过下载与环境犯罪相关的推文而建立的。该语料库由228,412条推文组成，分为四个不同的子部分，每个子部分都涉及特定的环境犯罪。对于目前的研究，我们专注于浪费犯罪的子部分，由86,206条推文组成，这些推文根据两个标签标记为警报和无警报。其目的是建立一个能够检测推文所属类别的模型。

引用次数: 1

Is Neural Language Model Perplexity Related to Readability? 神经语言模型困惑与可读性有关吗?

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8743

Alessio Miaschi, Chiara Alzetta, D. Brunato, F. Dell’Orletta, Giulia Venturi

This paper explores the relationship between Neural Language Model (NLM) perplexity and sentence readability. Start-ing from the evidence that NLMs implicitly acquire sophisticated linguistic knowledge from a huge amount of training data, our goal is to investigate whether perplexity is affected by linguistic features used to automatically assess sentence readability and if there is a correlation between the two metrics. Our ﬁndings suggest that this correlation is actually quite weak and the two metrics are affected by different linguistic phenomena. 1

本文探讨了神经语言模型(NLM)困惑与句子可读性之间的关系。从nlm隐含地从大量训练数据中获取复杂的语言知识的证据开始，我们的目标是调查用于自动评估句子可读性的语言特征是否会影响困惑，以及这两个指标之间是否存在相关性。我们的研究结果表明，这种相关性实际上很弱，这两个指标受到不同语言现象的影响。1

引用次数: 1

A Resource for Detecting Misspellings and Denoising Medical Text Data 医学文本数据的拼写错误检测和去噪资源

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8728

Enrico Mensa, G. Marino, Davide Colla, Matteo Delsanto, Daniele P. Radicioni

English. In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.1 Italiano. In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è cosı̀ rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche.

English。在这份文件中，我们提出了一种收集一份措辞强硬的医学文本文件的方法。像这样的意大利紧急情况室报告的质量如此之差，以至于在大多数情况下，这些文件几乎无法自动处理;这也是对其他语言的保留，与此有明显的区别在这项工作中，我们介绍并评估了旨在填补这一空白的资源意大利。在这项工作中，我们提出了一种建立一本专门用于处理医疗记录的词典的方法，即在急诊室记录的医疗记录的一部分。这类文件是cosı̀嘈杂,一般来说,很难病历直接自动的方式处理的。虽然清理这类文件的问题是一个重要和广泛的问题，但没有完整的字典来处理这种部门语言。在这项工作中，我们建议并评估一种资源，以便在医疗记录中进行这种处理。

{"title":"A Resource for Detecting Misspellings and Denoising Medical Text Data","authors":"Enrico Mensa, G. Marino, Davide Colla, Matteo Delsanto, Daniele P. Radicioni","doi":"10.4000/books.aaccademia.8728","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8728","url":null,"abstract":"English. In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.1 Italiano. In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è cosı̀ rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Clustering verbal Objects: Manual and Automatic Procedures Compared 聚类语言对象:人工和自动程序的比较

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8403

Ilaria Colucci, Elisabetta Jezek, V. Baisa

As highlighted by Pustejovsky (1995, 2002), the semantics of each verb is determined by the totality of its complementation patterns. Arguments play in fact a fundamental role in verb meaning and verbal polysemy, thanks to the sense co-composition principle between verb and argument. For this reason, clustering of lexical items filling the Object slot of a verb is believed to bring to surface relevant information about verbal meaning and the verb-Objects relation. The paper presents the results of an experiment comparing the automatic clustering of direct Objects operated by the agglomerative hierarchical algorithm of the Sketch Engine corpus tool with the manual clustering of direct Objects carried out in the T-PAS resource. Cluster analysis is here used to improve the semantic quality of automatic clusters against expert human intuition and as an investigation tool of phenomena intrinsic to semantic selection of verbs and the construction of verb senses in context.

正如Pustejovsky(1995,2002)所强调的，每个动词的语义是由其补语模式的总和决定的。论元在动词意义和动词多义性中起着重要的作用，这主要是由于动词和论元之间的意义共合原则。因此，填满动词的宾语槽的词汇项聚类可以使动词意义和动词-宾语关系的相关信息浮出水面。本文对Sketch Engine语料库工具的聚类层次算法进行的直接对象自动聚类与T-PAS资源中进行的直接对象手动聚类进行了实验比较。聚类分析在这里被用来提高自动聚类的语义质量，以对抗人类的专家直觉，并作为研究动词语义选择和语境中动词意义构建固有现象的工具。

引用次数: 0

Gender Bias in Italian Word Embeddings 意大利语词嵌入中的性别偏见

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8280

Davide Biasion, Alessandro Fabris, Gianmaria Silvello, Gian Antonio Susto

In this work we study gender bias in Italian word embeddings (WEs), evaluating whether they encode gender stereotypes studied in social psychology or present in the labor market. We find strong associations with gender in job-related WEs. Weaker gender stereotypes are present in other domains where grammatical gender plays a significant role.

在这项工作中，我们研究了意大利语词嵌入(WEs)中的性别偏见，评估它们是否编码了社会心理学研究或劳动力市场中存在的性别刻板印象。我们发现与工作相关的WEs与性别有很强的相关性。在语法性别发挥重要作用的其他领域，存在较弱的性别刻板印象。

引用次数: 2

ItaGLAM: A corpus of Cultural Communication on Twitter during the Pandemic ItaGLAM:大流行期间Twitter上的文化交流语料库

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8760

Gennaro Nolano, Carola Carlino, Maria Pia di Buono, J. Monti

This paper describes the compilation and annotation of ItaGLAM, a corpus of tweets written by Italian Galleries, Li-breries, Archives and Museums (GLAMs) during the lockdown period in Italy due to the COVID-19 pandemic ItaGLAM has been annotated with a set of labels which may be useful to identify different types of communication Furthermore, the collected data have been used to train a set of classifiers The results are analyzed to evaluate the information flow between GLAM and users and to analyze cultural communication on the Web Copyright © 2020 for this paper by its authors

引用次数: 0

The Style of a Successful Story: a Computational Study on the Fanfiction Genre 成功故事的风格:同人小说类型的计算研究

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8718

Andrea Mattei, D. Brunato, F. Dell’Orletta

This paper presents a new corpus for the Italian language representative of the fanfiction genre. It comprises about 55k usergenerated stories inspired to the original fantasy saga “Harry Potter” and published on a popular website. The corpus is large enough to support data-driven investigations in many directions, from more traditional studies on language variation aimed at characterizing this genre with respect to more traditional ones, to emerging topics in computational social science such as the identification of factors involved in the success of a story. The latter is the focus of the presented case-study, in which a wide set of multi-level linguistic features has been automatically extracted from a subset of the corpus and analysed in order to detect the ones which significantly discriminate successful from unsuccessful

本文提出了一个新的意大利语同人小说语料库。它由大约5.5万个用户创作的故事组成，这些故事的灵感来自原著奇幻传奇《哈利波特》，并发表在一个受欢迎的网站上。语料库足够大，可以支持在许多方向上进行数据驱动的调查，从更传统的语言变异研究，到计算社会科学中的新兴主题，如确定故事成功的因素。后者是本案例研究的重点，在该案例研究中，从语料库的一个子集中自动提取了一组广泛的多层次语言特征，并对其进行了分析，以检测那些显著区分成功和不成功的语言特征

引用次数: 2

You Don’t Say… Linguistic Features in Sarcasm Detection You Don’t Say…反讽检测中的语言特征

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8485

Martina Ducret, Lauren Kruse, Carlos Martinez, Anna Feldman, Jing Peng

We explore linguistic features that contribute to sarcasm detection. The linguistic features that we investigate are a combination of text and word complexity, stylistic and psychological features. We experiment with sarcastic tweets with and without context. The results of our experiments indicate that contextual information is crucial for sarcasm prediction. One important observation is that sarcastic tweets are typically incongruent with their context in terms of sentiment or emotional load.

我们探索有助于讽刺检测的语言特征。我们研究的语言特征是语篇和词的复杂性、文体和心理特征的结合。我们用带有或不带有背景的讽刺推文做实验。我们的实验结果表明，语境信息对讽刺预测至关重要。一个重要的观察是，讽刺的推文在情绪或情绪负荷方面通常与上下文不一致。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀