首页 > 最新文献

Recent Advances in Natural Language Processing最新文献

英文 中文
Towards Adaptive Text Summarization: How Does Compression Rate Affect Summary Readability of L2 Texts? 自适应文本摘要:压缩率如何影响二语文本摘要可读性?
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_145
Tatiana Vodolazova, Elena Lloret
This paper addresses the problem of readability of automatically generated summaries in the context of second language learning. For this we experimented with a new corpus of level-annotated simplified English texts. The texts were summarized using a total of 7 extractive and abstractive summarization systems with compression rates of 20%, 40%, 60% and 80%. We analyzed the generated summaries in terms of lexical, syntactic and length-based features of readability, and concluded that summary complexity depends on the compression rate, summarization technique and the nature of the summarized corpus. Our experiments demonstrate the importance of choosing appropriate summarization techniques that align with user’s needs and language proficiency.
本文探讨了在第二语言学习背景下自动生成摘要的可读性问题。为此,我们试验了一个新的语料库,其中包含有级别注释的简化英语文本。采用压缩率分别为20%、40%、60%和80%的7种抽取和抽象摘要系统对文本进行总结。我们从词法、句法和基于长度的可读性特征对生成的摘要进行了分析,得出结论:摘要的复杂性取决于压缩率、摘要技术和摘要语料库的性质。我们的实验证明了选择符合用户需求和语言熟练程度的适当摘要技术的重要性。
{"title":"Towards Adaptive Text Summarization: How Does Compression Rate Affect Summary Readability of L2 Texts?","authors":"Tatiana Vodolazova, Elena Lloret","doi":"10.26615/978-954-452-056-4_145","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_145","url":null,"abstract":"This paper addresses the problem of readability of automatically generated summaries in the context of second language learning. For this we experimented with a new corpus of level-annotated simplified English texts. The texts were summarized using a total of 7 extractive and abstractive summarization systems with compression rates of 20%, 40%, 60% and 80%. We analyzed the generated summaries in terms of lexical, syntactic and length-based features of readability, and concluded that summary complexity depends on the compression rate, summarization technique and the nature of the summarized corpus. Our experiments demonstrate the importance of choosing appropriate summarization techniques that align with user’s needs and language proficiency.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114590546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Neural Feature Extraction for Contextual Emotion Detection 上下文情感检测的神经特征提取
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_091
E. Mohammadi, Hessam Amini, Leila Kosseim
This paper describes a new approach for the task of contextual emotion detection. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a classifier, that can be neural or SVM-based. We evaluated the model with the dataset of the task 3 of SemEval 2019 (EmoContext), which includes short 3-turn conversations, tagged with 4 emotion classes. The best performing setup was achieved using ELMo word embeddings and POS tags as input, bidirectional GRU as hidden units, and an SVM as the final classifier. This configuration reached 69.93% in terms of micro-average F1 score on the main 3 emotion classes, a score that outperformed the baseline system by 11.25%.
本文描述了一种新的情境情感检测方法。该方法基于神经特征提取器,由具有注意机制的递归神经网络和基于神经或支持向量机的分类器组成。我们使用SemEval 2019的任务3 (EmoContext)的数据集对模型进行了评估,该数据集包括简短的3轮对话,标记有4种情感类别。使用ELMo词嵌入和POS标签作为输入,双向GRU作为隐藏单元,支持向量机作为最终分类器,实现了最佳性能设置。该配置在主要3种情绪类别上的微平均F1得分达到69.93%,比基线系统高出11.25%。
{"title":"Neural Feature Extraction for Contextual Emotion Detection","authors":"E. Mohammadi, Hessam Amini, Leila Kosseim","doi":"10.26615/978-954-452-056-4_091","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_091","url":null,"abstract":"This paper describes a new approach for the task of contextual emotion detection. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a classifier, that can be neural or SVM-based. We evaluated the model with the dataset of the task 3 of SemEval 2019 (EmoContext), which includes short 3-turn conversations, tagged with 4 emotion classes. The best performing setup was achieved using ELMo word embeddings and POS tags as input, bidirectional GRU as hidden units, and an SVM as the final classifier. This configuration reached 69.93% in terms of micro-average F1 score on the main 3 emotion classes, a score that outperformed the baseline system by 11.25%.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121920707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Whom to Learn From? Graph- vs. Text-based Word Embeddings 向谁学习?基于图与基于文本的词嵌入
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_120
M. Salawa, A. Branco, Ruben Branco, J. Rodrigues, Chakaveh Saedi
Vectorial representations of meaning can be supported by empirical data from diverse sources and obtained with diverse embedding approaches. This paper aims at screening this experimental space and reports on an assessment of word embeddings supported (i) by data in raw texts vs. in lexical graphs, (ii) by lexical information encoded in association- vs. inference-based graphs, and obtained (iii) by edge reconstruction- vs. matrix factorisation vs. random walk-based graph embedding methods. The results observed with these experiments indicate that the best solutions with graph-based word embeddings are very competitive, consistently outperforming mainstream text-based ones.
意义的向量表示可以由来自不同来源的经验数据支持,并通过不同的嵌入方法获得。本文旨在筛选这个实验空间,并报告对以下几种词嵌入的评估:(i)由原始文本中的数据与词汇图中支持的词嵌入,(ii)由关联图与基于推理图中编码的词汇信息支持的词嵌入,以及(iii)由边缘重建、矩阵分解与基于随机行走的图嵌入方法获得的词嵌入。通过这些实验观察到的结果表明,基于图的词嵌入的最佳解决方案非常有竞争力,始终优于主流的基于文本的词嵌入。
{"title":"Whom to Learn From? Graph- vs. Text-based Word Embeddings","authors":"M. Salawa, A. Branco, Ruben Branco, J. Rodrigues, Chakaveh Saedi","doi":"10.26615/978-954-452-056-4_120","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_120","url":null,"abstract":"Vectorial representations of meaning can be supported by empirical data from diverse sources and obtained with diverse embedding approaches. This paper aims at screening this experimental space and reports on an assessment of word embeddings supported (i) by data in raw texts vs. in lexical graphs, (ii) by lexical information encoded in association- vs. inference-based graphs, and obtained (iii) by edge reconstruction- vs. matrix factorisation vs. random walk-based graph embedding methods. The results observed with these experiments indicate that the best solutions with graph-based word embeddings are very competitive, consistently outperforming mainstream text-based ones.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Building a Morphological Analyser for Laz 构建Laz的形态分析器
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_101
Esra Onal, Francis M. Tyers
This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9% coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.
拉兹语是南高加索语系的一种语言,主要分布在土耳其东北海岸,是一种濒危语言,本研究旨在为拉兹语的记录和复兴工作做出贡献。通过使用赫尔辛基有限状态工具包(HFST)构建基于规则的词形分析器,它构成了为Laz创建词形识别和生成的通用计算模型的第一步。评估结果表明,该分析器对为本研究收集的包含111,365个令牌的语料库的覆盖率为64.9%。我们还从语料库中随机选择了100个未被分析器覆盖的标记进行了错误分析,结果表明错误主要是由于语料库中的土耳其语单词和词典中缺失的词干造成的。
{"title":"Building a Morphological Analyser for Laz","authors":"Esra Onal, Francis M. Tyers","doi":"10.26615/978-954-452-056-4_101","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_101","url":null,"abstract":"This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9% coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124033400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Summarizing Legal Rulings: Comparative Experiments 法律裁决总结:比较实验
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_036
Diego de Vargas Feijó, V. Moreira
In the context of text summarization, texts in the legal domain have peculiarities related to their length and to their specialized vocabulary. Recent neural network-based approaches can achieve high-quality scores for text summarization. However, these approaches have been used mostly for generating very short abstracts for news articles. Thus, their applicability to the legal domain remains an open issue. In this work, we experimented with ten extractive and four abstractive models in a real dataset of legal rulings. These models were compared with an extractive baseline based on heuristics to select the most relevant parts of the text. Our results show that abstractive approaches significantly outperform extractive methods in terms of ROUGE scores.
在文本摘要的语境下,法律领域的文本在其长度和专业词汇方面具有特殊性。最近基于神经网络的方法可以获得高质量的文本摘要分数。然而,这些方法主要用于为新闻文章生成非常短的摘要。因此,它们对法律领域的适用性仍然是一个悬而未决的问题。在这项工作中,我们在一个真实的法律裁决数据集中实验了10个抽取模型和4个抽象模型。将这些模型与基于启发式的提取基线进行比较,以选择文本中最相关的部分。我们的研究结果表明,抽象方法在ROUGE得分方面明显优于提取方法。
{"title":"Summarizing Legal Rulings: Comparative Experiments","authors":"Diego de Vargas Feijó, V. Moreira","doi":"10.26615/978-954-452-056-4_036","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_036","url":null,"abstract":"In the context of text summarization, texts in the legal domain have peculiarities related to their length and to their specialized vocabulary. Recent neural network-based approaches can achieve high-quality scores for text summarization. However, these approaches have been used mostly for generating very short abstracts for news articles. Thus, their applicability to the legal domain remains an open issue. In this work, we experimented with ten extractive and four abstractive models in a real dataset of legal rulings. These models were compared with an extractive baseline based on heuristics to select the most relevant parts of the text. Our results show that abstractive approaches significantly outperform extractive methods in terms of ROUGE scores.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125783613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Turning silver into gold: error-focused corpus reannotation with active learning 化银为金:基于主动学习的以错误为中心的语料库重新标注
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_088
P. Ménard, A. Mougeot
While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems’ performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.
虽然高质量的金标准标注语料库对于自然语言处理中的大多数任务至关重要,但近年来由注释器或工具创建的许多标注语料库都包含噪声注释。这些语料库可以被视为更多的银而不是金标准,即使它们被用于评估活动或比较系统的性能。由于将一个银级语料库升级到金级语料库仍然是一个挑战,我们探索了主动学习技术的应用,使用四个为文档分类和词性标注设计的数据集来检测错误。我们的结果表明,与随机选择相比,提出的播种步骤方法将发现错误注释的机会提高了2.73倍,比基线方法提高了14.71%。我们的查询方法对随机选择的错误检测精度平均提高了1.78倍,与其他查询方法相比提高了61.82%。
{"title":"Turning silver into gold: error-focused corpus reannotation with active learning","authors":"P. Ménard, A. Mougeot","doi":"10.26615/978-954-452-056-4_088","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_088","url":null,"abstract":"While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems’ performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124609558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Risk Factors Extraction from Clinical Texts based on Linked Open Data 基于关联开放数据的临床文献风险因素提取
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_019
S. Boytcheva, G. Angelova, Zhivko Angelov
This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of “meta-knowledge” about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD–10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.
本文介绍了基于关联开放数据(LOD)增强的临床文本的风险因素分析实验。这个想法是通过分析病人的门诊记录来确定病人是否有特定疾病的危险因素。构建了一个关于感兴趣疾病的“元知识”语义图,其中集成了来自Wikidata、PubMed、Wikipedia和MESH的症状、风险因素等多语言术语(标签),并通过ICD-10代码与个体患者的临床记录相关联。然后训练一个预测模型来预测患者是否有患感兴趣的疾病的风险。测试使用了2011-2016年期间全国存储库中的门诊记录。结果表明,当临床文本中添加LOD资源时,所有测试算法(kNN,朴素贝叶斯,树,逻辑回归,ANN)的整体性能都有所提高。
{"title":"Risk Factors Extraction from Clinical Texts based on Linked Open Data","authors":"S. Boytcheva, G. Angelova, Zhivko Angelov","doi":"10.26615/978-954-452-056-4_019","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_019","url":null,"abstract":"This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of “meta-knowledge” about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD–10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127079370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Porting Multilingual Morphological Resources to OntoLex-Lemon 将多语言形态学资源移植到OntoLex-Lemon
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_027
Thierry Declerck, Stefania Racioppa
We describe work consisting in porting various morphological resources to the OntoLex-Lemon model. A main objective of this work is to offer a uniform representation of different morphological data sets in order to be able to compare and interlink multilingual resources and to cross-check and interlink or merge the content of morphological resources of one and the same language. The results of our work will be published on the Linguistic Linked Open Data cloud.
我们描述了将各种形态学资源移植到OntoLex-Lemon模型中的工作。这项工作的主要目标是提供不同形态数据集的统一表示,以便能够比较和互连多语言资源,并交叉检查和互连或合并同一语言的形态资源的内容。我们的工作结果将在语言关联开放数据云上发表。
{"title":"Porting Multilingual Morphological Resources to OntoLex-Lemon","authors":"Thierry Declerck, Stefania Racioppa","doi":"10.26615/978-954-452-056-4_027","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_027","url":null,"abstract":"We describe work consisting in porting various morphological resources to the OntoLex-Lemon model. A main objective of this work is to offer a uniform representation of different morphological data sets in order to be able to compare and interlink multilingual resources and to cross-check and interlink or merge the content of morphological resources of one and the same language. The results of our work will be published on the Linguistic Linked Open Data cloud.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127396836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Survey of the Perceived Text Adaptation Needs of Adults with Autism 自闭症成人文本适应需求感知的调查研究
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_155
Victoria Yaneva, Constantin Orasan, L. Ha, N. Ponomareva
NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world user-generated texts such as product reviews, which cannot be controlled editorially and are thus a particularly good applcation for automatic text adaptation systems. In this paper we present a mixed-methods survey conducted with 24 adult web-users diagnosed with autism and an age-matched control group of 33 neurotypical participants. The aim of the survey was to identify whether the group with autism experienced any barriers when reading online reviews, what these potential barriers were, and what NLP methods would be best suited to improve the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, and the use of irony and sarcasm.
自动文本适应的NLP方法通常依赖于用户需求指南,这些指南是通用的,并且没有考虑到不同类型目标群体之间的差异。其中一个群体是患有高功能自闭症的成年人,他们通常能够阅读长句子和理解困难的单词,但他们的理解可能会受到其他语言结构的阻碍。这对于现实世界中用户生成的文本(如产品评论)来说尤其具有挑战性,因为这些文本无法编辑控制,因此是自动文本适应系统的一个特别好的应用程序。在本文中,我们提出了一项混合方法的调查,对24名诊断为自闭症的成年网络用户和33名年龄匹配的神经正常参与者进行了调查。这项调查的目的是确定自闭症患者在阅读在线评论时是否遇到了任何障碍,这些潜在的障碍是什么,以及哪种NLP方法最适合改善自闭症患者在线评论的可访问性。与对照组相比,自闭症组在理解在线产品评论方面一直表现出更大的困难,并发现了与文本长度、糟糕的主题组织以及讽刺和讽刺的使用有关的问题。
{"title":"A Survey of the Perceived Text Adaptation Needs of Adults with Autism","authors":"Victoria Yaneva, Constantin Orasan, L. Ha, N. Ponomareva","doi":"10.26615/978-954-452-056-4_155","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_155","url":null,"abstract":"NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world user-generated texts such as product reviews, which cannot be controlled editorially and are thus a particularly good applcation for automatic text adaptation systems. In this paper we present a mixed-methods survey conducted with 24 adult web-users diagnosed with autism and an age-matched control group of 33 neurotypical participants. The aim of the survey was to identify whether the group with autism experienced any barriers when reading online reviews, what these potential barriers were, and what NLP methods would be best suited to improve the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, and the use of irony and sarcasm.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132328061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Classification-Based Approach to Cognate Detection Combining Orthographic and Semantic Similarity Information 一种结合正字法和语义相似度信息的基于分类的同源词检测方法
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_071
Sofie Labat, Els Lefever
This paper presents proof-of-concept experiments for combining orthographic and semantic information to distinguish cognates from non-cognates. To this end, a context-independent gold standard is developed by manually labelling English-Dutch pairs of cognates and false friends in bilingual term lists. These annotated cognate pairs are then used to train and evaluate a supervised binary classification system for the automatic detection of cognates. Two types of information sources are incorporated in the classifier: fifteen string similarity metrics capture form similarity between source and target words, while word embeddings model semantic similarity between the words. The experimental results show that even though the system already achieves good results by only incorporating orthographic information, the performance further improves by including semantic information in the form of embeddings.
本文提出了结合正字法和语义信息来区分同源词和非同源词的概念验证实验。为此,通过在双语术语列表中手动标记英语-荷兰语同源词对和假朋友词,开发了一个与上下文无关的黄金标准。然后使用这些标注的同源词对来训练和评估用于自动检测同源词的监督二元分类系统。分类器中包含两种类型的信息源:15个字符串相似度度量捕获源词和目标词之间的形式相似度,而词嵌入建模词之间的语义相似度。实验结果表明,虽然系统在仅加入正字法信息时已经取得了较好的效果,但在以嵌入的形式加入语义信息后,性能进一步提高。
{"title":"A Classification-Based Approach to Cognate Detection Combining Orthographic and Semantic Similarity Information","authors":"Sofie Labat, Els Lefever","doi":"10.26615/978-954-452-056-4_071","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_071","url":null,"abstract":"This paper presents proof-of-concept experiments for combining orthographic and semantic information to distinguish cognates from non-cognates. To this end, a context-independent gold standard is developed by manually labelling English-Dutch pairs of cognates and false friends in bilingual term lists. These annotated cognate pairs are then used to train and evaluate a supervised binary classification system for the automatic detection of cognates. Two types of information sources are incorporated in the classifier: fifteen string similarity metrics capture form similarity between source and target words, while word embeddings model semantic similarity between the words. The experimental results show that even though the system already achieves good results by only incorporating orthographic information, the performance further improves by including semantic information in the form of embeddings.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127618446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Recent Advances in Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1