自动可读性评估的BERT嵌入

Recent Advances in Natural Language Processing Pub Date : 2021-06-15 DOI:10.26615/978-954-452-072-4_069

Joseph Marvin Imperial

{"title":"自动可读性评估的BERT嵌入","authors":"Joseph Marvin Imperial","doi":"10.26615/978-954-452-072-4_069","DOIUrl":null,"url":null,"abstract":"Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"BERT Embeddings for Automatic Readability Assessment\",\"authors\":\"Joseph Marvin Imperial\",\"doi\":\"10.26615/978-954-452-072-4_069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.\",\"PeriodicalId\":284493,\"journal\":{\"name\":\"Recent Advances in Natural Language Processing\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26615/978-954-452-072-4_069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-954-452-072-4_069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

自动可读性评估(ARA)是为目标受众评估文本文档的难易程度的任务。对于研究人员来说，该领域的许多开放问题之一是使这些为任务训练的模型即使对低资源语言也有效。在这项研究中，我们提出了一种替代方法，通过一种结合可读性评估的方法，利用BERT模型与手工语言特征的丰富信息嵌入。结果表明，该方法在使用英语和菲律宾语数据集的可读性评估方面优于经典方法，F1性能提高高达12.4%。我们还表明，在BERT嵌入中编码的一般信息可以用作低资源语言(如菲律宾语)的替代特征集，这些语言具有有限的语义和句法NLP工具，可以显式地提取任务的特征值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BERT Embeddings for Automatic Readability Assessment

Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Recent Advances in Natural Language Processing

自引率

0.00%

发文量