{"title":"利用书面和口语 L2 样本评估 NLP 模型","authors":"Kristopher Kyle , Masaki Eguchi","doi":"10.1016/j.rmal.2024.100120","DOIUrl":null,"url":null,"abstract":"<div><p>The use of natural language processing tools such as part-of-speech taggers and syntactic parsers are increasingly being used in studies of second language (L2) proficiency and development. However, relatively little work has focused on reporting on the accuracy of these tools or optimizing their performance in L2 contexts. While some studies reference the published overall accuracy of a particular tool or include a small-scale accuracy analysis, very few (if any) studies provide a comprehensive account of the performance of taggers and parsers across a range of written and spoken registers. In this study, we provide a large-scale accuracy analysis of popular taggers and parsers across L1 and L2 written and spoken texts, both when default and L2-optimized models are used. Accuracy is examined both at the feature level (e.g., identifying adjective-noun relationships) and the text level (e.g., mean mutualinformation scores). The results highlight the strength and weaknesses of these tools.</p></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"3 2","pages":"Article 100120"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating NLP models with written and spoken L2 samples\",\"authors\":\"Kristopher Kyle , Masaki Eguchi\",\"doi\":\"10.1016/j.rmal.2024.100120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The use of natural language processing tools such as part-of-speech taggers and syntactic parsers are increasingly being used in studies of second language (L2) proficiency and development. However, relatively little work has focused on reporting on the accuracy of these tools or optimizing their performance in L2 contexts. While some studies reference the published overall accuracy of a particular tool or include a small-scale accuracy analysis, very few (if any) studies provide a comprehensive account of the performance of taggers and parsers across a range of written and spoken registers. In this study, we provide a large-scale accuracy analysis of popular taggers and parsers across L1 and L2 written and spoken texts, both when default and L2-optimized models are used. Accuracy is examined both at the feature level (e.g., identifying adjective-noun relationships) and the text level (e.g., mean mutualinformation scores). The results highlight the strength and weaknesses of these tools.</p></div>\",\"PeriodicalId\":101075,\"journal\":{\"name\":\"Research Methods in Applied Linguistics\",\"volume\":\"3 2\",\"pages\":\"Article 100120\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research Methods in Applied Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772766124000260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766124000260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating NLP models with written and spoken L2 samples
The use of natural language processing tools such as part-of-speech taggers and syntactic parsers are increasingly being used in studies of second language (L2) proficiency and development. However, relatively little work has focused on reporting on the accuracy of these tools or optimizing their performance in L2 contexts. While some studies reference the published overall accuracy of a particular tool or include a small-scale accuracy analysis, very few (if any) studies provide a comprehensive account of the performance of taggers and parsers across a range of written and spoken registers. In this study, we provide a large-scale accuracy analysis of popular taggers and parsers across L1 and L2 written and spoken texts, both when default and L2-optimized models are used. Accuracy is examined both at the feature level (e.g., identifying adjective-noun relationships) and the text level (e.g., mean mutualinformation scores). The results highlight the strength and weaknesses of these tools.