用于语料库话语研究方法的生成式人工智能：对 ChatGPT 的批判性评估

Applied Corpus Linguistics Pub Date : 2023-12-19 DOI:10.1016/j.acorp.2023.100082

Niall Curry , Paul Baker , Gavin Brookes

{"title":"用于语料库话语研究方法的生成式人工智能：对 ChatGPT 的批判性评估","authors":"Niall Curry , Paul Baker , Gavin Brookes","doi":"10.1016/j.acorp.2023.100082","DOIUrl":null,"url":null,"abstract":"<div><p>This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.</p><p>The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100082"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000424/pdfft?md5=ae9708bc5113ac915574372c9ad6a9d7&pid=1-s2.0-S2666799123000424-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT\",\"authors\":\"Niall Curry , Paul Baker , Gavin Brookes\",\"doi\":\"10.1016/j.acorp.2023.100082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.</p><p>The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.</p></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"4 1\",\"pages\":\"Article 100082\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666799123000424/pdfft?md5=ae9708bc5113ac915574372c9ad6a9d7&pid=1-s2.0-S2666799123000424-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799123000424\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799123000424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了生成式人工智能技术（特别是 ChatGPT）在推动语料库方法用于话语研究方面的潜力。无论是在语料库语言学还是在话语分析方面，人工智能技术对语言学研究的贡献都是变革性的。然而，人工智能技术在进行自动定性分析方面的不足限制了其在语料库研究中的应用。鉴于数据分析中的新技术可以替代和补充现有方法，并考虑到 ChatGPT 在自动定性分析中的潜在能力，本文介绍了三项复制案例研究，旨在调查 ChatGPT 在使用语料库方法进行话语分析的研究中支持自动定性分析的适用性。研究结果表明，一般来说，ChatGPT 在对关键词进行语义分类时表现相当不错；但是，由于分类是基于非语境化的关键词进行的，因此分类可能会显得相当通用，从而限制了这种方法在分析代表专门流派和/或语境的语料库时的价值。ChatGPT 在协和分析方面表现不佳，因为其结果包括对协和行的错误推断，有时还会修改输入数据。最后，在功能到形式分析方面，ChatGPT 的表现也很差，因为它无法识别和分析直接和间接问题。总之，研究结果对 ChatGPT 在语料库方法中支持自动定性分析的能力提出了质疑，表明了可重复性和可复制性问题、围绕数据完整性的伦理挑战以及使用非确定性技术进行实证语言学研究的相关挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT

This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.

The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊