How humans and machines identify discourse topics: A methodological triangulation

IF 2.1 Applied Corpus Linguistics Pub Date : 2025-01-16 DOI:10.1016/j.acorp.2025.100121

Mathew Gillings , Sylvia Jaworska

{"title":"How humans and machines identify discourse topics: A methodological triangulation","authors":"Mathew Gillings , Sylvia Jaworska","doi":"10.1016/j.acorp.2025.100121","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100121"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人类和机器如何识别话语主题：方法论三角测量

识别和探索文本中的话语主题不仅是语言学家感兴趣的，而且是跨社会科学全广度工作的研究人员感兴趣的。本文报告了一项探索性研究，评估了分析方法对主题识别和标签的影响，这可能导致对文本的不同解释。使用公司可持续发展报告的语料库，总共98,277个单词，我们请6位不同的研究人员对语料库进行查询，并通过四种不同的方法决定其主要“主题”：法学硕士辅助分析；主题造型;一致性分析;还有细读。这些方法根据可以一次分析的数据量、研究人员可用的文本上下文的数量以及分析的重点（即从微观到宏观）而有所不同。本文探讨了使用相同方法的分析师之间以及方法之间确定的主题是如何不同的。我们总结了一系列关于每种方法的优点和局限性的试探性观察，并为研究人员提供了选择哪种分析技术的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊