How humans and machines identify discourse topics: A methodological triangulation

Mathew Gillings , Sylvia Jaworska
{"title":"How humans and machines identify discourse topics: A methodological triangulation","authors":"Mathew Gillings ,&nbsp;Sylvia Jaworska","doi":"10.1016/j.acorp.2025.100121","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100121"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人类和机器如何识别话语主题:方法论三角测量
识别和探索文本中的话语主题不仅是语言学家感兴趣的,而且是跨社会科学全广度工作的研究人员感兴趣的。本文报告了一项探索性研究,评估了分析方法对主题识别和标签的影响,这可能导致对文本的不同解释。使用公司可持续发展报告的语料库,总共98,277个单词,我们请6位不同的研究人员对语料库进行查询,并通过四种不同的方法决定其主要“主题”:法学硕士辅助分析;主题造型;一致性分析;还有细读。这些方法根据可以一次分析的数据量、研究人员可用的文本上下文的数量以及分析的重点(即从微观到宏观)而有所不同。本文探讨了使用相同方法的分析师之间以及方法之间确定的主题是如何不同的。我们总结了一系列关于每种方法的优点和局限性的试探性观察,并为研究人员提供了选择哪种分析技术的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Corpus Linguistics
Applied Corpus Linguistics Linguistics and Language
CiteScore
1.30
自引率
0.00%
发文量
0
审稿时长
70 days
期刊最新文献
The grammar-lexicon trade-off in lexicography: Corpus-based categorization of STONE’s modifying uses The Language of Hate and Offense: Understanding Linguistic Variation in Turkish Tweets SCILIC and SCAWL: Developing a smart cities corpus and academic word list Linguistic stratification in academic publishing: A corpus-based analysis of lexicogrammatical variation across journal tiers Vocabulary in academic texts across disciplinary fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1