首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
‘I am still unsure…’ – Spontaneous expressions of vaccine indecision on Mumsnet
Pub Date : 2025-01-29 DOI: 10.1016/j.acorp.2025.100122
Zsófia Demjén , Vaclav Brezina , Tara Coltman-Patel , William Dance , Richard Gleave , Claire Hardaker , Elena Semino
Vaccination programmes in 90 % of countries in the world have been affected by ‘vaccine hesitancy’. Childhood vaccinations are particularly important. Internationally, these vaccination rates have been declining, resulting in the resurfacing of communicable diseases previously considered eliminated. In this context, our paper examines parents’ unelicited expressions of vaccine indecision – dilemmas, hesitations and concerns related to vaccinating at the point of decision-making.
Our corpus-assisted discourse study combines discourse analysis and the qualitative and quantitative tools of corpus linguistics (text dispersion keywords and concordancing) to compare 422 Original Posts from the forum of the UK-based parenting website Mumsnet that outline vaccine indecision to vaccination discussions that do not involve decision-making difficulties. We examine what characterises authentic vaccine indecision in the localised context of Mumsnet users. Specifically, we analyse which vaccines Mumsnet users are undecided about; what concerns are linked to indecision specifically; and how such concerns are raised in a generally pro-vaccination online space. Our method combines the advantages of analysing large datasets with those of nuanced and localised qualitative analysis.
The vaccine that most consistently concerns parents is MMR. Concerns about other vaccines fluctuate with disease outbreaks and the introduction of new vaccines. The concerns linked to indecision specifically are mostly individual and vaccine-specific, and include the mode and timing of vaccinations, particular personal and family circumstances and a rather unspecific notion of side effects. Mumsnet users invite details about others' personal experiences to fill a need left by widely available general vaccine information. The implication is that health services may need to redirect resources from mass population level campaigns to more personalised and tailored approaches for parents who are hesitant about specific vaccines at particular points in time.
{"title":"‘I am still unsure…’ – Spontaneous expressions of vaccine indecision on Mumsnet","authors":"Zsófia Demjén ,&nbsp;Vaclav Brezina ,&nbsp;Tara Coltman-Patel ,&nbsp;William Dance ,&nbsp;Richard Gleave ,&nbsp;Claire Hardaker ,&nbsp;Elena Semino","doi":"10.1016/j.acorp.2025.100122","DOIUrl":"10.1016/j.acorp.2025.100122","url":null,"abstract":"<div><div>Vaccination programmes in 90 % of countries in the world have been affected by ‘vaccine hesitancy’. Childhood vaccinations are particularly important. Internationally, these vaccination rates have been declining, resulting in the resurfacing of communicable diseases previously considered eliminated. In this context, our paper examines parents’ unelicited expressions of vaccine indecision – dilemmas, hesitations and concerns related to vaccinating at the point of decision-making.</div><div>Our corpus-assisted discourse study combines discourse analysis and the qualitative and quantitative tools of corpus linguistics (text dispersion keywords and concordancing) to compare 422 Original Posts from the forum of the UK-based parenting website Mumsnet that outline vaccine indecision to vaccination discussions that do not involve decision-making difficulties. We examine what characterises authentic vaccine indecision in the localised context of Mumsnet users. Specifically, we analyse which vaccines Mumsnet users are undecided about; what concerns are linked to indecision specifically; and how such concerns are raised in a generally pro-vaccination online space. Our method combines the advantages of analysing large datasets with those of nuanced and localised qualitative analysis.</div><div>The vaccine that most consistently concerns parents is MMR. Concerns about other vaccines fluctuate with disease outbreaks and the introduction of new vaccines. The concerns linked to indecision specifically are mostly individual and vaccine-specific, and include the mode and timing of vaccinations, particular personal and family circumstances and a rather unspecific notion of side effects. Mumsnet users invite details about others' personal experiences to fill a need left by widely available general vaccine information. The implication is that health services may need to redirect resources from mass population level campaigns to more personalised and tailored approaches for parents who are hesitant about specific vaccines at particular points in time.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100122"},"PeriodicalIF":0.0,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143210206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How humans and machines identify discourse topics: A methodological triangulation
Pub Date : 2025-01-16 DOI: 10.1016/j.acorp.2025.100121
Mathew Gillings , Sylvia Jaworska
Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.
{"title":"How humans and machines identify discourse topics: A methodological triangulation","authors":"Mathew Gillings ,&nbsp;Sylvia Jaworska","doi":"10.1016/j.acorp.2025.100121","DOIUrl":"10.1016/j.acorp.2025.100121","url":null,"abstract":"<div><div>Identifying and exploring discursive topics in texts is of interest to not only linguists, but to researchers working across the full breadth of the social sciences. This paper reports on an exploratory study assessing the influence that analytical method has on the identification and labelling of topics, which might lead to varying interpretations of texts. Using a corpus of corporate sustainability reports, totalling 98,277 words, we asked 6 different researchers to interrogate the corpus and decide on its main ‘topics’ via four different methods: LLM-assisted analyses; topic modelling; concordance analysis; and close reading. These methods differ according to the amount of data that can be analysed at once, the amount of textual context available to the researcher, and the focus of the analysis (i.e., micro to macro). The paper explores how the identified topics differed both between analysts using the same method, and between methods. We conclude with a series of tentative observations regarding the benefits and limitations of each method, and offer recommendations for researchers in choosing which analytical technique to select.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100121"},"PeriodicalIF":0.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143159796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anywhere but here: Discourses and representations surrounding same-sex marriage in Japanese newspapers
Pub Date : 2025-01-11 DOI: 10.1016/j.acorp.2025.100120
Keisuke Yoshimoto
Although support for same-sex marriage has grown in Japan, discussions on its legalisation have been slow in the Japanese parliament. To contribute to a more meaningful discussion on this issue, this study uses corpus-driven and corpus-based methods to explore how issues surrounding same-sex marriage are represented in media discourse. It compares two corpora comprising articles from two Japanese national newspapers: the more conservative Yomiuri Shimbun (608,305 words) and liberal Asahi Shimbun (1,681,133 words). The data are from 1 April 2015, when Tokyo's Shibuya Ward started certifying same-sex couples, to 15 March 2024, the day after the Sapporo High Court ruled on same-sex marriage. Keywords, collocation, and concordance analysis are used to identify the differences in discourses and representations, exploring how their opinions on same-sex marriage are explicitly and implicitly delivered. The findings reveal that the Yomiuri Shimbun mostly depicts gay and lesbian people as fictional characters, or foreigners and argues against same-sex marriage in terms of child welfare. In contrast, the Asahi Shimbun considers the issues surrounding LGBTQ+ people to be related to human rights and criticises traditional heteropatriarchal family values as obstacles to advancing same-sex marriage movements and women's rights alike.
{"title":"Anywhere but here: Discourses and representations surrounding same-sex marriage in Japanese newspapers","authors":"Keisuke Yoshimoto","doi":"10.1016/j.acorp.2025.100120","DOIUrl":"10.1016/j.acorp.2025.100120","url":null,"abstract":"<div><div>Although support for same-sex marriage has grown in Japan, discussions on its legalisation have been slow in the Japanese parliament. To contribute to a more meaningful discussion on this issue, this study uses corpus-driven and corpus-based methods to explore how issues surrounding same-sex marriage are represented in media discourse. It compares two corpora comprising articles from two Japanese national newspapers: the more conservative <em>Yomiuri Shimbun</em> (608,305 words) and liberal <em>Asahi Shimbun</em> (1,681,133 words). The data are from 1 April 2015, when Tokyo's Shibuya Ward started certifying same-sex couples, to 15 March 2024, the day after the Sapporo High Court ruled on same-sex marriage. Keywords, collocation, and concordance analysis are used to identify the differences in discourses and representations, exploring how their opinions on same-sex marriage are explicitly and implicitly delivered. The findings reveal that the <em>Yomiuri Shimbun</em> mostly depicts gay and lesbian people as fictional characters, or foreigners and argues against same-sex marriage in terms of child welfare. In contrast, <em>the Asahi Shimbun</em> considers the issues surrounding LGBTQ+ people to be related to human rights and criticises traditional heteropatriarchal family values as obstacles to advancing same-sex marriage movements and women's rights alike.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100120"},"PeriodicalIF":0.0,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143159797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is LIWC reliable, efficient, and effective for the analysis of large online datasets in forensic and security contexts?
Pub Date : 2025-01-09 DOI: 10.1016/j.acorp.2025.100118
Madison Hunter, Tim Grant
This article evaluates the reliability, efficiency, and effectiveness of Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022) for the analysis of a white nationalist forum. This is important because LIWC has been the computational tool of choice for scores of studies generally and many examining extremist content in a forensic or security context. Our purpose, therefore, is to understand whether LIWC can be depended upon for large-scale analyses; we initially examine this here using a small sample of posts from a set of just eight users and manually checking the program's automated codings of a subset of categories. Our results show that the LIWC coding cannot be relied upon – precision falls to as low as 49.6 % and recall as low as 41.7 % for some categories. It would be possible to engage in considerable manual correction of these results, but this undermines its purported efficiency for large datasets.
{"title":"Is LIWC reliable, efficient, and effective for the analysis of large online datasets in forensic and security contexts?","authors":"Madison Hunter,&nbsp;Tim Grant","doi":"10.1016/j.acorp.2025.100118","DOIUrl":"10.1016/j.acorp.2025.100118","url":null,"abstract":"<div><div>This article evaluates the reliability, efficiency, and effectiveness of Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022) for the analysis of a white nationalist forum. This is important because LIWC has been the computational tool of choice for scores of studies generally and many examining extremist content in a forensic or security context. Our purpose, therefore, is to understand whether LIWC can be depended upon for large-scale analyses; we initially examine this here using a small sample of posts from a set of just eight users and manually checking the program's automated codings of a subset of categories. Our results show that the LIWC coding cannot be relied upon – precision falls to as low as 49.6 % and recall as low as 41.7 % for some categories. It would be possible to engage in considerable manual correction of these results, but this undermines its purported efficiency for large datasets.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100118"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143159794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The personal_relationship frame in love fraud
Pub Date : 2025-01-09 DOI: 10.1016/j.acorp.2025.100119
Pamela Faber
This research analyzed the love fraud event within the context of the Personal_Relation and the Intentional_Deception frame in FrameNet. Of the concepts that characterize this event, the focus was on Relationship, namely, its stages, participants, and dimensions. The data consisted of extended conversations between 83 scammers and the author, which were recorded from January 2021 to June 2024. When the corpus was analyzed on the SketchEngine platform, the collocates of relationship with the highest LogDice scores were identified and structured. The results show that fraudsters use scripts to construct a romantic relationship with victims, which begins with friendship, progresses to ‘soulmateship’ and engagement, and finally ends in an online ‘marriage’. This is accomplished through the strategic use and repetition of terms that belong to the Personal_Relation frame in FrameNet. The objective is to extract as much money as possible from the victim.
{"title":"The personal_relationship frame in love fraud","authors":"Pamela Faber","doi":"10.1016/j.acorp.2025.100119","DOIUrl":"10.1016/j.acorp.2025.100119","url":null,"abstract":"<div><div>This research analyzed the love fraud event within the context of the <span>Personal_Relation</span> and the <span>Intentional_Deception</span> frame in FrameNet. Of the concepts that characterize this event, the focus was on <span>Relationship,</span> namely, its stages, participants, and dimensions. The data consisted of extended conversations between 83 scammers and the author, which were recorded from January 2021 to June 2024. When the corpus was analyzed on the SketchEngine platform, the collocates of <em>relationship</em> with the highest LogDice scores were identified and structured. The results show that fraudsters use scripts to construct a romantic relationship with victims, which begins with friendship, progresses to ‘soulmateship’ and engagement, and finally ends in an online ‘marriage’. This is accomplished through the strategic use and repetition of terms that belong to the <span>Personal_Relation</span> frame in FrameNet. The objective is to extract as much money as possible from the victim.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100119"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143159795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introductory editorial synthesis paper: Corpus linguistics and the language of COVID-19: Applications and outcomes
Pub Date : 2024-12-01 DOI: 10.1016/j.acorp.2024.100110
David Oakey , Benet Vincent
This article provides an overview of the papers in the special issue of Applied Corpus Linguistics on “Corpus Linguistics and the Language of COVID-19: Applications and Outcomes”. As noted in our original call for contributions, we believe that, while traditional corpus linguistic work can reveal valuable insights into the emerging language around COVID-19, it should be complemented by more applied corpus linguistics work. The pandemic posed a real-world problem which applied corpus linguists were well equipped to address using linguistic evidence from a range of sources. This article presents an introduction to the papers in this special issue which will be of interest to applied corpus linguists due to the variety of perspectives they present in relation to a number of key issues of importance to the field: the data they draw on, the various theoretical frameworks which inform the research, the methods they use to collect and analyse the data, and the discussion of how their findings may be applicable to citizens, decision makers, consumers and other stakeholders in public and private contexts.
{"title":"Introductory editorial synthesis paper: Corpus linguistics and the language of COVID-19: Applications and outcomes","authors":"David Oakey ,&nbsp;Benet Vincent","doi":"10.1016/j.acorp.2024.100110","DOIUrl":"10.1016/j.acorp.2024.100110","url":null,"abstract":"<div><div>This article provides an overview of the papers in the special issue of Applied Corpus Linguistics on “Corpus Linguistics and the Language of COVID-19: Applications and Outcomes”. As noted in our original call for contributions, we believe that, while traditional corpus linguistic work can reveal valuable insights into the emerging language around COVID-19, it should be complemented by more applied corpus linguistics work. The pandemic posed a real-world problem which applied corpus linguists were well equipped to address using linguistic evidence from a range of sources. This article presents an introduction to the papers in this special issue which will be of interest to applied corpus linguists due to the variety of perspectives they present in relation to a number of key issues of importance to the field: the data they draw on, the various theoretical frameworks which inform the research, the methods they use to collect and analyse the data, and the discussion of how their findings may be applicable to citizens, decision makers, consumers and other stakeholders in public and private contexts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100110"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “here-, there-, and every where-: Exploring the role of pronominal adverbs in legal language” [Applied Corpus Linguistics Volume 4, Issue 1 (2024) 100087]
Pub Date : 2024-12-01 DOI: 10.1016/j.acorp.2024.100112
David Chandler, Brett Hashimoto
{"title":"Corrigendum to “here-, there-, and every where-: Exploring the role of pronominal adverbs in legal language” [Applied Corpus Linguistics Volume 4, Issue 1 (2024) 100087]","authors":"David Chandler,&nbsp;Brett Hashimoto","doi":"10.1016/j.acorp.2024.100112","DOIUrl":"10.1016/j.acorp.2024.100112","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100112"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexical complexity in academic lectures: Comparative analysis of EMI and Non-EMI settings and influential factors 学术讲座中的词汇复杂性:EMI 和非 EMI 环境及影响因素的比较分析
Pub Date : 2024-11-15 DOI: 10.1016/j.acorp.2024.100115
Chen Chen , Philip Durrant
Despite the substantial body of research on vocabulary in English Medium Instruction (EMI), there is a noticeable dearth of corpus-based studies examining lexical complexity of EMI lectures, particularly in specific disciplines. To fill this gap, this study developed an EMI spoken academic corpus in Business (EMIB) with 120 lectures collected from 54 lecturers with nine different first languages (L1), reaching 1.12 million tokens. The study compared the lexical complexity of EMI Business lectures in China with academic lectures in Anglophone and non-Anglophone settings, represented by teachers’ speech in the British Academic Spoken English Corpus (BASE) and the Corpus of English as a Lingua Franca in Academic Settings (ELFA), respectively. Lexical complexity was conceptualised by lexical sophistication (operationalised by lexical frequency profile and mean frequency band score) and lexical diversity (operationalised by the VOCD-D). Results show that ELFA has significantly higher lexical sophistication than BASE, and significantly lower lexical diversity than BASE and EMIB. This study further explored whether speaker L1, speaker gender, and discipline contributed to the lexical complexity of lectures using multiple linear regression with interaction terms. Results show that speaker L1 and discipline significantly impacted the lexical complexity of lectures. Pedagogical implications are discussed.
尽管有关英语教学(EMI)词汇的研究数量庞大,但基于语料库的研究却明显不足,尤其是对特定学科的EMI讲座词汇复杂性的研究。为填补这一空白,本研究开发了一个以英语为教学语言的商业学术口语语料库(EMIB),该语料库包含从 54 位讲师那里收集的 120 篇讲座,涉及 9 种不同的第一语言(L1),共计 112 万个词块。该研究比较了中国 EMI 商务讲座与英语和非英语环境中学术讲座的词汇复杂性,后者分别以英国学术英语口语语料库(BASE)和学术环境中英语作为母语的语料库(ELFA)中的教师演讲为代表。词法复杂性的概念是词法复杂性(通过词频分布图和平均频带得分来操作)和词法多样性(通过 VOCD-D 来操作)。结果显示,ELFA 的词汇复杂度明显高于 BASE,词汇多样性则明显低于 BASE 和 EMIB。本研究使用多元线性回归和交互项进一步探讨了说话者的 L1、说话者的性别和学科是否会影响讲座的词汇复杂性。结果表明,说话者的 L1 和学科对讲座的词汇复杂性有显著影响。本文对其教学意义进行了讨论。
{"title":"Lexical complexity in academic lectures: Comparative analysis of EMI and Non-EMI settings and influential factors","authors":"Chen Chen ,&nbsp;Philip Durrant","doi":"10.1016/j.acorp.2024.100115","DOIUrl":"10.1016/j.acorp.2024.100115","url":null,"abstract":"<div><div>Despite the substantial body of research on vocabulary in English Medium Instruction (EMI), there is a noticeable dearth of corpus-based studies examining lexical complexity of EMI lectures, particularly in specific disciplines. To fill this gap, this study developed an EMI spoken academic corpus in Business (EMIB) with 120 lectures collected from 54 lecturers with nine different first languages (L1), reaching 1.12 million tokens. The study compared the lexical complexity of EMI Business lectures in China with academic lectures in Anglophone and non-Anglophone settings, represented by teachers’ speech in the British Academic Spoken English Corpus (BASE) and the Corpus of English as a Lingua Franca in Academic Settings (ELFA), respectively. Lexical complexity was conceptualised by lexical sophistication (operationalised by lexical frequency profile and mean frequency band score) and lexical diversity (operationalised by the VOCD-D). Results show that ELFA has significantly higher lexical sophistication than BASE, and significantly lower lexical diversity than BASE and EMIB. This study further explored whether speaker L1, speaker gender, and discipline contributed to the lexical complexity of lectures using multiple linear regression with interaction terms. Results show that speaker L1 and discipline significantly impacted the lexical complexity of lectures. Pedagogical implications are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100115"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis 与嵌入词同床共枕?用于语料库辅助话语分析的搭配和词嵌入比较
Pub Date : 2024-11-13 DOI: 10.1016/j.acorp.2024.100117
Jordan Batchelor
This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.
本文讨论了识别话语中词汇模式的两种方法,即语料库语言学的搭配分析法和自然语言处理的词嵌入法。虽然这两种方法都能识别词汇模式,但它们处理任务的基本框架不同,其结果的相似程度也没有直接比较过。本研究使用两个语料库、五种搭配测量方法和两种词嵌入算法来进行这种比较。研究结果普遍支持这样的观点,即许多具有相似嵌入的词对都是搭配词,其次,许多搭配词也具有相似的词嵌入。然而,一个主要区别是,具有相似嵌入词的词对不需要经常或根本不需要共同出现。此外,我们还发现两种词嵌入算法所突出的词的种类存在系统性差异,并对此进行了讨论。
{"title":"Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis","authors":"Jordan Batchelor","doi":"10.1016/j.acorp.2024.100117","DOIUrl":"10.1016/j.acorp.2024.100117","url":null,"abstract":"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100117"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining in-service senior high school English teachers’ perspectives on corpus use and the effects of corpus training 考察在职高中英语教师对语料库使用的看法和语料库培训的效果
Pub Date : 2024-11-06 DOI: 10.1016/j.acorp.2024.100116
Hsiao-Ling Hsu , Shu-Li Lai , Hao-Jan Howard Chen
Given the benefits of incorporating corpora into language learning, particularly in developing students’ abilities to observe and analyze language data, this study investigated Taiwanese in-service senior high school English teachers’ corpus literacy, their application of corpus tools in teaching, and the effects of an online corpus workshop. Conducted in two stages, the first involved collecting 151 teachers’ perceptions of corpus literacy and its applications from 141 schools across Taiwan. The second stage invited teachers across Taiwan to participate in an online corpus workshop, where corpus-based teaching and two tools (SKELL and Sketch Engine) were introduced, along with hands-on activities. Following the workshop, the participants completed a post-survey. The analysis of the pre-survey responses revealed a positive attitude toward but limited understanding of corpus use among teachers before attending the workshop. The Wilcoxon Signed Rank test, used to analyze the pre- and post-survey responses, showed significant improvements in the teachers’ corpus literacy and application skills after the workshop. The findings of this study offer valuable insights into corpus use among in-service teachers in various contexts. Future research should explore the further integration of corpus tools into classrooms and include in-depth interviews for more comprehensive insights.
鉴于语料库对语言学习的益处,尤其是在培养学生观察和分析语言数据的能力方面,本研究调查了台湾在职高中英语教师的语料库素养、他们在教学中对语料库工具的应用以及在线语料库工作坊的效果。研究分两个阶段进行,第一阶段从全台 141 所学校收集了 151 位教师对语料库素养及其应用的看法。第二阶段邀请全台教师参加在线语料库工作坊,介绍基于语料库的教学和两种工具(SKELL 和 Sketch Engine),并开展实践活动。研讨会结束后,与会者完成了一项后调查。对会前调查回复的分析表明,参加研讨会之前,教师们对语料库的使用持积极态度,但了解有限。用于分析前后调查问卷的 Wilcoxon Signed Rank 检验表明,研讨会后教师的语料库素养和应用技能有了显著提高。本研究的结果为在职教师在各种情况下使用语料库提供了有价值的见解。今后的研究应探讨如何进一步将语料库工具融入课堂,并通过深入访谈获得更全面的见解。
{"title":"Examining in-service senior high school English teachers’ perspectives on corpus use and the effects of corpus training","authors":"Hsiao-Ling Hsu ,&nbsp;Shu-Li Lai ,&nbsp;Hao-Jan Howard Chen","doi":"10.1016/j.acorp.2024.100116","DOIUrl":"10.1016/j.acorp.2024.100116","url":null,"abstract":"<div><div>Given the benefits of incorporating corpora into language learning, particularly in developing students’ abilities to observe and analyze language data, this study investigated Taiwanese in-service senior high school English teachers’ corpus literacy, their application of corpus tools in teaching, and the effects of an online corpus workshop. Conducted in two stages, the first involved collecting 151 teachers’ perceptions of corpus literacy and its applications from 141 schools across Taiwan. The second stage invited teachers across Taiwan to participate in an online corpus workshop, where corpus-based teaching and two tools (SKELL and Sketch Engine) were introduced, along with hands-on activities. Following the workshop, the participants completed a post-survey. The analysis of the pre-survey responses revealed a positive attitude toward but limited understanding of corpus use among teachers before attending the workshop. The Wilcoxon Signed Rank test, used to analyze the pre- and post-survey responses, showed significant improvements in the teachers’ corpus literacy and application skills after the workshop. The findings of this study offer valuable insights into corpus use among in-service teachers in various contexts. Future research should explore the further integration of corpus tools into classrooms and include in-depth interviews for more comprehensive insights.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100116"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1