首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Lexical epistemic markers in Ghanaian parliamentary discourse: A corpus-based diachronic analysis (2005–2024) 加纳议会话语中的词汇认知标记:基于语料库的历时分析(2005-2024)
IF 2.1 Pub Date : 2025-10-27 DOI: 10.1016/j.acorp.2025.100161
Emmanuel Mensah Bonsu
Despite growing scholarly attention to parliamentary communication in established democracies, African legislative contexts remain underexplored. This study, therefore, examined lexical epistemic modality markers in Ghanaian parliamentary discourse using a corpus-based diachronic analysis (2005–2024). The corpus comprised 1,729 parliamentary Hansards (41.7 million words), processed with Python 3.x and AntConc. Analysis revealed that cognitive verbs dominated epistemic expression. Diachronic analysis found statistically significant changes across consecutive electoral period. Standardised residual analysis showed redistribution from personalised cognitive claims toward markers framing propositions as objective assessments. The findings provide the first diachronic quantitative results for epistemic modality in Ghanaian and wider West African parliamentary discourse. The results suggest potential applications for parliamentary communication training.
尽管学术界越来越关注建立民主国家的议会沟通,但非洲的立法背景仍未得到充分探讨。因此,本研究使用基于语料库的历时分析(2005-2024)考察了加纳议会话语中的词汇认知情态标记。该语料库由1729份议会议事录(4170万字)组成,使用Python 3进行处理。x和AntConc。分析表明,认知动词在认知表达中占主导地位。历时分析发现,在连续的选举期间,统计上发生了重大变化。标准化残差分析显示,从个性化认知要求到标记框架命题作为客观评估的再分配。这些发现为加纳和更广泛的西非议会话语中的认知形态提供了第一个历时性定量结果。研究结果表明了议会沟通培训的潜在应用。
{"title":"Lexical epistemic markers in Ghanaian parliamentary discourse: A corpus-based diachronic analysis (2005–2024)","authors":"Emmanuel Mensah Bonsu","doi":"10.1016/j.acorp.2025.100161","DOIUrl":"10.1016/j.acorp.2025.100161","url":null,"abstract":"<div><div>Despite growing scholarly attention to parliamentary communication in established democracies, African legislative contexts remain underexplored. This study, therefore, examined lexical epistemic modality markers in Ghanaian parliamentary discourse using a corpus-based diachronic analysis (2005–2024). The corpus comprised 1,729 parliamentary Hansards (41.7 million words), processed with Python 3.x and AntConc. Analysis revealed that cognitive verbs dominated epistemic expression. Diachronic analysis found statistically significant changes across consecutive electoral period. Standardised residual analysis showed redistribution from personalised cognitive claims toward markers framing propositions as objective assessments. The findings provide the first diachronic quantitative results for epistemic modality in Ghanaian and wider West African parliamentary discourse. The results suggest potential applications for parliamentary communication training.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100161"},"PeriodicalIF":2.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fighting fraud: Corpus-assisted approaches to understanding and disrupting fraud activity on the dark web 打击欺诈:语料库辅助方法来理解和破坏暗网上的欺诈活动
IF 2.1 Pub Date : 2025-10-23 DOI: 10.1016/j.acorp.2025.100159
Emily Chiang , Krzysztof Kredens , John Thornton
Financial fraud has risen steeply over the last decade and, according to data from the National Crime Agency, is currently recognised as the most commonly experienced crime in the UK, accounting for over 40 % of all crimes in England and Wales committed against individuals over 16. Much of this increase is attributed to the rise and evolution of online technologies which have ushered in a wave of new methods and opportunities for perpetrators as well as an era of unprecedented personal self-disclosure via social media by potential victims whose details can be readily exploited.
A key affordance to perpetrators is the rise of illicit marketplaces and crime-focused discussion fora on the dark web, i.e. a portion of the internet unindexed by mainstream search engines. Such spaces provide users a level of anonymity that makes policing them very difficult, yet they are fruitful sites for linguistic exploration regarding the behaviours and activities of the relevant communities of practice. We demonstrate the application of corpus methods to addressing online fraud by, firstly, showing how a linguistically-informed understanding of online fraud communities’ interactions can assist the undercover policing of dark-web fraud fora with regard to the specific task of community infiltration. Secondly, we address the problem from a commercial perspective, demonstrating how corpus analytic methods can inform online tools designed to help commercial entities monitor dark-web spaces for fraud activity related to their products, and how popular corpus tools can be tweaked for use by non-linguist audiences for this purpose.
根据国家犯罪局的数据,金融欺诈在过去十年中急剧上升,目前被认为是英国最常见的犯罪行为,占英格兰和威尔士所有针对16岁以上个人的犯罪的40%以上。这种增长在很大程度上归因于在线技术的兴起和发展,这为犯罪者带来了一波新的方法和机会,以及一个前所未有的时代,潜在的受害者通过社交媒体自我披露,他们的细节很容易被利用。暗网(即未被主流搜索引擎编入索引的互联网的一部分)上非法市场和以犯罪为重点的讨论论坛的兴起,是犯罪者的一个关键证据。这样的空间为用户提供了一定程度的匿名性,这使得监管他们变得非常困难,但它们是关于相关实践社区的行为和活动的语言探索的富有成效的站点。我们展示了语料库方法在解决网络欺诈问题上的应用,首先,展示了对网络欺诈社区互动的语言知情理解如何有助于在社区渗透的具体任务方面协助暗网欺诈论坛的秘密警务。其次,我们从商业角度解决了这个问题,展示了语料库分析方法如何为在线工具提供信息,这些工具旨在帮助商业实体监控暗网空间中与其产品相关的欺诈活动,以及如何调整流行的语料库工具以供非语言学家受众使用。
{"title":"Fighting fraud: Corpus-assisted approaches to understanding and disrupting fraud activity on the dark web","authors":"Emily Chiang ,&nbsp;Krzysztof Kredens ,&nbsp;John Thornton","doi":"10.1016/j.acorp.2025.100159","DOIUrl":"10.1016/j.acorp.2025.100159","url":null,"abstract":"<div><div>Financial fraud has risen steeply over the last decade and, according to data from the National Crime Agency, is currently recognised as the most commonly experienced crime in the UK, accounting for over 40 % of all crimes in England and Wales committed against individuals over 16. Much of this increase is attributed to the rise and evolution of online technologies which have ushered in a wave of new methods and opportunities for perpetrators as well as an era of unprecedented personal self-disclosure via social media by potential victims whose details can be readily exploited.</div><div>A key affordance to perpetrators is the rise of illicit marketplaces and crime-focused discussion fora on the dark web, i.e. a portion of the internet unindexed by mainstream search engines. Such spaces provide users a level of anonymity that makes policing them very difficult, yet they are fruitful sites for linguistic exploration regarding the behaviours and activities of the relevant communities of practice. We demonstrate the application of corpus methods to addressing online fraud by, firstly, showing how a linguistically-informed understanding of online fraud communities’ interactions can assist the undercover policing of dark-web fraud fora with regard to the specific task of community infiltration. Secondly, we address the problem from a commercial perspective, demonstrating how corpus analytic methods can inform online tools designed to help commercial entities monitor dark-web spaces for fraud activity related to their products, and how popular corpus tools can be tweaked for use by non-linguist audiences for this purpose.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100159"},"PeriodicalIF":2.1,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing imagined future attackers: A corpus analysis of shared agency in the online manifestos of perpetrators of mass harm 解决想象中的未来攻击者:大规模伤害肇事者在线宣言中共享代理的语料库分析
IF 2.1 Pub Date : 2025-10-23 DOI: 10.1016/j.acorp.2025.100160
Emily Powell
The manifestos that frequently accompany mass shootings are usually freely available online within minutes of an attack taking place and remain so for several years afterwards. It is well documented that the writers of such texts reference past shooters or copy elements of previous attacks (e.g., Langman 2017; Kupper et al 2022). Studies predominantly focus on shared linguistic markers of psychological variables in the texts (e.g. Shrestha et al 2020) to try to predict attacks ahead of time. However, because the manifestos do not appear far enough in advance of the attacks for such approaches to be effective, this study instead examines how the language used in them inspires others who may carry out similar attacks in the future. This paper uses corpus analysis of keywords to identify the ways in which 15 perpetrators actively address imagined future attackers and anticipate them as an audience. Findings demonstrate that rather than a passive ‘contagion’ effect (Kupper et al. 2022), writers of such texts use second person pronouns ambiguously to share agency and connect with future readers and instruct them, and that this varies depending on the ideology of the perpetrator. These findings have implications for the way in which the availability of such texts is viewed and suggest that the role of these texts in the perpetuation of violence should be taken more seriously by those responsible for disseminating them.
大规模枪击事件的宣言通常在袭击发生后几分钟内就可以在网上免费获得,并在之后的几年里一直如此。有充分的证据表明,这些文本的作者参考了过去的枪手或复制了以前袭击的元素(例如,Langman 2017; Kupper et al 2022)。研究主要集中在文本中心理变量的共享语言标记上(例如Shrestha et al 2020),试图提前预测攻击。然而,由于这些宣言在袭击发生前出现的时间不够长,这种方法无法发挥作用,因此本研究转而考察宣言中使用的语言如何激励其他人在未来实施类似的袭击。本文使用关键字的语料库分析来确定15个肇事者积极应对想象中的未来攻击者的方式,并将他们作为受众进行预测。研究结果表明,这些文本的作者使用模糊的第二人称代词来分享代理,与未来的读者联系并指导他们,而不是被动的“传染”效应(Kupper et al. 2022),这取决于犯罪者的意识形态。这些调查结果对如何看待这些文本的可用性具有影响,并建议负责传播这些文本的人应更认真地对待这些文本在使暴力永续存在方面的作用。
{"title":"Addressing imagined future attackers: A corpus analysis of shared agency in the online manifestos of perpetrators of mass harm","authors":"Emily Powell","doi":"10.1016/j.acorp.2025.100160","DOIUrl":"10.1016/j.acorp.2025.100160","url":null,"abstract":"<div><div>The manifestos that frequently accompany mass shootings are usually freely available online within minutes of an attack taking place and remain so for several years afterwards. It is well documented that the writers of such texts reference past shooters or copy elements of previous attacks (e.g., <span><span>Langman 2017</span></span>; <span><span>Kupper et al 2022</span></span>). Studies predominantly focus on shared linguistic markers of psychological variables in the texts (e.g. <span><span>Shrestha et al 2020</span></span>) to try to predict attacks ahead of time. However, because the manifestos do not appear far enough in advance of the attacks for such approaches to be effective, this study instead examines how the language used in them inspires others who may carry out similar attacks in the future. This paper uses corpus analysis of keywords to identify the ways in which 15 perpetrators actively address imagined future attackers and anticipate them as an audience. Findings demonstrate that rather than a passive ‘contagion’ effect (<span><span>Kupper et al. 2022</span></span>), writers of such texts use second person pronouns ambiguously to share agency and connect with future readers and instruct them, and that this varies depending on the ideology of the perpetrator. These findings have implications for the way in which the availability of such texts is viewed and suggest that the role of these texts in the perpetuation of violence should be taken more seriously by those responsible for disseminating them.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100160"},"PeriodicalIF":2.1,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysing child writing in multilingual contexts: Combining corpora, computational tools, and methods for crossing the borders of monolingual studies on communicative competence 分析多语言环境下的儿童写作:结合语料库、计算工具和方法,跨越单语交际能力研究的边界
IF 2.1 Pub Date : 2025-10-17 DOI: 10.1016/j.acorp.2025.100158
Jennifer-Carmen Frey
Working as a computational corpus linguist in a multilingual area, my research aims to analyse communicative competence as shown in writing not only across different languages but also within multilingual students. I have been working with corpora that contain comparable and/or multilingual – partly longitudinal, partly cross-sectional – data for L1 and L2 students of German and Italian with additional data for English as a foreign language. My research investigates plurilingual competences and questions traditional concepts of L1 and L2 categories when researching students from multilingual areas. In my work, I combine data-driven analysis frameworks, quantitative corpus linguistic methods and qualitative investigations in collaboration with my colleagues, relating language features with detailed sociolinguistic metadata on students’ language backgrounds.
This article brings together some of my work in the area of non-adult writing, presenting the various corpora I have worked on and how they have been used to analyse communicative competence in both German and Italian children’s writing moving from the assumption of clearly separated L1 and L2 contexts towards observing multicompetence in young writers. While the studies presented here show some attempts to uncover the complexity of different learning contexts in a multilingual society, combining various resources as well as quantitative and qualitative research methods, the article will also discuss challenges, potentials and limitations of combining data, as well as methods and tools borrowed from different disciplines, with an outlook for future research in the field.
作为一名多语言领域的计算语料库语言学家,我的研究旨在分析交际能力,不仅在不同语言之间,而且在多语言学生之间。我一直在研究包含可比较和/或多语言(部分纵向,部分横断面)的语料库,这些语料库包含德语和意大利语的L1和L2学生的数据,以及作为外语的英语的额外数据。我的研究调查了多语言能力,并在研究来自多语言地区的学生时质疑L1和L2类别的传统概念。在我的工作中,我与同事合作,结合数据驱动的分析框架,定量语料库语言学方法和定性调查,将语言特征与学生语言背景的详细社会语言学元数据联系起来。这篇文章汇集了我在非成人写作领域的一些工作,展示了我所研究的各种语料库,以及如何使用它们来分析德国和意大利儿童写作的交际能力,从L1和L2语境明确分离的假设转向观察年轻作家的多重能力。本文的研究结合了各种资源以及定量和定性的研究方法,试图揭示多语言社会中不同学习环境的复杂性,但本文也将讨论结合数据的挑战、潜力和局限性,以及从不同学科借鉴的方法和工具,并展望该领域未来的研究。
{"title":"Analysing child writing in multilingual contexts: Combining corpora, computational tools, and methods for crossing the borders of monolingual studies on communicative competence","authors":"Jennifer-Carmen Frey","doi":"10.1016/j.acorp.2025.100158","DOIUrl":"10.1016/j.acorp.2025.100158","url":null,"abstract":"<div><div>Working as a computational corpus linguist in a multilingual area, my research aims to analyse communicative competence as shown in writing not only across different languages but also within multilingual students. I have been working with corpora that contain comparable and/or multilingual – partly longitudinal, partly cross-sectional – data for L1 and L2 students of German and Italian with additional data for English as a foreign language. My research investigates plurilingual competences and questions traditional concepts of L1 and L2 categories when researching students from multilingual areas. In my work, I combine data-driven analysis frameworks, quantitative corpus linguistic methods and qualitative investigations in collaboration with my colleagues, relating language features with detailed sociolinguistic metadata on students’ language backgrounds.</div><div>This article brings together some of my work in the area of non-adult writing, presenting the various corpora I have worked on and how they have been used to analyse communicative competence in both German and Italian children’s writing moving from the assumption of clearly separated L1 and L2 contexts towards observing multicompetence in young writers. While the studies presented here show some attempts to uncover the complexity of different learning contexts in a multilingual society, combining various resources as well as quantitative and qualitative research methods, the article will also discuss challenges, potentials and limitations of combining data, as well as methods and tools borrowed from different disciplines, with an outlook for future research in the field.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100158"},"PeriodicalIF":2.1,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexical choices of sharers and non-sharers on child sexual abuse material forums 儿童性虐待材料论坛上的分享者和非分享者的词汇选择
IF 2.1 Pub Date : 2025-10-11 DOI: 10.1016/j.acorp.2025.100157
Meike de Boer , Willemijn Heeren , Anton Daser , Colm Gannon , Frederic Gnielka , Salla Huikuri , Robert Lehmann , Rebecca Reichel , Thomas Schäfer , Alexander F. Schmidt , Katarzyna Staciwa , Arjan Blokland
On the dark web, there are forums dedicated to the distribution and discussion of child sexual abuse material (CSAM). Although exchanging material is one of the major purposes of such forums, only a small portion of the users share CSAM themselves. Using keyness analysis, we analyzed word frequencies to see which words were unusually frequent for either CSAM sharers or non-sharers. The language of non-sharing members shows more positivity and rapport-building, which could be a way to compensate for not being able to meet the expectation to contribute material to the forum. In addition, they use more sexually explicit language, potentially to prove that they are a genuine part of the community. Sharers, on the other hand, talk more about the forum and the world outside of the forum where their practices are considered illegal. Hence, many words that are typical for the sharing members are related to the law and law enforcement. Before members start sharing, their language use is situated between non-sharers and sharers. They use positive, rapport-building, and explicit language, although lesser pronounced than non-sharers, and they refer to the forum community but not yet to the world outside the forum. Findings can be used by law enforcement in covert operations, who might want to mimic strategies to compensate for not being able to share CSAM. In addition, the results show that keyness analysis could potentially aid in differentiating between different groups of users on dark web CSAM forums, which could help law enforcement to prioritize target members in large-scale CSAM forums.
在暗网上,有专门分发和讨论儿童性虐待材料(CSAM)的论坛。虽然交换材料是这些论坛的主要目的之一,但只有一小部分用户自己共享CSAM。使用关键字分析,我们分析了单词频率,以查看哪些单词在CSAM共享者和非共享者中异常频繁。非分享成员的语言表现出更多的积极性和建立关系,这可能是一种弥补无法满足为论坛贡献材料的期望的方式。此外,他们会使用更露骨的性语言,可能是为了证明他们是真正的社区成员。另一方面,分享者更多地谈论论坛和论坛之外的世界,在那里他们的行为被认为是非法的。因此,分享成员的许多典型词汇都与法律和执法有关。在成员开始分享之前,他们的语言使用处于非分享者和分享者之间。他们使用积极的,建立关系的,明确的语言,尽管不像非分享者那样明显,他们指的是论坛社区,而不是论坛外的世界。调查结果可以用于秘密行动中的执法部门,他们可能想要模仿策略来弥补无法共享CSAM的缺陷。此外,结果表明,关键字分析可能有助于区分暗网CSAM论坛上的不同用户群体,这可以帮助执法部门在大型CSAM论坛中优先考虑目标成员。
{"title":"Lexical choices of sharers and non-sharers on child sexual abuse material forums","authors":"Meike de Boer ,&nbsp;Willemijn Heeren ,&nbsp;Anton Daser ,&nbsp;Colm Gannon ,&nbsp;Frederic Gnielka ,&nbsp;Salla Huikuri ,&nbsp;Robert Lehmann ,&nbsp;Rebecca Reichel ,&nbsp;Thomas Schäfer ,&nbsp;Alexander F. Schmidt ,&nbsp;Katarzyna Staciwa ,&nbsp;Arjan Blokland","doi":"10.1016/j.acorp.2025.100157","DOIUrl":"10.1016/j.acorp.2025.100157","url":null,"abstract":"<div><div>On the dark web, there are forums dedicated to the distribution and discussion of child sexual abuse material (CSAM). Although exchanging material is one of the major purposes of such forums, only a small portion of the users share CSAM themselves. Using keyness analysis, we analyzed word frequencies to see which words were unusually frequent for either CSAM sharers or non-sharers. The language of non-sharing members shows more positivity and rapport-building, which could be a way to compensate for not being able to meet the expectation to contribute material to the forum. In addition, they use more sexually explicit language, potentially to prove that they are a genuine part of the community. Sharers, on the other hand, talk more about the forum and the world outside of the forum where their practices are considered illegal. Hence, many words that are typical for the sharing members are related to the law and law enforcement. Before members start sharing, their language use is situated between non-sharers and sharers. They use positive, rapport-building, and explicit language, although lesser pronounced than non-sharers, and they refer to the forum community but not yet to the world outside the forum. Findings can be used by law enforcement in covert operations, who might want to mimic strategies to compensate for not being able to share CSAM. In addition, the results show that keyness analysis could potentially aid in differentiating between different groups of users on dark web CSAM forums, which could help law enforcement to prioritize target members in large-scale CSAM forums.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100157"},"PeriodicalIF":2.1,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting verb forms in reporting and reported clauses: A corpus-based study of academic citations 转述和间接从句动词形式预测:基于语料库的学术引文研究
IF 2.1 Pub Date : 2025-10-09 DOI: 10.1016/j.acorp.2025.100155
Atikhom Thienthong
Verb forms are crucial time-reference expressions in academic citations, observed to be affected by citational and linguistic features, such as citation forms, reporting subjects, and reporting verbs (i.e., citation-internal features). Using a corpus of 852 journal articles, this first corpus-based study investigates a range of citation-internal factors in 3,694 academic citations to determine their main and interaction effects on the choice of verb forms through multinomial logistic regression modeling. The original and bootstrapped results show that most of the main factors significantly predict the selection of verb forms in reporting and reported clauses. The occurrence of reporting and reported verb forms is affected by the number of sources, citation forms, subject animacy, meaning-based verbs, and activity verbs. However, while subject definiteness strongly affects reporting verb forms but not reported ones, the reverse is true for evaluation verbs. In addition, two significant interaction terms are observed for reported verb forms; general subjects and tentative verbs interact to choose the present, while multiple sources interact with non-integral citations to influence the choice of modal verbs. The results underscore the importance of citation-internal features in influencing and contextualizing the use of verb forms to express temporal reference in academic citations.
动词形式是学术引文中至关重要的时间参考表达,受引文和语言特征的影响,如引文形式、报道主语和报道动词(即引文内部特征)。本文利用852篇期刊论文的语料库,对3694篇学术引文中的一系列引文内部因素进行了研究,通过多项逻辑回归模型确定了它们对动词形式选择的主要影响和交互影响。原始结果和自举结果表明,大多数主要因素对转述从句和间接从句中动词形式的选择具有显著的预测作用。转述动词和转述动词形式的出现受来源数量、引用形式、主语活力、意义动词和活动动词的影响。然而,虽然主语的确定性强烈影响转述动词的形式,但对转述动词没有影响,但对评价动词则相反。此外,在转述动词形式中还观察到两个重要的相互作用项;一般主语和试探性动词相互作用来选择现在,而多个来源与非完整引用相互作用来影响情态动词的选择。研究结果强调了引文内部特征对学术引文中动词形式表达时间指称的影响和语境化的重要性。
{"title":"Predicting verb forms in reporting and reported clauses: A corpus-based study of academic citations","authors":"Atikhom Thienthong","doi":"10.1016/j.acorp.2025.100155","DOIUrl":"10.1016/j.acorp.2025.100155","url":null,"abstract":"<div><div>Verb forms are crucial time-reference expressions in academic citations, observed to be affected by citational and linguistic features, such as citation forms, reporting subjects, and reporting verbs (i.e., citation-internal features). Using a corpus of 852 journal articles, this first corpus-based study investigates a range of citation-internal factors in 3,694 academic citations to determine their main and interaction effects on the choice of verb forms through multinomial logistic regression modeling. The original and bootstrapped results show that most of the main factors significantly predict the selection of verb forms in reporting and reported clauses. The occurrence of reporting and reported verb forms is affected by the number of sources, citation forms, subject animacy, meaning-based verbs, and activity verbs. However, while subject definiteness strongly affects reporting verb forms but not reported ones, the reverse is true for evaluation verbs. In addition, two significant interaction terms are observed for reported verb forms; general subjects and tentative verbs interact to choose the present, while multiple sources interact with non-integral citations to influence the choice of modal verbs. The results underscore the importance of citation-internal features in influencing and contextualizing the use of verb forms to express temporal reference in academic citations.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100155"},"PeriodicalIF":2.1,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The role of adverbial clauses as a feature of clausal complexity in L2 academic writing: A usage-based, discourse perspective 二语学术写作中状语从句作为从句复杂性特征的作用:基于用法的话语视角
IF 2.1 Pub Date : 2025-09-21 DOI: 10.1016/j.acorp.2025.100154
Liming Liu
Research on clausal complexity in L2 writing has traditionally employed a reductionist approach by encapsulating all types of finite dependent clauses under the rubric of subordination, without distinguishing between their syntactic functions and with participle adverbial clauses excluded from clausal features. Taking a functional, usage-based approach to clausal complexity, this study sets out to investigate the frequency of finite adverbial clauses of three semantic relations and participle adverbial clauses of certain structural types in L2 academic writing, in a corpus-assisted comparison with published research articles. Results show that students use both finite and participle adverbial clauses less frequently than published writers overall. The study then tries to provide a rich textual analysis to functionally interpret the low representation of adverbial clauses in student writing. Implications for L2 writing pedagogy and L2 syntactic complexity research are discussed.
传统上,对二语写作中小句复杂性的研究采用了一种还原主义的方法,即将所有类型的有限从属子句封装在从属的标题下,而不区分它们的句法功能,并且将分词状语从句排除在小句特征之外。本研究采用功能的、基于用法的方法来研究小句的复杂性,通过语料库辅助与已发表的研究文章的比较,研究了二语学术写作中三种语义关系的有限状语从句和某些结构类型的分词状语从句的使用频率。结果表明,学生使用有限和分词状语从句的频率低于所有出版作家。然后,本研究试图提供丰富的篇章分析,以功能性地解释学生写作中状语从句的低代表性。本文讨论了对二语写作教学法和二语句法复杂性研究的启示。
{"title":"The role of adverbial clauses as a feature of clausal complexity in L2 academic writing: A usage-based, discourse perspective","authors":"Liming Liu","doi":"10.1016/j.acorp.2025.100154","DOIUrl":"10.1016/j.acorp.2025.100154","url":null,"abstract":"<div><div>Research on clausal complexity in L2 writing has traditionally employed a reductionist approach by encapsulating all types of finite dependent clauses under the rubric of subordination, without distinguishing between their syntactic functions and with participle adverbial clauses excluded from clausal features. Taking a functional, usage-based approach to clausal complexity, this study sets out to investigate the frequency of finite adverbial clauses of three semantic relations and participle adverbial clauses of certain structural types in L2 academic writing, in a corpus-assisted comparison with published research articles. Results show that students use both finite and participle adverbial clauses less frequently than published writers overall. The study then tries to provide a rich textual analysis to functionally interpret the low representation of adverbial clauses in student writing. Implications for L2 writing pedagogy and L2 syntactic complexity research are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100154"},"PeriodicalIF":2.1,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145218951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The multisemiotic dimension of 5G news: A corpus-based discursive news values analysis 5G新闻的多符号学维度:基于语料库的话语新闻价值分析
IF 2.1 Pub Date : 2025-09-08 DOI: 10.1016/j.acorp.2025.100153
Youqi Kong , Wei Lin
Chinese technology has emerged as a highly debated and newsworthy topic in recent years. While much scholarly attention has been devoted to analyzing news texts, the role of news photographs in shaping perceptions of newsworthiness remains underexplored. This study bridges this gap by examining the interplay between textual and visual news values in Chinese and US media coverage of 5G networks. Drawing on a corpus of 275 news articles published between 2017 and 2021 in China Daily, The Washington Post, and The New York Times, we employ the discursive news values analysis (DNVA) framework, augmented by corpus linguistic techniques and AI-driven image annotation tools. The findings reveal distinct patterns: Chinese media emphasizes Positivity, Personalization, and Proximity, whereas US media prioritizes Negativity, Eliteness, and Proximity. The differences in the multisemiotic construction of news values reflect underlying sociocultural ideologies and geopolitical dynamics, offering fresh insights into the media’s role in shaping global technological narratives.
近年来,中国科技已经成为一个备受争议和有新闻价值的话题。虽然许多学术关注一直致力于分析新闻文本,但新闻照片在塑造新闻价值观念方面的作用仍未得到充分探讨。本研究通过考察中美媒体对5G网络报道中文本和视觉新闻价值之间的相互作用,弥合了这一差距。利用2017年至2021年间在《中国日报》、《华盛顿邮报》和《纽约时报》上发表的275篇新闻文章的语料库,我们采用了话语新闻价值分析(DNVA)框架,并辅以语料库语言技术和人工智能驱动的图像注释工具。研究结果揭示了不同的模式:中国媒体强调积极、个性化和接近性,而美国媒体优先考虑消极、精英和接近性。新闻价值的多符号学建构的差异反映了潜在的社会文化意识形态和地缘政治动态,为媒体在塑造全球技术叙事中的作用提供了新的见解。
{"title":"The multisemiotic dimension of 5G news: A corpus-based discursive news values analysis","authors":"Youqi Kong ,&nbsp;Wei Lin","doi":"10.1016/j.acorp.2025.100153","DOIUrl":"10.1016/j.acorp.2025.100153","url":null,"abstract":"<div><div>Chinese technology has emerged as a highly debated and newsworthy topic in recent years. While much scholarly attention has been devoted to analyzing news texts, the role of news photographs in shaping perceptions of newsworthiness remains underexplored. This study bridges this gap by examining the interplay between textual and visual news values in Chinese and US media coverage of 5G networks. Drawing on a corpus of 275 news articles published between 2017 and 2021 in China Daily, The Washington Post, and The New York Times, we employ the discursive news values analysis (DNVA) framework, augmented by corpus linguistic techniques and AI-driven image annotation tools. The findings reveal distinct patterns: Chinese media emphasizes Positivity, Personalization, and Proximity, whereas US media prioritizes Negativity, Eliteness, and Proximity. The differences in the multisemiotic construction of news values reflect underlying sociocultural ideologies and geopolitical dynamics, offering fresh insights into the media’s role in shaping global technological narratives.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100153"},"PeriodicalIF":2.1,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHEU-lex: a parallel multilingual corpus of Swiss and EU legislation CHEU-lex:瑞士和欧盟立法的平行多语言语料库
IF 2.1 Pub Date : 2025-09-04 DOI: 10.1016/j.acorp.2025.100151
Annarita Felici
This paper describes the design and construction of CHEU-lex, a parallel and comparable corpus of Swiss and European Union (EU) legislation. Data are available in the three languages of the Swiss Confederation (French, German and Italian) and include bilateral agreements between Switzerland and the EU and their reception in Swiss law. The corpus is a richly annotated multilingual resource and allows the analysis of legal language at several levels (macro-textual, lexical, morphosyntactic) and according to different perspectives (monolingual, cross-lingual, cross-textual, diachronic). The goal is to highlight key properties of CHEU-lex, discuss issues of legal corpus compilation and, finally, outline some applications for translation and legal linguistic research.
本文介绍了瑞士和欧盟立法语料库CHEU-lex的设计和构建。数据以瑞士联邦的三种语言(法语、德语和意大利语)提供,包括瑞士与欧盟之间的双边协定及其在瑞士法律中的接受情况。该语料库是一个注释丰富的多语言资源,允许在几个层面(宏观文本、词汇、形态句法)和根据不同的视角(单语、跨语、跨文本、历时)分析法律语言。目的是强调CHEU-lex的主要特性,讨论法律语料库编写问题,最后概述翻译和法律语言学研究的一些应用。
{"title":"CHEU-lex: a parallel multilingual corpus of Swiss and EU legislation","authors":"Annarita Felici","doi":"10.1016/j.acorp.2025.100151","DOIUrl":"10.1016/j.acorp.2025.100151","url":null,"abstract":"<div><div>This paper describes the design and construction of CHEU-lex, a parallel and comparable corpus of Swiss and European Union (EU) legislation. Data are available in the three languages of the Swiss Confederation (French, German and Italian) and include bilateral agreements between Switzerland and the EU and their reception in Swiss law. The corpus is a richly annotated multilingual resource and allows the analysis of legal language at several levels (macro-textual, lexical, morphosyntactic) and according to different perspectives (monolingual, cross-lingual, cross-textual, diachronic). The goal is to highlight key properties of CHEU-lex, discuss issues of legal corpus compilation and, finally, outline some applications for translation and legal linguistic research.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100151"},"PeriodicalIF":2.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus linguistics for safeguarding children online 保护儿童网络安全的语料库语言学
IF 2.1 Pub Date : 2025-08-26 DOI: 10.1016/j.acorp.2025.100149
Mark McGlashan , Charlotte-Rose Kennedy
Safeguarding children in schools broadly refers to the actions taken to protect children from abuse, prevent damage to health and development, and promote conditions that would improve the life chances of children. To safeguard children, UK schools must implement filtering and monitoring software to “block harmful and inappropriate content without unreasonably impacting teaching and learning” (Department for Education, 2024: 40). The industry standard method for monitoring online language use in schools is ‘keyword monitoring’, which identifies the use or presence of specific words or phrases (e.g. ‘bomb’) that correlate with a specific form of risk (e.g. violence). However, this approach typically depends on lists of words isolated from their context(s) of use and tends only to raise concerns if there is a direct match to a ‘keyword’. This can lead to ‘false positives’ whereby a 'keyword' match raises an automatic safeguarding concern (e.g. ‘bomb’) even if the use of the keyword was innocuous (e.g. ‘bath bomb’). This paper introduces corpus linguistics as a set of methods and approaches to enhance the effectiveness of filtering and monitoring through a case study based on a 1094,914-word corpus of online testimonies relating to suicide. In doing so, we demonstrate how corpus methods and analysis of authentic language data can be used to identify and contextualise safeguarding concerns. The practical applications of this research are intended to help schools to better protect children from the illegal and legal (but harmful) online materials that currently pose a threat to their safety and wellbeing.
保护在校儿童广义上是指为保护儿童不受虐待、防止对健康和发展的损害以及促进改善儿童生活机会的条件而采取的行动。为了保护儿童,英国学校必须实施过滤和监控软件,以“阻止有害和不适当的内容,而不会不合理地影响教学”(Department for Education, 2024: 40)。监测学校在线语言使用的行业标准方法是“关键字监测”,即识别与特定形式的风险(例如暴力)相关的特定单词或短语(例如“炸弹”)的使用或存在。然而,这种方法通常依赖于与使用上下文分离的单词列表,并且只有在与“关键字”直接匹配时才会引起关注。这可能导致“误报”,即“关键字”匹配会引发自动保护问题(例如“炸弹”),即使关键字的使用是无害的(例如“沐浴炸弹”)。本文介绍了语料库语言学作为一套方法和途径,以提高过滤和监测的有效性,通过一个基于1094,914字的在线证词语料库与自杀相关的案例研究。在此过程中,我们展示了如何使用语料库方法和真实语言数据的分析来识别和情境化保护问题。这项研究的实际应用旨在帮助学校更好地保护儿童免受非法和合法(但有害)在线材料的侵害,这些材料目前对他们的安全和福祉构成威胁。
{"title":"Corpus linguistics for safeguarding children online","authors":"Mark McGlashan ,&nbsp;Charlotte-Rose Kennedy","doi":"10.1016/j.acorp.2025.100149","DOIUrl":"10.1016/j.acorp.2025.100149","url":null,"abstract":"<div><div>Safeguarding children in schools broadly refers to the actions taken to protect children from abuse, prevent damage to health and development, and promote conditions that would improve the life chances of children. To safeguard children, UK schools must implement filtering and monitoring software to “block harmful and inappropriate content without unreasonably impacting teaching and learning” (Department for Education, 2024: 40). The industry standard method for monitoring online language use in schools is ‘keyword monitoring’, which identifies the use or presence of specific words or phrases (e.g. ‘bomb’) that correlate with a specific form of risk (e.g. violence). However, this approach typically depends on lists of words isolated from their context(s) of use and tends only to raise concerns if there is a direct match to a ‘keyword’. This can lead to ‘false positives’ whereby a 'keyword' match raises an automatic safeguarding concern (e.g. ‘bomb’) even if the use of the keyword was innocuous (e.g. ‘bath bomb’). This paper introduces corpus linguistics as a set of methods and approaches to enhance the effectiveness of filtering and monitoring through a case study based on a 1094,914-word corpus of online testimonies relating to suicide. In doing so, we demonstrate how corpus methods and analysis of authentic language data can be used to identify and contextualise safeguarding concerns. The practical applications of this research are intended to help schools to better protect children from the illegal and legal (but harmful) online materials that currently pose a threat to their safety and wellbeing.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100149"},"PeriodicalIF":2.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1