Pub Date : 2025-10-27DOI: 10.1016/j.acorp.2025.100161
Emmanuel Mensah Bonsu
Despite growing scholarly attention to parliamentary communication in established democracies, African legislative contexts remain underexplored. This study, therefore, examined lexical epistemic modality markers in Ghanaian parliamentary discourse using a corpus-based diachronic analysis (2005–2024). The corpus comprised 1,729 parliamentary Hansards (41.7 million words), processed with Python 3.x and AntConc. Analysis revealed that cognitive verbs dominated epistemic expression. Diachronic analysis found statistically significant changes across consecutive electoral period. Standardised residual analysis showed redistribution from personalised cognitive claims toward markers framing propositions as objective assessments. The findings provide the first diachronic quantitative results for epistemic modality in Ghanaian and wider West African parliamentary discourse. The results suggest potential applications for parliamentary communication training.
{"title":"Lexical epistemic markers in Ghanaian parliamentary discourse: A corpus-based diachronic analysis (2005–2024)","authors":"Emmanuel Mensah Bonsu","doi":"10.1016/j.acorp.2025.100161","DOIUrl":"10.1016/j.acorp.2025.100161","url":null,"abstract":"<div><div>Despite growing scholarly attention to parliamentary communication in established democracies, African legislative contexts remain underexplored. This study, therefore, examined lexical epistemic modality markers in Ghanaian parliamentary discourse using a corpus-based diachronic analysis (2005–2024). The corpus comprised 1,729 parliamentary Hansards (41.7 million words), processed with Python 3.x and AntConc. Analysis revealed that cognitive verbs dominated epistemic expression. Diachronic analysis found statistically significant changes across consecutive electoral period. Standardised residual analysis showed redistribution from personalised cognitive claims toward markers framing propositions as objective assessments. The findings provide the first diachronic quantitative results for epistemic modality in Ghanaian and wider West African parliamentary discourse. The results suggest potential applications for parliamentary communication training.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100161"},"PeriodicalIF":2.1,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23DOI: 10.1016/j.acorp.2025.100159
Emily Chiang , Krzysztof Kredens , John Thornton
Financial fraud has risen steeply over the last decade and, according to data from the National Crime Agency, is currently recognised as the most commonly experienced crime in the UK, accounting for over 40 % of all crimes in England and Wales committed against individuals over 16. Much of this increase is attributed to the rise and evolution of online technologies which have ushered in a wave of new methods and opportunities for perpetrators as well as an era of unprecedented personal self-disclosure via social media by potential victims whose details can be readily exploited.
A key affordance to perpetrators is the rise of illicit marketplaces and crime-focused discussion fora on the dark web, i.e. a portion of the internet unindexed by mainstream search engines. Such spaces provide users a level of anonymity that makes policing them very difficult, yet they are fruitful sites for linguistic exploration regarding the behaviours and activities of the relevant communities of practice. We demonstrate the application of corpus methods to addressing online fraud by, firstly, showing how a linguistically-informed understanding of online fraud communities’ interactions can assist the undercover policing of dark-web fraud fora with regard to the specific task of community infiltration. Secondly, we address the problem from a commercial perspective, demonstrating how corpus analytic methods can inform online tools designed to help commercial entities monitor dark-web spaces for fraud activity related to their products, and how popular corpus tools can be tweaked for use by non-linguist audiences for this purpose.
{"title":"Fighting fraud: Corpus-assisted approaches to understanding and disrupting fraud activity on the dark web","authors":"Emily Chiang , Krzysztof Kredens , John Thornton","doi":"10.1016/j.acorp.2025.100159","DOIUrl":"10.1016/j.acorp.2025.100159","url":null,"abstract":"<div><div>Financial fraud has risen steeply over the last decade and, according to data from the National Crime Agency, is currently recognised as the most commonly experienced crime in the UK, accounting for over 40 % of all crimes in England and Wales committed against individuals over 16. Much of this increase is attributed to the rise and evolution of online technologies which have ushered in a wave of new methods and opportunities for perpetrators as well as an era of unprecedented personal self-disclosure via social media by potential victims whose details can be readily exploited.</div><div>A key affordance to perpetrators is the rise of illicit marketplaces and crime-focused discussion fora on the dark web, i.e. a portion of the internet unindexed by mainstream search engines. Such spaces provide users a level of anonymity that makes policing them very difficult, yet they are fruitful sites for linguistic exploration regarding the behaviours and activities of the relevant communities of practice. We demonstrate the application of corpus methods to addressing online fraud by, firstly, showing how a linguistically-informed understanding of online fraud communities’ interactions can assist the undercover policing of dark-web fraud fora with regard to the specific task of community infiltration. Secondly, we address the problem from a commercial perspective, demonstrating how corpus analytic methods can inform online tools designed to help commercial entities monitor dark-web spaces for fraud activity related to their products, and how popular corpus tools can be tweaked for use by non-linguist audiences for this purpose.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100159"},"PeriodicalIF":2.1,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23DOI: 10.1016/j.acorp.2025.100160
Emily Powell
The manifestos that frequently accompany mass shootings are usually freely available online within minutes of an attack taking place and remain so for several years afterwards. It is well documented that the writers of such texts reference past shooters or copy elements of previous attacks (e.g., Langman 2017; Kupper et al 2022). Studies predominantly focus on shared linguistic markers of psychological variables in the texts (e.g. Shrestha et al 2020) to try to predict attacks ahead of time. However, because the manifestos do not appear far enough in advance of the attacks for such approaches to be effective, this study instead examines how the language used in them inspires others who may carry out similar attacks in the future. This paper uses corpus analysis of keywords to identify the ways in which 15 perpetrators actively address imagined future attackers and anticipate them as an audience. Findings demonstrate that rather than a passive ‘contagion’ effect (Kupper et al. 2022), writers of such texts use second person pronouns ambiguously to share agency and connect with future readers and instruct them, and that this varies depending on the ideology of the perpetrator. These findings have implications for the way in which the availability of such texts is viewed and suggest that the role of these texts in the perpetuation of violence should be taken more seriously by those responsible for disseminating them.
大规模枪击事件的宣言通常在袭击发生后几分钟内就可以在网上免费获得,并在之后的几年里一直如此。有充分的证据表明,这些文本的作者参考了过去的枪手或复制了以前袭击的元素(例如,Langman 2017; Kupper et al 2022)。研究主要集中在文本中心理变量的共享语言标记上(例如Shrestha et al 2020),试图提前预测攻击。然而,由于这些宣言在袭击发生前出现的时间不够长,这种方法无法发挥作用,因此本研究转而考察宣言中使用的语言如何激励其他人在未来实施类似的袭击。本文使用关键字的语料库分析来确定15个肇事者积极应对想象中的未来攻击者的方式,并将他们作为受众进行预测。研究结果表明,这些文本的作者使用模糊的第二人称代词来分享代理,与未来的读者联系并指导他们,而不是被动的“传染”效应(Kupper et al. 2022),这取决于犯罪者的意识形态。这些调查结果对如何看待这些文本的可用性具有影响,并建议负责传播这些文本的人应更认真地对待这些文本在使暴力永续存在方面的作用。
{"title":"Addressing imagined future attackers: A corpus analysis of shared agency in the online manifestos of perpetrators of mass harm","authors":"Emily Powell","doi":"10.1016/j.acorp.2025.100160","DOIUrl":"10.1016/j.acorp.2025.100160","url":null,"abstract":"<div><div>The manifestos that frequently accompany mass shootings are usually freely available online within minutes of an attack taking place and remain so for several years afterwards. It is well documented that the writers of such texts reference past shooters or copy elements of previous attacks (e.g., <span><span>Langman 2017</span></span>; <span><span>Kupper et al 2022</span></span>). Studies predominantly focus on shared linguistic markers of psychological variables in the texts (e.g. <span><span>Shrestha et al 2020</span></span>) to try to predict attacks ahead of time. However, because the manifestos do not appear far enough in advance of the attacks for such approaches to be effective, this study instead examines how the language used in them inspires others who may carry out similar attacks in the future. This paper uses corpus analysis of keywords to identify the ways in which 15 perpetrators actively address imagined future attackers and anticipate them as an audience. Findings demonstrate that rather than a passive ‘contagion’ effect (<span><span>Kupper et al. 2022</span></span>), writers of such texts use second person pronouns ambiguously to share agency and connect with future readers and instruct them, and that this varies depending on the ideology of the perpetrator. These findings have implications for the way in which the availability of such texts is viewed and suggest that the role of these texts in the perpetuation of violence should be taken more seriously by those responsible for disseminating them.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100160"},"PeriodicalIF":2.1,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1016/j.acorp.2025.100158
Jennifer-Carmen Frey
Working as a computational corpus linguist in a multilingual area, my research aims to analyse communicative competence as shown in writing not only across different languages but also within multilingual students. I have been working with corpora that contain comparable and/or multilingual – partly longitudinal, partly cross-sectional – data for L1 and L2 students of German and Italian with additional data for English as a foreign language. My research investigates plurilingual competences and questions traditional concepts of L1 and L2 categories when researching students from multilingual areas. In my work, I combine data-driven analysis frameworks, quantitative corpus linguistic methods and qualitative investigations in collaboration with my colleagues, relating language features with detailed sociolinguistic metadata on students’ language backgrounds.
This article brings together some of my work in the area of non-adult writing, presenting the various corpora I have worked on and how they have been used to analyse communicative competence in both German and Italian children’s writing moving from the assumption of clearly separated L1 and L2 contexts towards observing multicompetence in young writers. While the studies presented here show some attempts to uncover the complexity of different learning contexts in a multilingual society, combining various resources as well as quantitative and qualitative research methods, the article will also discuss challenges, potentials and limitations of combining data, as well as methods and tools borrowed from different disciplines, with an outlook for future research in the field.
{"title":"Analysing child writing in multilingual contexts: Combining corpora, computational tools, and methods for crossing the borders of monolingual studies on communicative competence","authors":"Jennifer-Carmen Frey","doi":"10.1016/j.acorp.2025.100158","DOIUrl":"10.1016/j.acorp.2025.100158","url":null,"abstract":"<div><div>Working as a computational corpus linguist in a multilingual area, my research aims to analyse communicative competence as shown in writing not only across different languages but also within multilingual students. I have been working with corpora that contain comparable and/or multilingual – partly longitudinal, partly cross-sectional – data for L1 and L2 students of German and Italian with additional data for English as a foreign language. My research investigates plurilingual competences and questions traditional concepts of L1 and L2 categories when researching students from multilingual areas. In my work, I combine data-driven analysis frameworks, quantitative corpus linguistic methods and qualitative investigations in collaboration with my colleagues, relating language features with detailed sociolinguistic metadata on students’ language backgrounds.</div><div>This article brings together some of my work in the area of non-adult writing, presenting the various corpora I have worked on and how they have been used to analyse communicative competence in both German and Italian children’s writing moving from the assumption of clearly separated L1 and L2 contexts towards observing multicompetence in young writers. While the studies presented here show some attempts to uncover the complexity of different learning contexts in a multilingual society, combining various resources as well as quantitative and qualitative research methods, the article will also discuss challenges, potentials and limitations of combining data, as well as methods and tools borrowed from different disciplines, with an outlook for future research in the field.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100158"},"PeriodicalIF":2.1,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145416012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-11DOI: 10.1016/j.acorp.2025.100157
Meike de Boer , Willemijn Heeren , Anton Daser , Colm Gannon , Frederic Gnielka , Salla Huikuri , Robert Lehmann , Rebecca Reichel , Thomas Schäfer , Alexander F. Schmidt , Katarzyna Staciwa , Arjan Blokland
On the dark web, there are forums dedicated to the distribution and discussion of child sexual abuse material (CSAM). Although exchanging material is one of the major purposes of such forums, only a small portion of the users share CSAM themselves. Using keyness analysis, we analyzed word frequencies to see which words were unusually frequent for either CSAM sharers or non-sharers. The language of non-sharing members shows more positivity and rapport-building, which could be a way to compensate for not being able to meet the expectation to contribute material to the forum. In addition, they use more sexually explicit language, potentially to prove that they are a genuine part of the community. Sharers, on the other hand, talk more about the forum and the world outside of the forum where their practices are considered illegal. Hence, many words that are typical for the sharing members are related to the law and law enforcement. Before members start sharing, their language use is situated between non-sharers and sharers. They use positive, rapport-building, and explicit language, although lesser pronounced than non-sharers, and they refer to the forum community but not yet to the world outside the forum. Findings can be used by law enforcement in covert operations, who might want to mimic strategies to compensate for not being able to share CSAM. In addition, the results show that keyness analysis could potentially aid in differentiating between different groups of users on dark web CSAM forums, which could help law enforcement to prioritize target members in large-scale CSAM forums.
{"title":"Lexical choices of sharers and non-sharers on child sexual abuse material forums","authors":"Meike de Boer , Willemijn Heeren , Anton Daser , Colm Gannon , Frederic Gnielka , Salla Huikuri , Robert Lehmann , Rebecca Reichel , Thomas Schäfer , Alexander F. Schmidt , Katarzyna Staciwa , Arjan Blokland","doi":"10.1016/j.acorp.2025.100157","DOIUrl":"10.1016/j.acorp.2025.100157","url":null,"abstract":"<div><div>On the dark web, there are forums dedicated to the distribution and discussion of child sexual abuse material (CSAM). Although exchanging material is one of the major purposes of such forums, only a small portion of the users share CSAM themselves. Using keyness analysis, we analyzed word frequencies to see which words were unusually frequent for either CSAM sharers or non-sharers. The language of non-sharing members shows more positivity and rapport-building, which could be a way to compensate for not being able to meet the expectation to contribute material to the forum. In addition, they use more sexually explicit language, potentially to prove that they are a genuine part of the community. Sharers, on the other hand, talk more about the forum and the world outside of the forum where their practices are considered illegal. Hence, many words that are typical for the sharing members are related to the law and law enforcement. Before members start sharing, their language use is situated between non-sharers and sharers. They use positive, rapport-building, and explicit language, although lesser pronounced than non-sharers, and they refer to the forum community but not yet to the world outside the forum. Findings can be used by law enforcement in covert operations, who might want to mimic strategies to compensate for not being able to share CSAM. In addition, the results show that keyness analysis could potentially aid in differentiating between different groups of users on dark web CSAM forums, which could help law enforcement to prioritize target members in large-scale CSAM forums.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100157"},"PeriodicalIF":2.1,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09DOI: 10.1016/j.acorp.2025.100155
Atikhom Thienthong
Verb forms are crucial time-reference expressions in academic citations, observed to be affected by citational and linguistic features, such as citation forms, reporting subjects, and reporting verbs (i.e., citation-internal features). Using a corpus of 852 journal articles, this first corpus-based study investigates a range of citation-internal factors in 3,694 academic citations to determine their main and interaction effects on the choice of verb forms through multinomial logistic regression modeling. The original and bootstrapped results show that most of the main factors significantly predict the selection of verb forms in reporting and reported clauses. The occurrence of reporting and reported verb forms is affected by the number of sources, citation forms, subject animacy, meaning-based verbs, and activity verbs. However, while subject definiteness strongly affects reporting verb forms but not reported ones, the reverse is true for evaluation verbs. In addition, two significant interaction terms are observed for reported verb forms; general subjects and tentative verbs interact to choose the present, while multiple sources interact with non-integral citations to influence the choice of modal verbs. The results underscore the importance of citation-internal features in influencing and contextualizing the use of verb forms to express temporal reference in academic citations.
{"title":"Predicting verb forms in reporting and reported clauses: A corpus-based study of academic citations","authors":"Atikhom Thienthong","doi":"10.1016/j.acorp.2025.100155","DOIUrl":"10.1016/j.acorp.2025.100155","url":null,"abstract":"<div><div>Verb forms are crucial time-reference expressions in academic citations, observed to be affected by citational and linguistic features, such as citation forms, reporting subjects, and reporting verbs (i.e., citation-internal features). Using a corpus of 852 journal articles, this first corpus-based study investigates a range of citation-internal factors in 3,694 academic citations to determine their main and interaction effects on the choice of verb forms through multinomial logistic regression modeling. The original and bootstrapped results show that most of the main factors significantly predict the selection of verb forms in reporting and reported clauses. The occurrence of reporting and reported verb forms is affected by the number of sources, citation forms, subject animacy, meaning-based verbs, and activity verbs. However, while subject definiteness strongly affects reporting verb forms but not reported ones, the reverse is true for evaluation verbs. In addition, two significant interaction terms are observed for reported verb forms; general subjects and tentative verbs interact to choose the present, while multiple sources interact with non-integral citations to influence the choice of modal verbs. The results underscore the importance of citation-internal features in influencing and contextualizing the use of verb forms to express temporal reference in academic citations.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100155"},"PeriodicalIF":2.1,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-21DOI: 10.1016/j.acorp.2025.100154
Liming Liu
Research on clausal complexity in L2 writing has traditionally employed a reductionist approach by encapsulating all types of finite dependent clauses under the rubric of subordination, without distinguishing between their syntactic functions and with participle adverbial clauses excluded from clausal features. Taking a functional, usage-based approach to clausal complexity, this study sets out to investigate the frequency of finite adverbial clauses of three semantic relations and participle adverbial clauses of certain structural types in L2 academic writing, in a corpus-assisted comparison with published research articles. Results show that students use both finite and participle adverbial clauses less frequently than published writers overall. The study then tries to provide a rich textual analysis to functionally interpret the low representation of adverbial clauses in student writing. Implications for L2 writing pedagogy and L2 syntactic complexity research are discussed.
{"title":"The role of adverbial clauses as a feature of clausal complexity in L2 academic writing: A usage-based, discourse perspective","authors":"Liming Liu","doi":"10.1016/j.acorp.2025.100154","DOIUrl":"10.1016/j.acorp.2025.100154","url":null,"abstract":"<div><div>Research on clausal complexity in L2 writing has traditionally employed a reductionist approach by encapsulating all types of finite dependent clauses under the rubric of subordination, without distinguishing between their syntactic functions and with participle adverbial clauses excluded from clausal features. Taking a functional, usage-based approach to clausal complexity, this study sets out to investigate the frequency of finite adverbial clauses of three semantic relations and participle adverbial clauses of certain structural types in L2 academic writing, in a corpus-assisted comparison with published research articles. Results show that students use both finite and participle adverbial clauses less frequently than published writers overall. The study then tries to provide a rich textual analysis to functionally interpret the low representation of adverbial clauses in student writing. Implications for L2 writing pedagogy and L2 syntactic complexity research are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100154"},"PeriodicalIF":2.1,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145218951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08DOI: 10.1016/j.acorp.2025.100153
Youqi Kong , Wei Lin
Chinese technology has emerged as a highly debated and newsworthy topic in recent years. While much scholarly attention has been devoted to analyzing news texts, the role of news photographs in shaping perceptions of newsworthiness remains underexplored. This study bridges this gap by examining the interplay between textual and visual news values in Chinese and US media coverage of 5G networks. Drawing on a corpus of 275 news articles published between 2017 and 2021 in China Daily, The Washington Post, and The New York Times, we employ the discursive news values analysis (DNVA) framework, augmented by corpus linguistic techniques and AI-driven image annotation tools. The findings reveal distinct patterns: Chinese media emphasizes Positivity, Personalization, and Proximity, whereas US media prioritizes Negativity, Eliteness, and Proximity. The differences in the multisemiotic construction of news values reflect underlying sociocultural ideologies and geopolitical dynamics, offering fresh insights into the media’s role in shaping global technological narratives.
{"title":"The multisemiotic dimension of 5G news: A corpus-based discursive news values analysis","authors":"Youqi Kong , Wei Lin","doi":"10.1016/j.acorp.2025.100153","DOIUrl":"10.1016/j.acorp.2025.100153","url":null,"abstract":"<div><div>Chinese technology has emerged as a highly debated and newsworthy topic in recent years. While much scholarly attention has been devoted to analyzing news texts, the role of news photographs in shaping perceptions of newsworthiness remains underexplored. This study bridges this gap by examining the interplay between textual and visual news values in Chinese and US media coverage of 5G networks. Drawing on a corpus of 275 news articles published between 2017 and 2021 in China Daily, The Washington Post, and The New York Times, we employ the discursive news values analysis (DNVA) framework, augmented by corpus linguistic techniques and AI-driven image annotation tools. The findings reveal distinct patterns: Chinese media emphasizes Positivity, Personalization, and Proximity, whereas US media prioritizes Negativity, Eliteness, and Proximity. The differences in the multisemiotic construction of news values reflect underlying sociocultural ideologies and geopolitical dynamics, offering fresh insights into the media’s role in shaping global technological narratives.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100153"},"PeriodicalIF":2.1,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1016/j.acorp.2025.100151
Annarita Felici
This paper describes the design and construction of CHEU-lex, a parallel and comparable corpus of Swiss and European Union (EU) legislation. Data are available in the three languages of the Swiss Confederation (French, German and Italian) and include bilateral agreements between Switzerland and the EU and their reception in Swiss law. The corpus is a richly annotated multilingual resource and allows the analysis of legal language at several levels (macro-textual, lexical, morphosyntactic) and according to different perspectives (monolingual, cross-lingual, cross-textual, diachronic). The goal is to highlight key properties of CHEU-lex, discuss issues of legal corpus compilation and, finally, outline some applications for translation and legal linguistic research.
{"title":"CHEU-lex: a parallel multilingual corpus of Swiss and EU legislation","authors":"Annarita Felici","doi":"10.1016/j.acorp.2025.100151","DOIUrl":"10.1016/j.acorp.2025.100151","url":null,"abstract":"<div><div>This paper describes the design and construction of CHEU-lex, a parallel and comparable corpus of Swiss and European Union (EU) legislation. Data are available in the three languages of the Swiss Confederation (French, German and Italian) and include bilateral agreements between Switzerland and the EU and their reception in Swiss law. The corpus is a richly annotated multilingual resource and allows the analysis of legal language at several levels (macro-textual, lexical, morphosyntactic) and according to different perspectives (monolingual, cross-lingual, cross-textual, diachronic). The goal is to highlight key properties of CHEU-lex, discuss issues of legal corpus compilation and, finally, outline some applications for translation and legal linguistic research.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100151"},"PeriodicalIF":2.1,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-26DOI: 10.1016/j.acorp.2025.100149
Mark McGlashan , Charlotte-Rose Kennedy
Safeguarding children in schools broadly refers to the actions taken to protect children from abuse, prevent damage to health and development, and promote conditions that would improve the life chances of children. To safeguard children, UK schools must implement filtering and monitoring software to “block harmful and inappropriate content without unreasonably impacting teaching and learning” (Department for Education, 2024: 40). The industry standard method for monitoring online language use in schools is ‘keyword monitoring’, which identifies the use or presence of specific words or phrases (e.g. ‘bomb’) that correlate with a specific form of risk (e.g. violence). However, this approach typically depends on lists of words isolated from their context(s) of use and tends only to raise concerns if there is a direct match to a ‘keyword’. This can lead to ‘false positives’ whereby a 'keyword' match raises an automatic safeguarding concern (e.g. ‘bomb’) even if the use of the keyword was innocuous (e.g. ‘bath bomb’). This paper introduces corpus linguistics as a set of methods and approaches to enhance the effectiveness of filtering and monitoring through a case study based on a 1094,914-word corpus of online testimonies relating to suicide. In doing so, we demonstrate how corpus methods and analysis of authentic language data can be used to identify and contextualise safeguarding concerns. The practical applications of this research are intended to help schools to better protect children from the illegal and legal (but harmful) online materials that currently pose a threat to their safety and wellbeing.
保护在校儿童广义上是指为保护儿童不受虐待、防止对健康和发展的损害以及促进改善儿童生活机会的条件而采取的行动。为了保护儿童,英国学校必须实施过滤和监控软件,以“阻止有害和不适当的内容,而不会不合理地影响教学”(Department for Education, 2024: 40)。监测学校在线语言使用的行业标准方法是“关键字监测”,即识别与特定形式的风险(例如暴力)相关的特定单词或短语(例如“炸弹”)的使用或存在。然而,这种方法通常依赖于与使用上下文分离的单词列表,并且只有在与“关键字”直接匹配时才会引起关注。这可能导致“误报”,即“关键字”匹配会引发自动保护问题(例如“炸弹”),即使关键字的使用是无害的(例如“沐浴炸弹”)。本文介绍了语料库语言学作为一套方法和途径,以提高过滤和监测的有效性,通过一个基于1094,914字的在线证词语料库与自杀相关的案例研究。在此过程中,我们展示了如何使用语料库方法和真实语言数据的分析来识别和情境化保护问题。这项研究的实际应用旨在帮助学校更好地保护儿童免受非法和合法(但有害)在线材料的侵害,这些材料目前对他们的安全和福祉构成威胁。
{"title":"Corpus linguistics for safeguarding children online","authors":"Mark McGlashan , Charlotte-Rose Kennedy","doi":"10.1016/j.acorp.2025.100149","DOIUrl":"10.1016/j.acorp.2025.100149","url":null,"abstract":"<div><div>Safeguarding children in schools broadly refers to the actions taken to protect children from abuse, prevent damage to health and development, and promote conditions that would improve the life chances of children. To safeguard children, UK schools must implement filtering and monitoring software to “block harmful and inappropriate content without unreasonably impacting teaching and learning” (Department for Education, 2024: 40). The industry standard method for monitoring online language use in schools is ‘keyword monitoring’, which identifies the use or presence of specific words or phrases (e.g. ‘bomb’) that correlate with a specific form of risk (e.g. violence). However, this approach typically depends on lists of words isolated from their context(s) of use and tends only to raise concerns if there is a direct match to a ‘keyword’. This can lead to ‘false positives’ whereby a 'keyword' match raises an automatic safeguarding concern (e.g. ‘bomb’) even if the use of the keyword was innocuous (e.g. ‘bath bomb’). This paper introduces corpus linguistics as a set of methods and approaches to enhance the effectiveness of filtering and monitoring through a case study based on a 1094,914-word corpus of online testimonies relating to suicide. In doing so, we demonstrate how corpus methods and analysis of authentic language data can be used to identify and contextualise safeguarding concerns. The practical applications of this research are intended to help schools to better protect children from the illegal and legal (but harmful) online materials that currently pose a threat to their safety and wellbeing.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100149"},"PeriodicalIF":2.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}