首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Legal cynicism in Men’s Rights discourses: Using corpus linguistics to investigate how distrust in the legal system excuses and perpetuates sexual violence against women 男性权利话语中的法律犬儒主义:运用语料库语言学研究对法律制度的不信任如何为针对妇女的性暴力提供借口和延续
IF 2.1 Pub Date : 2025-08-26 DOI: 10.1016/j.acorp.2025.100148
Kate Barber
The term legal cynicism refers to a type of legal disengagement which is associated with a lack of internal commitment to follow legal rules and a failure to acknowledge legal authority, typically stemming from perceived ongoing injustices and rights deprivations. This perception of the criminal justice system enables individuals in extremist communities to rationalise criminal actions, leading to an increased propensity for violent behaviour. Effectively identifying content such as this within online discourses has been argued to be the initial step in mitigating this propensity for violence and corpus linguistic methods, employed as entry points into these discourses, offer effective tools to do such analysis.
Using a 122,000-word corpus of online discourses produced by Men’s Right’s Activists (MRAs) on blogs and the subreddit r/MensRights, quantitative and qualitative approaches are used in this corpus-assisted discourse analysis to determine how legal cynicism is indexed and generated. The ways in which the criminal justice systems in both the United States and United Kingdom are contextualised and reframed to embed legal cynicism in MRA discourses, and the evidential and legal processes highlighted as problematic by MRAs, are explored. The paper discusses the impact of this reframing of the criminal justice system on the potential for violence through conspiracy theories and legal disengagement. It concludes with suggestions for addressing legal cynicism through prebunking and educational strategies designed to challenge misconceptions of criminal justice processes.
法律犬儒主义一词指的是一种法律脱离,它与缺乏遵守法律规则的内在承诺和不承认法律权威有关,通常源于感知到的持续的不公正和权利剥夺。对刑事司法系统的这种看法使极端主义社区中的个人能够将犯罪行为合理化,从而导致暴力行为的倾向增加。有效地识别在线话语中的此类内容被认为是减轻这种暴力倾向的第一步,而语料库语言学方法作为进入这些话语的切入点,为进行此类分析提供了有效的工具。利用男性权利活动家(MRAs)在博客和reddit r/MensRights子论坛上制作的122,000字的在线话语语料库,定量和定性方法用于语料库辅助话语分析,以确定法律犬儒主义是如何被索引和产生的。本文探讨了美国和英国的刑事司法系统是如何被语境化和重构的,以便在MRA话语中嵌入法律犬儒主义,以及MRA强调的有问题的证据和法律程序。本文通过阴谋论和法律脱离讨论了刑事司法系统的这种重构对暴力可能性的影响。报告最后提出建议,通过预先学习和旨在挑战对刑事司法程序的误解的教育策略来解决法律犬儒主义问题。
{"title":"Legal cynicism in Men’s Rights discourses: Using corpus linguistics to investigate how distrust in the legal system excuses and perpetuates sexual violence against women","authors":"Kate Barber","doi":"10.1016/j.acorp.2025.100148","DOIUrl":"10.1016/j.acorp.2025.100148","url":null,"abstract":"<div><div>The term <em>legal cynicism</em> refers to a type of legal disengagement which is associated with a lack of internal commitment to follow legal rules and a failure to acknowledge legal authority, typically stemming from perceived ongoing injustices and rights deprivations. This perception of the criminal justice system enables individuals in extremist communities to rationalise criminal actions, leading to an increased propensity for violent behaviour. Effectively identifying content such as this within online discourses has been argued to be the initial step in mitigating this propensity for violence and corpus linguistic methods, employed as entry points into these discourses, offer effective tools to do such analysis.</div><div>Using a 122,000-word corpus of online discourses produced by Men’s Right’s Activists (MRAs) on blogs and the subreddit <em>r/MensRights</em>, quantitative and qualitative approaches are used in this corpus-assisted discourse analysis to determine how legal cynicism is indexed and generated. The ways in which the criminal justice systems in both the United States and United Kingdom are contextualised and reframed to embed legal cynicism in MRA discourses, and the evidential and legal processes highlighted as problematic by MRAs, are explored. The paper discusses the impact of this reframing of the criminal justice system on the potential for violence through conspiracy theories and legal disengagement. It concludes with suggestions for addressing legal cynicism through prebunking and educational strategies designed to challenge misconceptions of criminal justice processes.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100148"},"PeriodicalIF":2.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjectives and deception: A view from linguistic theory 形容词与欺骗:语言学理论视角
IF 2.1 Pub Date : 2025-08-26 DOI: 10.1016/j.acorp.2025.100150
Willem B. Hollmann , Mathew Gillings
This study addresses the challenge of deceptive opinion spam, a growing concern for e-commerce and consumer trust. Building on established psychological theories of deception and focusing on hotel reviews, we expand current approaches by incorporating a Radical Construction Grammar (RCG; Croft, 1990, 1991, 2001, 2022) perspective on adjectives. Traditional part-of-speech taggers define adjectives largely through morphological and syntactic criteria, lumping property modifiers together with property predicates. Based on Croft’s more refined framework, we suggest that the cognitive load associated with property words used attributively (e.g., the white door) is higher than in predicative positions (e.g., the door is white). We analyse a subset of the Deceptive Opinion Spam Corpus (DOSC) and find attributive property words to be significantly more frequent in truthful reviews, whereas predicative forms show no variation. This distinction proved more effective than a traditional POS-tagger based definition of adjectives in separating authentic from fake reviews. The manual coding required for the RCG-based approach was resource-intensive, but even modest accuracy gains could be crucial in high-stakes scenarios. Future work should investigate whether a Croftian approach can be operationalised through automated taggers and whether these findings extend to other deceptive contexts. The paper highlights the benefit of a more theoretically grounded view of linguistic categories in forensic settings. A truly interdisciplinary effort that draws on advanced linguistic theory as much as on psychological theories of deception, and operationalises the approach computationally, thus promises to yield efficient and more effective deception detection systems.
这项研究解决了欺骗性意见垃圾邮件的挑战,这是对电子商务和消费者信任日益关注的问题。在已建立的欺骗心理学理论的基础上,我们将重点放在酒店评论上,通过结合激进结构语法(RCG; Croft, 1990, 1991, 2001, 2022)对形容词的观点来扩展当前的方法。传统的词性标注器主要通过形态和句法标准来定义形容词,将属性修饰语和属性谓词混在一起。基于Croft的更精细的框架,我们认为与属性词相关的认知负荷(例如,白色的门)高于谓语位置(例如,门是白色的)。我们分析了欺骗性意见垃圾语料库(DOSC)的一个子集,发现定语属性词在真实评论中明显更频繁,而谓语形式则没有变化。事实证明,在区分真实评论和虚假评论方面,这种区分比传统的基于post -tagger的形容词定义更有效。基于rgc的方法所需的手工编码是资源密集型的,但是在高风险的场景中,即使是适度的准确性提高也可能是至关重要的。未来的工作应该研究Croftian方法是否可以通过自动标记器操作,以及这些发现是否可以扩展到其他欺骗性背景。这篇论文强调了在法医环境中对语言类别进行更有理论基础的观点的好处。这是一项真正跨学科的研究,它借鉴了先进的语言学理论和欺骗心理学理论,并通过计算将方法付诸实践,从而有望产生更高效、更有效的欺骗检测系统。
{"title":"Adjectives and deception: A view from linguistic theory","authors":"Willem B. Hollmann ,&nbsp;Mathew Gillings","doi":"10.1016/j.acorp.2025.100150","DOIUrl":"10.1016/j.acorp.2025.100150","url":null,"abstract":"<div><div>This study addresses the challenge of deceptive opinion spam, a growing concern for e-commerce and consumer trust. Building on established psychological theories of deception and focusing on hotel reviews, we expand current approaches by incorporating a Radical Construction Grammar (RCG; Croft, 1990, 1991, 2001, 2022) perspective on adjectives. Traditional part-of-speech taggers define adjectives largely through morphological and syntactic criteria, lumping property modifiers together with property predicates. Based on Croft’s more refined framework, we suggest that the cognitive load associated with property words used attributively (e.g., <em>the <u>white</u> door</em>) is higher than in predicative positions (e.g., <em>the door is <u>white</u></em>). We analyse a subset of the Deceptive Opinion Spam Corpus (DOSC) and find attributive property words to be significantly more frequent in truthful reviews, whereas predicative forms show no variation. This distinction proved more effective than a traditional POS-tagger based definition of adjectives in separating authentic from fake reviews. The manual coding required for the RCG-based approach was resource-intensive, but even modest accuracy gains could be crucial in high-stakes scenarios. Future work should investigate whether a Croftian approach can be operationalised through automated taggers and whether these findings extend to other deceptive contexts. The paper highlights the benefit of a more theoretically grounded view of linguistic categories in forensic settings. A truly interdisciplinary effort that draws on advanced linguistic theory as much as on psychological theories of deception, and operationalises the approach computationally, thus promises to yield efficient and more effective deception detection systems.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100150"},"PeriodicalIF":2.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus-assisted discourse analysis of children’s and groomers’ talk in online grooming interactions 基于语料库辅助的在线梳理互动中儿童和美容师谈话的话语分析
IF 2.1 Pub Date : 2025-08-25 DOI: 10.1016/j.acorp.2025.100147
Craig Evans , Nuria Lorenzo-Dus
Harmful communication may not always be recognisable as such, especially when it is manipulative and deceptive and appears to be indistinguishable from innocuous communication. This is the case with online child sexual grooming, where talk from interactions between groomers and children may resemble that seen between friends or consenting adults chatting. However, recognising that online grooming may be taking place is not simply a matter of spotting tell-tale words or phrases. It requires engaging with ways that online grooming is discursive: involving groomers and children using language to perform particular functions as they pursue different goals through a dynamic exchange. We address this need in this study by providing the first ever complete account of online grooming discourse, one that identifies features not only of groomers’ talk but also of children’s, using collocates of the most frequent content words in a corpus of each. Comparing findings between the two highlights distinctiveness that helps make online grooming communication more identifiable. It also reveals strong similarity, perhaps reflecting groomers’ efforts to minimise perpetrator/victim contrast for deception purposes. An advantage of using a corpus-assisted discourse studies approach, as found in our study, is that it can uncover subtle, non-obvious patterns that may serve as indicators of online grooming despite such deception.
有害的沟通可能并不总是这样被识别出来,特别是当它是操纵和欺骗的,似乎与无害的沟通无法区分。这就是在线儿童性引诱的情况,在这种情况下,美容师和孩子之间的互动对话可能类似于朋友之间或成年人之间的聊天。然而,认识到在线美容可能正在发生,并不仅仅是发现泄露信息的单词或短语的问题。它需要采用在线培训是话语式的方式:让培训师和孩子通过动态交流,在追求不同目标的过程中,使用语言来执行特定功能。在这项研究中,我们通过提供有史以来第一个完整的在线梳理话语来解决这一需求,它不仅确定了梳理者谈话的特征,还确定了儿童谈话的特征,使用每个语料库中最常见的实词的搭配。比较两者的研究结果,可以凸显出有助于使在线梳理交流更容易识别的独特性。它还揭示了强烈的相似性,这也许反映了为了欺骗的目的,美容师努力将肇事者/受害者的对比降到最低。正如我们在研究中发现的那样,使用语料库辅助话语研究方法的一个优点是,它可以发现微妙的、不明显的模式,这些模式可以作为在线修饰的指标,尽管存在这种欺骗。
{"title":"A corpus-assisted discourse analysis of children’s and groomers’ talk in online grooming interactions","authors":"Craig Evans ,&nbsp;Nuria Lorenzo-Dus","doi":"10.1016/j.acorp.2025.100147","DOIUrl":"10.1016/j.acorp.2025.100147","url":null,"abstract":"<div><div>Harmful communication may not always be recognisable as such, especially when it is manipulative and deceptive and appears to be indistinguishable from innocuous communication. This is the case with online child sexual grooming, where talk from interactions between groomers and children may resemble that seen between friends or consenting adults chatting. However, recognising that online grooming may be taking place is not simply a matter of spotting tell-tale words or phrases. It requires engaging with ways that online grooming is discursive: involving groomers and children using language to perform particular functions as they pursue different goals through a dynamic exchange. We address this need in this study by providing the first ever <em>complete</em> account of online grooming discourse, one that identifies features not only of groomers’ talk but also of children’s, using collocates of the most frequent content words in a corpus of each. Comparing findings between the two highlights distinctiveness that helps make online grooming communication more identifiable. It also reveals strong similarity, perhaps reflecting groomers’ efforts to minimise perpetrator/victim contrast for deception purposes. An advantage of using a corpus-assisted discourse studies approach, as found in our study, is that it can uncover subtle, non-obvious patterns that may serve as indicators of online grooming despite such deception.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100147"},"PeriodicalIF":2.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forensic authorship profiling using geolocated social media data: A corpus linguistic and cartographic approach 使用地理定位的社会媒体数据的法医作者分析:语料库语言和制图方法
IF 2.1 Pub Date : 2025-08-24 DOI: 10.1016/j.acorp.2025.100146
Dana Roemling
This paper explores the use of corpus-based methods for regional authorship profiling in forensic linguistics. Traditional approaches depend on linguistic expertise to identify regional markers, but this has limitations: it relies on an analyst’s intuition and potentially outdated dialect resources. Furthermore, traditional dialectology typically does not support word frequency analysis.
This study argues for the use of large, geolocated datasets to modernise regional authorship profiling. Unlike traditional dialect atlases, corpora provide access to contemporary, naturally occurring data, allowing for nuanced frequency analyses. Spatial statistics, such as Moran’s I, and tools like R allow for the rapid visualisation of regional linguistic patterns, enhancing both analysis and communication in legal contexts.
Using a case study based on a corpus of 15 million social media posts, this paper demonstrates the advantages of corpus-based methods in regional authorship profiling. It finds that for the 10,000 most frequent words in the dataset, Moran’s I values ranged from 0.071 to 0.768 (mean = 0.329), with strongly regional terms such as etz (“now”; I = 0.739) and guad (“good”; I = 0.511) showing clear spatial clustering. This data-driven, spatial statistical approach enables the extraction of regional markers without relying on expert intuition. Consequently, the approach provides a more objective and scalable method for identifying regional language patterns, enhancing forensic casework while also reducing the reliance on potentially outdated dialect resources.
本文探讨了在法律语言学中使用基于语料库的方法进行区域作者身份分析。传统的方法依赖于语言专业知识来识别区域标记,但这有局限性:它依赖于分析师的直觉和潜在过时的方言资源。此外,传统的方言学通常不支持词频分析。该研究主张使用大型地理定位数据集来实现区域作者身份分析的现代化。与传统的方言地图集不同,语料库提供了对当代自然发生的数据的访问,允许进行细致入微的频率分析。空间统计,如莫兰的I,和像R这样的工具允许区域语言模式的快速可视化,加强在法律背景下的分析和交流。本文以1500万篇社交媒体帖子的语料库为例,展示了基于语料库的方法在区域作者身份分析中的优势。研究发现,对于数据集中出现频率最高的10000个单词,Moran的I值范围在0.071到0.768之间(平均值= 0.329),etz(“现在”;I = 0.739)和guad(“好”;I = 0.511)等具有很强的地域性的术语显示出明显的空间聚类。这种数据驱动的空间统计方法可以在不依赖专家直觉的情况下提取区域标记。因此,该方法为识别区域语言模式提供了一种更加客观和可扩展的方法,增强了法医案件工作,同时也减少了对潜在过时方言资源的依赖。
{"title":"Forensic authorship profiling using geolocated social media data: A corpus linguistic and cartographic approach","authors":"Dana Roemling","doi":"10.1016/j.acorp.2025.100146","DOIUrl":"10.1016/j.acorp.2025.100146","url":null,"abstract":"<div><div>This paper explores the use of corpus-based methods for regional authorship profiling in forensic linguistics. Traditional approaches depend on linguistic expertise to identify regional markers, but this has limitations: it relies on an analyst’s intuition and potentially outdated dialect resources. Furthermore, traditional dialectology typically does not support word frequency analysis.</div><div>This study argues for the use of large, geolocated datasets to modernise regional authorship profiling. Unlike traditional dialect atlases, corpora provide access to contemporary, naturally occurring data, allowing for nuanced frequency analyses. Spatial statistics, such as Moran’s <em>I</em>, and tools like R allow for the rapid visualisation of regional linguistic patterns, enhancing both analysis and communication in legal contexts.</div><div>Using a case study based on a corpus of 15 million social media posts, this paper demonstrates the advantages of corpus-based methods in regional authorship profiling. It finds that for the 10,000 most frequent words in the dataset, Moran’s <em>I</em> values ranged from 0.071 to 0.768 (mean = 0.329), with strongly regional terms such as <em>etz</em> (“now”; <em>I</em> = 0.739) and <em>guad</em> (“good”; <em>I</em> = 0.511) showing clear spatial clustering. This data-driven, spatial statistical approach enables the extraction of regional markers without relying on expert intuition. Consequently, the approach provides a more objective and scalable method for identifying regional language patterns, enhancing forensic casework while also reducing the reliance on potentially outdated dialect resources.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100146"},"PeriodicalIF":2.1,"publicationDate":"2025-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144907626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus sense: A comprehensive tool for advanced text and discourse exploration 语料库感知:高级文本和话语探索的综合工具
IF 2.1 Pub Date : 2025-08-13 DOI: 10.1016/j.acorp.2025.100145
Antonio Moreno-Ortiz
Corpus Sense is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.
语料库感知是一个专注于内容和话语分析的web应用程序,旨在促进语言语料库的探索,分析和可视化,其中包含一些现有软件无法提供的高级功能。该工具通过结合定量、定性和人工智能功能,使用户能够以最小的努力获得有用的见解。它专为中小型语料库(目前多达250万个令牌)而设计,允许在线语料库共享,并提供独特的功能,例如基于nlp的关键字提取,命名实体识别,语义搜索和高级主题建模与llm生成的可解释标签。该应用程序的界面简单直观,旨在使其能够被广泛的用户配置文件访问。本文对该应用程序的开发、结构及其在语料库语言学和语篇分析研究中的应用进行了综述。本文还讨论了基于nlp和ai辅助的新型工具与传统语料库分析方法的集成。
{"title":"Corpus sense: A comprehensive tool for advanced text and discourse exploration","authors":"Antonio Moreno-Ortiz","doi":"10.1016/j.acorp.2025.100145","DOIUrl":"10.1016/j.acorp.2025.100145","url":null,"abstract":"<div><div><em>Corpus Sense</em> is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100145"},"PeriodicalIF":2.1,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144903858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparative Study of Graded Vocabulary Features in HSK Level 6 Listening Materials and Media Audio, and an Analysis of the Graded Word List HSK六级听力材料与媒体音频分级词汇特征对比研究及分级词表分析
IF 2.1 Pub Date : 2025-08-09 DOI: 10.1016/j.acorp.2025.100143
Nan Xue PhD (student) , Jimin Wang PhD (Professor)
Vocabulary familiarity plays a critical role in Chinese language learners’ listening comprehension. This study compares HSK Level 6 listening materials (∼50,000 tokens) and transcribed media audio texts (∼100,000 tokens), using the graded word lists from the Standards for Chinese Language Proficiency in International Chinese Education. Applying Python and the Language Technology Platform (LTP) for segmentation and automated processing, the study calculates the proportions of vocabulary across levels. Results reveal no significant differences in graded word coverage between the two corpora, but both contain a substantial proportion of unclassified words, indicating limited coverage by current word lists. Frequency analysis also shows underuse of many listed words. These findings highlight the need to enhance graded word lists through corpus-based NLP techniques and suggest that topic type may influence vocabulary distribution in listening texts.
词汇熟悉度对汉语学习者的听力理解起着至关重要的作用。本研究使用《国际汉语教育汉语水平标准》中的分级词表,对HSK 6级听力材料(~ 50,000个标记)和转录媒体音频文本(~ 100,000个标记)进行了比较。该研究使用Python和语言技术平台(LTP)进行分割和自动化处理,计算了不同级别词汇的比例。结果显示,两个语料库的分级词覆盖率没有显著差异,但都包含大量未分类词,表明当前词表的覆盖率有限。频率分析还显示,许多所列单词使用不足。这些发现强调了通过基于语料库的自然语言处理技术加强分级词表的必要性,并表明主题类型可能会影响听力文本中的词汇分布。
{"title":"A Comparative Study of Graded Vocabulary Features in HSK Level 6 Listening Materials and Media Audio, and an Analysis of the Graded Word List","authors":"Nan Xue PhD (student) ,&nbsp;Jimin Wang PhD (Professor)","doi":"10.1016/j.acorp.2025.100143","DOIUrl":"10.1016/j.acorp.2025.100143","url":null,"abstract":"<div><div>Vocabulary familiarity plays a critical role in Chinese language learners’ listening comprehension. This study compares HSK Level 6 listening materials (∼50,000 tokens) and transcribed media audio texts (∼100,000 tokens), using the graded word lists from the Standards for Chinese Language Proficiency in International Chinese Education. Applying Python and the Language Technology Platform (LTP) for segmentation and automated processing, the study calculates the proportions of vocabulary across levels. Results reveal no significant differences in graded word coverage between the two corpora, but both contain a substantial proportion of unclassified words, indicating limited coverage by current word lists. Frequency analysis also shows underuse of many listed words. These findings highlight the need to enhance graded word lists through corpus-based NLP techniques and suggest that topic type may influence vocabulary distribution in listening texts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100143"},"PeriodicalIF":2.1,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Epistemic lexical verbs in academic writing: A corpus-based comparative study of L1 English and L1 Chinese writers of English 学术写作中的认知词汇动词:基于语料库的母语英语和母语汉语英语作者比较研究
IF 2.1 Pub Date : 2025-08-08 DOI: 10.1016/j.acorp.2025.100144
Fei Xie, Amanda Patten
Epistemic lexical verbs (ELVs) are critical to English academic writing where writers need to deliver their statements with an appropriate level of modesty and certainty. Drawing on Hyland’s (1998) taxonomy of ELVs, the study combines both quantitative and qualitative methods to examine how L1 Chinese writers use ELVs to express modality differently from L1 English writers in terms of frequency, range, sentence and grammatical construction. The investigation was conducted on two specialised corpora, comprising academic texts written by postgraduate students of L1 Chinese and L1 English respectively. The findings indicate that L1 Chinese writers tend to rely on a different range of devices and express a stronger commitment. Moreover, L1 Chinese students’ usage of ELVs is less balanced in terms of the grammatical patterns and sentence constructions, and some misuses can be identified in their writing. The authors also highlight the potential reasons behind these findings and propose pedagogical suggestions to improve learners’ pragmatic competence in this important area.
认识论词汇动词(elv)对英语学术写作至关重要,因为作者需要以适当的谦虚和确定性来表达他们的陈述。本研究借鉴Hyland(1998)的语气词分类法,采用定量和定性相结合的方法,考察汉语母语作者使用语气词表达情态在频率、范围、句子和语法结构等方面与英语母语作者的差异。调查是在两个专门的语料库上进行的,其中包括分别由L1汉语和L1英语的研究生撰写的学术文本。研究结果表明,母语汉语作者倾向于使用不同的语言表达手段,表达更强的承诺。此外,汉语母语学生在语法模式和句子结构方面的使用不平衡,在写作中可以发现一些误用现象。作者还强调了这些发现背后的潜在原因,并提出了提高学习者在这一重要领域的语用能力的教学建议。
{"title":"Epistemic lexical verbs in academic writing: A corpus-based comparative study of L1 English and L1 Chinese writers of English","authors":"Fei Xie,&nbsp;Amanda Patten","doi":"10.1016/j.acorp.2025.100144","DOIUrl":"10.1016/j.acorp.2025.100144","url":null,"abstract":"<div><div>Epistemic lexical verbs (ELVs) are critical to English academic writing where writers need to deliver their statements with an appropriate level of modesty and certainty. Drawing on Hyland’s (1998) taxonomy of ELVs, the study combines both quantitative and qualitative methods to examine how L1 Chinese writers use ELVs to express modality differently from L1 English writers in terms of frequency, range, sentence and grammatical construction. The investigation was conducted on two specialised corpora, comprising academic texts written by postgraduate students of L1 Chinese and L1 English respectively. The findings indicate that L1 Chinese writers tend to rely on a different range of devices and express a stronger commitment. Moreover, L1 Chinese students’ usage of ELVs is less balanced in terms of the grammatical patterns and sentence constructions, and some misuses can be identified in their writing. The authors also highlight the potential reasons behind these findings and propose pedagogical suggestions to improve learners’ pragmatic competence in this important area.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100144"},"PeriodicalIF":2.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corporate buzzword or genuine commitment? A corpus-assisted analysis of corporate ‘net-zero’ pledges by major global corporations 企业流行语还是真正的承诺?对主要跨国公司的企业“净零”承诺的语料库辅助分析
IF 2.1 Pub Date : 2025-08-08 DOI: 10.1016/j.acorp.2025.100142
Matteo Fuoli, Annika Beelitz
In recent years, corporations have faced growing pressure to address their environmental impact, leading many to pledge ‘net-zero’ emissions. This study employs corpus-assisted discourse analysis to examine how Fortune Global 500 companies communicate their net-zero commitments in their sustainability disclosures. Specifically, we conduct frequency, collocate, and concordance analyses to examine how the term net zero is discursively constructed and the solutions proposed to achieve this goal. Our findings support media observations that net zero has rapidly become a central theme in corporate discourse. However, corporate disclosures often frame net zero as a “journey” or an “ambition” and place a stronger focus on setting targets over concrete strategies to reduce emissions. These results raise questions about how credible corporate net-zero commitments are.
近年来,企业面临着越来越大的压力,要求解决其对环境的影响,导致许多企业承诺“净零”排放。本研究采用语料库辅助话语分析来研究财富全球500强公司如何在其可持续性披露中传达其净零承诺。具体而言,我们进行了频率、搭配和一致性分析,以检查术语净零是如何语篇构建的,以及为实现这一目标而提出的解决方案。我们的研究结果支持了媒体的观察,即净零已迅速成为企业话语的中心主题。然而,企业披露的信息往往将净零排放定义为一种“旅程”或“雄心”,更注重设定目标,而不是具体的减排战略。这些结果引发了人们对企业净零承诺可信度的质疑。
{"title":"Corporate buzzword or genuine commitment? A corpus-assisted analysis of corporate ‘net-zero’ pledges by major global corporations","authors":"Matteo Fuoli,&nbsp;Annika Beelitz","doi":"10.1016/j.acorp.2025.100142","DOIUrl":"10.1016/j.acorp.2025.100142","url":null,"abstract":"<div><div>In recent years, corporations have faced growing pressure to address their environmental impact, leading many to pledge ‘net-zero’ emissions. This study employs corpus-assisted discourse analysis to examine how Fortune Global 500 companies communicate their net-zero commitments in their sustainability disclosures. Specifically, we conduct frequency, collocate, and concordance analyses to examine how the term <em>net zero</em> is discursively constructed and the solutions proposed to achieve this goal. Our findings support media observations that net zero has rapidly become a central theme in corporate discourse. However, corporate disclosures often frame net zero as a “journey” or an “ambition” and place a stronger focus on setting targets over concrete strategies to reduce emissions. These results raise questions about how credible corporate net-zero commitments are.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100142"},"PeriodicalIF":2.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144885853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The social representation of ‘women’ on X platform before and after the launch of Saudi vision 2030 沙特2030愿景发布前后X平台上“女性”的社会代表
IF 2.1 Pub Date : 2025-07-11 DOI: 10.1016/j.acorp.2025.100140
Amal Alharbi, Areej Albawardi
This study investigates how Saudi males and females represent “women” on X (formerly Twitter), focusing on two distinct timeframes: 2015 (before Vision 2030) and 2022 (after Vision 2030). By integrating applied Corpus Linguistics (CL) and Critical Discourse Analysis (CDA), the research examines a corpus of 10,000 Arabic tweets (equally divided between male and female authors), thereby illuminating how broader social reforms correspond with shifts in online discourse. Specifically, we apply frequency counts, collocation analysis, and semantic prosody techniques in order to compare lexical choice, thematic focus, and evaluative stands in relation to Saudi women during both phases.
The findings reveal a discernible positive shift in attitudes after the official publication of Vision 2030. In 2015, the discourse was more likely to be about “spinsterhood,” boycotts, and guardianship, reflecting predominantly negative or restrictive portrayals of women. By 2022, tweets became more likely to be about empowerment, achievements, and national pride, suggesting changing social attitudes that increasingly legitimize women’s roles in workplaces, education, and public life. Although pockets of negativity persist—particularly in certain domains such as sports—these pockets of resistance are outnumbered by the overall trend towards more inclusive and celebratory discourses.
These results highlight how top-down reforms, such as the lifting of the driving ban and the promotion of women’s employment, have reshaped Saudi women’s discourse. Beyond its sociolinguistic and critical discourse studies contribution, this research highlights the power of large-scale policy changes in achieving shifts in everyday language and attitudes in conservative societies.
这项研究调查了沙特男性和女性如何在X(以前的Twitter)上代表“女性”,重点关注两个不同的时间框架:2015年(2030愿景之前)和2022年(2030愿景之后)。通过整合应用语料库语言学(CL)和批评话语分析(CDA),该研究检查了10,000个阿拉伯语推文的语料库(男女作者平均分配),从而阐明了更广泛的社会改革如何与在线话语的变化相对应。具体来说,我们应用频率计数、搭配分析和语义韵律技术来比较两个阶段沙特女性的词汇选择、主题焦点和评价立场。调查结果显示,在《2030年愿景》正式发布后,人们的态度出现了明显的积极转变。2015年,讨论更有可能是关于“老处女”、抵制和监护,主要反映了对女性的负面或限制性描述。到2022年,推文更有可能是关于赋权、成就和民族自豪感的,这表明社会态度的改变,女性在工作场所、教育和公共生活中的角色日益合法化。尽管一些消极因素仍然存在——特别是在某些领域,如体育——但这些抵制因素的数量超过了更包容和庆祝话语的总体趋势。这些结果突显了自上而下的改革,如解除驾驶禁令和促进妇女就业,如何重塑了沙特妇女的话语。除了在社会语言学和批评话语研究方面的贡献外,本研究还强调了大规模政策变化在保守社会中实现日常语言和态度转变的力量。
{"title":"The social representation of ‘women’ on X platform before and after the launch of Saudi vision 2030","authors":"Amal Alharbi,&nbsp;Areej Albawardi","doi":"10.1016/j.acorp.2025.100140","DOIUrl":"10.1016/j.acorp.2025.100140","url":null,"abstract":"<div><div>This study investigates how Saudi males and females represent “women” on X (formerly Twitter), focusing on two distinct timeframes: 2015 (before Vision 2030) and 2022 (after Vision 2030). By integrating applied Corpus Linguistics (CL) and Critical Discourse Analysis (CDA), the research examines a corpus of 10,000 Arabic tweets (equally divided between male and female authors), thereby illuminating how broader social reforms correspond with shifts in online discourse. Specifically, we apply frequency counts, collocation analysis, and semantic prosody techniques in order to compare lexical choice, thematic focus, and evaluative stands in relation to Saudi women during both phases.</div><div>The findings reveal a discernible positive shift in attitudes after the official publication of Vision 2030. In 2015, the discourse was more likely to be about “spinsterhood,” boycotts, and guardianship, reflecting predominantly negative or restrictive portrayals of women. By 2022, tweets became more likely to be about empowerment, achievements, and national pride, suggesting changing social attitudes that increasingly legitimize women’s roles in workplaces, education, and public life. Although pockets of negativity persist—particularly in certain domains such as sports—these pockets of resistance are outnumbered by the overall trend towards more inclusive and celebratory discourses.</div><div>These results highlight how top-down reforms, such as the lifting of the driving ban and the promotion of women’s employment, have reshaped Saudi women’s discourse. Beyond its sociolinguistic and critical discourse studies contribution, this research highlights the power of large-scale policy changes in achieving shifts in everyday language and attitudes in conservative societies.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100140"},"PeriodicalIF":2.1,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
L2 acquisition of unaccusative verbs in pre-intermediate middle school learners through indirect DDL: A study on change-of-state verbs and verbs of occurrence 通过间接DDL学习非宾格动词的二语习得:状态变化和发生动词的研究
Pub Date : 2025-07-09 DOI: 10.1016/j.acorp.2025.100139
Jina Son , Seyeon Park , Youjung Park , Hyebin Seo
{"title":"L2 acquisition of unaccusative verbs in pre-intermediate middle school learners through indirect DDL: A study on change-of-state verbs and verbs of occurrence","authors":"Jina Son ,&nbsp;Seyeon Park ,&nbsp;Youjung Park ,&nbsp;Hyebin Seo","doi":"10.1016/j.acorp.2025.100139","DOIUrl":"10.1016/j.acorp.2025.100139","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100139"},"PeriodicalIF":0.0,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144661990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1