我如何知道该法律语料库是可靠有效的?使用代表性论据验证语料库

Jenny Kemp
{"title":"我如何知道该法律语料库是可靠有效的?使用代表性论据验证语料库","authors":"Jenny Kemp","doi":"10.1016/j.acorp.2024.100099","DOIUrl":null,"url":null,"abstract":"<div><p>Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a <em>Representativeness Argument,</em> which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘<em>DSVC-IL</em>’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (r<sub>s</sub>=.98, p&lt;.00). The list provides similar coverage of the <em>DSVC-IL</em> (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000169/pdfft?md5=5be89dd8047952d7d59c561d28b28f8b&pid=1-s2.0-S2666799124000169-main.pdf","citationCount":"0","resultStr":"{\"title\":\"How do I know this Law corpus is reliable and valid? Using a representativeness argument for corpus validation\",\"authors\":\"Jenny Kemp\",\"doi\":\"10.1016/j.acorp.2024.100099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a <em>Representativeness Argument,</em> which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘<em>DSVC-IL</em>’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (r<sub>s</sub>=.98, p&lt;.00). The list provides similar coverage of the <em>DSVC-IL</em> (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.</p></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666799124000169/pdfft?md5=5be89dd8047952d7d59c561d28b28f8b&pid=1-s2.0-S2666799124000169-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799124000169\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799124000169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

只有当语料库充分代表了目标领域的内容和语言时,语料库研究结果才会有用;然而,很少有研究对代表性进行评估或报告。本文认为,语料库语言学家应明确关注验证过程。它提出了 "代表性论证 "这一创新概念,明确说明了语料库的可靠性和有效性,从而使语料库在特定目的和受众面前的应用具有可辩护性。它改编自图尔敏(1958/2003)的论证模型,其独创性在于同时关注目标领域和语言代表性,以及专家判断所发挥的关键作用。为了说明这种方法,我对 198 万字的 "DSVC-IL "语料库进行了代表性论证,该语料库是为了研究阅读国际法研究生文章所需的特定学科词汇而编制的。通过分析模块和阅读清单,并经专家确认,该语料库充分反映了目标领域的内容。通过对 1026 个单词表进行分析,并使用频率、关键度、范围和分布均匀度等指标进行提取,证明语料充分反映了该领域的内容。单词表项目在随机分割的语料库两半中分布均匀(rs=.98,p<.00)。该列表的覆盖范围与 DSVC-IL (26.37%)和该领域其他文本(23.87%)相似。此外,法律专家证实,列表中的大多数项目都是法律词汇。总之,这些证据证明了语料库和列表在其明确定义的目的方面的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How do I know this Law corpus is reliable and valid? Using a representativeness argument for corpus validation

Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a Representativeness Argument, which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘DSVC-IL’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (rs=.98, p<.00). The list provides similar coverage of the DSVC-IL (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Corpus Linguistics
Applied Corpus Linguistics Linguistics and Language
CiteScore
1.30
自引率
0.00%
发文量
0
审稿时长
70 days
期刊最新文献
Identifying ChatGPT-generated texts in EFL students’ writing: Through comparative analysis of linguistic fingerprints English podcasts for schoolchildren and their vocabulary demands Capturing chronological variation in L2 speech through lexical measurements and regression analysis Investigating spoken classroom interactions in linguistically heterogeneous learning groups – An interdisciplinary approach to process video-based data in second language acquisition classrooms FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1