{"title":"测量跨语料库的学术公式列表的频率:基于TED演讲和耶鲁讲座的案例研究","authors":"Peter Wingrove","doi":"10.1016/j.acorp.2021.100012","DOIUrl":null,"url":null,"abstract":"<div><p>Measuring lists of lexis across corpora is a well-established method in corpus linguistics<span><span>. This article takes a novel approach and measures the frequency of occurrence of the Academic Formulas List (AFL; Simpson-Vlach and Ellis, 2010) across academic lectures (OYCLC) and an academic-adjacent corpus of TED talks (TTC). Frequency of occurrence is measured at three levels: overall inter- and intra-corpus variation; the composition of representation, to see which formulas are represented; and an investigation of the behaviour of formulas within texts. The corpora were found to be significantly different from each other in terms of overall representation with a medium effect size. The greatest difference concerned referential expressions and the smallest difference concerned stance expressions. In terms of intra-corpus variation the AFL was found to occur less often in the humanities and most often in the natural sciences for both corpora. The composition of coverage revealed Zipfian distributions for the AFL, with both corpora presenting a similar set of high frequency formulas within each group category. A combined ratio and minimum frequency measure identified salient formulas to each corpus. Concerning formula behaviour, differences were found between the corpora concerning the use of the same formulas. </span>Pedagogic and methodological implications are discussed in the conclusion.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Measuring the frequency of the academic formulas list across corpora: A case study based in TED talks and Yale lectures\",\"authors\":\"Peter Wingrove\",\"doi\":\"10.1016/j.acorp.2021.100012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Measuring lists of lexis across corpora is a well-established method in corpus linguistics<span><span>. This article takes a novel approach and measures the frequency of occurrence of the Academic Formulas List (AFL; Simpson-Vlach and Ellis, 2010) across academic lectures (OYCLC) and an academic-adjacent corpus of TED talks (TTC). Frequency of occurrence is measured at three levels: overall inter- and intra-corpus variation; the composition of representation, to see which formulas are represented; and an investigation of the behaviour of formulas within texts. The corpora were found to be significantly different from each other in terms of overall representation with a medium effect size. The greatest difference concerned referential expressions and the smallest difference concerned stance expressions. In terms of intra-corpus variation the AFL was found to occur less often in the humanities and most often in the natural sciences for both corpora. The composition of coverage revealed Zipfian distributions for the AFL, with both corpora presenting a similar set of high frequency formulas within each group category. A combined ratio and minimum frequency measure identified salient formulas to each corpus. Concerning formula behaviour, differences were found between the corpora concerning the use of the same formulas. </span>Pedagogic and methodological implications are discussed in the conclusion.</span></p></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799121000125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799121000125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
跨语料库词汇测量表是语料库语言学中一种行之有效的方法。本文采用一种新颖的方法,测量了学术公式表(AFL)的出现频率;Simpson-Vlach and Ellis, 2010)跨越学术讲座(OYCLC)和TED演讲学术邻近语料库(TTC)。发生频率在三个层面进行测量:总体上的体间和体内变异;表示的组成,看哪些公式被表示;以及对文本中公式行为的调查。在整体表征方面,这些语料库彼此之间存在显著差异,具有中等效应大小。最大的差异涉及指称表达,最小的差异涉及立场表达。在语料库内部变化方面,发现AFL在人文学科中发生的频率较低,而在自然科学中最常见。覆盖的组成揭示了AFL的Zipfian分布,两个语料库在每个组类别中都呈现出类似的一组高频公式。组合比率和最小频率测量确定了每个语料库的显著公式。关于公式行为,在使用相同公式的语料库之间发现了差异。在结论部分讨论了教育学和方法论的含义。
Measuring the frequency of the academic formulas list across corpora: A case study based in TED talks and Yale lectures
Measuring lists of lexis across corpora is a well-established method in corpus linguistics. This article takes a novel approach and measures the frequency of occurrence of the Academic Formulas List (AFL; Simpson-Vlach and Ellis, 2010) across academic lectures (OYCLC) and an academic-adjacent corpus of TED talks (TTC). Frequency of occurrence is measured at three levels: overall inter- and intra-corpus variation; the composition of representation, to see which formulas are represented; and an investigation of the behaviour of formulas within texts. The corpora were found to be significantly different from each other in terms of overall representation with a medium effect size. The greatest difference concerned referential expressions and the smallest difference concerned stance expressions. In terms of intra-corpus variation the AFL was found to occur less often in the humanities and most often in the natural sciences for both corpora. The composition of coverage revealed Zipfian distributions for the AFL, with both corpora presenting a similar set of high frequency formulas within each group category. A combined ratio and minimum frequency measure identified salient formulas to each corpus. Concerning formula behaviour, differences were found between the corpora concerning the use of the same formulas. Pedagogic and methodological implications are discussed in the conclusion.