International Journal of Corpus Linguistics最新文献

英文中文

Innovation on screen 屏幕上的创新

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-12-08 DOI: 10.1075/IJCL.00038.REI

Susan A. Reichelt

Abstract This study explores marked affixation as a possible cue for characterization in scripted television dialogue. The data used here is the newly compiled TV Corpus, which encompasses over 265 million words in its North American English context. An initial corpus-based analysis quantifies the innovative use of affixes in word-formation processes across the corpus to allow for comparison with a following character analysis, which investigates how derivational word-formation supports characterization patterns within a specific series, Buffy the Vampire Slayer. For this, a list of productive prefixes (e.g. de-, un-) and suffixes (e.g. -y, -ish) is used to elicit relevant contexts. The study thus combines two approaches to word-formation processes in scripted contexts. On a large scale, it shows how derivational neologisms are spread across TV dialogue and on a much smaller scale, it highlights particular instances where these neologisms are used to aid character construction.

摘要本研究探讨了在电视剧本对话中标记词缀作为表征的可能线索。这里使用的数据是最新汇编的电视语料库，在北美英语语境中包含超过2.65亿个单词。最初的基于语料库的分析量化了词缀在整个语料库的单词形成过程中的创新使用，以便与下面的字符分析进行比较，该分析调查了衍生单词形成如何支持特定系列中的特征模式，吸血鬼杀手巴菲。为此，使用一系列富有成效的前缀（例如de、un-）和后缀（例如-y、-ish）来引出相关上下文。因此，该研究结合了两种方法来处理脚本环境中的单词形成过程。在很大程度上，它展示了衍生新词是如何在电视对话中传播的，在小得多的范围内，它强调了这些新词被用来帮助角色构建的特定例子。

引用次数: 5

Subcategorization frame identification for learner English 英语学习者的子范畴框架识别

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-12-08 DOI: 10.1075/ijcl.18097.hua

Yi-Feng Huang, Akira Murakami, T. Alexopoulou, A. Korhonen

Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.

随着大规模学习语料库的日益普及，自然语言处理(NLP)技术的发展为第二语言研究提供丰富的语言注释变得至关重要。本文提出了一个用于英语学习者的子分类框架自动分析系统。SCFs将词汇与形态句法联系起来，揭示了学习者语言中词汇信息与结构信息之间的相互作用。同时，scf对于研究单个动词、动词类和句法结构变化等一系列现象至关重要。为了说明我们的系统对学习者语料库研究和第二语言习得(SLA)的有用性，我们研究了二语学习者如何在文本中多样化地使用scf，以及这种多样性如何随着二语熟练程度的变化而变化。

引用次数: 0

Speech acts in corpus pragmatics 言语在语料库语用学中的作用

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-23 DOI: 10.1075/IJCL.19023.WEI

M. Weisser

In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end, I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained taxonomy to describe complex verbal exchanges.

在语料库语用学中，大多数对言语行为的研究仍然倾向于局限于原始的、高度抽象的、由Austin和Searle等普通语言哲学家设计的言语行为分类。本文的目的是说明使用这种有限的分类法如何导致对口语互动中表达的交际功能的过度简化或潜在的误导印象，并展示更详细的分类法，即DART分类法(Weisser, 2018)如何帮助我们更好地了解对话中出现的语用策略。为此，我将从面向任务的领域和不受约束的交互中选取一小部分对话样本，并在Searle的分类法和DART分类法的基础上对比选定的语音行为分类，展示使用更细粒度的分类法来描述复杂的语言交换所产生的优势。

引用次数: 3

Keyword analysis and the indexing of Aboriginal and Torres Strait Islander identity 原住民与托雷斯海峡岛民认同之关键词分析与标引

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-23 DOI: 10.1075/ijcl.00031.bed

M. Bednarek

Abstract This article presents a corpus-driven sociolinguistic study of Redfern Now – the first major television drama series commissioned, written, acted, directed and produced by Indigenous industry professionals in Australia. The study examines whether corpus linguistic keyword analysis can identify evidence for type indexicality (social demographics, personae) and trait indexicality (stance, personality), with particular attention paid to the potential indexing of Aboriginal and Torres Strait Islander identity. More specifically, the study’s goal is to retrieve and analyse words that are associated with varieties of English in Australia, and with Australian Aboriginal Englishes in particular. To this end, a corpus with dialogue from Redfern Now is compared to a reference corpus of US television dialogue. Results show that Redfern Now features the use of easily recognisable and familiar words (e.g. blackfella[s], deadly; kinship terms), but also shows clear variation among characters. The case study concludes by evaluating the use of keyword analysis for identifying indexicality in telecinematic discourse.

本文介绍了一个语料库驱动的社会语言学研究Redfern Now -第一个主要的电视连续剧委托，编剧，表演，导演和制作的土著行业专业人士在澳大利亚。本研究考察了语料库语言关键词分析是否可以识别类型索引性(社会人口统计、人物)和特征索引性(立场、个性)的证据，并特别关注原住民和托雷斯海峡岛民身份的潜在索引。更具体地说，这项研究的目标是检索和分析与澳大利亚各种英语有关的单词，尤其是与澳大利亚土著英语有关的单词。为此，将Redfern Now的对话语料库与美国电视对话的参考语料库进行了比较。结果表明，Redfern Now的特点是使用容易识别和熟悉的单词(例如blackfella[s]， deadly;亲属术语)，但也显示出人物之间的明显差异。案例研究最后评估了关键词分析在影视语篇中识别指标性的使用。

引用次数: 2

Classifying heuristic textual practices in academic discourse 学术话语中启发式语篇实践的分类

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-23 DOI: 10.1075/ijcl.19097.bec

Maria Becker, M. Bender, Marcus Müller

In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.

在本文中，我们研究了深度学习技术如何应用于话语语用学。作为一个测试案例，我们分析了启发式语篇实践，它被定义为学术话语中研究过程中决策例程的语言实现。我们开发了一个不同粒度水平的语用语言学类别的复杂注释方案，并手动注释了不同科学学科的文本语料库。这是训练递归神经网络对启发式文本实践进行分类的基础。我们的实验表明，注释类别足够强大，可以被我们的模型识别，这些模型学习了表示为单词嵌入的句子表面的相似性。我们的研究目标是一个迭代的人在循环过程，在这个过程中，手工解释学和算法程序相互推进洞察过程。它强调了这样一个事实，即手动和自动方法之间的互动为进一步研究开辟了一个很有前途的领域，允许对大型语料库中复杂的语用现象进行解释性分析。

引用次数: 2

Love, R. (2020). Overcoming Challenges in Corpus Construction: The spoken British National Corpus 2014 爱，R（2020）。克服语料库建设中的挑战：英国口语国家语料库2014

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-23 DOI: 10.1075/ijcl.00032.wan

Jiawei Wang

This article reviews Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014

本文综述了《克服语料库建设中的挑战：2014年英国国家口语语料库》

引用次数: 4

Lima or cima? 青柠还是更浓？

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-23 DOI: 10.1075/IJCL.19094.POS

C. Posch, Gerhard Rampl

Abstract This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of German texts on alpinism. We report on issues related to building the corpus from the Austrian Alpine Club Journal (1869–2010). First, a general description of our data and the project phases from digitization and annotation to publication is given. We focus on the most interesting challenges that the diverse layouts and the extensive use of Fraktur typefacing posed for optical layout recognition and optical character recognition (OCR) as well as post correction. The corrected data was lemmatized and annotated with part-of-speech information including named entities as well as TEI-conformant metadata. The resulting 19.9-million-word corpus is designed to be queried using CQPweb and Hyperbase and can be accessed freely online. Lastly, we give a short roadmap of current and future expansions and improvements as corpus data has been and is being enhanced in follow-up projects.

摘要本文概述了Alpenwort语料库的构建，这是一个大型的、基于体裁的德语高山主义文本语料库。我们报告了有关从奥地利阿尔卑斯俱乐部杂志(1869-2010)建立语料库的问题。首先，对我们的数据和项目从数字化、注释到出版的各个阶段进行了概述。我们关注的是德国尖角字体的多样化布局和广泛使用给光学布局识别和光学字符识别(OCR)以及后期校正带来的最有趣的挑战。校正后的数据用词性信息(包括命名实体和符合tei的元数据)进行语法化和注释。由此产生的1990万字的语料库可以使用CQPweb和Hyperbase进行查询，并且可以在线免费访问。最后，我们给出了当前和未来扩展和改进的简短路线图，因为语料库数据已经并正在后续项目中得到增强。

引用次数: 2

A linguistic typology of American television 美国电视的语言类型学

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-12 DOI: 10.1075/IJCL.00039.BER

Tony Berber Sardinha, M. Pinto

Abstract This paper presents the first entirely linguistic typology of contemporary American television, derived from a multi-dimensional (MD) analysis of the USTV corpus. The USTV corpus comprises 930 texts from 191 different TV programs, classified into 31 different registers (including nine telecinematic ones: drama series, miniseries, movies, sitcoms, soap operas, general animation, children’s animation, short-feature animation, and children’s and teens’ shows). The linguistic typology we present in this study is based on the linguistic characteristics present in the individual programs, with no a priori textual categorizations. A cluster analysis grouped the individual programs into clusters that shared similar dimensional profiles. The resulting typology comprises nine different text types – namely Presentation of information, Opinion and discussion, Analysis and debate, Description, Interactive recount, Engaging demonstration, Playful discourse, Simplified interaction, and Simulated conversation. The paper discusses and illustrates each text type and considers how telecinematic discourse relates to each of them.

摘要本文提出了当代美国电视的第一个完全语言类型学，源于对USTV语料库的多维分析。USTV语料库包括来自191个不同电视节目的930篇文本，分为31个不同的寄存器（包括9个电视电影寄存器：电视连续剧、迷你剧、电影、情景喜剧、肥皂剧、普通动画、儿童动画、短片动画以及儿童和青少年节目）。我们在本研究中提出的语言类型学是基于单个程序中存在的语言特征，没有先验的文本分类。聚类分析将各个程序分组为具有相似维度配置文件的聚类。由此产生的类型学包括九种不同的文本类型，即信息呈现、观点和讨论、分析和辩论、描述、互动叙述、参与演示、有趣的话语、简化互动和模拟对话。本文讨论并举例说明了每一种文本类型，并考虑了电视电影话语与每一种类型的关系。

引用次数: 2

A diachronic perspective on telecinematic language 电影语言的历时性透视

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-02 DOI: 10.1075/IJCL.00036.WER

Valentin Werner

Abstract Previous corpus-based studies, which have mostly focused on a particular film or series, have identified various key characteristics of telecinematic language. However, a restriction on those results applies as regards the stability of findings across time and across individual productions. To address this gap, and following calls for more nuanced perspectives on telecinematic language as a whole, this study re-assesses a number of claims pertaining to lexical and lexicogrammatical aspects through a diachronic lens. To this end, it uses the Northern American sections of the new Movie and TV Corpora, multi-million word corpora compiled from subtitles of a wide range of film and series genres in the English-speaking world from the 20th and 21st century. Overall, the diachronic view of the data is suggestive of a highly complex nature of telecinematic language, with levels of emotionality and informality increasing over time for most items tested.

以往基于语料库的研究主要集中在一部特定的电影或电视剧上，已经确定了电影语言的各种关键特征。然而，这些结果在不同时间和不同作品之间的稳定性方面受到限制。为了解决这一差距，并遵循对电影语言整体更细致入微的观点的呼吁，本研究通过历时镜头重新评估了与词汇和词汇语法方面有关的一些主张。为此，它使用了新电影和电视语料库的北美部分，该语料库是由20世纪和21世纪英语世界广泛的电影和电视剧类型的字幕汇编而成的数百万单词的语料库。总的来说，数据的历时性观点暗示了电视电影语言的高度复杂性，大多数测试项目的情感和非正式程度随着时间的推移而增加。

引用次数: 6

Language use in pop culture over three decades 三十年来流行文化中的语言使用

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2020-11-02 DOI: 10.1075/IJCL.00037.CSO

Enikó Csomay, Ryan Young

Abstract Analyzing variation in language features in literature and telecinematic discourse provides valuable insights into society’s shifting values and perspectives. In this study, we carry out a keyword analysis on the language of three series of Star Trek television dialogues, broadcast in the 1960s, 1980s, and 1990s, from two perspectives: (i) keywords across the three series highlighting words that are unique to one series in contrast to the other two, providing insights about changes of foci across time; (ii) keywords in relation to gender depicting potential differences in gender roles and how these may change through time across the series.

通过分析文学和影视话语中语言特征的变化，可以深入了解社会价值观和观点的变化。在本研究中，我们从两个角度对20世纪60年代、80年代和90年代播出的三部《星际迷航》电视对话的语言进行了关键词分析:(i)三部电视剧的关键词，突出了其中一部电视剧与其他两部电视剧相比所特有的词语，从而揭示了焦点在时间上的变化;(ii)与性别相关的关键词，描述性别角色的潜在差异，以及这些差异如何随着时间的推移而变化。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Corpus Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀