首页 > 最新文献

Acta Linguistica Academica最新文献

英文 中文
The (non-)finiteness of subordination correlates with basic word order: Evidence from Uralic 从属关系的(非)有限性与基本语序相关——来自乌拉尔语的证据
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-06-12 DOI: 10.1556/2062.2023.00647
K. Kiss
This paper aims to answer why the Uralic languages use, or used until intensive contacts with Indo-European languages, only non-finite subordination. It argues against regarding the evolution of finite subordination language development, showing that languages with non-finite subordination and parataxis have the same expressive power as languages with finite subordination. It claims that non-finite subordination is a concomitant of SOV word order, and the growing proportion of finite subordination in the Uralic languages from east to west, and in the history of Hungarian is a consequence of the loosening of the SOV order and the emergence of SVO. The paper examines two hypotheses about the correlations between SOV and non-finite subordination, and SVO and finite subordination, the Final-Over-Final Condition of Biberauer, Holmberg & Roberts (2014, etc.), a formal principle constraining clausal architecture, and the Minimize Domains Principle of Hawkins (2004, etc.), a functional principle of processing efficiency. The two theories make largely overlapping correct predictions for the Uralic languages, which suggests that the Final-Over-Final Condition may be the syntacticization of the condition that ensures processing efficiency in SOV and SVO languages.
本文旨在回答为什么乌拉尔语使用,或者直到与印欧语言密切接触时才使用,只有非有限从属关系。它反对关于有限从属语言发展的演变,表明具有非有限从属关系和并列关系的语言与具有有限从属关系的语言具有相同的表达能力。它声称,非有限隶属关系是SOV语序的伴随,而在乌拉尔语中,从东到西,以及在匈牙利语历史上,有限隶属关系的比例不断增加,是SOV词序松动和SVO出现的结果。本文考察了关于SOV与非有限隶属关系、SVO与有限隶属关系的两个假设,Biberauer,Holmberg&Roberts(2014等)的最终超终条件,一个约束子句结构的形式原则,以及Hawkins(2004等)的最小化域原则,一个处理效率的函数原则。这两种理论对乌拉尔语做出了很大程度上重叠的正确预测,这表明词尾过词尾条件可能是确保SOV和SVO语言处理效率的条件的语法化。
{"title":"The (non-)finiteness of subordination correlates with basic word order: Evidence from Uralic","authors":"K. Kiss","doi":"10.1556/2062.2023.00647","DOIUrl":"https://doi.org/10.1556/2062.2023.00647","url":null,"abstract":"This paper aims to answer why the Uralic languages use, or used until intensive contacts with Indo-European languages, only non-finite subordination. It argues against regarding the evolution of finite subordination language development, showing that languages with non-finite subordination and parataxis have the same expressive power as languages with finite subordination. It claims that non-finite subordination is a concomitant of SOV word order, and the growing proportion of finite subordination in the Uralic languages from east to west, and in the history of Hungarian is a consequence of the loosening of the SOV order and the emergence of SVO. The paper examines two hypotheses about the correlations between SOV and non-finite subordination, and SVO and finite subordination, the Final-Over-Final Condition of Biberauer, Holmberg & Roberts (2014, etc.), a formal principle constraining clausal architecture, and the Minimize Domains Principle of Hawkins (2004, etc.), a functional principle of processing efficiency. The two theories make largely overlapping correct predictions for the Uralic languages, which suggests that the Final-Over-Final Condition may be the syntacticization of the condition that ensures processing efficiency in SOV and SVO languages.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44974058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SVO – Attractor in the declarative-to-procedural shift in grammar evolution SVO——语法进化中陈述性到程序性转变的吸引子
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-06-12 DOI: 10.1556/2062.2023.00642
H. Haider
Diachronic changes in phrase or clause structure are vectored rather than oscillating. A century ago, E. Sapir identified a drift towards fixed word order and another one towards the invariant word (including the levelling of the forms for subject and object marking). What is still missing is a theory that predicts such drifts. As will be argued, the theory that explains Sapir's observations and, in passing, makes the concept of Universal Grammar dispensable is the theory that grammars are targets and products of cognitive evolution. Sapir's drifts are shifts from systems based primarily on the consciously accessible declarative network to systems based on the consciously inaccessible procedural network. This also explains why the [S[VO]] clause-structure is a point of no return and why languages do not change in the reverse direction, starting from a grammar like English and eventually moving to a grammar like Sanskrit.
短语或子句结构的历时变化是矢量的,而不是振荡的。一个世纪以前,E. Sapir发现了一种倾向于固定词序的趋势,以及另一种倾向于不变词的趋势(包括主语和宾语标记形式的平等化)。现在仍然缺少的是预测这种漂移的理论。正如将要讨论的那样,解释萨皮尔的观察并顺便使通用语法的概念变得可有可无的理论是语法是认知进化的目标和产物的理论。Sapir的漂移是从主要基于有意识可及的声明性网络的系统转向基于有意识不可及的程序性网络的系统。这也解释了为什么[S[VO]]从句结构是一个不归路,为什么语言不会从英语这样的语法开始,最终移动到梵语这样的语法。
{"title":"SVO – Attractor in the declarative-to-procedural shift in grammar evolution","authors":"H. Haider","doi":"10.1556/2062.2023.00642","DOIUrl":"https://doi.org/10.1556/2062.2023.00642","url":null,"abstract":"Diachronic changes in phrase or clause structure are vectored rather than oscillating. A century ago, E. Sapir identified a drift towards fixed word order and another one towards the invariant word (including the levelling of the forms for subject and object marking). What is still missing is a theory that predicts such drifts. As will be argued, the theory that explains Sapir's observations and, in passing, makes the concept of Universal Grammar dispensable is the theory that grammars are targets and products of cognitive evolution. Sapir's drifts are shifts from systems based primarily on the consciously accessible declarative network to systems based on the consciously inaccessible procedural network. This also explains why the [S[VO]] clause-structure is a point of no return and why languages do not change in the reverse direction, starting from a grammar like English and eventually moving to a grammar like Sanskrit.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44330595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some notes on negated and quantified objects in Middle English and Early Modern English 中古英语和近代早期英语中否定宾语和量词的注释
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-05-24 DOI: 10.1556/2062.2023.00650
Chiara De Bastiani
In this paper, I present a novel corpus investigation of quantified and negated objects in the Middle English and Early Modern English period, which is embedded within the wider language change scenario from linear OV to linear VO in the history of English. It will be shown that evidence for preverbal positioning of such objects is mostly limited to translated texts in Middle English in the PPCME2 corpus, and that by late Middle English, most of the hits consist of negated elements, as shown in the PCEEC corpus, which consists of native texts. The different constraints governing spell out of positive objects in Old English and Middle English are discussed and compared to the licensing of negated and quantified objects. The data provided in this paper constitute further evidence for Ingham's (2000, 2002, 2007) analysis of preposed negated objects in late ME and their correlation with the Negative Cycle, and complement previous investigations on negated and quantified objects in Middle English and Early Modern English.
在这篇论文中,我对中古英语和现代英语早期的量化和否定对象进行了一次新颖的语料库调查,它嵌入了英语历史上从线性OV到线性VO的更广泛的语言变化场景中。研究表明,对这些对象进行语前定位的证据大多局限于PPCME2语料库中的中古英语翻译文本,而到了中古英语晚期,大多数命中由否定元素组成,如PCEEC语料库所示,该语料库由母语文本组成。讨论了古英语和中古英语中支配阳性宾语拼写的不同约束,并与否定宾语和量化宾语的许可进行了比较。本文提供的数据为Ingham(200020022007)对晚期ME中前置否定宾语及其与负循环的相关性的分析提供了进一步的证据,并补充了以往对中古英语和现代早期英语中否定和量化宾语的研究。
{"title":"Some notes on negated and quantified objects in Middle English and Early Modern English","authors":"Chiara De Bastiani","doi":"10.1556/2062.2023.00650","DOIUrl":"https://doi.org/10.1556/2062.2023.00650","url":null,"abstract":"In this paper, I present a novel corpus investigation of quantified and negated objects in the Middle English and Early Modern English period, which is embedded within the wider language change scenario from linear OV to linear VO in the history of English. It will be shown that evidence for preverbal positioning of such objects is mostly limited to translated texts in Middle English in the PPCME2 corpus, and that by late Middle English, most of the hits consist of negated elements, as shown in the PCEEC corpus, which consists of native texts. The different constraints governing spell out of positive objects in Old English and Middle English are discussed and compared to the licensing of negated and quantified objects. The data provided in this paper constitute further evidence for Ingham's (2000, 2002, 2007) analysis of preposed negated objects in late ME and their correlation with the Negative Cycle, and complement previous investigations on negated and quantified objects in Middle English and Early Modern English.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43923629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the argument structure realization of result verbs: A syntactic approach 结果动词实元结构实现:一种句法方法
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-03-15 DOI: 10.1556/2062.2023.00567
Josep Ausensi, Alessandro Bigolin
Manner/Result Complementarity (Rappaport Hovav & Levin 2010) has been argued to have consequences for argument realization: only manner verbs permit object deletion and non-selected objects. In contrast, result verbs always co-appear with their object, because they are required to express the undergoer of the change that they entail. We discuss new data involving result verbs in constructions where the undergoer of the change encoded by the result verb is not realized as the object of the predicate. We argue these data display result verbs whose root is integrated into the argument structure of the predicate in such a way that it is interpreted as specifying a co-event of the main event denoted by the predicate, whereby the result entailed by the root is not necessarily intended to hold of the direct object. This follows if verb roots do not come with a syntactically relevant specification for manner or result from the lexicon, but acquire it on the basis of their association with the syntactic structure.
方式/结果互补性(Rappaport Hovav & Levin 2010)被认为对论证的实现有影响:只有方式动词允许对象删除和非选择对象。相反,结果动词总是与它们的宾语同时出现,因为它们需要表达它们所引起的变化的接受者。我们讨论了结构中涉及结果动词的新数据,其中由结果动词编码的变化的经历者没有作为谓词的对象实现。我们认为,这些数据显示的结果动词,其根被集成到谓词的参数结构中,以这样一种方式将其解释为指定谓词表示的主事件的共同事件,因此根所包含的结果不一定要包含直接对象。如果动词词根没有从词典中获得与语法相关的方式或结果规范,而是根据它们与句法结构的联系获得它,那么就会出现这种情况。
{"title":"On the argument structure realization of result verbs: A syntactic approach","authors":"Josep Ausensi, Alessandro Bigolin","doi":"10.1556/2062.2023.00567","DOIUrl":"https://doi.org/10.1556/2062.2023.00567","url":null,"abstract":"Manner/Result Complementarity (Rappaport Hovav & Levin 2010) has been argued to have consequences for argument realization: only manner verbs permit object deletion and non-selected objects. In contrast, result verbs always co-appear with their object, because they are required to express the undergoer of the change that they entail. We discuss new data involving result verbs in constructions where the undergoer of the change encoded by the result verb is not realized as the object of the predicate. We argue these data display result verbs whose root is integrated into the argument structure of the predicate in such a way that it is interpreted as specifying a co-event of the main event denoted by the predicate, whereby the result entailed by the root is not necessarily intended to hold of the direct object. This follows if verb roots do not come with a syntactically relevant specification for manner or result from the lexicon, but acquire it on the basis of their association with the syntactic structure.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46618459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic intuitions 语言的直觉
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-03-15 DOI: 10.1556/2062.2023.00657
Xiangyu Chang, Tiaoyuan Mao
{"title":"Linguistic intuitions","authors":"Xiangyu Chang, Tiaoyuan Mao","doi":"10.1556/2062.2023.00657","DOIUrl":"https://doi.org/10.1556/2062.2023.00657","url":null,"abstract":"","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43354328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The development and functions of the inferential marker chog‘i in Uzbek 推理标记chog’i在乌兹别克语中的发展及其作用
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2023-03-15 DOI: 10.1556/2062.2023.00568
Melike Üzüm
In the evidential system of Uzbek, the speaker has different grammatical options in marking the source of information, such as -ibdi, ekan, emish, etc., although it is not compulsory to mark this category in the utterance. In addition to these established markers, new markers have developed into evidentials, and they encode specific sub-categories of evidentiality. In this study, after a brief overview of grammatical markers of evidentiality in Uzbek, the marker chog‘i is examined with a syntactic and semantic approach based on a corpus of selected texts. Its development into an inferential marker is evaluated with special attention to sources of evidentials.
在乌兹别克语的证据系统中,说话人在标记信息来源时有不同的语法选择,如-ibdi、ekan、emish等,尽管在话语中不必标记这一类别。除了这些已建立的标记,新的标记已经发展成为证据,它们编码证据性的特定子类别。在本研究中,在简要概述了乌兹别克语证据性的语法标记后,基于所选文本的语料库,用句法和语义方法对标记chog'i进行了检验。对其作为推理标记的发展进行了评估,并特别注意证据的来源。
{"title":"The development and functions of the inferential marker chog‘i in Uzbek","authors":"Melike Üzüm","doi":"10.1556/2062.2023.00568","DOIUrl":"https://doi.org/10.1556/2062.2023.00568","url":null,"abstract":"In the evidential system of Uzbek, the speaker has different grammatical options in marking the source of information, such as -ibdi, ekan, emish, etc., although it is not compulsory to mark this category in the utterance. In addition to these established markers, new markers have developed into evidentials, and they encode specific sub-categories of evidentiality. In this study, after a brief overview of grammatical markers of evidentiality in Uzbek, the marker chog‘i is examined with a syntactic and semantic approach based on a corpus of selected texts. Its development into an inferential marker is evaluated with special attention to sources of evidentials.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47388637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Guest Editor's Foreword 客座编辑前言
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2022-12-12 DOI: 10.1556/2062.2022.00623
Gábor Prószéky
{"title":"Guest Editor's Foreword","authors":"Gábor Prószéky","doi":"10.1556/2062.2022.00623","DOIUrl":"https://doi.org/10.1556/2062.2022.00623","url":null,"abstract":"","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44038232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach 一个概念验证的意义辨别实验,用于编译形容词的上下文词数据集-基于图的分布方法
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2022-12-12 DOI: 10.1556/2062.2022.00579
Enikő Héja, Noémi Ligeti-Nagy
The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.
word -in- context语料库是SuperGLUE基准数据集的一部分,它专注于一个特定的意义消歧义任务:它必须确定给定目标单词在两个不同的上下文中的两次出现是否传达相同的含义。不幸的是,WiC数据库在注释者之间的一致性方面表现出相对较低的一致性,这意味着即使对于人类来说,意义区分任务也没有很好地定义。本文旨在通过将语义信息锚定到可观测表面数据来解决这一问题。为此,我们尝试了一种基于图的分布方法,其中稀疏和密集的形容词向量表示都作为输入。根据我们的期望,该算法能够将语义信息锚定到上下文数据,因此它能够提供清晰明确的标准,以确定何时应该为出现的事件分配相同的含义。此外,由于这种方法不依赖于任何外部知识库,因此它应该适用于任何低资源或中等资源的语言。
{"title":"A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach","authors":"Enikő Héja, Noémi Ligeti-Nagy","doi":"10.1556/2062.2022.00579","DOIUrl":"https://doi.org/10.1556/2062.2022.00579","url":null,"abstract":"The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43712821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiVaSE: A bilingual variational sentence encoder with randomly initialized Transformer layers BiVaSE:一种具有随机初始化Transformer层的双语变分句编码器
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2022-12-12 DOI: 10.1556/2062.2022.00584
Bence Nyéki
Transformer-based NLP models have achieved state-of-the-art results in many NLP tasks including text classification and text generation. However, the layers of these models do not output any explicit representations for texts units larger than tokens (e.g. sentences), although such representations are required to perform text classification. Sentence encodings are usually obtained by applying a pooling technique during fine-tuning on a specific task. In this paper, a new sentence encoder is introduced. Relying on an autoencoder architecture, it was trained to learn sentence representations from the very beginning of its training. The model was trained on bilingual data with variational Bayesian inference. Sentence representations were evaluated in downstream and linguistic probing tasks. Although the newly introduced encoder generally performs worse than well-known Transformer-based encoders, the experiments show that it was able to learn to incorporate linguistic information in the sentence representations.
基于Transformer的NLP模型在许多NLP任务中取得了最先进的结果,包括文本分类和文本生成。然而,这些模型的层不输出大于标记(例如句子)的文本单元的任何显式表示,尽管执行文本分类需要这样的表示。句子编码通常是通过在对特定任务进行微调时应用池技术来获得的。本文介绍了一种新的句子编码器。依靠自动编码器架构,它从一开始就被训练来学习句子表示。该模型使用变分贝叶斯推理在双语数据上进行训练。在下游和语言探究任务中评估句子表征。尽管新引入的编码器通常比众所周知的基于Transformer的编码器性能更差,但实验表明,它能够学会将语言信息融入句子表示中。
{"title":"BiVaSE: A bilingual variational sentence encoder with randomly initialized Transformer layers","authors":"Bence Nyéki","doi":"10.1556/2062.2022.00584","DOIUrl":"https://doi.org/10.1556/2062.2022.00584","url":null,"abstract":"Transformer-based NLP models have achieved state-of-the-art results in many NLP tasks including text classification and text generation. However, the layers of these models do not output any explicit representations for texts units larger than tokens (e.g. sentences), although such representations are required to perform text classification. Sentence encodings are usually obtained by applying a pooling technique during fine-tuning on a specific task. In this paper, a new sentence encoder is introduced. Relying on an autoencoder architecture, it was trained to learn sentence representations from the very beginning of its training. The model was trained on bilingual data with variational Bayesian inference. Sentence representations were evaluated in downstream and linguistic probing tasks. Although the newly introduced encoder generally performs worse than well-known Transformer-based encoders, the experiments show that it was able to learn to incorporate linguistic information in the sentence representations.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46375866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural machine translation for Hungarian 匈牙利语的神经机器翻译
IF 0.5 3区 文学 Q1 Arts and Humanities Pub Date : 2022-11-30 DOI: 10.1556/2062.2022.00576
L. Laki, Zijian Győző Yang
In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora.
在本研究的范围内,我们旨在概述当前现有的机器翻译解决方案,并评估它们在英语-匈牙利语言对上的表现。匈牙利语被认为是机器翻译的一种具有挑战性的语言,因为它与英语相比具有高度不同的语法结构和词序。我们探讨了各种机器翻译系统从学术和工业应用。我们工作的一个关键亮点是,我们的模型(Marian NMT, BART)比大多数市场领先的跨国公司提供的解决方案表现得好得多。最后,我们对不同的预微调模型(mT5、mBART、M2M100)进行了英匈语翻译,在我们的测试语料库中取得了最先进的结果。
{"title":"Neural machine translation for Hungarian","authors":"L. Laki, Zijian Győző Yang","doi":"10.1556/2062.2022.00576","DOIUrl":"https://doi.org/10.1556/2062.2022.00576","url":null,"abstract":"In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42316871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Acta Linguistica Academica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1