首页 > 最新文献

Linguistic Issues in Language Technology最新文献

英文 中文
Improving Multilingual Frame Identification by Estimating Frame Transferability 基于帧可移植性的多语言帧识别方法
Pub Date : 2022-07-18 DOI: 10.33011/lilt.v19i.939
Jennifer Sikos, Michael Roth, Sebastian Padó
A recent research direction in computational linguistics involves efforts to make the field, which used to focus primarily on English, more multilingual and inclusive. However, resource creation often remains a bottleneck for many languages, in particular at the semantic level. In this article, we consider the case of frame-semantic annotation. We investigate how to perform frame selection for annotation in a target language by taking advantage of existing annotations in different, supplementary languages, with the goal of reducing the required annotation effort in the target language. We measure success by training and testing frame identification models for the target language. We base our selection methods on measuring frame transferability in the supplementary language, where we estimate which frames will transfer poorly, and therefore should receive more annotation, in the target language. We apply our approach to English, German, and French – three languages which have annotations that are similar in size as well as frames with overlapping lexicographic definitions. We find that transferability is indeed a useful indicator and supports a setup where a limited amount of target language data is sufficient to train frame identification systems.
计算语言学最近的一个研究方向是努力使这个曾经主要关注英语的领域变得更加多语种和包容性。然而,资源创建通常仍然是许多语言的瓶颈,特别是在语义级别。在本文中,我们考虑框架语义注释的情况。我们研究了如何通过利用不同补充语言的现有注释来执行目标语言注释的框架选择,以减少目标语言中所需的注释工作。我们通过训练和测试目标语言的框架识别模型来衡量成功与否。我们的选择方法基于测量补充语言中的帧可迁移性,我们估计哪些帧在目标语言中迁移不好,因此应该得到更多的注释。我们将我们的方法应用于英语、德语和法语——这三种语言的注释大小相似,并且具有重叠词典定义的框架。我们发现可移植性确实是一个有用的指标,并支持在有限数量的目标语言数据足以训练帧识别系统的情况下设置。
{"title":"Improving Multilingual Frame Identification by Estimating Frame Transferability","authors":"Jennifer Sikos, Michael Roth, Sebastian Padó","doi":"10.33011/lilt.v19i.939","DOIUrl":"https://doi.org/10.33011/lilt.v19i.939","url":null,"abstract":"A recent research direction in computational linguistics involves efforts to make the field, which used to focus primarily on English, more multilingual and inclusive. However, resource creation often remains a bottleneck for many languages, in particular at the semantic level. In this article, we consider the case of frame-semantic annotation. We investigate how to perform frame selection for annotation in a target language by taking advantage of existing annotations in different, supplementary languages, with the goal of reducing the required annotation effort in the target language. We measure success by training and testing frame identification models for the target language. We base our selection methods on measuring frame transferability in the supplementary language, where we estimate which frames will transfer poorly, and therefore should receive more annotation, in the target language. We apply our approach to English, German, and French – three languages which have annotations that are similar in size as well as frames with overlapping lexicographic definitions. We find that transferability is indeed a useful indicator and supports a setup where a limited amount of target language data is sufficient to train frame identification systems.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126866187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parsed Corpus as a Source for Testing Generalizations in Japanese Syntax 解析语料库作为日语语法泛化测试的来源
Pub Date : 2019-08-13 DOI: 10.33011/lilt.v18i.1431
H. Kishimoto, Prashant Pardeshi
In this paper, we discuss constituent ordering generalizations in Japanese. Japanese has SOV as its basic order, but a significant range of argument order variations brought about by ‘scrambling’ is permitted. Although scrambling does not induce much in the way of semantic effects, it is conceivable that marked orders are derived from the unmarked order under some pragmatic or other motivations. The difference in the effect of basic and derived order is not reflected in native speaker’s grammaticality judgments, but we suggest that the intuition about the ordering of arguments may be attested in corpus data. By using the Keyaki treebank (a proper subset of which is NINJAL Parsed Corpus of Modern Japanese (NPCMJ)), it is shown that the naturallyoccurring corpus data confirm that marked orderings of arguments are less frequent than their unmarked ordering counterparts. We suggest some possible motivations lying behind the argument order variations.
本文讨论了日语中的成分排序推广。日语以SOV作为其基本顺序,但允许“乱置”带来的显着范围的参数顺序变化。虽然置乱不会产生太多的语义效应,但可以想象,在一些语用或其他动机下,标记顺序是从未标记顺序衍生出来的。基本顺序和派生顺序的影响差异并没有反映在母语者的语法判断中,但我们认为关于论点顺序的直觉可能在语料库数据中得到证实。通过使用Keyaki树库(其中一个适当的子集是NINJAL Parsed Corpus of Modern Japanese (NPCMJ)),结果表明,自然发生的语料库数据证实,参数的标记顺序比未标记的顺序更少出现。我们提出了一些可能的动机背后的论点顺序变化。
{"title":"Parsed Corpus as a Source for Testing Generalizations in Japanese Syntax","authors":"H. Kishimoto, Prashant Pardeshi","doi":"10.33011/lilt.v18i.1431","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1431","url":null,"abstract":"In this paper, we discuss constituent ordering generalizations in Japanese. Japanese has SOV as its basic order, but a significant range of argument order variations brought about by ‘scrambling’ is permitted. Although scrambling does not induce much in the way of semantic effects, it is conceivable that marked orders are derived from the unmarked order under some pragmatic or other motivations. The difference in the effect of basic and derived order is not reflected in native speaker’s grammaticality judgments, but we suggest that the intuition about the ordering of arguments may be attested in corpus data. By using the Keyaki treebank (a proper subset of which is NINJAL Parsed Corpus of Modern Japanese (NPCMJ)), it is shown that the naturallyoccurring corpus data confirm that marked orderings of arguments are less frequent than their unmarked ordering counterparts. We suggest some possible motivations lying behind the argument order variations.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115347637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing 利用已解析语料库:在研究、教学和处理中的应用
Pub Date : 2019-08-13 DOI: 10.33011/lilt.v18i.1427
Prashant Pardeshi, Alistair Butler, S. Horn, K. Yoshimoto, Iku Nagasaki
The articles in this special issue are based on presentations made at the international symposium entitled  Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing  held at the National Institute for Japanese Language and Linguistics (NINJAL) on Dec. 9-10, 2017 and organized by the collaborative research project at NINJAL entitled 'Development of and Linguistic Research with a Parsed Corpus of Japanese' with which all the guest editors are associated.
本期特刊的文章基于2017年12月9日至10日在国立日语语言研究所(NINJAL)举行的题为“开发解析语料库:在研究、教学和处理中的应用”的国际研讨会上的发言,该研讨会由NINJAL合作研究项目“日语解析语料库的开发和语言学研究”组织,所有客座编辑都参与其中。
{"title":"Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing","authors":"Prashant Pardeshi, Alistair Butler, S. Horn, K. Yoshimoto, Iku Nagasaki","doi":"10.33011/lilt.v18i.1427","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1427","url":null,"abstract":"The articles in this special issue are based on presentations made at the international symposium entitled  Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing  held at the National Institute for Japanese Language and Linguistics (NINJAL) on Dec. 9-10, 2017 and organized by the collaborative research project at NINJAL entitled 'Development of and Linguistic Research with a Parsed Corpus of Japanese' with which all the guest editors are associated.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting parsed corpora in grammar teaching 解析语料库在语法教学中的应用
Pub Date : 2019-08-13 DOI: 10.33011/lilt.v18i.1437
S. Wallis, I. Cushing, B. Aarts
The principal barrier to the uptake of technologies in schools is not technological, but social and political. Teachers must be convinced of the pedagogical benefits of a particular curriculum before they will agree to learn the means to teach it. The teaching of formal grammar to first language students in schools is no exception to this rule. Over the last three decades, most schools in England have been legally required to teach grammatical subject knowledge, i.e. linguistic knowledge of grammar terms and structure, to children age five and upwards as part of the national curriculum in English. A mandatory set of curriculum specifications for England and Wales was published in 2014, and elsewhere similar requirements were imposed. However, few current English school teachers were taught grammar themselves, and the dominant view has long been in favour of ‘real books’ rather than the teaching of a formal grammar. English grammar teaching thus faces multiple challenges: to convince teachers of the value of grammar in their own teaching, to teach the teachers the knowledge they need, and to develop relevant resources to use in the classroom. Alongside subject knowledge, teachers need pedagogical knowledge – how to teach grammar effectively and how to integrate this teaching into other kinds of language learning. The paper introduces the Englicious1 web platform for schools, and summarises its development and impact since publication. Englicious draws data from the fully-parsed British Component of the International Corpus of English, ICE-GB. The corpus offers plentiful examples of genuine natural language, speech and writing, with context and potentially audio playback. However, corpus examples may be ageinappropriate or over-complex, and without grammar training, teachers are insufficiently equipped to use them. In the absence of grammatical knowledge among teachers, it is insufficient simply to give teachers and children access to a corpus. Whereas so-called ‘classroom concordancing’ approaches offer access to tools and encourage bottom-up learning, Englicious approaches the question of grammar teaching in a concept-driven, top-down way. It contains a modular series of professional development resources, lessons and exercises focused on each concept in turn, in which corpus examples are used extensively. Teachers must be able to discuss with a class why, for instance, work is a noun in a particular sentence, rather than merely report that it is. The paper describes the development of Englicious from secondary to primary, and outlines some of the practical challenges facing the design of this type of teaching resource. A key question, the ‘selection problem’, concerns how tools parameterise the selection of relevant examples for teaching purposes. Finally we discuss curricula for teaching teachers and the evaluation of the effectiveness of the intervention.
学校采用技术的主要障碍不是技术,而是社会和政治。在教师同意学习教学方法之前,他们必须确信某一特定课程的教学效益。在学校里向母语学生教授正式语法也不例外。在过去的三十年里,英国的大多数学校都被法律要求向五岁及以上的儿童教授语法主题知识,即语法术语和结构的语言知识,作为国家英语课程的一部分。2014年,英格兰和威尔士发布了一套强制性的课程规范,其他地方也实施了类似的要求。然而,现在很少有英语学校的老师是自己教语法的,而且主流的观点一直是支持“真正的书”而不是正式的语法教学。因此,英语语法教学面临着多重挑战:让教师相信语法在自己教学中的价值,向教师传授他们所需要的知识,并开发相关资源用于课堂教学。除了学科知识,教师还需要教学知识——如何有效地教授语法,以及如何将这种教学融入其他语言学习。本文介绍了englishous1学校网络平台,并对其发布以来的发展和影响进行了总结。englishicious从国际英语语料库(ICE-GB)的完全解析的英国部分提取数据。语料库提供了大量真实的自然语言、演讲和写作的例子,并带有上下文和潜在的音频回放。然而,语料库的例子可能是年龄不合适或过于复杂的,没有语法培训,教师没有足够的装备来使用它们。在教师缺乏语法知识的情况下,仅仅给教师和儿童提供语料库是不够的。所谓的“课堂协调”方法提供了工具,鼓励自下而上的学习,而english则以概念驱动、自上而下的方式解决语法教学问题。它包含一系列模块化的专业发展资源、课程和练习,重点放在每个概念上,其中广泛使用语料库示例。教师必须能够与学生讨论,例如,为什么work在一个特定的句子中是一个名词,而不仅仅是报告它是。本文描述了英语从中学到小学的发展,并概述了设计这类教学资源所面临的一些实际挑战。一个关键问题,“选择问题”,是关于工具如何参数化相关例子的选择,以达到教学目的。最后讨论了教师教学的课程设置和干预效果的评价。
{"title":"Exploiting parsed corpora in grammar teaching","authors":"S. Wallis, I. Cushing, B. Aarts","doi":"10.33011/lilt.v18i.1437","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1437","url":null,"abstract":"The principal barrier to the uptake of technologies in schools is not technological, but social and political. Teachers must be convinced of the pedagogical benefits of a particular curriculum before they will agree to learn the means to teach it. The teaching of formal grammar to first language students in schools is no exception to this rule. Over the last three decades, most schools in England have been legally required to teach grammatical subject knowledge, i.e. linguistic knowledge of grammar terms and structure, to children age five and upwards as part of the national curriculum in English. A mandatory set of curriculum specifications for England and Wales was published in 2014, and elsewhere similar requirements were imposed. However, few current English school teachers were taught grammar themselves, and the dominant view has long been in favour of ‘real books’ rather than the teaching of a formal grammar. English grammar teaching thus faces multiple challenges: to convince teachers of the value of grammar in their own teaching, to teach the teachers the knowledge they need, and to develop relevant resources to use in the classroom. Alongside subject knowledge, teachers need pedagogical knowledge – how to teach grammar effectively and how to integrate this teaching into other kinds of language learning. The paper introduces the Englicious1 web platform for schools, and summarises its development and impact since publication. Englicious draws data from the fully-parsed British Component of the International Corpus of English, ICE-GB. The corpus offers plentiful examples of genuine natural language, speech and writing, with context and potentially audio playback. However, corpus examples may be ageinappropriate or over-complex, and without grammar training, teachers are insufficiently equipped to use them. In the absence of grammatical knowledge among teachers, it is insufficient simply to give teachers and children access to a corpus. Whereas so-called ‘classroom concordancing’ approaches offer access to tools and encourage bottom-up learning, Englicious approaches the question of grammar teaching in a concept-driven, top-down way. It contains a modular series of professional development resources, lessons and exercises focused on each concept in turn, in which corpus examples are used extensively. Teachers must be able to discuss with a class why, for instance, work is a noun in a particular sentence, rather than merely report that it is. The paper describes the development of Englicious from secondary to primary, and outlines some of the practical challenges facing the design of this type of teaching resource. A key question, the ‘selection problem’, concerns how tools parameterise the selection of relevant examples for teaching purposes. Finally we discuss curricula for teaching teachers and the evaluation of the effectiveness of the intervention.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130025790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building a Chinese AMR Bank with Concept and Relation Alignments 从概念和关系的角度构建中国的AMR银行
Pub Date : 2019-08-13 DOI: 10.33011/lilt.v18i.1429
Bin Li, Y. Wen, Li Song, Weiguang Qu, Nianwen Xue
Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.
抽象意义表示(AMR)是一种意义表示框架,它将完整句子的意义表示为单根、无环、有向图。在本文中,我们描述了一个正在进行的项目,以建立一个中文AMR (CAMR)语料库,该语料库目前包括来自中文树库(CTB)新闻组和博客部分的10,149个句子。我们描述了CAMR语料库的标注规范,该规范遵循英语AMR的标注原则,但在需要的地方进行了调整,以适应汉语的语言事实。CAMR规范还包括对句子内部语篇关系的系统处理。我们对AMR注释方法所做的一个重要更改是包含句子中的单词标记与CAMR注释中的概念/关系之间的对齐,从而使自动解析器更容易对句子及其意义表示之间的对应关系进行建模。我们开发了CAMR标注工具,两个标注器的Smatch评分的一致性为0.83,表明标注是可靠的。我们还对CAMR语料库进行了定量分析。46.71%的句子amr是非树状图。此外,88.95%的句子的AMR包含从句子上下文推断出来的概念,但不对应于特定的单词。
{"title":"Building a Chinese AMR Bank with Concept and Relation Alignments","authors":"Bin Li, Y. Wen, Li Song, Weiguang Qu, Nianwen Xue","doi":"10.33011/lilt.v18i.1429","DOIUrl":"https://doi.org/10.33011/lilt.v18i.1429","url":null,"abstract":"Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Complex predicates: Structure, potential structure and underspecification 复杂谓词:结构、潜在结构和不充分说明
Pub Date : 2019-03-11 DOI: 10.33011/lilt.v17i.1423
Stephan Müller
This paper compares a recent TAG-based analysis of complex predicates in Hindi/Urdu with its HPSG analog. It points out that TAG combines actual structure while HPSG (and Categorial Grammar and other valence-based frameworks) specify valence of lexical items and hence potential structure. This makes it possible to have light verbs decide which arguments of embedded heads get realized, somthing that is not possible in TAG. TAG has to retreat to disjunctions instead. While this allows straight-forward analyses of active/passive alternations based on the light verb in valence-based frameworks, such an option does not exist for TAG and it has to be assumed that preverbs come with different sets of arguments.
本文比较了最近基于标记的印地语/乌尔都语复杂谓词分析及其HPSG模拟。指出TAG结合了实际结构,而HPSG(以及范畴语法和其他基于价值的框架)规定了词汇项目的价,从而确定了潜在结构。这使得轻动词可以决定嵌入头部的哪些参数被实现,这在TAG中是不可能的。TAG不得不退回到断裂。虽然这允许基于基于值的框架中轻动词的主动/被动变化的直接分析,但是对于TAG来说不存在这样的选项,并且必须假设谓词带有不同的参数集。
{"title":"Complex predicates: Structure, potential structure and underspecification","authors":"Stephan Müller","doi":"10.33011/lilt.v17i.1423","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1423","url":null,"abstract":"This paper compares a recent TAG-based analysis of complex predicates in Hindi/Urdu with its HPSG analog. It points out that TAG combines actual structure while HPSG (and Categorial Grammar and other valence-based frameworks) specify valence of lexical items and hence potential structure. This makes it possible to have light verbs decide which arguments of embedded heads get realized, somthing that is not possible in TAG. TAG has to retreat to disjunctions instead. While this allows straight-forward analyses of active/passive alternations based on the light verb in valence-based frameworks, such an option does not exist for TAG and it has to be assumed that preverbs come with different sets of arguments.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133679388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Syntactic composition and selectional preferences in Hindi Light Verb Constructions 印地语轻动词结构的句法构成和选择偏好
Pub Date : 2019-03-11 DOI: 10.33011/lilt.v17i.1419
Ashwini Vaidya, Owen Rambow, Martha Palmer
Previous work on light verb constructions (e.g. chorii kar ‘theft do; steal’) in Hindi describes their syntactic formation via co-predication (Ahmed et al., 2012, Butt, 2014). This implies that both noun and light verb contribute their arguments, and these overlapping argument structures must be composed in the syntax. In this paper, we present a co-predication analysis using Tree-Adjoining Grammar, which models syntactic composition and semantic selectional preferences without transformations (deletion or argument identification). The analysis has two key components (i) an underspecified category for the nominal and (ii) combinatorial constraints on the noun and light verb to specify selectional preferences. The former has the advantage of syntactic composition without argument identification and the latter prevents over-generalization, while recognizing the semantic contribution of both predicates. This work additionally accounts for the agreement facts for the Hindi LVC.
以前对轻动词结构的研究(如chorii kar ' theft do;印地语中的“steal”)通过共同预测描述了它们的句法结构(Ahmed et al., 2012; Butt, 2014)。这意味着名词和轻动词都提供了它们的论点,这些重叠的论点结构必须在句法中组成。在本文中,我们提出了一种使用树相邻语法的共预测分析,它在没有转换(删除或参数识别)的情况下对句法组成和语义选择偏好进行建模。该分析有两个关键组成部分:(i)名称的未指定类别;(ii)名词和轻动词的组合约束,以指定选择偏好。前者的优点是没有参数识别的句法组合,后者防止过度泛化,同时识别两个谓词的语义贡献。这项工作还说明了印地语LVC的协议事实。
{"title":"Syntactic composition and selectional preferences in Hindi Light Verb Constructions","authors":"Ashwini Vaidya, Owen Rambow, Martha Palmer","doi":"10.33011/lilt.v17i.1419","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1419","url":null,"abstract":"Previous work on light verb constructions (e.g. chorii kar ‘theft do; steal’) in Hindi describes their syntactic formation via co-predication (Ahmed et al., 2012, Butt, 2014). This implies that both noun and light verb contribute their arguments, and these overlapping argument structures must be composed in the syntax. In this paper, we present a co-predication analysis using Tree-Adjoining Grammar, which models syntactic composition and semantic selectional preferences without transformations (deletion or argument identification). The analysis has two key components (i) an underspecified category for the nominal and (ii) combinatorial constraints on the noun and light verb to specify selectional preferences. The former has the advantage of syntactic composition without argument identification and the latter prevents over-generalization, while recognizing the semantic contribution of both predicates. This work additionally accounts for the agreement facts for the Hindi LVC.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Argument alternations in complex predicates: an LFG+glue perspective 复杂谓词中的参数变化:LFG+glue视角
Pub Date : 2019-02-11 DOI: 10.33011/lilt.v17i.1421
J. Lowe
Vaidya et al. (2019) discuss argument alternations in Hindi complex predicates, and propose an analysis within an LTAG framework, comparing this with an LFG analysis of complex predicates. In this paper I clarify the inadequacies in existing LFG analyses of complex predicates, and show how the LFG+glue approach proposed by Lowe (2015) can both address these inadequacies and provide a relatively simple treatment of the phenomena discussed by Vaidya et al. (2019).
Vaidya等人(2019)讨论了印地语复杂谓词中的参数变化,并提出了在LTAG框架内的分析,将其与复杂谓词的LFG分析进行比较。在本文中,我澄清了现有复杂谓词LFG分析的不足之处,并展示了Lowe(2015)提出的LFG+glue方法如何既能解决这些不足之处,又能对Vaidya等人(2019)讨论的现象提供相对简单的处理。
{"title":"Argument alternations in complex predicates: an LFG+glue perspective","authors":"J. Lowe","doi":"10.33011/lilt.v17i.1421","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1421","url":null,"abstract":"Vaidya et al. (2019) discuss argument alternations in Hindi complex predicates, and propose an analysis within an LTAG framework, comparing this with an LFG analysis of complex predicates. In this paper I clarify the inadequacies in existing LFG analyses of complex predicates, and show how the LFG+glue approach proposed by Lowe (2015) can both address these inadequacies and provide a relatively simple treatment of the phenomena discussed by Vaidya et al. (2019).","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125621471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Complex Predicates and Multidimensionality in Grammar 语法中的复杂谓词与多维性
Pub Date : 2019-02-11 DOI: 10.33011/lilt.v17i.1425
M. Butt
This paper contributes to the on-going discussion of how best to analyze and handle complex predicate formations, commenting in particular on the properties of Hindi N-V complex predicates as set out by Vaidya et al. (2019). I highlight features of existing LFG analyses and focus in particular on the modular architecture of LFG, its attendant multidimensional lexicon and the analytic consequences which follow from this. I point out where the previously existing LFG proposals have been misunderstood as viewed from the lens of theories such as LTAG and HPSG, which assume a very different architectural set-up and provide a comparative discussion of the issues.
本文对如何最好地分析和处理复杂谓词构成的持续讨论做出了贡献,特别是对Vaidya等人(2019)提出的印地语N-V复杂谓词的特性进行了评论。我强调了现有LFG分析的特点,并特别关注LFG的模块化架构、随之而来的多维词汇以及由此产生的分析结果。我指出,从LTAG和HPSG等理论的角度来看,先前存在的LFG提案被误解了,这些理论假设了非常不同的架构设置,并提供了对问题的比较讨论。
{"title":"Complex Predicates and Multidimensionality in Grammar","authors":"M. Butt","doi":"10.33011/lilt.v17i.1425","DOIUrl":"https://doi.org/10.33011/lilt.v17i.1425","url":null,"abstract":"This paper contributes to the on-going discussion of how best to analyze and handle complex predicate formations, commenting in particular on the properties of Hindi N-V complex predicates as set out by Vaidya et al. (2019). I highlight features of existing LFG analyses and focus in particular on the modular architecture of LFG, its attendant multidimensional lexicon and the analytic consequences which follow from this. I point out where the previously existing LFG proposals have been misunderstood as viewed from the lens of theories such as LTAG and HPSG, which assume a very different architectural set-up and provide a comparative discussion of the issues.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129257226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Recurrent Neural Networks Learn Nested Recursion? 递归神经网络能学习嵌套递归吗?
Pub Date : 2018-07-01 DOI: 10.33011/lilt.v16i.1417
Jean-Philippe Bernardy
Context-free grammars (CFG) were one of the first formal tools used to model natural languages, and they remain relevant today as the basis of several frameworks. A key ingredient of CFG is the presence of nested recursion. In this paper, we investigate experimentally the capability of several recurrent neural networks (RNNs) to learn nested recursion. More precisely, we measure an upper bound of their capability to do so, by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. To do so, we present the RNNs with a set of random strings having a given maximum nesting depth and test its ability to predict the kind of closing parenthesis when facing deeper nested strings. We report mixed results: when generalizing to deeper nesting levels, the accuracy of standard RNNs is significantly higher than random, but still far from perfect. Additionally, we propose some non-standard stack-based models which can approach perfect accuracy, at the cost of robustness.
上下文无关语法(CFG)是最早用于对自然语言进行建模的正式工具之一,今天它们仍然是几个框架的基础。CFG的一个关键成分是嵌套递归的存在。在本文中,我们实验研究了几种递归神经网络(rnn)学习嵌套递归的能力。更准确地说,我们通过将任务简化为学习一种广义的Dyck语言,即由各种匹配的括号组成的语言,来衡量它们这样做的能力的上限。为此,我们向rnn提供一组具有给定最大嵌套深度的随机字符串,并测试其在面对更深嵌套字符串时预测闭括号类型的能力。我们报告了不同的结果:当推广到更深的嵌套水平时,标准rnn的准确性明显高于随机,但仍远未达到完美。此外,我们提出了一些非标准的基于堆栈的模型,这些模型可以接近完美的精度,但代价是鲁棒性。
{"title":"Can Recurrent Neural Networks Learn Nested Recursion?","authors":"Jean-Philippe Bernardy","doi":"10.33011/lilt.v16i.1417","DOIUrl":"https://doi.org/10.33011/lilt.v16i.1417","url":null,"abstract":"Context-free grammars (CFG) were one of the first formal tools used to model natural languages, and they remain relevant today as the basis of several frameworks. A key ingredient of CFG is the presence of nested recursion. In this paper, we investigate experimentally the capability of several recurrent neural networks (RNNs) to learn nested recursion. More precisely, we measure an upper bound of their capability to do so, by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. To do so, we present the RNNs with a set of random strings having a given maximum nesting depth and test its ability to predict the kind of closing parenthesis when facing deeper nested strings. We report mixed results: when generalizing to deeper nesting levels, the accuracy of standard RNNs is significantly higher than random, but still far from perfect. Additionally, we propose some non-standard stack-based models which can approach perfect accuracy, at the cost of robustness.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133353195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
Linguistic Issues in Language Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1