This article explores the lexicographic codification of botanical knowledge in two general dictionaries: Diccionario de la lengua española (2014 [2021]), produced by the Real Academia Española in Spain, and Diccionario del español de México (2019), published by El Colegio de México. It begins with a historical overview of the inclusion of botanical terminology in general dictionaries by the Real Academia Española and by other authors. Then, it analyses the degree to which the botanical terms included in each dictionary meet the inclusion criteria established. Finally, it focuses on diatechnical labelling and the differences between the two dictionaries in terms of the disciplines to which terms are assigned. This allows us to draw conclusions regarding the representation of this field of knowledge in the two dictionaries and the lexicographic techniques used to produce them.
本文探讨了两本通用词典中植物学知识的词典编纂:西班牙西班牙皇家科学院出版的《西班牙植物词典》(Diccionario de la lengua española,2014[2021])和墨西哥El Colegio de México出版的《墨西哥植物词典》。它从西班牙皇家科学院和其他作者将植物学术语纳入通用词典的历史概述开始。然后,分析了每本词典中收录的植物学术语符合既定收录标准的程度。最后,重点讨论了中介技术标签以及两本词典在术语分配学科方面的差异。这使我们能够得出关于这一知识领域在两本词典中的表现以及用于产生它们的词典编纂技术的结论。
{"title":"Two Ways of Representing Specialist Knowledge: Analysing the Botanical Lexicon in Diccionario de la Lengua Española and Diccionario del Español de México","authors":"Jesús Camacho-Niño","doi":"10.1093/ijl/ecad014","DOIUrl":"https://doi.org/10.1093/ijl/ecad014","url":null,"abstract":"\u0000 This article explores the lexicographic codification of botanical knowledge in two general dictionaries: Diccionario de la lengua española (2014 [2021]), produced by the Real Academia Española in Spain, and Diccionario del español de México (2019), published by El Colegio de México. It begins with a historical overview of the inclusion of botanical terminology in general dictionaries by the Real Academia Española and by other authors. Then, it analyses the degree to which the botanical terms included in each dictionary meet the inclusion criteria established. Finally, it focuses on diatechnical labelling and the differences between the two dictionaries in terms of the disciplines to which terms are assigned. This allows us to draw conclusions regarding the representation of this field of knowledge in the two dictionaries and the lexicographic techniques used to produce them.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48898426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The global era has led to fairly rapid changes in language. Many words have become obsolete. There are also many words whose meanings have become irrelevant nowadays. Unfortunately, in Indonesian dictionaries, especially in the Kamus Besar Bahasa Indonesia (KBBI; Comprehensive dictionary of Indonesian), there is no label for obsolete words. There is only an “archaic” label to mark all outdated words and a “classic” label to mark classical words. Another labeling problem in the KBBI is that there are no clear guidelines or criteria to determine when a word is considered archaic, obsolete, or classic. The absence of clear criteria causes some entries that have been labeled “archaic” in the KBBI to seem obsolete, and sometimes words labeled as classic get confused with archaic words. The aim of this article is to investigate ways of categorizing archaic, obsolete, and classic words in the KBBI. This research was conducted by comparing several forms and entry criteria labeled “archaic,” “obsolete,” and “classic” in several dictionaries, in particular dictionaries of foreign languages whose lexicographic traditions are well established. Each dictionary has its own criteria for classifying a word as archaic, obsolete, or classic, and we can learn from them. The findings suggest that checking the corpus data set is the easiest way to categorize words according to their labels.
全球化时代导致了语言的迅速变化。许多词已经过时了。现在也有许多词的意思已经变得无关紧要了。不幸的是,在印尼语词典中,特别是在印尼语Kamus Besar Bahasa Indonesia (KBBI;综合印尼语词典),没有废词的标签。只有一个“古老”的标签来标记所有过时的单词,一个“经典”的标签来标记经典的单词。KBBI的另一个标签问题是,没有明确的指导方针或标准来确定一个词何时被认为是古老的、过时的或经典的。由于缺乏明确的标准,一些在KBBI中被标记为“archaic”的条目似乎已经过时,有时被标记为“classic”的单词会与古单词混淆。本文的目的是研究KBBI中古词、废词和经典词的分类方法。这项研究是通过比较几本词典中标记为“古老的”、“过时的”和“经典的”的几种形式和词条标准来进行的,特别是那些词典编纂传统已经确立的外语词典。每本词典都有自己的标准来区分一个词是古老的、过时的还是经典的,我们可以从中学习。研究结果表明,检查语料库数据集是根据标签对单词进行分类的最简单方法。
{"title":"Categorizing obsolete, archaic, and classic words in an Indonesian dictionary","authors":"Dewi Puspita, K. Yusuf","doi":"10.1558/lexi.24757","DOIUrl":"https://doi.org/10.1558/lexi.24757","url":null,"abstract":"The global era has led to fairly rapid changes in language. Many words have become obsolete. There are also many words whose meanings have become irrelevant nowadays. Unfortunately, in Indonesian dictionaries, especially in the Kamus Besar Bahasa Indonesia (KBBI; Comprehensive dictionary of Indonesian), there is no label for obsolete words. There is only an “archaic” label to mark all outdated words and a “classic” label to mark classical words. Another labeling problem in the KBBI is that there are no clear guidelines or criteria to determine when a word is considered archaic, obsolete, or classic. The absence of clear criteria causes some entries that have been labeled “archaic” in the KBBI to seem obsolete, and sometimes words labeled as classic get confused with archaic words. The aim of this article is to investigate ways of categorizing archaic, obsolete, and classic words in the KBBI. This research was conducted by comparing several forms and entry criteria labeled “archaic,” “obsolete,” and “classic” in several dictionaries, in particular dictionaries of foreign languages whose lexicographic traditions are well established. Each dictionary has its own criteria for classifying a word as archaic, obsolete, or classic, and we can learn from them. The findings suggest that checking the corpus data set is the easiest way to categorize words according to their labels.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"16 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75430982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The verb is one of the most perplexing features for learners of BIPA (Bahasa Indonesia bagi Penutur Asing “Indonesian for foreign speakers”). Indonesian verbs are particularly rich in affixes that correspond to their numerous senses. The learners find it challenging to use verbs with the appropriate affixes in a sentence structure. To function effectively as a learning tool, dictionaries must provide morphological and grammatical information. Frame semantics theory is used in this research to determine a verb’s meaning based on its semantic context and frame. The verb is described by identifying the grammatical constructions in which it participates, and by characterizing all of the obligatory and optional types of companions. By doing this, we can obtain the verb valency pattern to add to a dictionary’s morphological and grammatical information. This study aims to create entry models of affixed verbs with the valency pattern. The six transitive verbs for discussion comprise a variety of affixes with various senses: mempersembahkan “to present/dedicate”; membersihkan “to clean”; mencintai “to love”; memperlancar “to expedite”; memperbaiki “to fix”; and memberlakukan “to apply”. These entries serve as models for the BIPA learner’s dictionary.
{"title":"use of verb valency patterns in the Indonesian monolingual learner’s dictionary","authors":"Dora Amalía","doi":"10.1558/lexi.24995","DOIUrl":"https://doi.org/10.1558/lexi.24995","url":null,"abstract":"The verb is one of the most perplexing features for learners of BIPA (Bahasa Indonesia bagi Penutur Asing “Indonesian for foreign speakers”). Indonesian verbs are particularly rich in affixes that correspond to their numerous senses. The learners find it challenging to use verbs with the appropriate affixes in a sentence structure. To function effectively as a learning tool, dictionaries must provide morphological and grammatical information. Frame semantics theory is used in this research to determine a verb’s meaning based on its semantic context and frame. The verb is described by identifying the grammatical constructions in which it participates, and by characterizing all of the obligatory and optional types of companions. By doing this, we can obtain the verb valency pattern to add to a dictionary’s morphological and grammatical information. This study aims to create entry models of affixed verbs with the valency pattern. The six transitive verbs for discussion comprise a variety of affixes with various senses: mempersembahkan “to present/dedicate”; membersihkan “to clean”; mencintai “to love”; memperlancar “to expedite”; memperbaiki “to fix”; and memberlakukan “to apply”. These entries serve as models for the BIPA learner’s dictionary.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"48 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81701567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The National Agency for Language Development and Cultivation (henceforth, Badan Bahasa) has published many dictionaries as a government agency under the Ministry of Education and Culture of Indonesia. More than 100 dictionaries have been published since 1977. Some dictionaries have been revised by adding new entries and senses. In alignment with technological developments, Badan Bahasa has started an integration project that aims to provide an online application for their language products. In 2015, it started Program Pengayaan Kosakata (Word proposal application program), and this was followed by the launch of the online version of the Kamus Besar Bahasa Indonesia (KBBI; Comprehensive dictionary of Indonesian), Tesaurus Tematis Bahasa Indonesia (Thematic thesaurus of Indonesian), and Ensiklopedia Sastra Indonesia (Encyclopedia of Indonesian literature) in 2016. In 2020, Badan Bahasa started the development of Aplikasi Pangkalan Data Kamus, also called Aplikasi Kompilasi Kamus (AKK; Dictionary compilation application). This online application accommodates at least three kinds of dictionary – a local language dictionary, specialized dictionary, and bilingual dictionary – published by Badan Bahasa. The process has continued by developing a digitalization project targeting the digitalization of print versions of specialized dictionaries, Indonesian-local language dictionaries, and local language-Indonesian dictionaries. This article aims to discuss some challenges regarding the digitalization of dictionaries arising from the print versions, the dictionary structure, and the dictionary interface, and puts forward some solutions to deal with the issues. The research method uses qualitative methods for observing dictionary files and examining microstructure issues throughout the whole process. The results of this study are expected to support the digitalization process and dictionary development in Indonesia.
作为印尼教育和文化部下属的政府机构,国家语言发展和培养机构(以下简称“Badan Bahasa”)出版了许多字典。自1977年以来,已经出版了100多本词典。有些字典已经过修订,增加了新的词条和词义。随着科技的发展,巴丹语已经启动了一个整合项目,旨在为他们的语言产品提供在线应用。2015年,它启动了Word提案应用程序Program Pengayaan Kosakata (Word提案应用程序),随后推出了在线版本的印度尼西亚语Kamus Besar Bahasa (KBBI;印度尼西亚综合词典),Tesaurus Tematis Bahasa Indonesia(印度尼西亚主题辞典)和Ensiklopedia Sastra Indonesia(印度尼西亚文学百科全书)。2020年,巴丹语开始开发《马来语数据卡姆斯》,也称为《马来语数据卡姆斯》(AKK;字典编译应用程序)。这个在线应用程序至少可以容纳三种词典——本地语言词典、专业词典和双语词典——由巴丹语出版。这一进程还在继续,并发展了一个数字化项目,目标是将专业词典、印尼语-当地语言词典和印尼语-当地语言词典的印刷版数字化。本文从印刷版、词典结构、词典接口等方面探讨了词典数字化面临的挑战,并提出了解决问题的对策。研究方法采用定性方法对字典文件进行观察,对整个过程中的微观结构问题进行考察。这项研究的结果有望支持印尼的数字化进程和词典的发展。
{"title":"Digitalizing a local language dictionary","authors":"Winda Luthfita, Selly Rizki Yanita","doi":"10.1558/lexi.25076","DOIUrl":"https://doi.org/10.1558/lexi.25076","url":null,"abstract":"The National Agency for Language Development and Cultivation (henceforth, Badan Bahasa) has published many dictionaries as a government agency under the Ministry of Education and Culture of Indonesia. More than 100 dictionaries have been published since 1977. Some dictionaries have been revised by adding new entries and senses. In alignment with technological developments, Badan Bahasa has started an integration project that aims to provide an online application for their language products. In 2015, it started Program Pengayaan Kosakata (Word proposal application program), and this was followed by the launch of the online version of the Kamus Besar Bahasa Indonesia (KBBI; Comprehensive dictionary of Indonesian), Tesaurus Tematis Bahasa Indonesia (Thematic thesaurus of Indonesian), and Ensiklopedia Sastra Indonesia (Encyclopedia of Indonesian literature) in 2016. In 2020, Badan Bahasa started the development of Aplikasi Pangkalan Data Kamus, also called Aplikasi Kompilasi Kamus (AKK; Dictionary compilation application). This online application accommodates at least three kinds of dictionary – a local language dictionary, specialized dictionary, and bilingual dictionary – published by Badan Bahasa. The process has continued by developing a digitalization project targeting the digitalization of print versions of specialized dictionaries, Indonesian-local language dictionaries, and local language-Indonesian dictionaries. This article aims to discuss some challenges regarding the digitalization of dictionaries arising from the print versions, the dictionary structure, and the dictionary interface, and puts forward some solutions to deal with the issues. The research method uses qualitative methods for observing dictionary files and examining microstructure issues throughout the whole process. The results of this study are expected to support the digitalization process and dictionary development in Indonesia.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"7 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87690298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on lexical bundles in the last few decades has focused mostly on written registers, especially academic writing. In this study, I investigate the use of lexical bundles in a different genre – dictionaries. As a lexical bundle is formulaic language specific to a particular register, I hypothesize that particular lexical bundles are used in dictionaries. The research focus of this study is the extent to which lexical bundles are used in the online version of the Kamus Besar Bahasa Indonesia (KBBI Daring; Comprehensive dictionary of Indonesian online), especially in the lemma, definition, and example sections. In addition, the study examines the characteristics of lexical bundles in dictionaries. The approach used is corpus based. As reference bundles, a total of 517 lexical bundles were extracted from the IndonesianWeb Corpus (available from SketchEngine). The bundles were then analyzed for their use in KBBI Daring in terms of their frequency, structure, and function. The results showed that the use of lexical bundles in KBBI Daring was mostly found in the definition section. The bundles found were generally in the form of phrases rather than clauses. In terms of structure, the lexical bundles were dominated by incomplete structures. The bundles, either in the definition or example sections, were mostly in the form of yang-clause fragments, such as yang digunakan untuk “that is used for,” yang terdiri atas “that consists of,” yang terbuat dari “that is made of/from,” yang berasal dari “that comes from,” and yang berhubungan dengan “that relates to.” In terms of function, presenting content is the dominant function in the KBBI, while organizing text is the least prevalent function. In a nutshell, each section in the dictionary, especially in the KBBI, has its own character.
在过去的几十年里,对词汇束的研究主要集中在书面语域,尤其是学术写作上。在这项研究中,我研究了词汇束在不同类型词典中的使用。由于词汇包是特定于特定寄存器的公式化语言,因此我假设字典中使用了特定的词汇包。本研究的研究重点是词束在印尼语Kamus Besar Bahasa Indonesia (KBBI Daring;综合的印尼语在线词典),特别是在引理,定义,和例子部分。此外,本研究还探讨了词典中词汇束的特征。使用的方法是基于语料库的。作为参考包,从印尼web语料库(SketchEngine)中提取了总共517个词汇包。然后分析了这些束在KBBI dare中的使用频率、结构和功能。结果表明,KBBI dare中词汇束的使用主要集中在定义部分。所发现的捆包通常是短语而不是从句的形式。在结构上,词束以不完整结构为主。无论是在定义部分还是示例部分,这些词束大多以“yang-从句”片段的形式出现,例如“yang digunakan untuk”用于,“yang terdiri atas”由,“yang terbuat dari”由/来自,“yang berasal dari”来自,“yang berhubungan dengan”与…有关。就功能而言,呈现内容是KBBI的主要功能,而组织文本是最不常见的功能。简而言之,字典中的每个部分,尤其是KBBI,都有自己的特点。
{"title":"use of lexical bundles in an online comprehensive dictionary of Indonesian (KBBI Daring)","authors":"Adi Budiwiyanto","doi":"10.1558/lexi.25177","DOIUrl":"https://doi.org/10.1558/lexi.25177","url":null,"abstract":"Research on lexical bundles in the last few decades has focused mostly on written registers, especially academic writing. In this study, I investigate the use of lexical bundles in a different genre – dictionaries. As a lexical bundle is formulaic language specific to a particular register, I hypothesize that particular lexical bundles are used in dictionaries. The research focus of this study is the extent to which lexical bundles are used in the online version of the Kamus Besar Bahasa Indonesia (KBBI Daring; Comprehensive dictionary of Indonesian online), especially in the lemma, definition, and example sections. In addition, the study examines the characteristics of lexical bundles in dictionaries. The approach used is corpus based. As reference bundles, a total of 517 lexical bundles were extracted from the IndonesianWeb Corpus (available from SketchEngine). The bundles were then analyzed for their use in KBBI Daring in terms of their frequency, structure, and function. The results showed that the use of lexical bundles in KBBI Daring was mostly found in the definition section. The bundles found were generally in the form of phrases rather than clauses. In terms of structure, the lexical bundles were dominated by incomplete structures. The bundles, either in the definition or example sections, were mostly in the form of yang-clause fragments, such as yang digunakan untuk “that is used for,” yang terdiri atas “that consists of,” yang terbuat dari “that is made of/from,” yang berasal dari “that comes from,” and yang berhubungan dengan “that relates to.” In terms of function, presenting content is the dominant function in the KBBI, while organizing text is the least prevalent function. In a nutshell, each section in the dictionary, especially in the KBBI, has its own character.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"35 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75706202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recursion, and recursion-like design patterns, are used in the entry schemas of dictionaries to model subsenses and subentries. Recursion occurs when elements of a given type, such as sense, are allowed to contain elements of the same or similar type, such as sense or subsense. This article argues that recursion unnecessarily increases the computational complexity of entries, making dictionaries less easily processable by machines. The article will show how entry schemas can be simplified by re-engineering subsenses and subentries as relations (like in a relational database) such that we only have flat lists of senses and entries, while the is-subsense-of and is-subentry-of relations are encoded using pairs of unique identifiers. This design pattern losslessly records the same information as recursion (including – importantly – the listing order of items inside an entry) but decreases the complexity of the entry structure and makes dictionary entries more easily machine-processable.
{"title":"Avoiding Recursion in the Representation of Subsenses and Subentries in Dictionaries","authors":"M. Mechura","doi":"10.1093/ijl/ecad012","DOIUrl":"https://doi.org/10.1093/ijl/ecad012","url":null,"abstract":"\u0000 Recursion, and recursion-like design patterns, are used in the entry schemas of dictionaries to model subsenses and subentries. Recursion occurs when elements of a given type, such as sense, are allowed to contain elements of the same or similar type, such as sense or subsense. This article argues that recursion unnecessarily increases the computational complexity of entries, making dictionaries less easily processable by machines. The article will show how entry schemas can be simplified by re-engineering subsenses and subentries as relations (like in a relational database) such that we only have flat lists of senses and entries, while the is-subsense-of and is-subentry-of relations are encoded using pairs of unique identifiers. This design pattern losslessly records the same information as recursion (including – importantly – the listing order of items inside an entry) but decreases the complexity of the entry structure and makes dictionary entries more easily machine-processable.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43354780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Przybyszewski, Iwona Kosek, Monika Czerepowicka
This article presents Verbel: The Electronic Dictionary of Paradigms of Polish Verbal Multiword Expressions (MWEs) and discusses theoretical problems connected with compiling such a dictionary for inflectionally complex languages such as Polish. The dictionary includes over 5,000 Polish verbal MWEs and explicitly presents their forms and constraints in inflection. It also provides grammatical, semantic, pragmatic and prescriptive commentaries. The first part of the article covers the theoretical and methodological basis used in the compilation of the dictionary. Generally, a verbal MWE is inflected according to the paradigm of the verb which is its main component. However, MWEs may have some specific inflectional constraints connected with other factors (e.g. semantic, pragmatic), which result in different paradigms for verbal MWEs and for the verbs that are their main components. In the second part, the conception and content of the dictionary are discussed. Finally, the natural language processing tools that underlie the work on the dictionary are described.
{"title":"Not Only Meaning… Verbel: The Electronic Dictionary of Paradigms of Polish Verbal Multiword Expressions","authors":"Sebastian Przybyszewski, Iwona Kosek, Monika Czerepowicka","doi":"10.1093/ijl/ecad008","DOIUrl":"https://doi.org/10.1093/ijl/ecad008","url":null,"abstract":"\u0000 This article presents Verbel: The Electronic Dictionary of Paradigms of Polish Verbal Multiword Expressions (MWEs) and discusses theoretical problems connected with compiling such a dictionary for inflectionally complex languages such as Polish. The dictionary includes over 5,000 Polish verbal MWEs and explicitly presents their forms and constraints in inflection. It also provides grammatical, semantic, pragmatic and prescriptive commentaries. The first part of the article covers the theoretical and methodological basis used in the compilation of the dictionary. Generally, a verbal MWE is inflected according to the paradigm of the verb which is its main component. However, MWEs may have some specific inflectional constraints connected with other factors (e.g. semantic, pragmatic), which result in different paradigms for verbal MWEs and for the verbs that are their main components. In the second part, the conception and content of the dictionary are discussed. Finally, the natural language processing tools that underlie the work on the dictionary are described.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49103709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Which compounds should be included in general-purpose dictionaries is often an open question that is answered with a case-by-case consideration of all compounds above a certain corpus frequency threshold. Another way to determine which compounds should be listed, is to examine which compounds, or rather which compound properties, are in demand by the users. This study uses look-up data from the two officially sanctioned, general-purpose dictionaries of Norwegian (Bokmålsordboka and Nynorskordboka) to derive an explicit compound selection model that performs with comparable sensitivity and specificity as the traditional procedure. These findings demonstrate that it is indeed possible to arrive at a fully operational and explicit compound selection model that meets the needs of users. With such a tool at their disposal, lexicographers would be able to separate the wheat from the chaff in the boundless field that is the compound lexicon of North Germanic Languages.
{"title":"Wheat or Chaff? A Compound Selection Model Based on Look-Up Data","authors":"Mikkel Ekeland Paulsen","doi":"10.1093/ijl/ecad013","DOIUrl":"https://doi.org/10.1093/ijl/ecad013","url":null,"abstract":"Abstract Which compounds should be included in general-purpose dictionaries is often an open question that is answered with a case-by-case consideration of all compounds above a certain corpus frequency threshold. Another way to determine which compounds should be listed, is to examine which compounds, or rather which compound properties, are in demand by the users. This study uses look-up data from the two officially sanctioned, general-purpose dictionaries of Norwegian (Bokmålsordboka and Nynorskordboka) to derive an explicit compound selection model that performs with comparable sensitivity and specificity as the traditional procedure. These findings demonstrate that it is indeed possible to arrive at a fully operational and explicit compound selection model that meets the needs of users. With such a tool at their disposal, lexicographers would be able to separate the wheat from the chaff in the boundless field that is the compound lexicon of North Germanic Languages.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135693008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hausa Dictionary for Everyday Use: Hausa - English/ English - Hausa. Ƙamusun Hausa na yau da kullum: Hausa - Inglilishi/ Ingilishi - Hausa. Paul Newman and Roxana Ma Newman","authors":"C. Schmaling","doi":"10.1093/ijl/ecad010","DOIUrl":"https://doi.org/10.1093/ijl/ecad010","url":null,"abstract":"","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45152399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with several aspects of context in lexicography. Section 1 briefly mentions some different approaches to the concept context in various fields. Section 2 puts the focus on different uses and perceptions of the concept context in lexicography, contrasting it with related concepts, such as cotext, contextualization and contextual information. A more comprehensive discussion also covers different aspects of the occurrence of the concept context in dictionary research, with specific reference to central aspects of the so-called inner and outer context. Various portals, dictionaries and dictionary entries will illustrate the above-mentioned approaches. Section 3 approaches the subject from a user perspective. Section 4 addresses the question How can contextual data be extracted or generated? To answer this question, some methods and tools for (automatic) acquisition and analysis of contextual data, – in particular of the local contextual data in terms of Faber and León-Araúz (2016) – are introduced. Examples of these are lexical databases or semantic networks, like WordNet, and corpora, like Sketch Engine, or predictive methods, like Word2vec and similar ones. Some advantages and disadvantages of specific data acquisition tools used for the analysis of local contextual data are indicated. This section also contributes to a more detailed discussion of the automatic generation of the so-called local syntactic-semantic context or word environment, specifically of the building of syntactic-semantic argument patterns and their examples.
{"title":"The Definition, Presentation and Automatic Generation of Contextual Data in Lexicography","authors":"M. J. Domínguez, R. Gouws","doi":"10.1093/ijl/ecac020","DOIUrl":"https://doi.org/10.1093/ijl/ecac020","url":null,"abstract":"\u0000 This paper deals with several aspects of context in lexicography. Section 1 briefly mentions some different approaches to the concept context in various fields. Section 2 puts the focus on different uses and perceptions of the concept context in lexicography, contrasting it with related concepts, such as cotext, contextualization and contextual information. A more comprehensive discussion also covers different aspects of the occurrence of the concept context in dictionary research, with specific reference to central aspects of the so-called inner and outer context. Various portals, dictionaries and dictionary entries will illustrate the above-mentioned approaches. Section 3 approaches the subject from a user perspective. Section 4 addresses the question How can contextual data be extracted or generated? To answer this question, some methods and tools for (automatic) acquisition and analysis of contextual data, – in particular of the local contextual data in terms of Faber and León-Araúz (2016) – are introduced. Examples of these are lexical databases or semantic networks, like WordNet, and corpora, like Sketch Engine, or predictive methods, like Word2vec and similar ones. Some advantages and disadvantages of specific data acquisition tools used for the analysis of local contextual data are indicated. This section also contributes to a more detailed discussion of the automatic generation of the so-called local syntactic-semantic context or word environment, specifically of the building of syntactic-semantic argument patterns and their examples.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"1 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41887432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}