{"title":"MT词典选择多词单位的词典学标准","authors":"Jack Halpern","doi":"10.26615/978-2-9701095-6-3_022","DOIUrl":null,"url":null,"abstract":". A basic assumption in bilingual lexicography and machine translation (MT) is that the linguistic units of one language correspond to those of another language. But even in close language pairs, such as Spanish and English, there are numerous exceptions, while in some language pairs, such as English and Japanese, cross-linguistic lexical anisomorphism is so great that it becomes literally impossible to map certain words and phrases across these languages. This is especially true of linguistic units that consists of multiple components, or multiword units (MWUs). The recognition and accurate translation of MWUs play a critical role in enhancing the quality of machine translation[9]. In spite of the recent advances in MT based on neural networks (NMT), MWUs still present major challenges to MT technology. This paper discusses the fundamental principles for identifying and selecting MWUs for inclusion in bilingual dictionaries, both for humans and for MT systems (MT lexicons). It attempts to define the various subtypes of MWU based on lexicographic principles derived from exten-sive experience in bilingual lexicography, especially the compilation of a large-scale full-form lexicon for Spanish-English MT. It also introduces some large-scale resources designed to significantly enhance the translation accuracy of multiword proper nouns.","PeriodicalId":259759,"journal":{"name":"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lexicographic Criteria for Selecting Multiword Units for MT Lexicons\",\"authors\":\"Jack Halpern\",\"doi\":\"10.26615/978-2-9701095-6-3_022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". A basic assumption in bilingual lexicography and machine translation (MT) is that the linguistic units of one language correspond to those of another language. But even in close language pairs, such as Spanish and English, there are numerous exceptions, while in some language pairs, such as English and Japanese, cross-linguistic lexical anisomorphism is so great that it becomes literally impossible to map certain words and phrases across these languages. This is especially true of linguistic units that consists of multiple components, or multiword units (MWUs). The recognition and accurate translation of MWUs play a critical role in enhancing the quality of machine translation[9]. In spite of the recent advances in MT based on neural networks (NMT), MWUs still present major challenges to MT technology. This paper discusses the fundamental principles for identifying and selecting MWUs for inclusion in bilingual dictionaries, both for humans and for MT systems (MT lexicons). It attempts to define the various subtypes of MWU based on lexicographic principles derived from exten-sive experience in bilingual lexicography, especially the compilation of a large-scale full-form lexicon for Spanish-English MT. It also introduces some large-scale resources designed to significantly enhance the translation accuracy of multiword proper nouns.\",\"PeriodicalId\":259759,\"journal\":{\"name\":\"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26615/978-2-9701095-6-3_022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-2-9701095-6-3_022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Lexicographic Criteria for Selecting Multiword Units for MT Lexicons
. A basic assumption in bilingual lexicography and machine translation (MT) is that the linguistic units of one language correspond to those of another language. But even in close language pairs, such as Spanish and English, there are numerous exceptions, while in some language pairs, such as English and Japanese, cross-linguistic lexical anisomorphism is so great that it becomes literally impossible to map certain words and phrases across these languages. This is especially true of linguistic units that consists of multiple components, or multiword units (MWUs). The recognition and accurate translation of MWUs play a critical role in enhancing the quality of machine translation[9]. In spite of the recent advances in MT based on neural networks (NMT), MWUs still present major challenges to MT technology. This paper discusses the fundamental principles for identifying and selecting MWUs for inclusion in bilingual dictionaries, both for humans and for MT systems (MT lexicons). It attempts to define the various subtypes of MWU based on lexicographic principles derived from exten-sive experience in bilingual lexicography, especially the compilation of a large-scale full-form lexicon for Spanish-English MT. It also introduces some large-scale resources designed to significantly enhance the translation accuracy of multiword proper nouns.