{"title":"基于嵌入语言模型(ELMos)的印尼语依赖分析器","authors":"","doi":"10.15849/ijasca.211128.01","DOIUrl":null,"url":null,"abstract":"The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec","PeriodicalId":38638,"journal":{"name":"International Journal of Advances in Soft Computing and its Applications","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Embedding from Language Models (ELMos)- based Dependency Parser for Indonesian Language\",\"authors\":\"\",\"doi\":\"10.15849/ijasca.211128.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec\",\"PeriodicalId\":38638,\"journal\":{\"name\":\"International Journal of Advances in Soft Computing and its Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advances in Soft Computing and its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15849/ijasca.211128.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Soft Computing and its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15849/ijasca.211128.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Embedding from Language Models (ELMos)- based Dependency Parser for Indonesian Language
The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec
期刊介绍:
The aim of this journal is to provide a lively forum for the communication of original research papers and timely review articles on Advances in Soft Computing and Its Applications. IJASCA will publish only articles of the highest quality. Submissions will be evaluated on their originality and significance. IJASCA invites submissions in all areas of Soft Computing and Its Applications. The scope of the journal includes, but is not limited to: √ Soft Computing Fundamental and Optimization √ Soft Computing for Big Data Era √ GPU Computing for Machine Learning √ Soft Computing Modeling for Perception and Spiritual Intelligence √ Soft Computing and Agents Technology √ Soft Computing in Computer Graphics √ Soft Computing and Pattern Recognition √ Soft Computing in Biomimetic Pattern Recognition √ Data mining for Social Network Data √ Spatial Data Mining & Information Retrieval √ Intelligent Software Agent Systems and Architectures √ Advanced Soft Computing and Multi-Objective Evolutionary Computation √ Perception-Based Intelligent Decision Systems √ Spiritual-Based Intelligent Systems √ Soft Computing in Industry ApplicationsOther issues related to the Advances of Soft Computing in various applications.