Bilingual Auto-Categorization Comparison of Two LSTM Text Classifiers

Johannes Lindén, Xutao Wang, Stefan Forsström, Tingting Zhang
{"title":"Bilingual Auto-Categorization Comparison of Two LSTM Text Classifiers","authors":"Johannes Lindén, Xutao Wang, Stefan Forsström, Tingting Zhang","doi":"10.1109/IIAI-AAI.2019.00127","DOIUrl":null,"url":null,"abstract":"Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.","PeriodicalId":136474,"journal":{"name":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2019.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
两种LSTM文本分类器的双语自动分类比较
像自动分类这样的多语言问题不是一件容易的事。可以为每种语言训练不同的模型,另一种进行自动分类的方法是用一种基本语言构建模型,并使用从其他语言到该基本语言的自动翻译。不同的语言对一种语言特定的语法和句法有偏见,因此会造成用其他语言表达的问题。从一种语言翻译成非言语语言可能会对分类结果产生积极的影响。例如,非言语语言可以是从文本中提取的知识图关系形式的纯信息。本文对汉语和瑞典语进行了比较。在每个数据集上开发并验证了两个分类模型。目的是建立一个自动分类模型,适用于非导入语言。其中一个模型是基于LSTM并针对瑞典语进行了优化的,另一个模型是针对汉语进行了优化的改进的双向LSTM卷积模型。改进算法在两种语言上进行了训练,并与LSTM算法进行了比较。Bidirectional-LSTM算法比LSTM算法的性能提高了约20%,这是非常显著的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Developing a Multifaceted Evaluation System of Students' Learning Outcomes in Medical School Cognitive Acceleration Program in Undergraduate School Linking Business Strategies and System Demands Using GQM+Strategies and Systems Modeling Language Bubbloid Algorithm: A Simple Method for Generating Bubble-like Line Drawings Shape Recovery of Polyp Using Blood Vessel Detection and Matching Estimation by U-Net
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1