Anomaly Detection in Lexical Definitions via One-Class Classification Techniques

Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi
{"title":"Anomaly Detection in Lexical Definitions via One-Class Classification Techniques","authors":"Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi","doi":"10.1109/iSAI-NLP54397.2021.9678166","DOIUrl":null,"url":null,"abstract":"It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于单类分类技术的词汇定义异常检测
构建词汇表及其定义需要花费很长时间,因为它们必须仅由构建词汇表会议的专家批准,而且定义也是非结构化的。为了节省时间,我们将单类支持向量机、隔离森林和局部离群因子三种分类技术应用到实验中,并观察了该方法如何通过准确率来提示单词定义状态。因此,当使用USE生成的向量时,局部离群因子获得了最高的精度。它们能较好地识别被批准类的边界,并且被批准的类有几个,离群值分散在其中。此外,我们还发现定义的检测状态与参考定义的检测状态既相同又相反。对于定义的写作模式,批准的定义总是按照逻辑顺序书写,并以广泛或一般的信息开始,然后是特定的细节、示例和对英语术语或示例的引用。对于被拒绝的定义,它们并不总是按照逻辑顺序编写,而且它们的定义模式也多种多样——只有泰语翻译、带相关条目的泰语翻译、词性(POS)、泰语翻译、相关条目和后跟定义的英语术语引用等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients Image Processing for Classification of Rice Varieties with Deep Convolutional Neural Networks KaleCare: Smart Farm for Kale with Pests Detection System using Machine Learning The comparison of the proposed recommended system with actual data sylbreak4all: Regular Expressions for Syllable Breaking of Nine Major Ethnic Languages of Myanmar
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1