Anomaly Detection in Lexical Definitions via One-Class Classification Techniques

2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) Pub Date : 2021-12-21 DOI:10.1109/iSAI-NLP54397.2021.9678166

Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi

{"title":"Anomaly Detection in Lexical Definitions via One-Class Classification Techniques","authors":"Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi","doi":"10.1109/iSAI-NLP54397.2021.9678166","DOIUrl":null,"url":null,"abstract":"It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于单类分类技术的词汇定义异常检测

构建词汇表及其定义需要花费很长时间，因为它们必须仅由构建词汇表会议的专家批准，而且定义也是非结构化的。为了节省时间，我们将单类支持向量机、隔离森林和局部离群因子三种分类技术应用到实验中，并观察了该方法如何通过准确率来提示单词定义状态。因此，当使用USE生成的向量时，局部离群因子获得了最高的精度。它们能较好地识别被批准类的边界，并且被批准的类有几个，离群值分散在其中。此外，我们还发现定义的检测状态与参考定义的检测状态既相同又相反。对于定义的写作模式，批准的定义总是按照逻辑顺序书写，并以广泛或一般的信息开始，然后是特定的细节、示例和对英语术语或示例的引用。对于被拒绝的定义，它们并不总是按照逻辑顺序编写，而且它们的定义模式也多种多样——只有泰语翻译、带相关条目的泰语翻译、词性(POS)、泰语翻译、相关条目和后跟定义的英语术语引用等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)

自引率

0.00%

发文量