用于文本分类的神经嵌入和混合ML模型

2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET) Pub Date : 2020-04-01 DOI:10.1109/IRASET48871.2020.9092230

Mariem Bounabi, K. E. Moutaouakil, K. Satori

{"title":"用于文本分类的神经嵌入和混合ML模型","authors":"Mariem Bounabi, K. E. Moutaouakil, K. Satori","doi":"10.1109/IRASET48871.2020.9092230","DOIUrl":null,"url":null,"abstract":"Representation of knowledge remains a problem for models of machine learning (ML). The Paragraph vector is one of the current methods for embedding the text, where many parameters govern the utility of representation. In this context, we are addressing the effect, on the text classification area, of Paragraph Vector-Distributed Memory (PV-DM) as variant of doc2vec. In comparison, we apply other classification systems focused on doc2vec forms, and a collection of classifiers with current practices in this article. Then, we incorporate hybrid ML methods to improve the quality of classification. The experiments, on benchmarking dataset, prove that the results obtained are excellent, with 99% accuracy in the system based on the PV-DM with average method, and majority voting as a classifier.","PeriodicalId":271840,"journal":{"name":"2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Neural Embedding & Hybrid ML Models for Text Classification\",\"authors\":\"Mariem Bounabi, K. E. Moutaouakil, K. Satori\",\"doi\":\"10.1109/IRASET48871.2020.9092230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Representation of knowledge remains a problem for models of machine learning (ML). The Paragraph vector is one of the current methods for embedding the text, where many parameters govern the utility of representation. In this context, we are addressing the effect, on the text classification area, of Paragraph Vector-Distributed Memory (PV-DM) as variant of doc2vec. In comparison, we apply other classification systems focused on doc2vec forms, and a collection of classifiers with current practices in this article. Then, we incorporate hybrid ML methods to improve the quality of classification. The experiments, on benchmarking dataset, prove that the results obtained are excellent, with 99% accuracy in the system based on the PV-DM with average method, and majority voting as a classifier.\",\"PeriodicalId\":271840,\"journal\":{\"name\":\"2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRASET48871.2020.9092230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRASET48871.2020.9092230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

知识的表示仍然是机器学习(ML)模型的一个问题。段落向量是当前嵌入文本的方法之一，其中许多参数控制了表示的效用。在这种情况下，我们正在解决段落向量分布式记忆(PV-DM)作为doc2vec变体在文本分类领域的影响。相比之下，我们在本文中应用了其他侧重于doc2vec表单的分类系统，以及一组具有当前实践的分类器。然后，我们结合混合机器学习方法来提高分类质量。在基准数据集上进行的实验证明，基于平均方法的PV-DM，以多数投票作为分类器，系统的准确率达到99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Neural Embedding & Hybrid ML Models for Text Classification

Representation of knowledge remains a problem for models of machine learning (ML). The Paragraph vector is one of the current methods for embedding the text, where many parameters govern the utility of representation. In this context, we are addressing the effect, on the text classification area, of Paragraph Vector-Distributed Memory (PV-DM) as variant of doc2vec. In comparison, we apply other classification systems focused on doc2vec forms, and a collection of classifiers with current practices in this article. Then, we incorporate hybrid ML methods to improve the quality of classification. The experiments, on benchmarking dataset, prove that the results obtained are excellent, with 99% accuracy in the system based on the PV-DM with average method, and majority voting as a classifier.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)

自引率

0.00%

发文量