一种利用BERT和自定义naïve贝叶斯分类器进行多类分层备注分类的创新方法

International journal of engineering science and technology Pub Date : 2022-05-30 DOI:10.4314/ijest.v13i4.4

M. Dhina, S. Sumathi

{"title":"一种利用BERT和自定义naïve贝叶斯分类器进行多类分层备注分类的创新方法","authors":"M. Dhina, S. Sumathi","doi":"10.4314/ijest.v13i4.4","DOIUrl":null,"url":null,"abstract":"Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.","PeriodicalId":14145,"journal":{"name":"International journal of engineering science and technology","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An innovative approach to classify hierarchical remarks with multi-class using BERT and customized naïve bayes classifier\",\"authors\":\"M. Dhina, S. Sumathi\",\"doi\":\"10.4314/ijest.v13i4.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.\",\"PeriodicalId\":14145,\"journal\":{\"name\":\"International journal of engineering science and technology\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of engineering science and technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4314/ijest.v13i4.4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of engineering science and technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4314/ijest.v13i4.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本分类是将文本分组为不同类别的过程。文本分类器可以自动评估文本输入并根据其内容或使用自然语言处理(NLP)的预训练模型分配一组预定义标签或类别，这实际上是机器学习(ML)的一个子集。文本分类的概念在企业中变得越来越重要，因为它可以帮助公司从事实中获得想法并使公司操作自动化，从而降低体力劳动和费用。语言检测器(确定给定文档的语言的技术)、情感分析(确定文本对给定主题是有利还是不利的过程)、主题检测(确定一组文本的主题或主题)等是文本分类在工业中的常见应用。数据集的本质是多类、多层次的，即层次结构是多层次的，每一层次结构本质上是多类的。机器学习最成功的范例之一是监督学习，人们可以从中构建泛化模型。因此，构建了一个定制模型，以便该模型适合问题。深度学习(DL)是人工智能(AI)的一部分，其功能是复制人类大脑的数据处理能力，以识别文本或人工制品、翻译语言、检测声音、得出结论等。BERT (Bidirectional Encoder Representations from Transformers)是一种深度学习算法，在NLP文本分类中发挥了非凡的作用，具有很高的准确率。因此，BERT与开发的自定义模型相结合，并与本机算法进行比较，以确保准确率的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An innovative approach to classify hierarchical remarks with multi-class using BERT and customized naïve bayes classifier

Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of engineering science and technology

自引率

0.00%

发文量