一种利用BERT和自定义naïve贝叶斯分类器进行多类分层备注分类的创新方法

M. Dhina, S. Sumathi
{"title":"一种利用BERT和自定义naïve贝叶斯分类器进行多类分层备注分类的创新方法","authors":"M. Dhina, S. Sumathi","doi":"10.4314/ijest.v13i4.4","DOIUrl":null,"url":null,"abstract":"Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.","PeriodicalId":14145,"journal":{"name":"International journal of engineering science and technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An innovative approach to classify hierarchical remarks with multi-class using BERT and customized naïve bayes classifier\",\"authors\":\"M. Dhina, S. Sumathi\",\"doi\":\"10.4314/ijest.v13i4.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.\",\"PeriodicalId\":14145,\"journal\":{\"name\":\"International journal of engineering science and technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of engineering science and technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4314/ijest.v13i4.4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of engineering science and technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4314/ijest.v13i4.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文本分类是将文本分组为不同类别的过程。文本分类器可以自动评估文本输入并根据其内容或使用自然语言处理(NLP)的预训练模型分配一组预定义标签或类别,这实际上是机器学习(ML)的一个子集。文本分类的概念在企业中变得越来越重要,因为它可以帮助公司从事实中获得想法并使公司操作自动化,从而降低体力劳动和费用。语言检测器(确定给定文档的语言的技术)、情感分析(确定文本对给定主题是有利还是不利的过程)、主题检测(确定一组文本的主题或主题)等是文本分类在工业中的常见应用。数据集的本质是多类、多层次的,即层次结构是多层次的,每一层次结构本质上是多类的。机器学习最成功的范例之一是监督学习,人们可以从中构建泛化模型。因此,构建了一个定制模型,以便该模型适合问题。深度学习(DL)是人工智能(AI)的一部分,其功能是复制人类大脑的数据处理能力,以识别文本或人工制品、翻译语言、检测声音、得出结论等。BERT (Bidirectional Encoder Representations from Transformers)是一种深度学习算法,在NLP文本分类中发挥了非凡的作用,具有很高的准确率。因此,BERT与开发的自定义模型相结合,并与本机算法进行比较,以确保准确率的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An innovative approach to classify hierarchical remarks with multi-class using BERT and customized naïve bayes classifier
Text classification is the process of grouping text into distinct categories. Text classifiers may automatically assess text input and allocate a set of pre-defined tags or categories depending on its content or a pre-trained model using Natural Language Processing (NLP), which actually is a subset of Machine Learning (ML). The notion of text categorization is becoming increasingly essential in enterprises since it helps firms to get ideas from facts and automate company operations, lowering manual labor and expenses. Linguistic Detectors (the technique of determining the language of a given document), Sentiment Analysis (the process of identifying whether a text is favorable or unfavorable about a given subject), Topic Detection (determining the theme or topic of a group of texts), and so on are common applications of text classification in industry. The nature of the dataset is Multi-class and multi-hierarchical, which means that the hierarchies are in multiple levels, each level of hierarchy is multiple class in nature. One of ML’s most successful paradigms is supervised learning from which one can build a generalization model. Hence, a custom model is built, so that the model fits with the problem. Deep learning (DL), part of Artificial Intelligence (AI) , does functions that replicate the human brain's data processing capabilities in order to identify text or artifacts, translate languages, detect voice, draw conclusions and so on. Bidirectional Encoder Representations from Transformers (BERT), a Deep Learning Algorithm performs an extra-ordinary task in NLP text classification and results in high accuracy. Therefore, BERT is combined with the Custom Model developed and compared with the native algorithm to ensure the increase in accuracy rates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Screening the Phytochemicals of the Medicinal Plants Constituting the Ayurvedic Formulation Arjunarishta as Antismoking Agents an Aid to Smoking Cessation Therapies IoT-Based Monitoring System for Turbidity and Mercury Concentration of Rivers in Ghana: Detecting Illegal Mining (Galamsey) Sites and Evaluating Environmental Impact Derek Parfit on Personal Identity: Relation-R and Moral Commitments Transmutation of Workplace Gender Diversity and Inclusion in Multinational Companies in India: Fostering Inclusion of Gender Nonconforming Employees Estimation of Reliability in a Consecutive linear/circular k-out-of-n system based on Weighted Exponential-Lindley distribution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1