Waqas Arshad, Muhammad Ali, Muhammad Mumtaz Ali, A. Javed, S. Hussain
{"title":"多类文本分类:模型比较与选择","authors":"Waqas Arshad, Muhammad Ali, Muhammad Mumtaz Ali, A. Javed, S. Hussain","doi":"10.1109/ICECCE52056.2021.9514108","DOIUrl":null,"url":null,"abstract":"The objective of text classification is to categorize documents into a specific number of predefined categories. We can easily imagine the issue of arranging documents, not by topic, but rather by and large assessment, e.g. deciding if the sentiment of a document is whether positive or negative. While working on a supervised machine learning problem with a defined dataset, there are many classifiers that can be used in text classification. Utilizing dataset of stack overflow questions, answers, and tags as information, we find that standard machine learning systems completely beat human-delivered baselines. These majorly include Naive Bayes Classifier for multinomial models, Linear Support Vector Machine, Logistic Regression, Word to vector (Word2vec) and Logistic Regression, Document to vector (Doc2vc) and logistic regression, Bag of Words (BOW) with Keras. Our paper is a detailed examination and comparison of accuracies among these algorithms.","PeriodicalId":302947,"journal":{"name":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multi-Class Text Classification: Model Comparison and Selection\",\"authors\":\"Waqas Arshad, Muhammad Ali, Muhammad Mumtaz Ali, A. Javed, S. Hussain\",\"doi\":\"10.1109/ICECCE52056.2021.9514108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of text classification is to categorize documents into a specific number of predefined categories. We can easily imagine the issue of arranging documents, not by topic, but rather by and large assessment, e.g. deciding if the sentiment of a document is whether positive or negative. While working on a supervised machine learning problem with a defined dataset, there are many classifiers that can be used in text classification. Utilizing dataset of stack overflow questions, answers, and tags as information, we find that standard machine learning systems completely beat human-delivered baselines. These majorly include Naive Bayes Classifier for multinomial models, Linear Support Vector Machine, Logistic Regression, Word to vector (Word2vec) and Logistic Regression, Document to vector (Doc2vc) and logistic regression, Bag of Words (BOW) with Keras. Our paper is a detailed examination and comparison of accuracies among these algorithms.\",\"PeriodicalId\":302947,\"journal\":{\"name\":\"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCE52056.2021.9514108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCE52056.2021.9514108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Class Text Classification: Model Comparison and Selection
The objective of text classification is to categorize documents into a specific number of predefined categories. We can easily imagine the issue of arranging documents, not by topic, but rather by and large assessment, e.g. deciding if the sentiment of a document is whether positive or negative. While working on a supervised machine learning problem with a defined dataset, there are many classifiers that can be used in text classification. Utilizing dataset of stack overflow questions, answers, and tags as information, we find that standard machine learning systems completely beat human-delivered baselines. These majorly include Naive Bayes Classifier for multinomial models, Linear Support Vector Machine, Logistic Regression, Word to vector (Word2vec) and Logistic Regression, Document to vector (Doc2vc) and logistic regression, Bag of Words (BOW) with Keras. Our paper is a detailed examination and comparison of accuracies among these algorithms.