{"title":"A Comparative Study of Parametric Versus Non-Parametric Text Classification Algorithms","authors":"Mihaela Chistol","doi":"10.1109/DAS49615.2020.9108968","DOIUrl":null,"url":null,"abstract":"Evolution of modern technologies allowed to store the text in various digital formats such as e-mails, e-documents, libraries, etc. The amount of text data that is produced daily is increasing dramatically. Discovering useful patterns in text that can be represented in unstructured, semi-structured or structured format is a difficult task that requires a good understanding of machine learning algorithms. Finding a suitable algorithm for text mining tasks such as classification, clustering or natural language processing is a demanding situation that tests researchers’ abilities. This paper provides an overview of the text mining process also, presents a comparison of the performance and limitations of two predictive models generated using the parametric Naïve Bayes algorithm and nonparametric Deep Learning neural network. RapidMiner data science software platform has been used for models’ implementations and e-mail classification.","PeriodicalId":103267,"journal":{"name":"2020 International Conference on Development and Application Systems (DAS)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Development and Application Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS49615.2020.9108968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Evolution of modern technologies allowed to store the text in various digital formats such as e-mails, e-documents, libraries, etc. The amount of text data that is produced daily is increasing dramatically. Discovering useful patterns in text that can be represented in unstructured, semi-structured or structured format is a difficult task that requires a good understanding of machine learning algorithms. Finding a suitable algorithm for text mining tasks such as classification, clustering or natural language processing is a demanding situation that tests researchers’ abilities. This paper provides an overview of the text mining process also, presents a comparison of the performance and limitations of two predictive models generated using the parametric Naïve Bayes algorithm and nonparametric Deep Learning neural network. RapidMiner data science software platform has been used for models’ implementations and e-mail classification.