R. Mittal, Varun Malik, Vikram Singh, Jaiteg Singh, Amandeep Kaur
{"title":"结合遗传算法和随机森林提高Web日志数据分类性能","authors":"R. Mittal, Varun Malik, Vikram Singh, Jaiteg Singh, Amandeep Kaur","doi":"10.1109/PDGC50313.2020.9315807","DOIUrl":null,"url":null,"abstract":"Web mining is an important approach to retrieve and analyse the information from web server log data. In the internet-driven information age, a lot of data is present on the web in many ways and analysing such data using the web mining methods cam result in some novel insights. Such data can be extracted from the server log files and can be preprocessed to be used for various web mining functionalities. In this paper authors used the data from web server log files, preprocessed it and then applied various classification algorithms such as Naïve bayes,KNN,decision tree,random forest and analysed the results. The best approach was then chosen to further improve the performance of the classifier by integrating it with genetic algorithm. In this context, a hybrid approach, namely RFGA was used integrating Random forest and genetic algorithm on the dataset and the results of different machine learning classifiers were compared with RFGA in terms of the predictive accuracy.","PeriodicalId":347216,"journal":{"name":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"47 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Integrating Genetic Algorithm with Random Forest for Improving the Classification Performance of Web Log Data\",\"authors\":\"R. Mittal, Varun Malik, Vikram Singh, Jaiteg Singh, Amandeep Kaur\",\"doi\":\"10.1109/PDGC50313.2020.9315807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web mining is an important approach to retrieve and analyse the information from web server log data. In the internet-driven information age, a lot of data is present on the web in many ways and analysing such data using the web mining methods cam result in some novel insights. Such data can be extracted from the server log files and can be preprocessed to be used for various web mining functionalities. In this paper authors used the data from web server log files, preprocessed it and then applied various classification algorithms such as Naïve bayes,KNN,decision tree,random forest and analysed the results. The best approach was then chosen to further improve the performance of the classifier by integrating it with genetic algorithm. In this context, a hybrid approach, namely RFGA was used integrating Random forest and genetic algorithm on the dataset and the results of different machine learning classifiers were compared with RFGA in terms of the predictive accuracy.\",\"PeriodicalId\":347216,\"journal\":{\"name\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"volume\":\"47 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC50313.2020.9315807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC50313.2020.9315807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating Genetic Algorithm with Random Forest for Improving the Classification Performance of Web Log Data
Web mining is an important approach to retrieve and analyse the information from web server log data. In the internet-driven information age, a lot of data is present on the web in many ways and analysing such data using the web mining methods cam result in some novel insights. Such data can be extracted from the server log files and can be preprocessed to be used for various web mining functionalities. In this paper authors used the data from web server log files, preprocessed it and then applied various classification algorithms such as Naïve bayes,KNN,decision tree,random forest and analysed the results. The best approach was then chosen to further improve the performance of the classifier by integrating it with genetic algorithm. In this context, a hybrid approach, namely RFGA was used integrating Random forest and genetic algorithm on the dataset and the results of different machine learning classifiers were compared with RFGA in terms of the predictive accuracy.