Ashadullah Shawon, S. T. Zuhori, F. Mahmud, Md. Jamil-Ur Rahman
{"title":"基于词的多N -Gram模型和面向特征参数随机搜索的网站分类","authors":"Ashadullah Shawon, S. T. Zuhori, F. Mahmud, Md. Jamil-Ur Rahman","doi":"10.1109/ICCITECHN.2018.8631907","DOIUrl":null,"url":null,"abstract":"Website classification is a convenient starting point for building an intelligent web browser and social networking sites that can understand the favorite categories of a user and also detect adult or harmful websites perfectly. Classifying the web sites using the information of the Uniform Resource Locator (URL) is an important and fast technique. A perfect result is needed for URL classification to make it usable in the real world applications. So we have proposed an improved approach for URL classification that is able to provide a better result. We have introduced the word-based multiple n-gram models for efficient feature extraction and multinomial distribution for Naive Bayes classifier under the Random Search pipeline for hyperparameter optimization that finds the best parameters of the URL features. The experimental result of our research is compared with the result of previous research works and we have shown a better result than the existing result. Our experimental result provides 88.77% in recall and 87.63% in F1-Score which is the best performance so far.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Website Classification Using Word Based Multiple N -Gram Models and Random Search Oriented Feature Parameters\",\"authors\":\"Ashadullah Shawon, S. T. Zuhori, F. Mahmud, Md. Jamil-Ur Rahman\",\"doi\":\"10.1109/ICCITECHN.2018.8631907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Website classification is a convenient starting point for building an intelligent web browser and social networking sites that can understand the favorite categories of a user and also detect adult or harmful websites perfectly. Classifying the web sites using the information of the Uniform Resource Locator (URL) is an important and fast technique. A perfect result is needed for URL classification to make it usable in the real world applications. So we have proposed an improved approach for URL classification that is able to provide a better result. We have introduced the word-based multiple n-gram models for efficient feature extraction and multinomial distribution for Naive Bayes classifier under the Random Search pipeline for hyperparameter optimization that finds the best parameters of the URL features. The experimental result of our research is compared with the result of previous research works and we have shown a better result than the existing result. Our experimental result provides 88.77% in recall and 87.63% in F1-Score which is the best performance so far.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Website Classification Using Word Based Multiple N -Gram Models and Random Search Oriented Feature Parameters
Website classification is a convenient starting point for building an intelligent web browser and social networking sites that can understand the favorite categories of a user and also detect adult or harmful websites perfectly. Classifying the web sites using the information of the Uniform Resource Locator (URL) is an important and fast technique. A perfect result is needed for URL classification to make it usable in the real world applications. So we have proposed an improved approach for URL classification that is able to provide a better result. We have introduced the word-based multiple n-gram models for efficient feature extraction and multinomial distribution for Naive Bayes classifier under the Random Search pipeline for hyperparameter optimization that finds the best parameters of the URL features. The experimental result of our research is compared with the result of previous research works and we have shown a better result than the existing result. Our experimental result provides 88.77% in recall and 87.63% in F1-Score which is the best performance so far.