Koo Kwong Ze, Fares Hasan, R. Razali, A. Buhari, Elisha Tadiwa
{"title":"An Enhanced PageRank Algorithm based on Optimized Normalized Technique and Content-based Approach","authors":"Koo Kwong Ze, Fares Hasan, R. Razali, A. Buhari, Elisha Tadiwa","doi":"10.31580/OJST.V3I2.1468","DOIUrl":null,"url":null,"abstract":"PageRank is an algorithm concerning search queries over the Internet. The algorithm returns the best search results to the user based on the webpage relevancy by calculating the outgoing links from each webpage. Although useful, the algorithm consumes a considerable amount of time as it needs to calculate the available webpages, which are also increasing in number over time. Moreover, the returned results by the algorithm are biased towards old webpages because they have the volume due to their lifetime, thus resulting in newly created webpages to have lower page ranks even though they have comparatively more relevant and useful information. To overcome these issues, this paper proposes an alternative hybrid PageRank algorithm based on optimized normalization technique and content-based approach. The proposed algorithm reduces the number of iterations required to calculate the page rank, hence improves the efficiency, by calculating the mean of all page rank values and normalizes them through the use of the mean. Through this approach, the algorithm is also able to determine the relevancy of webpages based on validity of links rather than popularity. These claims are demonstrated by an experiment conducted on the proposed algorithm using a dummy web structure consisting of 12 webpages. The results showed that the traditional PageRank algorithm has 74% more iterations than the proposed algorithm. The proposed algorithm returned a mean value of 1.00 compared to 1.32 for the traditional algorithm. These results confirm that the proposed algorithm saves a substantial amount of computing power while being more precise and not biased.","PeriodicalId":19674,"journal":{"name":"Open Access Journal of Science and Technology","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Access Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31580/OJST.V3I2.1468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
PageRank is an algorithm concerning search queries over the Internet. The algorithm returns the best search results to the user based on the webpage relevancy by calculating the outgoing links from each webpage. Although useful, the algorithm consumes a considerable amount of time as it needs to calculate the available webpages, which are also increasing in number over time. Moreover, the returned results by the algorithm are biased towards old webpages because they have the volume due to their lifetime, thus resulting in newly created webpages to have lower page ranks even though they have comparatively more relevant and useful information. To overcome these issues, this paper proposes an alternative hybrid PageRank algorithm based on optimized normalization technique and content-based approach. The proposed algorithm reduces the number of iterations required to calculate the page rank, hence improves the efficiency, by calculating the mean of all page rank values and normalizes them through the use of the mean. Through this approach, the algorithm is also able to determine the relevancy of webpages based on validity of links rather than popularity. These claims are demonstrated by an experiment conducted on the proposed algorithm using a dummy web structure consisting of 12 webpages. The results showed that the traditional PageRank algorithm has 74% more iterations than the proposed algorithm. The proposed algorithm returned a mean value of 1.00 compared to 1.32 for the traditional algorithm. These results confirm that the proposed algorithm saves a substantial amount of computing power while being more precise and not biased.