{"title":"Finding High Quality Documents through Link and Click Graphs","authors":"Linfeng Yu, M. Iwaihara","doi":"10.1109/IIAI-AAI.2018.00020","DOIUrl":null,"url":null,"abstract":"Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.","PeriodicalId":309975,"journal":{"name":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2018.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.