{"title":"基于(两层)增量聚类的在线新闻源分析与比较","authors":"Francesco Cambi, P. Crescenzi, L. Pagli","doi":"10.4230/LIPIcs.FUN.2016.9","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \nof the most popular Italian newspapers, in order to answer questions such as what are the most \nrelevant news, what is the average life of news, and how much different are different sites. To this \naim, we have developed a web-based application which hourly collects the articles in the main \ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \narticles into news, and finally allows the user to see the answer to the above questions. We have \nalso designed and implemented a two-layer modification of the incremental clustering algorithm \nand executed some preliminary experimental evaluation of this modification: it turns out that \nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \ngood performances in terms of precision and recall.","PeriodicalId":293763,"journal":{"name":"Fun with Algorithms","volume":"271 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering\",\"authors\":\"Francesco Cambi, P. Crescenzi, L. Pagli\",\"doi\":\"10.4230/LIPIcs.FUN.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \\nof the most popular Italian newspapers, in order to answer questions such as what are the most \\nrelevant news, what is the average life of news, and how much different are different sites. To this \\naim, we have developed a web-based application which hourly collects the articles in the main \\ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \\narticles into news, and finally allows the user to see the answer to the above questions. We have \\nalso designed and implemented a two-layer modification of the incremental clustering algorithm \\nand executed some preliminary experimental evaluation of this modification: it turns out that \\nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \\ngood performances in terms of precision and recall.\",\"PeriodicalId\":293763,\"journal\":{\"name\":\"Fun with Algorithms\",\"volume\":\"271 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fun with Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.FUN.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fun with Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FUN.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering
In this paper, we analyse the contents of the web site of two Italian news agencies and of four
of the most popular Italian newspapers, in order to answer questions such as what are the most
relevant news, what is the average life of news, and how much different are different sites. To this
aim, we have developed a web-based application which hourly collects the articles in the main
column of the six web sites, implements an incremental clustering algorithm for grouping the
articles into news, and finally allows the user to see the answer to the above questions. We have
also designed and implemented a two-layer modification of the incremental clustering algorithm
and executed some preliminary experimental evaluation of this modification: it turns out that
the two-layer clustering is extremely efficient in terms of time performances, and it has quite
good performances in terms of precision and recall.