基于(两层)增量聚类的在线新闻源分析与比较

Fun with Algorithms Pub Date : 2016-06-08 DOI:10.4230/LIPIcs.FUN.2016.9

Francesco Cambi, P. Crescenzi, L. Pagli

{"title":"基于(两层)增量聚类的在线新闻源分析与比较","authors":"Francesco Cambi, P. Crescenzi, L. Pagli","doi":"10.4230/LIPIcs.FUN.2016.9","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \nof the most popular Italian newspapers, in order to answer questions such as what are the most \nrelevant news, what is the average life of news, and how much different are different sites. To this \naim, we have developed a web-based application which hourly collects the articles in the main \ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \narticles into news, and finally allows the user to see the answer to the above questions. We have \nalso designed and implemented a two-layer modification of the incremental clustering algorithm \nand executed some preliminary experimental evaluation of this modification: it turns out that \nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \ngood performances in terms of precision and recall.","PeriodicalId":293763,"journal":{"name":"Fun with Algorithms","volume":"271 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering\",\"authors\":\"Francesco Cambi, P. Crescenzi, L. Pagli\",\"doi\":\"10.4230/LIPIcs.FUN.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \\nof the most popular Italian newspapers, in order to answer questions such as what are the most \\nrelevant news, what is the average life of news, and how much different are different sites. To this \\naim, we have developed a web-based application which hourly collects the articles in the main \\ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \\narticles into news, and finally allows the user to see the answer to the above questions. We have \\nalso designed and implemented a two-layer modification of the incremental clustering algorithm \\nand executed some preliminary experimental evaluation of this modification: it turns out that \\nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \\ngood performances in terms of precision and recall.\",\"PeriodicalId\":293763,\"journal\":{\"name\":\"Fun with Algorithms\",\"volume\":\"271 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fun with Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.FUN.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fun with Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FUN.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们分析了两家意大利新闻机构和四家最受欢迎的意大利报纸的网站内容，以回答诸如什么是最相关的新闻，新闻的平均寿命是什么，以及不同网站的差异有多大等问题。为此，我们开发了一个基于web的应用程序，该应用程序每小时收集六个网站主栏中的文章，并实现增量聚类算法将文章分组为新闻，最后让用户看到上述问题的答案。我们还设计并实现了增量聚类算法的两层修改，并对该修改进行了一些初步的实验评估:结果表明，两层聚类在时间性能上是非常高效的，在精度和召回率方面也有相当好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering

In this paper, we analyse the contents of the web site of two Italian news agencies and of four of the most popular Italian newspapers, in order to answer questions such as what are the most relevant news, what is the average life of news, and how much different are different sites. To this aim, we have developed a web-based application which hourly collects the articles in the main column of the six web sites, implements an incremental clustering algorithm for grouping the articles into news, and finally allows the user to see the answer to the above questions. We have also designed and implemented a two-layer modification of the incremental clustering algorithm and executed some preliminary experimental evaluation of this modification: it turns out that the two-layer clustering is extremely efficient in terms of time performances, and it has quite good performances in terms of precision and recall.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Fun with Algorithms

自引率

0.00%

发文量

期刊最新文献

Coordinating "7 Billion Humans" Is Hard Chess is hard even for a single player How Fast Can We Play Tetris Greedily With Rectangular Pieces? Cooperating in Video Games? Impossible! Undecidability of Team Multiplayer Games Card-Based ZKP Protocols for Takuzu and Juosan