基于(两层)增量聚类的在线新闻源分析与比较

Francesco Cambi, P. Crescenzi, L. Pagli
{"title":"基于(两层)增量聚类的在线新闻源分析与比较","authors":"Francesco Cambi, P. Crescenzi, L. Pagli","doi":"10.4230/LIPIcs.FUN.2016.9","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \nof the most popular Italian newspapers, in order to answer questions such as what are the most \nrelevant news, what is the average life of news, and how much different are different sites. To this \naim, we have developed a web-based application which hourly collects the articles in the main \ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \narticles into news, and finally allows the user to see the answer to the above questions. We have \nalso designed and implemented a two-layer modification of the incremental clustering algorithm \nand executed some preliminary experimental evaluation of this modification: it turns out that \nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \ngood performances in terms of precision and recall.","PeriodicalId":293763,"journal":{"name":"Fun with Algorithms","volume":"271 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering\",\"authors\":\"Francesco Cambi, P. Crescenzi, L. Pagli\",\"doi\":\"10.4230/LIPIcs.FUN.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \\nof the most popular Italian newspapers, in order to answer questions such as what are the most \\nrelevant news, what is the average life of news, and how much different are different sites. To this \\naim, we have developed a web-based application which hourly collects the articles in the main \\ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \\narticles into news, and finally allows the user to see the answer to the above questions. We have \\nalso designed and implemented a two-layer modification of the incremental clustering algorithm \\nand executed some preliminary experimental evaluation of this modification: it turns out that \\nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \\ngood performances in terms of precision and recall.\",\"PeriodicalId\":293763,\"journal\":{\"name\":\"Fun with Algorithms\",\"volume\":\"271 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fun with Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.FUN.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fun with Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FUN.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在本文中,我们分析了两家意大利新闻机构和四家最受欢迎的意大利报纸的网站内容,以回答诸如什么是最相关的新闻,新闻的平均寿命是什么,以及不同网站的差异有多大等问题。为此,我们开发了一个基于web的应用程序,该应用程序每小时收集六个网站主栏中的文章,并实现增量聚类算法将文章分组为新闻,最后让用户看到上述问题的答案。我们还设计并实现了增量聚类算法的两层修改,并对该修改进行了一些初步的实验评估:结果表明,两层聚类在时间性能上是非常高效的,在精度和召回率方面也有相当好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering
In this paper, we analyse the contents of the web site of two Italian news agencies and of four of the most popular Italian newspapers, in order to answer questions such as what are the most relevant news, what is the average life of news, and how much different are different sites. To this aim, we have developed a web-based application which hourly collects the articles in the main column of the six web sites, implements an incremental clustering algorithm for grouping the articles into news, and finally allows the user to see the answer to the above questions. We have also designed and implemented a two-layer modification of the incremental clustering algorithm and executed some preliminary experimental evaluation of this modification: it turns out that the two-layer clustering is extremely efficient in terms of time performances, and it has quite good performances in terms of precision and recall.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Coordinating "7 Billion Humans" Is Hard Chess is hard even for a single player How Fast Can We Play Tetris Greedily With Rectangular Pieces? Cooperating in Video Games? Impossible! Undecidability of Team Multiplayer Games Card-Based ZKP Protocols for Takuzu and Juosan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1