Efficient extraction of news articles based on RSS crawling

2010 International Conference on Machine and Web Intelligence Pub Date : 2010-11-29 DOI:10.1109/ICMWI.2010.5647851

George Adam, C. Bouras, V. Poulopoulos

引用次数: 13

Abstract

The expansion of the World Wide Web has led to a state where a vast amount of Internet users face and have to overcome the major problem of discovering desired information. It is inevitable that hundreds of web pages and weblogs are generated daily or changing on a daily basis. The main problem that arises from the continuous generation and alteration of web pages is the discovery of useful information, a task that becomes difficult even for the experienced internet users. Many mechanisms have been constructed and presented in order to overcome the puzzle of information discovery on the Internet and they are mostly based on crawlers which are browsing the WWW, downloading pages and collect the information that might be of user interest. In this manuscript we describe a mechanism that fetches web pages that include news articles from major news portals and blogs. This mechanism is constructed in order to support tools that are used to acquire news articles from all over the world, process them and present them back to the end users in a personalized manner.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于RSS抓取的新闻文章高效提取

万维网的发展导致大量互联网用户面临并不得不克服发现所需信息的主要问题。每天都有成百上千的网页和博客生成，或者每天都在变化，这是不可避免的。网页的不断生成和更改所产生的主要问题是发现有用的信息，即使对有经验的互联网用户来说，这一任务也变得困难。为了克服互联网上的信息发现难题，已经建立和提出了许多机制，它们大多基于爬虫程序，它们浏览WWW，下载页面并收集用户可能感兴趣的信息。在本文中，我们描述了一种从主要新闻门户和博客获取包含新闻文章的网页的机制。该机制的构建是为了支持用于从世界各地获取新闻文章、对其进行处理并以个性化的方式将其呈现给最终用户的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 International Conference on Machine and Web Intelligence

自引率

0.00%

发文量

期刊最新文献

Disparity map estimation with neural network Weighted matrix distance metric for face images classification Exploring semantic roles of Web interface components Clustering approach for false alerts reducing in behavioral based intrusion detection systems Towards re-engineering Web applications into Semantic Web Services