一种利用智能代理技术访问隐藏web数据的方法

2013 3rd IEEE International Advance Computing Conference (IACC) Pub Date : 2013-05-13 DOI:10.1109/IADCC.2013.6514329

Lohit Singh, Dilip Kumar Sharma

{"title":"一种利用智能代理技术访问隐藏web数据的方法","authors":"Lohit Singh, Dilip Kumar Sharma","doi":"10.1109/IADCC.2013.6514329","DOIUrl":null,"url":null,"abstract":"There is large amount of information available on web, which is hidden from users. This is because such information is not able to be accessed or indexed by traditional search engines. These search engines are only able to crawl information by following hypertext links. The forms which require login or any authorization process can be ignored by them. Hidden web refers to that deepest part of the Web which is not available for traditional Web crawlers. Obtaining the content from Hidden web is a challenging task. Today many web sites are containing pages that are dynamic in nature. This dynamic nature of web pages creates a problem for retrieving information for traditional web crawlers. The effort done to solve the given problem is discussed in brief. Then, a comparative study among the earlier defined architecture, considering various parameters, is also shown. By analyzing above methods a framework is proposed which uses an intelligent agent technology for accessing the hidden web.","PeriodicalId":325901,"journal":{"name":"2013 3rd IEEE International Advance Computing Conference (IACC)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An approach for accessing data from hidden web using intelligent agent technology\",\"authors\":\"Lohit Singh, Dilip Kumar Sharma\",\"doi\":\"10.1109/IADCC.2013.6514329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is large amount of information available on web, which is hidden from users. This is because such information is not able to be accessed or indexed by traditional search engines. These search engines are only able to crawl information by following hypertext links. The forms which require login or any authorization process can be ignored by them. Hidden web refers to that deepest part of the Web which is not available for traditional Web crawlers. Obtaining the content from Hidden web is a challenging task. Today many web sites are containing pages that are dynamic in nature. This dynamic nature of web pages creates a problem for retrieving information for traditional web crawlers. The effort done to solve the given problem is discussed in brief. Then, a comparative study among the earlier defined architecture, considering various parameters, is also shown. By analyzing above methods a framework is proposed which uses an intelligent agent technology for accessing the hidden web.\",\"PeriodicalId\":325901,\"journal\":{\"name\":\"2013 3rd IEEE International Advance Computing Conference (IACC)\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 3rd IEEE International Advance Computing Conference (IACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IADCC.2013.6514329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 3rd IEEE International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2013.6514329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

网络上有大量的信息，这些信息对用户来说是隐藏的。这是因为这些信息不能被传统的搜索引擎访问或索引。这些搜索引擎只能通过跟踪超文本链接来抓取信息。需要登录或任何授权过程的表单可以被他们忽略。隐藏网络指的是网络最深处，传统网络爬虫无法访问的部分。从隐网中获取内容是一项具有挑战性的任务。今天，许多网站都包含动态的页面。网页的这种动态特性给传统的网络爬虫程序检索信息带来了问题。简要地讨论了为解决给定问题所做的努力。然后，在考虑各种参数的情况下，对早期定义的体系结构进行了比较研究。通过对上述方法的分析，提出了一种利用智能代理技术访问隐藏web的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An approach for accessing data from hidden web using intelligent agent technology

There is large amount of information available on web, which is hidden from users. This is because such information is not able to be accessed or indexed by traditional search engines. These search engines are only able to crawl information by following hypertext links. The forms which require login or any authorization process can be ignored by them. Hidden web refers to that deepest part of the Web which is not available for traditional Web crawlers. Obtaining the content from Hidden web is a challenging task. Today many web sites are containing pages that are dynamic in nature. This dynamic nature of web pages creates a problem for retrieving information for traditional web crawlers. The effort done to solve the given problem is discussed in brief. Then, a comparative study among the earlier defined architecture, considering various parameters, is also shown. By analyzing above methods a framework is proposed which uses an intelligent agent technology for accessing the hidden web.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 3rd IEEE International Advance Computing Conference (IACC)

自引率

0.00%

发文量