I. López, P. A. Alvarez-Carrillo, E. Fernández-González
{"title":"聚焦网络蜘蛛中测量文档相关性的进化模型","authors":"I. López, P. A. Alvarez-Carrillo, E. Fernández-González","doi":"10.1109/CERMA.2008.28","DOIUrl":null,"url":null,"abstract":"Exploring the Web in search of relevant information is a difficult task due to the vast amount of documents it stores and to the heterogeneity of such documents. Using automated systems such as search engines help users cope with the size of the Web. However the results produced by these systems usually contain documents from a large variety of topics with little or no relevance to the end user. In this work, we propose a model that can be used by a Web spider to selectively explore the Web for relevant documents. In this model, two criteria are used for assessing document relevance; content and structure. These two criteria are integrated in a fuzzy predicate that indicates the degree of relevance of a document with respect to a user-defined topic. The parameters of the proposed model are generated by a genetic algorithm that solves a bi-criteria optimization problem.","PeriodicalId":126172,"journal":{"name":"2008 Electronics, Robotics and Automotive Mechanics Conference (CERMA '08)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Evolutionary Model for Measuring Document Relevance in a Focused Web Spider\",\"authors\":\"I. López, P. A. Alvarez-Carrillo, E. Fernández-González\",\"doi\":\"10.1109/CERMA.2008.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploring the Web in search of relevant information is a difficult task due to the vast amount of documents it stores and to the heterogeneity of such documents. Using automated systems such as search engines help users cope with the size of the Web. However the results produced by these systems usually contain documents from a large variety of topics with little or no relevance to the end user. In this work, we propose a model that can be used by a Web spider to selectively explore the Web for relevant documents. In this model, two criteria are used for assessing document relevance; content and structure. These two criteria are integrated in a fuzzy predicate that indicates the degree of relevance of a document with respect to a user-defined topic. The parameters of the proposed model are generated by a genetic algorithm that solves a bi-criteria optimization problem.\",\"PeriodicalId\":126172,\"journal\":{\"name\":\"2008 Electronics, Robotics and Automotive Mechanics Conference (CERMA '08)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Electronics, Robotics and Automotive Mechanics Conference (CERMA '08)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CERMA.2008.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Electronics, Robotics and Automotive Mechanics Conference (CERMA '08)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CERMA.2008.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evolutionary Model for Measuring Document Relevance in a Focused Web Spider
Exploring the Web in search of relevant information is a difficult task due to the vast amount of documents it stores and to the heterogeneity of such documents. Using automated systems such as search engines help users cope with the size of the Web. However the results produced by these systems usually contain documents from a large variety of topics with little or no relevance to the end user. In this work, we propose a model that can be used by a Web spider to selectively explore the Web for relevant documents. In this model, two criteria are used for assessing document relevance; content and structure. These two criteria are integrated in a fuzzy predicate that indicates the degree of relevance of a document with respect to a user-defined topic. The parameters of the proposed model are generated by a genetic algorithm that solves a bi-criteria optimization problem.