A. Tripathy, Nilakshi Joshi, Steffy Thomas, Shweta U. Shetty, Namitha Thomas
{"title":"VEDD- a visual wrapper for extraction of data using DOM tree","authors":"A. Tripathy, Nilakshi Joshi, Steffy Thomas, Shweta U. Shetty, Namitha Thomas","doi":"10.1109/ICCICT.2012.6398114","DOIUrl":null,"url":null,"abstract":"The World Wide Web plays an important role while searching for information in the data network. Users are constantly exposed to an ever-growing flood of information. A wrapper is an application which helps in searching for Search Results Records (SSR) from multiple search engines. This helps in making the search more efficient and reliable. VEDD wrapper extracts the relevant SRRs from three search engines by filtering out the noisy and redundant records. Finally the unique set of records is displayed in a common VEDD search result page. The extraction is performed using the concepts of Document Object Model (DOM) tree. The paper presents a series of data filters to detect and remove irrelevant data from the web page. The data filters will also be used to further improve the similarity check of data records. Also, visual cues from the underlying browser rendering engine is made use to locate and extract the relevant data region from the deep web by the keyword matching technique.","PeriodicalId":319467,"journal":{"name":"2012 International Conference on Communication, Information & Computing Technology (ICCICT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Communication, Information & Computing Technology (ICCICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCICT.2012.6398114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The World Wide Web plays an important role while searching for information in the data network. Users are constantly exposed to an ever-growing flood of information. A wrapper is an application which helps in searching for Search Results Records (SSR) from multiple search engines. This helps in making the search more efficient and reliable. VEDD wrapper extracts the relevant SRRs from three search engines by filtering out the noisy and redundant records. Finally the unique set of records is displayed in a common VEDD search result page. The extraction is performed using the concepts of Document Object Model (DOM) tree. The paper presents a series of data filters to detect and remove irrelevant data from the web page. The data filters will also be used to further improve the similarity check of data records. Also, visual cues from the underlying browser rendering engine is made use to locate and extract the relevant data region from the deep web by the keyword matching technique.