{"title":"Development of a framework for sub-topic discovery from the Web","authors":"Eray Uluhan, B. Badur","doi":"10.1109/PICMET.2008.4599696","DOIUrl":null,"url":null,"abstract":"The motivation behind sub-topic or topic specific keyword discovery through Web pages is helping a user, who is insufficient in knowledge and experience about a topic, to find important concepts without much effort. Intuitively, a Web user would start searching the Web via querying search engines, visiting some pages, and spending a lot of time on deciding what is important about the topic and what is not. In this study, we try to mine important sub-topics or key concepts of a given topic automatically, through HTML based Web pages. Starting with a search query, the system gathers top-ranking pages returned from a search engine; and selects informative pages among them. These pages are processed further for extracting important phrases and then applied data mining techniques on these phrases to find candidate sub-topics. Each candidate phrase is given scores based on its relevance with the search query over the Web space. Using the proposed technique, the user should be able to quickly learn sub-topics or key concepts about a topic without going through the ordeal of browsing a large number of non-informative pages returned by the search engine.","PeriodicalId":168329,"journal":{"name":"PICMET '08 - 2008 Portland International Conference on Management of Engineering & Technology","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PICMET '08 - 2008 Portland International Conference on Management of Engineering & Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICMET.2008.4599696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The motivation behind sub-topic or topic specific keyword discovery through Web pages is helping a user, who is insufficient in knowledge and experience about a topic, to find important concepts without much effort. Intuitively, a Web user would start searching the Web via querying search engines, visiting some pages, and spending a lot of time on deciding what is important about the topic and what is not. In this study, we try to mine important sub-topics or key concepts of a given topic automatically, through HTML based Web pages. Starting with a search query, the system gathers top-ranking pages returned from a search engine; and selects informative pages among them. These pages are processed further for extracting important phrases and then applied data mining techniques on these phrases to find candidate sub-topics. Each candidate phrase is given scores based on its relevance with the search query over the Web space. Using the proposed technique, the user should be able to quickly learn sub-topics or key concepts about a topic without going through the ordeal of browsing a large number of non-informative pages returned by the search engine.