挖掘Web以从文本数据生成主题元数据

Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien
{"title":"挖掘Web以从文本数据生成主题元数据","authors":"Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien","doi":"10.1109/ICDE.2004.1320065","DOIUrl":null,"url":null,"abstract":"Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mining the Web for generating thematic metadata from textual data\",\"authors\":\"Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien\",\"doi\":\"10.1109/ICDE.2004.1320065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.\",\"PeriodicalId\":358862,\"journal\":{\"name\":\"Proceedings. 20th International Conference on Data Engineering\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 20th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2004.1320065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

用于自动创建元数据的传统工具主要是从文本中提取命名实体或模式,并用有关人员、位置、日期等信息对它们进行注释。然而,对于更高级的智能应用程序(如基于概念的搜索)来说,这种实体类型信息通常过于原始。在这里,我们尝试在有限的人为干预下生成语义深度的元数据。我们的方法背后的主要思想是使用Web挖掘和分类技术来创建主题元数据。该方法包括三个计算模块:特征提取、HCQF(层次概念查询公式)和文本实例分类。特征提取模块将文本实例的名称发送给Web搜索引擎,并使用返回的高排名搜索结果页面来描述它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mining the Web for generating thematic metadata from textual data
Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ContextMetrics/sup /spl trade//: semantic and syntactic interoperability in cross-border trading systems EShopMonitor: a Web content monitoring tool A probabilistic approach to metasearching with adaptive probing Simple, robust and highly concurrent b-trees with node deletion Substructure clustering on sequential 3d object datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1