首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Personalization of queries in database systems 数据库系统中查询的个性化
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320030
G. Koutrika, Y. Ioannidis
As information becomes available in increasing amounts to a wide spectrum of users, the need for a shift towards a more user-centered information access paradigm arises. We develop a personalization framework for database systems based on user profiles and identify the basic architectural modules required to support it. We define a preference model that assigns to each atomic query condition a personal degree of interest and provide a mechanism to compute the degree of interest in any complex query condition based on the degrees of interest in the constituent atomic ones. Preferences are stored in profiles. At query time, personalization proceeds in two steps: (a) preference selection and (b) preference integration into the original user query. We formulate the main personalization step, i.e. preference selection, as a graph computation problem and provide an efficient algorithm for it. We also discuss results of experimentation with a prototype query personalization system.
随着越来越多的信息可供广泛的用户使用,需要转向更以用户为中心的信息访问范式。我们基于用户配置文件为数据库系统开发了一个个性化框架,并确定了支持该框架所需的基本体系结构模块。我们定义了一个首选项模型,为每个原子查询条件分配个人兴趣程度,并提供了一种机制,根据对组成原子查询条件的兴趣程度计算任何复杂查询条件中的兴趣程度。首选项存储在配置文件中。在查询时,个性化分两个步骤进行:(a)偏好选择和(b)将偏好集成到原始用户查询中。我们将个性化的主要步骤,即偏好选择,表述为一个图计算问题,并给出了一个有效的算法。我们还讨论了一个原型查询个性化系统的实验结果。
{"title":"Personalization of queries in database systems","authors":"G. Koutrika, Y. Ioannidis","doi":"10.1109/ICDE.2004.1320030","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320030","url":null,"abstract":"As information becomes available in increasing amounts to a wide spectrum of users, the need for a shift towards a more user-centered information access paradigm arises. We develop a personalization framework for database systems based on user profiles and identify the basic architectural modules required to support it. We define a preference model that assigns to each atomic query condition a personal degree of interest and provide a mechanism to compute the degree of interest in any complex query condition based on the degrees of interest in the constituent atomic ones. Preferences are stored in profiles. At query time, personalization proceeds in two steps: (a) preference selection and (b) preference integration into the original user query. We formulate the main personalization step, i.e. preference selection, as a graph computation problem and provide an efficient algorithm for it. We also discuss results of experimentation with a prototype query personalization system.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
XML query processing XML查询处理
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320100
D. Florescu, Donald Kossmann
XQuery is starting to gain significant traction as a language for querying and transforming XML data. It is used in a variety of different products. Examples to date include XML database systems, XML document repositories, XML data integation, workflow systems, and publish and subscribe systems. In addition, XPath of which XQuery is a superset is used in various products such as Web browsers. Although the W3C XQuery specification has not yet attained Recommendation status, and the definition of the language has not entirely stabilized, a number of alternative proposals to implement and optimize XQuery have appeared both in industry and in the research community. Given the wide range of applications for which XQuery is applicable, a wide spectrum of alternative techniques have been proposed for XQuery processing. Some of these techniques are only useful for certain applications, other techniques are general-purpose. The goal of this seminar is to give an overview of the existing approaches to process XQuery expressions and to give details of the most important techniques. The presenters have experience from designing and building an industrial-strength XQuery engine [1]. The seminar will give details of that XQuery engine, but the seminar will also give extensive coverage of other XQuery engines and of the state of the art in the research community. The agenda for the seminar is as follows: (1) Introduction to XQuery: Motivation, XQuery data model, XQuery type system, Basic query language concepts; (2) Internal Representation of XML Data: DOM, SAX Events, TokenStream, Skeleton, Vertical Partitioning; (3) XQuery Algebras: XQuery Core vs. Relational Algebra, XQuery Algebras from Research Projects; (4) XPath Query Processing: Transducers, Automata, etc.; (5) XQuery Optimization: XML query equivalence, Rewrite Rules, Cost Models; (6) XQuery Runtime Systems: Iterator Models, Algorithms for XQuery Operators; (7) XML Indexes: Value and path indexes, others; (8) XQuery Products and Prototypes: XQRL/BEA, Galax, Saxon, etc. (as available); (9) Advanced Query Processing Techniques, Related Topics: Querying compressed XML data, Multi-Query Optimization, Publish/Subscribe and XML Information Filter, XML Data Integration, XML Updates, XML integrity constraints; (10) Summary.
作为一种查询和转换XML数据的语言,XQuery正开始受到广泛关注。它被用于各种不同的产品。到目前为止的例子包括XML数据库系统、XML文档存储库、XML数据集成、工作流系统以及发布和订阅系统。此外,XPath (XQuery是它的超集)用于各种产品,如Web浏览器。尽管W3C XQuery规范还没有达到推荐标准,而且该语言的定义也没有完全稳定下来,但是在工业界和研究社区中已经出现了许多实现和优化XQuery的替代建议。由于XQuery适用的应用范围很广,因此为XQuery处理提出了各种各样的替代技术。其中一些技术仅对某些应用程序有用,其他技术则是通用的。本次研讨会的目的是概述处理XQuery表达式的现有方法,并详细介绍最重要的技术。演讲者具有设计和构建工业级XQuery引擎的经验[1]。研讨会将详细介绍该XQuery引擎,但也将广泛介绍其他XQuery引擎和研究社区的最新技术。研讨会议程如下:(1)XQuery简介:动机、XQuery数据模型、XQuery类型系统、基本查询语言概念;(2) XML数据的内部表示:DOM、SAX事件、令牌流、骨架、垂直分区;(3) XQuery代数:XQuery核心与关系代数,来自研究项目的XQuery代数;(4) XPath查询处理:换能器、自动机等(5) XQuery优化:XML查询等价、重写规则、成本模型;(6) XQuery运行时系统:迭代器模型、XQuery操作符算法;(7) XML索引:值和路径索引,其他;(8) XQuery产品及原型:XQRL/BEA、Galax、Saxon等(如有);(9)高级查询处理技术,相关主题:压缩XML数据查询、多查询优化、发布/订阅与XML信息过滤、XML数据集成、XML更新、XML完整性约束;(10)总结。
{"title":"XML query processing","authors":"D. Florescu, Donald Kossmann","doi":"10.1109/ICDE.2004.1320100","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320100","url":null,"abstract":"XQuery is starting to gain significant traction as a language for querying and transforming XML data. It is used in a variety of different products. Examples to date include XML database systems, XML document repositories, XML data integation, workflow systems, and publish and subscribe systems. In addition, XPath of which XQuery is a superset is used in various products such as Web browsers. Although the W3C XQuery specification has not yet attained Recommendation status, and the definition of the language has not entirely stabilized, a number of alternative proposals to implement and optimize XQuery have appeared both in industry and in the research community. Given the wide range of applications for which XQuery is applicable, a wide spectrum of alternative techniques have been proposed for XQuery processing. Some of these techniques are only useful for certain applications, other techniques are general-purpose. The goal of this seminar is to give an overview of the existing approaches to process XQuery expressions and to give details of the most important techniques. The presenters have experience from designing and building an industrial-strength XQuery engine [1]. The seminar will give details of that XQuery engine, but the seminar will also give extensive coverage of other XQuery engines and of the state of the art in the research community. The agenda for the seminar is as follows: (1) Introduction to XQuery: Motivation, XQuery data model, XQuery type system, Basic query language concepts; (2) Internal Representation of XML Data: DOM, SAX Events, TokenStream, Skeleton, Vertical Partitioning; (3) XQuery Algebras: XQuery Core vs. Relational Algebra, XQuery Algebras from Research Projects; (4) XPath Query Processing: Transducers, Automata, etc.; (5) XQuery Optimization: XML query equivalence, Rewrite Rules, Cost Models; (6) XQuery Runtime Systems: Iterator Models, Algorithms for XQuery Operators; (7) XML Indexes: Value and path indexes, others; (8) XQuery Products and Prototypes: XQRL/BEA, Galax, Saxon, etc. (as available); (9) Advanced Query Processing Techniques, Related Topics: Querying compressed XML data, Multi-Query Optimization, Publish/Subscribe and XML Information Filter, XML Data Integration, XML Updates, XML integrity constraints; (10) Summary.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130614556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Superimposed applications using SPARCE 使用空间的叠加应用程序
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320089
S. Murthy, D. Maier, L. Delcambre, S. Bowers
People often impose new interpretations onto existing information. In the process, they work with information in two layers: a base layer, where the original information resides, and a superimposed layer, where only the new interpretations reside. Abstractions defined in the Superimposed Pluggable Architecture for Contexts and Excerpts (SPARCE) ease communication between the two layers. SPARCE provides three key abstractions for superimposed information management: mark, context, and excerpt. We demonstrate two applications, RIDPad and Schematics Browser, for use in the appeal process of the US Forest Service (USFS).
人们经常对现有信息强加新的解释。在这个过程中,他们在两个层中处理信息:基础层,原始信息驻留在其中;叠加层,只有新的解释驻留在其中。在上下文和摘录的叠加可插拔架构(SPARCE)中定义的抽象简化了两层之间的通信。SPARCE为叠加信息管理提供了三个关键的抽象:标记、上下文和摘录。我们演示了两个应用程序,RIDPad和Schematics浏览器,用于美国林业局(USFS)的上诉过程。
{"title":"Superimposed applications using SPARCE","authors":"S. Murthy, D. Maier, L. Delcambre, S. Bowers","doi":"10.1109/ICDE.2004.1320089","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320089","url":null,"abstract":"People often impose new interpretations onto existing information. In the process, they work with information in two layers: a base layer, where the original information resides, and a superimposed layer, where only the new interpretations reside. Abstractions defined in the Superimposed Pluggable Architecture for Contexts and Excerpts (SPARCE) ease communication between the two layers. SPARCE provides three key abstractions for superimposed information management: mark, context, and excerpt. We demonstrate two applications, RIDPad and Schematics Browser, for use in the appeal process of the US Forest Service (USFS).","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130113380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
LexEQUAL: supporting multilexical queries in SQL LexEQUAL:支持SQL中的多词法查询
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320075
A. Kumaran, J. Haritsa
Current database systems offer support for storing multilingual data, but are not capable of querying across languages, an important consideration in today's global economy. We therefore propose a new multilexical operator called LexEQUAL that extends the standard lexicographic matching in database systems to matching of text data across languages, specifically for names, which form close to twenty percent of text corpora. The implementation of the LexEQUAL operator is based on transforming matches in language space into parameterized approximate matches in the equivalent phoneme space. A detailed evaluation of our approach on a real data set shows that there exist settings of the algorithm parameters with which it is possible to achieve both good recall and precision.
当前的数据库系统支持存储多语言数据,但不支持跨语言查询,这是当今全球经济中的一个重要考虑因素。因此,我们提出了一个名为LexEQUAL的新的多词法运算符,它将数据库系统中的标准词典匹配扩展到跨语言的文本数据匹配,特别是对于占文本语料库近20%的名称。LexEQUAL运算符的实现基于将语言空间中的匹配转换为等效音素空间中的参数化近似匹配。对我们的方法在真实数据集上的详细评估表明,存在算法参数的设置,可以实现良好的召回率和精度。
{"title":"LexEQUAL: supporting multilexical queries in SQL","authors":"A. Kumaran, J. Haritsa","doi":"10.1109/ICDE.2004.1320075","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320075","url":null,"abstract":"Current database systems offer support for storing multilingual data, but are not capable of querying across languages, an important consideration in today's global economy. We therefore propose a new multilexical operator called LexEQUAL that extends the standard lexicographic matching in database systems to matching of text data across languages, specifically for names, which form close to twenty percent of text corpora. The implementation of the LexEQUAL operator is based on transforming matches in language space into parameterized approximate matches in the equivalent phoneme space. A detailed evaluation of our approach on a real data set shows that there exist settings of the algorithm parameters with which it is possible to achieve both good recall and precision.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134148625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Go green: recycle and reuse frequent patterns 走向绿色:循环利用和重复使用频繁的图案
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319990
G. Cong, B. Ooi, K. Tan, A. Tung
In constrained data mining, users can specify constraints to prune the search space to avoid mining uninteresting knowledge. This is typically done by specifying some initial values of the constraints that are subsequently refined iteratively until satisfactory results are obtained. Existing mining schemes treat each iteration as a distinct mining process, and fail to exploit the information generated between iterations. We propose to salvage knowledge that is discovered from an earlier iteration of mining to enhance subsequent rounds of mining. In particular, we look at how frequent patterns can be recycled. Our proposed strategy operates in two phases. In the first phase, frequent patterns obtained from an early iteration are used to compress a database. In the second phase, subsequent mining processes operate on the compressed database. We propose two compression strategies and adapt three existing frequent pattern mining techniques to exploit the compressed database. Results from our extensive experimental study show that our proposed recycling algorithms outperform their nonrecycling counterpart by an order of magnitude.
在有约束的数据挖掘中,用户可以指定约束来修剪搜索空间,以避免挖掘不感兴趣的知识。这通常是通过指定约束的一些初始值来完成的,这些值随后被迭代地改进,直到获得满意的结果。现有的挖掘方案将每次迭代视为一个独立的挖掘过程,无法利用迭代之间生成的信息。我们建议挽救从早期迭代挖掘中发现的知识,以增强后续的挖掘。特别地,我们将研究如何频繁地循环模式。我们建议的策略分两个阶段实施。在第一阶段,使用从早期迭代中获得的频繁模式来压缩数据库。在第二阶段,后续挖掘过程对压缩数据库进行操作。我们提出了两种压缩策略,并采用了现有的三种频繁模式挖掘技术来利用压缩后的数据库。我们广泛的实验研究结果表明,我们提出的回收算法在一个数量级上优于非回收算法。
{"title":"Go green: recycle and reuse frequent patterns","authors":"G. Cong, B. Ooi, K. Tan, A. Tung","doi":"10.1109/ICDE.2004.1319990","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319990","url":null,"abstract":"In constrained data mining, users can specify constraints to prune the search space to avoid mining uninteresting knowledge. This is typically done by specifying some initial values of the constraints that are subsequently refined iteratively until satisfactory results are obtained. Existing mining schemes treat each iteration as a distinct mining process, and fail to exploit the information generated between iterations. We propose to salvage knowledge that is discovered from an earlier iteration of mining to enhance subsequent rounds of mining. In particular, we look at how frequent patterns can be recycled. Our proposed strategy operates in two phases. In the first phase, frequent patterns obtained from an early iteration are used to compress a database. In the second phase, subsequent mining processes operate on the compressed database. We propose two compression strategies and adapt three existing frequent pattern mining techniques to exploit the compressed database. Results from our extensive experimental study show that our proposed recycling algorithms outperform their nonrecycling counterpart by an order of magnitude.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129409546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Hash-merge join: a non-blocking join algorithm for producing fast and early join results 散列合并连接:用于生成快速和早期连接结果的非阻塞连接算法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320002
M. Mokbel, Ming Lu, Walid G. Aref
We introduce the hash-merge join algorithm (HMJ, for short); a new nonblocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) minimize the time to produce the first few results, and (2) produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art nonblocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.
我们介绍了哈希合并连接算法(简称HMJ);一种新的非阻塞连接算法,通过不可预测的、缓慢的或突发的网络流量处理来自远程数据源的数据项。HMJ算法的设计考虑了两个目标:(1)最小化生成前几个结果的时间;(2)即使连接操作符的两个源偶尔被阻塞,也能生成连接结果。HMJ算法有两个阶段:哈希阶段和合并阶段。散列阶段采用内存中基于散列的连接算法,该算法在数据到达时立即生成连接结果。如果两个源被阻塞,合并阶段负责产生连接结果。HMJ算法的两个阶段都通过一个刷新策略连接起来,该策略在内存耗尽时将内存中的部分刷新到磁盘存储中。实验结果表明,HMJ结合了两种最先进的无阻塞连接算法(XJoin和Progressive Merge join)的优点,同时避免了它们的缺点。
{"title":"Hash-merge join: a non-blocking join algorithm for producing fast and early join results","authors":"M. Mokbel, Ming Lu, Walid G. Aref","doi":"10.1109/ICDE.2004.1320002","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320002","url":null,"abstract":"We introduce the hash-merge join algorithm (HMJ, for short); a new nonblocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) minimize the time to produce the first few results, and (2) produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art nonblocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133116434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Mining the Web for generating thematic metadata from textual data 挖掘Web以从文本数据生成主题元数据
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320065
Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien
Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.
用于自动创建元数据的传统工具主要是从文本中提取命名实体或模式,并用有关人员、位置、日期等信息对它们进行注释。然而,对于更高级的智能应用程序(如基于概念的搜索)来说,这种实体类型信息通常过于原始。在这里,我们尝试在有限的人为干预下生成语义深度的元数据。我们的方法背后的主要思想是使用Web挖掘和分类技术来创建主题元数据。该方法包括三个计算模块:特征提取、HCQF(层次概念查询公式)和文本实例分类。特征提取模块将文本实例的名称发送给Web搜索引擎,并使用返回的高排名搜索结果页面来描述它们。
{"title":"Mining the Web for generating thematic metadata from textual data","authors":"Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien","doi":"10.1109/ICDE.2004.1320065","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320065","url":null,"abstract":"Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130575096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Content-based three-dimensional engineering shape search 基于内容的三维工程形状搜索
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320043
K. Lou, K. Ramani, Sunil Prabhakar
We discuss the design and implementation of a prototype 3D engineering shape search system. The system incorporates multiple feature vectors, relevance feedback, and query by example and browsing, flexible definition of shape similarity, and efficient execution through multidimensional indexing and clustering. In order to offer more information for a user to determine similarity of 3D engineering shape, a 3D interface that allows users to manipulate shapes is proposed and implemented to present the search results. The system allows users to specify which feature vectors should be used to perform the search. The system is used to conduct extensive experimentation real data to test the effectiveness of various feature vectors for shape - the first such comparison of this type. The test results show that the descending order of the average precision of feature vectors is: principal moments, moment invariants, geometric parameters, and eigenvalues. In addition, a multistep similarity search strategy is proposed and tested to improve the effectiveness of 3D engineering shape search. It is shown that the multistep approach is more effective than the one-shot search approach, when a fixed number of shapes are retrieved.
我们讨论了一个原型三维工程形状搜索系统的设计与实现。该系统结合了多特征向量、关联反馈、示例查询和浏览查询,灵活定义形状相似度,并通过多维索引和聚类实现高效执行。为了给用户提供更多的信息来确定三维工程形状的相似性,提出并实现了一个允许用户操纵形状的三维界面来显示搜索结果。该系统允许用户指定应该使用哪些特征向量来执行搜索。该系统被用于进行大量的实验真实数据,以测试各种形状特征向量的有效性-这是此类比较的第一次。测试结果表明,特征向量的平均精度由大到小依次为:主矩、矩不变量、几何参数、特征值。此外,为了提高三维工程形状搜索的有效性,提出并验证了多步相似度搜索策略。结果表明,当检索到固定数量的形状时,多步搜索方法比单次搜索方法更有效。
{"title":"Content-based three-dimensional engineering shape search","authors":"K. Lou, K. Ramani, Sunil Prabhakar","doi":"10.1109/ICDE.2004.1320043","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320043","url":null,"abstract":"We discuss the design and implementation of a prototype 3D engineering shape search system. The system incorporates multiple feature vectors, relevance feedback, and query by example and browsing, flexible definition of shape similarity, and efficient execution through multidimensional indexing and clustering. In order to offer more information for a user to determine similarity of 3D engineering shape, a 3D interface that allows users to manipulate shapes is proposed and implemented to present the search results. The system allows users to specify which feature vectors should be used to perform the search. The system is used to conduct extensive experimentation real data to test the effectiveness of various feature vectors for shape - the first such comparison of this type. The test results show that the descending order of the average precision of feature vectors is: principal moments, moment invariants, geometric parameters, and eigenvalues. In addition, a multistep similarity search strategy is proposed and tested to improve the effectiveness of 3D engineering shape search. It is shown that the multistep approach is more effective than the one-shot search approach, when a fixed number of shapes are retrieved.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
BIDE: efficient mining of frequent closed sequences BIDE:高效挖掘频繁闭合序列
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319986
Jianyong Wang, Jiawei Han
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. We present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. We adopt a novel sequence closure checking scheme called bidirectional extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.
以往的研究表明,频繁模式挖掘算法不应该挖掘所有的频繁模式,而应该只挖掘封闭模式,因为后者不仅可以使结果集更紧凑完整,而且可以提高效率。然而,以前开发的大多数封闭模式挖掘算法都是在候选维护和测试范式下工作的,当支持阈值较低或模式变长时,这种模式在运行时和空间使用方面都是非常昂贵的。提出了一种不需要候选维护的频繁闭序列挖掘算法。我们采用了一种新的序列闭包检查方案——双向扩展,并通过使用BackScan修剪方法和Scan-Skip优化技术对搜索空间进行了更深入的修剪。对稀疏和密集的实际数据集进行的全面性能研究表明,ide的性能明显优于以前的算法:它消耗的内存少了几个数量级,而且速度可以快一个数量级以上。在数据库大小方面,它也是线性可扩展的。
{"title":"BIDE: efficient mining of frequent closed sequences","authors":"Jianyong Wang, Jiawei Han","doi":"10.1109/ICDE.2004.1319986","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319986","url":null,"abstract":"Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. We present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. We adopt a novel sequence closure checking scheme called bidirectional extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126008620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 732
Direct mesh: a multiresolution approach to terrain visualization 直接网格:一种多分辨率的地形可视化方法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320044
K. Xu, Xiaofang Zhou, Xuemin Lin
Terrain can be approximated by a triangular mesh consisting millions of 3D points. Multiresolution triangular mesh (MTM) structures are designed to support applications that use terrain data at variable levels of detail (LOD). Typically, an MTM adopts a tree structure where a parent node represents a lower-resolution approximation of its descendants. Given a region of interest (ROI) and a LOD, the process of retrieving the required terrain data from the database is to traverse the MTM tree from the root to reach all the nodes satisfying the ROI and LOD conditions. This process, while being commonly used for multiresolution terrain visualization, is inefficient as either a large number of sequential I/O operations or fetching a large amount of extraneous data is incurred. Various spatial indexes have been proposed in the past to address this problem, however level-by-level tree traversal remains a common practice in order to obtain topological information among the retrieved terrain data. A new MTM data structure called direct mesh is proposed. We demonstrate that with direct mesh the amount of data retrieval can be substantially reduced. Comparing with existing MTM indexing methods, a significant performance improvement has been observed for real-life terrain data.
地形可以通过由数百万个三维点组成的三角形网格来近似。多分辨率三角形网格(MTM)结构设计用于支持使用可变细节水平(LOD)地形数据的应用程序。通常,MTM采用树形结构,其中父节点表示其后代的低分辨率近似值。给定感兴趣区域(ROI)和LOD,从数据库中检索所需地形数据的过程是从根遍历MTM树,以到达满足ROI和LOD条件的所有节点。这个过程虽然通常用于多分辨率地形可视化,但效率很低,因为要么需要进行大量的顺序I/O操作,要么需要获取大量的无关数据。过去已经提出了各种空间索引来解决这个问题,但是为了在检索的地形数据中获得拓扑信息,逐层树遍历仍然是一种常见的做法。提出了一种新的MTM数据结构——直接网格。我们证明了使用直接网格可以大大减少数据检索量。与现有的MTM索引方法相比,该方法对真实地形数据的索引性能有了显著提高。
{"title":"Direct mesh: a multiresolution approach to terrain visualization","authors":"K. Xu, Xiaofang Zhou, Xuemin Lin","doi":"10.1109/ICDE.2004.1320044","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320044","url":null,"abstract":"Terrain can be approximated by a triangular mesh consisting millions of 3D points. Multiresolution triangular mesh (MTM) structures are designed to support applications that use terrain data at variable levels of detail (LOD). Typically, an MTM adopts a tree structure where a parent node represents a lower-resolution approximation of its descendants. Given a region of interest (ROI) and a LOD, the process of retrieving the required terrain data from the database is to traverse the MTM tree from the root to reach all the nodes satisfying the ROI and LOD conditions. This process, while being commonly used for multiresolution terrain visualization, is inefficient as either a large number of sequential I/O operations or fetching a large amount of extraneous data is incurred. Various spatial indexes have been proposed in the past to address this problem, however level-by-level tree traversal remains a common practice in order to obtain topological information among the retrieved terrain data. A new MTM data structure called direct mesh is proposed. We demonstrate that with direct mesh the amount of data retrieval can be substantially reduced. Comparing with existing MTM indexing methods, a significant performance improvement has been observed for real-life terrain data.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128839455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1