首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Outrageous ideas and/or thoughts while shaving 剃须时的荒唐想法和/或想法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320096
M. Stonebraker
For this closing panel discussion, we will recruit a collection of participants from the attendees and organizers. Each will agree to present one or more outrageous ideas that are too wacky to get funded and/or incapable of being turned into least-publishable units (LPUs). Less adventuresome panelists can present their pet peeve about research activities pursued by other in the DBMS community. Risk averse panelists can discuss more mundane problems which they would like to work on if they had more time or were excused from department committees.
在这次闭幕小组讨论中,我们将从与会者和组织者中招募一些参与者。每个人都同意提出一个或多个离谱的想法,这些想法太过古怪,无法获得资助,或者无法变成最不容易出版的单位(least-publishable units, lpu)。不太冒险的小组成员可以提出他们对DBMS社区中其他人所从事的研究活动的不满。不愿冒险的小组成员可以讨论更多的世俗问题,如果他们有更多的时间或被允许参加部门委员会,他们会想要解决这些问题。
{"title":"Outrageous ideas and/or thoughts while shaving","authors":"M. Stonebraker","doi":"10.1109/ICDE.2004.1320096","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320096","url":null,"abstract":"For this closing panel discussion, we will recruit a collection of participants from the attendees and organizers. Each will agree to present one or more outrageous ideas that are too wacky to get funded and/or incapable of being turned into least-publishable units (LPUs). Less adventuresome panelists can present their pet peeve about research activities pursued by other in the DBMS community. Risk averse panelists can discuss more mundane problems which they would like to work on if they had more time or were excused from department committees.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132163238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling uncertainties in publish/subscribe systems 发布/订阅系统中的不确定性建模
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320023
Haifeng Liu, H. Jacobsen
In the publish/subscribe paradigm, information providers disseminate publications to all consumers who have expressed interest by registering subscriptions. This paradigm has found wide-spread applications, ranging from selective information dissemination to network management. However, all existing publish/subscribe systems cannot capture uncertainty inherent to the information in either subscriptions or publications. In many situations, exact knowledge of either specific subscriptions or publications is not available. Moreover, especially in selective information dissemination applications, it is often more appropriate for a user to formulate her search requests or information offers in less precise terms, rather than defining a sharp limit. To address these problems, this paper proposes a new publish/subscribe model based on possibility theory and fuzzy set theory to process uncertainties for both subscriptions and publications. Furthermore, an approximate publish/subscribe matching problem is defined and algorithms for solving it are developed and evaluated.
在发布/订阅范例中,信息提供者将出版物分发给通过注册订阅表示有兴趣的所有消费者。这种模式在从选择性信息传播到网络管理等领域得到了广泛的应用。然而,所有现有的发布/订阅系统都无法捕获订阅或发布中信息固有的不确定性。在许多情况下,无法获得特定订阅或出版物的确切信息。此外,特别是在有选择性的信息传播应用中,用户以较不精确的术语拟订其搜索请求或提供的信息,而不是规定一个严格的限制,往往更为合适。针对这些问题,本文提出了一种新的基于可能性理论和模糊集理论的发布/订阅模型来处理订阅和发布的不确定性。在此基础上,定义了一个近似的发布/订阅匹配问题,并给出了求解该问题的算法。
{"title":"Modeling uncertainties in publish/subscribe systems","authors":"Haifeng Liu, H. Jacobsen","doi":"10.1109/ICDE.2004.1320023","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320023","url":null,"abstract":"In the publish/subscribe paradigm, information providers disseminate publications to all consumers who have expressed interest by registering subscriptions. This paradigm has found wide-spread applications, ranging from selective information dissemination to network management. However, all existing publish/subscribe systems cannot capture uncertainty inherent to the information in either subscriptions or publications. In many situations, exact knowledge of either specific subscriptions or publications is not available. Moreover, especially in selective information dissemination applications, it is often more appropriate for a user to formulate her search requests or information offers in less precise terms, rather than defining a sharp limit. To address these problems, this paper proposes a new publish/subscribe model based on possibility theory and fuzzy set theory to process uncertainties for both subscriptions and publications. Furthermore, an approximate publish/subscribe matching problem is defined and algorithms for solving it are developed and evaluated.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116915691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Benchmarking SAP R/3 archiving scenarios 对SAP R/3归档场景进行基准测试
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320046
Bernhard Zeller, A. Kemper
According to a survey of the University of Berkeley [P. Lyman et al., (2003)], about 5 Exabytes of new information has been created in 2002. This information explosion affects also the database volumes of enterprise resource planning (ERP) systems like SAP R/3, the market leader for ERP systems. Just like the overall information explosion, the database volumes of ERP systems are growing at a tremendous rate and some of them have reached a size of several Terabytes. OLTP (online transaction processing) databases of this size are hard to maintain and tend to perform poorly. One way to limit the size of a database is data staging, i.e., to make use of an SAP technique called archiving. That is, data which are not needed for every-day operations are demoted from the database (disks) to tertiary storage (tapes). In cooperation with our research group, SAP is adapting their archiving techniques to accelerate the archiving process by integrating new technologies like XML and advanced database features. However, so far no benchmark existed to evaluate different archiving scenarios and to measure the impact of a change in the archiving technique. We therefore designed and implemented a generic benchmark which is applicable to many different system layouts and allows the users to evaluate various archiving scenarios.
根据伯克利大学的一项调查[P.]Lyman et al.,(2003)], 2002年创造了大约5eb的新信息。这种信息爆炸也影响了企业资源规划(ERP)系统的数据库容量,如ERP系统的市场领导者SAP R/3。就像整个信息爆炸一样,ERP系统的数据库量也在以惊人的速度增长,其中一些已经达到了几tb的大小。这种规模的OLTP(在线事务处理)数据库很难维护,而且往往表现不佳。限制数据库大小的一种方法是数据分段,即利用一种称为归档的SAP技术。也就是说,将日常操作不需要的数据从数据库(磁盘)降级到三级存储(磁带)。SAP正在与我们的研究小组合作,通过集成新技术(如XML和高级数据库特性)来调整他们的归档技术,以加速归档过程。然而,到目前为止,还没有基准来评估不同的归档场景和度量归档技术变更的影响。因此,我们设计并实现了一个通用基准测试,它适用于许多不同的系统布局,并允许用户评估各种归档场景。
{"title":"Benchmarking SAP R/3 archiving scenarios","authors":"Bernhard Zeller, A. Kemper","doi":"10.1109/ICDE.2004.1320046","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320046","url":null,"abstract":"According to a survey of the University of Berkeley [P. Lyman et al., (2003)], about 5 Exabytes of new information has been created in 2002. This information explosion affects also the database volumes of enterprise resource planning (ERP) systems like SAP R/3, the market leader for ERP systems. Just like the overall information explosion, the database volumes of ERP systems are growing at a tremendous rate and some of them have reached a size of several Terabytes. OLTP (online transaction processing) databases of this size are hard to maintain and tend to perform poorly. One way to limit the size of a database is data staging, i.e., to make use of an SAP technique called archiving. That is, data which are not needed for every-day operations are demoted from the database (disks) to tertiary storage (tapes). In cooperation with our research group, SAP is adapting their archiving techniques to accelerate the archiving process by integrating new technologies like XML and advanced database features. However, so far no benchmark existed to evaluate different archiving scenarios and to measure the impact of a change in the archiving technique. We therefore designed and implemented a generic benchmark which is applicable to many different system layouts and allows the users to evaluate various archiving scenarios.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115469387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
EShopMonitor: a Web content monitoring tool EShopMonitor: Web内容监控工具
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320055
N. Agrawal, Rema Ananthanarayanan, Rahul Gupta, Sachindra Joshi, R. Krishnapuram, Sumit Negi
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.
商业站点上呈现的数据有数千个页面,并且通常是从多个后端来源交付的。这使得识别不正确、异常或有趣的数据(如9.99美元的机票、缺失的链接、价格的剧烈变化以及新产品或促销活动的增加)变得困难。我们描述了一个自动监视Web站点并生成各种类型报告的系统,以便可以监视站点的内容并保持其质量。我们设计并实现的解决方案包括一个抓取动态页面的站点爬虫,一个根据用户提供的示例学习从页面中提取有用信息的信息挖掘器,以及一个可以由用户配置以回答特定查询的报告器。该工具还可以用于识别价格趋势和竞争对手网站的新产品或促销活动。该工具的试运行已在ibm.com站点上成功完成。
{"title":"EShopMonitor: a Web content monitoring tool","authors":"N. Agrawal, Rema Ananthanarayanan, Rahul Gupta, Sachindra Joshi, R. Krishnapuram, Sumit Negi","doi":"10.1109/ICDE.2004.1320055","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320055","url":null,"abstract":"Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114147527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Load shedding for aggregation queries over data streams 数据流上聚合查询的负载减少
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320010
Brian Babcock, Mayur Datar, R. Motwani
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. We focus on one particular type of adaptivity: the ability to gracefully degrade performance via "load shedding" (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot be met in full given available resources. Focusing on aggregation queries, we present algorithms that determine at what points in a query plan should load shedding be performed and what amount of load should be shed at each point in order to minimize the degree of inaccuracy introduced into query answers. We report the results of experiments that validate our analytical conclusions.
处理对数据流的连续监视查询的系统必须是自适应的,因为数据流通常是突发的,数据特征可能随时间而变化。我们专注于一种特殊类型的适应性:当给定的可用资源不能完全满足系统上的需求时,通过“减载”(删除未处理的元组以减少系统负载)优雅地降低性能的能力。关注聚合查询,我们介绍了一些算法,这些算法确定在查询计划中的哪些点应该执行负载卸载,以及在每个点应该卸载多少负载,以便将引入查询答案的不准确程度降至最低。我们报告验证我们的分析结论的实验结果。
{"title":"Load shedding for aggregation queries over data streams","authors":"Brian Babcock, Mayur Datar, R. Motwani","doi":"10.1109/ICDE.2004.1320010","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320010","url":null,"abstract":"Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. We focus on one particular type of adaptivity: the ability to gracefully degrade performance via \"load shedding\" (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot be met in full given available resources. Focusing on aggregation queries, we present algorithms that determine at what points in a query plan should load shedding be performed and what amount of load should be shed at each point in order to minimize the degree of inaccuracy introduced into query answers. We report the results of experiments that validate our analytical conclusions.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 374
Minimization and group-by detection for nested XQueries 对嵌套的xquery进行最小化和分组检测
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320069
Alin Deutsch, Y. Papakonstantinou, Yu Xu
We extend tree pattern queries into group-by normal form tree pattern (GNFTP) queries, which are nested, perform arbitrary joins, and freely mix bag and set semantics. Here, we describe a subset of XQuery, called OptXQuery and provide a normalization algorithm that rewrites any OptXQuery into a GNFTP query. Key logical query optimizations can be solved for GNFTP/OptXQuery. As a proof-of-concept but also for its own importance and value in query optimization, we developed and evaluated a query minimization algorithm for GNFTP. The rich features of GN-FTP/OptXQuery create key challenges that fundamentally extend the prior work on the problems of minimizing conjunctive queries. An important application of this technique is group-by detection. We extend GNFTP into extGNFTP to capture XQueries outside the OptXQuery set. The extGNFTP notation provides the logical plan optimization framework of our XQuery processor.
我们将树模式查询扩展为按组的规范化树模式(GNFTP)查询,这些查询是嵌套的,执行任意连接,并自由混合包和集语义。在这里,我们描述XQuery的一个子集,称为OptXQuery,并提供一个规范化算法,该算法将任何OptXQuery重写为GNFTP查询。关键逻辑查询优化可以解决GNFTP/OptXQuery。作为概念验证,同时也考虑到它在查询优化中的重要性和价值,我们为GNFTP开发并评估了一个查询最小化算法。GN-FTP/OptXQuery的丰富特性带来了关键挑战,从根本上扩展了最小化联合查询问题的先前工作。该技术的一个重要应用是分组检测。我们将GNFTP扩展到extGNFTP,以捕获OptXQuery集之外的xquery。extGNFTP符号提供了XQuery处理器的逻辑计划优化框架。
{"title":"Minimization and group-by detection for nested XQueries","authors":"Alin Deutsch, Y. Papakonstantinou, Yu Xu","doi":"10.1109/ICDE.2004.1320069","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320069","url":null,"abstract":"We extend tree pattern queries into group-by normal form tree pattern (GNFTP) queries, which are nested, perform arbitrary joins, and freely mix bag and set semantics. Here, we describe a subset of XQuery, called OptXQuery and provide a normalization algorithm that rewrites any OptXQuery into a GNFTP query. Key logical query optimizations can be solved for GNFTP/OptXQuery. As a proof-of-concept but also for its own importance and value in query optimization, we developed and evaluated a query minimization algorithm for GNFTP. The rich features of GN-FTP/OptXQuery create key challenges that fundamentally extend the prior work on the problems of minimizing conjunctive queries. An important application of this technique is group-by detection. We extend GNFTP into extGNFTP to capture XQueries outside the OptXQuery set. The extGNFTP notation provides the logical plan optimization framework of our XQuery processor.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124702086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Efficient similarity search in large databases of tree structured objects 大型数据库中树状结构对象的高效相似性搜索
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320066
K. Murthy, H. Kriegel, Stefan Schönauer, T. Seidl
We implemented our new approach for efficient similarity search in large databases of tree structures. Our experiments show that filtering significantly accelerates the complex task of similarity search for tree-structured objects. Moreover, they show that no single feature of a tree is sufficient for effective filtering, but only the combination of structural and content-based filters yields good results.
我们在大型树状结构数据库中实现了高效的相似性搜索。我们的实验表明,过滤显著加快了树状结构对象的复杂相似性搜索任务。此外,他们还表明,树的单个特征不足以进行有效的过滤,只有结构过滤器和基于内容的过滤器相结合才能产生良好的结果。
{"title":"Efficient similarity search in large databases of tree structured objects","authors":"K. Murthy, H. Kriegel, Stefan Schönauer, T. Seidl","doi":"10.1109/ICDE.2004.1320066","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320066","url":null,"abstract":"We implemented our new approach for efficient similarity search in large databases of tree structures. Our experiments show that filtering significantly accelerates the complex task of similarity search for tree-structured objects. Moreover, they show that no single feature of a tree is sufficient for effective filtering, but only the combination of structural and content-based filters yields good results.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"93 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124795193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SPINE: putting backbone into string indexing SPINE:将主干放入字符串索引中
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320008
Naresh Neelapala, Romil Mittal, J. Haritsa
The indexing technique commonly used for long strings, such as genomes, is the suffix tree, which is based on a vertical (intra-path) compaction of the underlying trie structure. We investigate an alternative approach to index building, based on horizontal (inter-path) compaction of the trie. In particular, we present SPINE, a carefully engineered horizontally-compacted trie index. SPINE consists of a backbone formed by a linear chain of nodes representing the underlying string, with the nodes connected by a rich set of edges for facilitating fast forward and backward traversals over the backbone during index construction and query search. A special feature of SPINE is that it collapses the trie into a linear structure, representing the logical extreme of horizontal compaction. We describe algorithms for SPINE construction and for searching this index to find the occurrences of query patterns. Our experimental results on a variety of real genomic and proteomic strings show that SPINE requires significantly less space than standard implementations of suffix trees. Further, SPINE takes lesser time for both construction and search as compared to suffix trees, especially when the index is disk-resident. Finally, the linearity of its structure makes it more amenable for integration with database engines.
通常用于长字符串(如基因组)的索引技术是后缀树,它基于底层trie结构的垂直(路径内)压缩。我们研究了索引构建的另一种方法,基于树的水平(路径间)压缩。特别地,我们提出SPINE,一个精心设计的水平压缩索引。SPINE由一个表示底层字符串的节点线性链组成的主干,节点之间由一组丰富的边连接,以便在索引构建和查询搜索期间在主干上快速向前和向后遍历。SPINE的一个特别之处在于,它将trie压缩成线性结构,表示水平压缩的逻辑极限。我们描述了SPINE构造和搜索该索引以查找查询模式出现的算法。我们在各种真实的基因组和蛋白质组字符串上的实验结果表明,SPINE比后缀树的标准实现所需的空间要少得多。此外,与后缀树相比,SPINE在构建和搜索方面花费的时间更少,尤其是当索引驻留在磁盘中时。最后,其结构的线性使其更适合与数据库引擎集成。
{"title":"SPINE: putting backbone into string indexing","authors":"Naresh Neelapala, Romil Mittal, J. Haritsa","doi":"10.1109/ICDE.2004.1320008","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320008","url":null,"abstract":"The indexing technique commonly used for long strings, such as genomes, is the suffix tree, which is based on a vertical (intra-path) compaction of the underlying trie structure. We investigate an alternative approach to index building, based on horizontal (inter-path) compaction of the trie. In particular, we present SPINE, a carefully engineered horizontally-compacted trie index. SPINE consists of a backbone formed by a linear chain of nodes representing the underlying string, with the nodes connected by a rich set of edges for facilitating fast forward and backward traversals over the backbone during index construction and query search. A special feature of SPINE is that it collapses the trie into a linear structure, representing the logical extreme of horizontal compaction. We describe algorithms for SPINE construction and for searching this index to find the occurrences of query patterns. Our experimental results on a variety of real genomic and proteomic strings show that SPINE requires significantly less space than standard implementations of suffix trees. Further, SPINE takes lesser time for both construction and search as compared to suffix trees, especially when the index is disk-resident. Finally, the linearity of its structure makes it more amenable for integration with database engines.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131349950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
"My personal web": a seminar on personalization and privacy for web and converged services “我的个人网络”:一个关于网络和融合服务的个性化和隐私的研讨会
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320098
I. Fundulaki, R. Hull, B. Kumar, D. Lieuwen, Arnaud Sahuguet
The web services paradigm holds the promise of tremendous flexibility in how services are combined to meet the needs of individual end-users. The “convergence” of networks (wireline telephony, wireless, data) further enhances the web services paradigm, by enabling the incorporation of real time contextual information (e.g., presence and location) along with opportunities for web services to impact the physical world more immediately (e.g., a vending machine delivering a soda based on a purchase via a cell phone). But it will not be possible for most end-users to enjoy the rich and intricate possibilities, unless a broad variety of personalization technologies are available and respect the end user’s legitimate need for privacy. This seminar begins with examples illustrating why personalization will be so important for the emerging web and converged services. The main body of the seminar focuses on 3 inter-related technologies. First is profile data management, the ability for services to share and access end-user profile data (including address, credit card, “simple” preferences, current location, current presence, ...) as appropriate for the services to be provided. Second is preference and policy management, the ability to store and execute on intricate, interrelated preferences that end-users may have (e.g., “during working hours, calls from strangers should be routed to voice-mail”; “I usually work from 9 to 6, but on Thursdays it is from 8 to 4”; ...). And third is personalized and privacy-conscious data sharing of profile data and preferences, the notion that an end-user should have complete control over what profile and preference data is shared with whom and under what circumstances and how it is interpreted. In addition to describing emerging approaches for providing these capabilities, the seminar will describe how to add value to applications by using personalization, from both the end-user and the application provider perspectives.
web服务范式承诺在如何组合服务以满足单个最终用户的需求方面具有极大的灵活性。网络(有线电话、无线、数据)的“融合”进一步增强了web服务范式,使实时上下文信息(例如,存在和位置)与web服务更直接地影响物理世界的机会结合在一起(例如,基于通过手机购买的自动售货机提供苏打水)。但是,对于大多数最终用户来说,享受丰富而复杂的可能性是不可能的,除非有各种各样的个性化技术可用,并尊重最终用户对隐私的合法需求。本次研讨会以举例说明为什么个性化对新兴的网络和融合服务如此重要。研讨会的主体集中在三个相互关联的技术。首先是概要数据管理,即服务共享和访问终端用户概要数据(包括地址、信用卡、“简单”偏好、当前位置、当前存在状态等)的能力,以适合所提供的服务。第二是偏好和策略管理,即存储和执行最终用户可能拥有的复杂的、相互关联的偏好的能力(例如,“在工作时间,陌生人打来的电话应该转到语音信箱”;“我通常从9点工作到6点,但周四从8点工作到4点”;……)。第三是个人资料数据和偏好的个性化和隐私数据共享,即终端用户应该完全控制与谁、在什么情况下以及如何解释哪些个人资料和偏好数据共享。除了描述提供这些功能的新兴方法外,研讨会还将从最终用户和应用程序提供者的角度描述如何通过使用个性化为应用程序增加价值。
{"title":"\"My personal web\": a seminar on personalization and privacy for web and converged services","authors":"I. Fundulaki, R. Hull, B. Kumar, D. Lieuwen, Arnaud Sahuguet","doi":"10.1109/ICDE.2004.1320098","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320098","url":null,"abstract":"The web services paradigm holds the promise of tremendous flexibility in how services are combined to meet the needs of individual end-users. The “convergence” of networks (wireline telephony, wireless, data) further enhances the web services paradigm, by enabling the incorporation of real time contextual information (e.g., presence and location) along with opportunities for web services to impact the physical world more immediately (e.g., a vending machine delivering a soda based on a purchase via a cell phone). But it will not be possible for most end-users to enjoy the rich and intricate possibilities, unless a broad variety of personalization technologies are available and respect the end user’s legitimate need for privacy. This seminar begins with examples illustrating why personalization will be so important for the emerging web and converged services. The main body of the seminar focuses on 3 inter-related technologies. First is profile data management, the ability for services to share and access end-user profile data (including address, credit card, “simple” preferences, current location, current presence, ...) as appropriate for the services to be provided. Second is preference and policy management, the ability to store and execute on intricate, interrelated preferences that end-users may have (e.g., “during working hours, calls from strangers should be routed to voice-mail”; “I usually work from 9 to 6, but on Thursdays it is from 8 to 4”; ...). And third is personalized and privacy-conscious data sharing of profile data and preferences, the notion that an end-user should have complete control over what profile and preference data is shared with whom and under what circumstances and how it is interpreted. In addition to describing emerging approaches for providing these capabilities, the seminar will describe how to add value to applications by using personalization, from both the end-user and the application provider perspectives.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124637516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient execution of computation modules in a model with massive data 在具有海量数据的模型中高效执行计算模块
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320071
Gary Kratkiewicz, R. Bostwick, Geoffrey S. Knauth
Models and simulations for analyzing difficult real-world problems often deal with massive amounts of data. The data problem is compounded when the analysis must be repeatedly run to perform what-if analyses. One such model is the Integrated Consumable Item Support (ICIS) Model developed for the U.S. Defense Logistics Agency (DLA). It models DLA's ability to satisfy future wartime requirements for parts, fuel, and food. ICIS uses a number of computation modules to project demands, model sourcing, and identify potential problem items for various commodities and military services. These modules are written in a variety of computer languages and must work together to generate an ICIS analysis.
用于分析现实世界难题的模型和模拟通常要处理大量数据。当必须重复运行分析以执行假设分析时,数据问题就变得更加复杂。其中一个模型是为美国国防后勤局(DLA)开发的综合消耗品支持(ICIS)模型。它模拟了pla满足未来战时零件、燃料和食品需求的能力。ICIS使用许多计算模块来规划需求、建模采购,并识别各种商品和军事服务的潜在问题项目。这些模块是用各种计算机语言编写的,必须协同工作才能生成ICIS分析。
{"title":"Efficient execution of computation modules in a model with massive data","authors":"Gary Kratkiewicz, R. Bostwick, Geoffrey S. Knauth","doi":"10.1109/ICDE.2004.1320071","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320071","url":null,"abstract":"Models and simulations for analyzing difficult real-world problems often deal with massive amounts of data. The data problem is compounded when the analysis must be repeatedly run to perform what-if analyses. One such model is the Integrated Consumable Item Support (ICIS) Model developed for the U.S. Defense Logistics Agency (DLA). It models DLA's ability to satisfy future wartime requirements for parts, fuel, and food. ICIS uses a number of computation modules to project demands, model sourcing, and identify potential problem items for various commodities and military services. These modules are written in a variety of computer languages and must work together to generate an ICIS analysis.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129507539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1