首页 > 最新文献

22nd International Conference on Data Engineering Workshops (ICDEW'06)最新文献

英文 中文
Grid Representation for Efficient Similarity Search in Time Series Databases 时间序列数据库中高效相似度搜索的网格表示
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.63
Guifang Duan, Yu Suzuki, K. Kawagoe
Widespread interest in time-series similarity search has made more in need of efficient technique, which can reduce dimensionality of the data and then to index it easily using a multidimensional structure. In this paper, we introduce a new technique, which we called grid representation, based on a grid approximation of the data. We propose a lower bounding distance measure that enables a bitmap approach for fast computation and searching. We also show how grid representation can be indexed with a multidimensional index structure, and demonstrate its superiority.
对时间序列相似性搜索的广泛关注使得对数据降维的高效检索技术的需求越来越大,这种技术既可以降低数据的维数,又可以利用多维结构方便地对数据进行索引。在本文中,我们介绍了一种基于数据的网格逼近的新技术,我们称之为网格表示。我们提出了一种较低的边界距离度量,使位图方法能够快速计算和搜索。我们还展示了如何使用多维索引结构对网格表示进行索引,并展示了它的优越性。
{"title":"Grid Representation for Efficient Similarity Search in Time Series Databases","authors":"Guifang Duan, Yu Suzuki, K. Kawagoe","doi":"10.1109/ICDEW.2006.63","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.63","url":null,"abstract":"Widespread interest in time-series similarity search has made more in need of efficient technique, which can reduce dimensionality of the data and then to index it easily using a multidimensional structure. In this paper, we introduce a new technique, which we called grid representation, based on a grid approximation of the data. We propose a lower bounding distance measure that enables a bitmap approach for fast computation and searching. We also show how grid representation can be indexed with a multidimensional index structure, and demonstrate its superiority.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"56 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128003232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Integration Workbench: Integrating Schema Integration Tools 集成工作台:集成模式集成工具
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.69
P. Mork, A. Rosenthal, Leonard J. Seligman, Joel Korb, Ken Samuel
A key aspect of any data integration endeavor is establishing a transformation that translates instances of one or more source schemata into instances of a target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema integration in which multiple tools share a common knowledge repository. In particular, the workbench facilitates the interoperation of research prototypes for schema matching (which automatically identify likely semantic correspondences) with commercial schema mapping tools (which help produce instance-level transformations). Currently, each of these tools provides its own ad hoc representation of schemata and mappings; combining these tools requires aligning these representations. The workbench provides a common representation so that these tools can more rapidly be combined.
任何数据集成工作的一个关键方面是建立转换,将一个或多个源模式的实例转换为目标模式的实例。无论采用何种集成体系结构或映射形式,都必须处理此模式集成任务。本文提出了一种模式集成的任务模型。我们使用这个分解来激发一个用于模式集成的工作台,其中多个工具共享一个公共知识存储库。特别是,工作台促进了模式匹配(自动识别可能的语义对应)与商业模式映射工具(帮助生成实例级转换)的研究原型的互操作。目前,这些工具中的每一个都提供了自己的模式和映射的特别表示;组合这些工具需要对齐这些表示。工作台提供了一个通用的表示,以便这些工具可以更快地组合在一起。
{"title":"Integration Workbench: Integrating Schema Integration Tools","authors":"P. Mork, A. Rosenthal, Leonard J. Seligman, Joel Korb, Ken Samuel","doi":"10.1109/ICDEW.2006.69","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.69","url":null,"abstract":"A key aspect of any data integration endeavor is establishing a transformation that translates instances of one or more source schemata into instances of a target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema integration in which multiple tools share a common knowledge repository. In particular, the workbench facilitates the interoperation of research prototypes for schema matching (which automatically identify likely semantic correspondences) with commercial schema mapping tools (which help produce instance-level transformations). Currently, each of these tools provides its own ad hoc representation of schemata and mappings; combining these tools requires aligning these representations. The workbench provides a common representation so that these tools can more rapidly be combined.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129312795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Mining Spatial and Spatio-Temporal Patterns in Scientific Data 科学数据中的时空模式挖掘
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.92
Hui Yang, S. Parthasarathy
Data mining is the process of discovering hidden and meaningful knowledge in a data set. It has been successfully applied to many real-life problems, for instance, web personalization, network intrusion detection, and customized marketing. Recent advances in computational sciences have led to the application of data mining to various scientific domains, such as astronomy and bioinformatics, to facilitate the understanding of different scientific processes in the underlying domain. In this thesis work, we focus on designing and applying data mining techniques to analyze spatial and spatiotemporal data originated in scientific domains. Examples of spatial and spatio-temporal data in scientific domains include data describing protein structures and data produced from protein folding simulations, respectively. Specifically, we have proposed a generalized framework to effectively discover different types of spatial and spatio-temporal patterns in scientific data sets. Such patterns can be used to capture a variety of interactions among objects of interest and the evolutionary behavior of such interactions. We have applied the framework to analyze data originated in the following three application domains: bioinformatics, computational molecular dynamics, and computational fluid dynamics. Empirical results demonstrate that the discovered patterns are meaningful in the underlying domain and can provide important insights into various scientific phenomena.
数据挖掘是在数据集中发现隐藏的和有意义的知识的过程。它已经成功地应用于许多现实问题,例如,web个性化、网络入侵检测和定制营销。计算科学的最新进展导致数据挖掘应用于各种科学领域,如天文学和生物信息学,以促进对基础领域中不同科学过程的理解。在本论文中,我们着重于设计和应用数据挖掘技术来分析源自科学领域的时空数据。空间和时空数据在科学领域的例子分别包括描述蛋白质结构的数据和由蛋白质折叠模拟产生的数据。具体而言,我们提出了一个通用框架,以有效地发现科学数据集中不同类型的空间和时空模式。这种模式可用于捕获感兴趣的对象之间的各种交互以及这种交互的演化行为。我们已经应用该框架来分析来自以下三个应用领域的数据:生物信息学、计算分子动力学和计算流体动力学。实证结果表明,发现的模式在基础领域是有意义的,可以为各种科学现象提供重要的见解。
{"title":"Mining Spatial and Spatio-Temporal Patterns in Scientific Data","authors":"Hui Yang, S. Parthasarathy","doi":"10.1109/ICDEW.2006.92","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.92","url":null,"abstract":"Data mining is the process of discovering hidden and meaningful knowledge in a data set. It has been successfully applied to many real-life problems, for instance, web personalization, network intrusion detection, and customized marketing. Recent advances in computational sciences have led to the application of data mining to various scientific domains, such as astronomy and bioinformatics, to facilitate the understanding of different scientific processes in the underlying domain. In this thesis work, we focus on designing and applying data mining techniques to analyze spatial and spatiotemporal data originated in scientific domains. Examples of spatial and spatio-temporal data in scientific domains include data describing protein structures and data produced from protein folding simulations, respectively. Specifically, we have proposed a generalized framework to effectively discover different types of spatial and spatio-temporal patterns in scientific data sets. Such patterns can be used to capture a variety of interactions among objects of interest and the evolutionary behavior of such interactions. We have applied the framework to analyze data originated in the following three application domains: bioinformatics, computational molecular dynamics, and computational fluid dynamics. Empirical results demonstrate that the discovered patterns are meaningful in the underlying domain and can provide important insights into various scientific phenomena.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128624882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
UsingWeb Knowledge to Improve the Wrapping of Web Sources 利用Web知识改进Web源的包装
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.160
Thomas Kabisch, Ronald Padur, D. Rother
During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.
在包装web界面的过程中,本体知识对于支持信息的自动解释非常重要。本体的开发是一个耗时的问题,在全局环境中是不现实的。另一方面,网络提供了大量的知识,这些知识可以用来代替本体。三种常见的网络知识来源是:网络词典、搜索引擎和网络百科全书。本文研究了如何利用Web知识来解决查询接口的参数查找、值的标注和接口进化后的重新标注三个语义问题。对于参数查找问题的解决,实现了一种算法,该算法使用网络百科全书WikiPedia对候选参数值进行初始识别,使用搜索引擎Google对标签-值关系进行验证。该方法已集成到包装器定义框架中。
{"title":"UsingWeb Knowledge to Improve the Wrapping of Web Sources","authors":"Thomas Kabisch, Ronald Padur, D. Rother","doi":"10.1109/ICDEW.2006.160","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.160","url":null,"abstract":"During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133753296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modeling and Advanced Exploitation of eChronicle ‘Narrative’ Information 编年史‘叙事性’信息
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.95
G. P. Zarri
In this paper, we describe NKRL (Narrative Knowledge Representation Language), a conceptual modeling formalism for taking into account the semantic characteristics of this important component of eChronicle information represented by the ‘narrative’ documents. In these documents, the main part of the information consists in the description of the ‘events’ that relate the real or intended behavior of some ‘actors’. Narrative documents of an industrial and economic interest correspond to news stories, corporate documents, normative and legal texts, intelligence messages, medical records, etc. NKRL employs several representational principles and some high-level inference tools.
在本文中,我们描述了NKRL(叙述性知识表示语言),这是一种概念建模形式,用于考虑由“叙述性”文档表示的编年史信息的重要组成部分的语义特征。在这些文件中,信息的主要部分包括对“事件”的描述,这些“事件”与某些“参与者”的真实或预期行为有关。具有工业和经济利益的叙述性文件对应于新闻报道、公司文件、规范和法律文本、情报信息、医疗记录等。NKRL采用了几种表示原则和一些高级推理工具。
{"title":"Modeling and Advanced Exploitation of eChronicle ‘Narrative’ Information","authors":"G. P. Zarri","doi":"10.1109/ICDEW.2006.95","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.95","url":null,"abstract":"In this paper, we describe NKRL (Narrative Knowledge Representation Language), a conceptual modeling formalism for taking into account the semantic characteristics of this important component of eChronicle information represented by the ‘narrative’ documents. In these documents, the main part of the information consists in the description of the ‘events’ that relate the real or intended behavior of some ‘actors’. Narrative documents of an industrial and economic interest correspond to news stories, corporate documents, normative and legal texts, intelligence messages, medical records, etc. NKRL employs several representational principles and some high-level inference tools.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Management of Heterogeneity in the SemanticWeb 语义web中的异构性管理
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.74
P. Atzeni, P. D. Nostro
As various models and languages have been proposed to handle information in the Semantic Web, it is important to be able to translate data from one to another. By referring to two specific models, namely RDF and Topic Maps, we propose a meta-modelling approach, based on previous experiences on handling heterogeneity in the database world.
由于已经提出了各种模型和语言来处理语义Web中的信息,因此能够将数据从一种转换为另一种非常重要。通过参考两个特定的模型,即RDF和Topic Maps,我们提出了一种元建模方法,该方法基于以前在数据库世界中处理异构性的经验。
{"title":"Management of Heterogeneity in the SemanticWeb","authors":"P. Atzeni, P. D. Nostro","doi":"10.1109/ICDEW.2006.74","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.74","url":null,"abstract":"As various models and languages have been proposed to handle information in the Semantic Web, it is important to be able to translate data from one to another. By referring to two specific models, namely RDF and Topic Maps, we propose a meta-modelling approach, based on previous experiences on handling heterogeneity in the database world.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127892614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficiently Computing Inclusion Dependencies for Schema Discovery 高效计算包含依赖关系的模式发现
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.54
Jana Bauckmann, U. Leser, Felix Naumann
Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.
大型数据集成项目通常必须处理未记录的数据源。模式发现的目的是在这种情况下自动查找结构。可以自动检测的属性之间的一类重要关系是包含依赖关系(IND),它为猜测外键约束提供了很好的基础。可以通过比较属性对的不同值的集合来发现索引。在本文中,我们提出了寻找一元ind的有效算法。我们首先说明(以及为什么)SQL不适合这个任务。然后,我们开发了两个算法来计算数据库外的包含依赖关系。两者都比基于sql的方法快得多;事实上,对于较大的模式,它们是唯一可行的解决方案。我们的实验表明,我们可以在大约2.5小时内计算出包含1,680个属性的模式中的所有一元ind,数据库总大小为3.2 GB。
{"title":"Efficiently Computing Inclusion Dependencies for Schema Discovery","authors":"Jana Bauckmann, U. Leser, Felix Naumann","doi":"10.1109/ICDEW.2006.54","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.54","url":null,"abstract":"Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129814442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
A Temporal Clustering Method forWeb Archives web档案的时间聚类方法
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.23
T. Kage, K. Sumiya
Web pages are collected and stored in Web archives, and several methods to construct Web archives have been developed. We propose a method to retrieve time series of Web pages from Web archives by using the pages’ temporal characteristics. We present two processes for searching Web archives based on the temporal relation of query keywords. One is a method for determining the relation. The other is a method of inquiring Web pages based on the relation. In this paper, we discuss the two processes and an experimental result of the method.
Web档案是对Web页面进行收集和存储的一种方式,目前已经发展了几种构建Web档案的方法。本文提出了一种利用网页的时间特征从Web存档中检索网页时间序列的方法。提出了两种基于查询关键字时间关系的Web档案检索方法。一个是确定关系的方法。另一种是基于关系查询Web页面的方法。本文讨论了这两个过程,并给出了该方法的实验结果。
{"title":"A Temporal Clustering Method forWeb Archives","authors":"T. Kage, K. Sumiya","doi":"10.1109/ICDEW.2006.23","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.23","url":null,"abstract":"Web pages are collected and stored in Web archives, and several methods to construct Web archives have been developed. We propose a method to retrieve time series of Web pages from Web archives by using the pages’ temporal characteristics. We present two processes for searching Web archives based on the temporal relation of query keywords. One is a method for determining the relation. The other is a method of inquiring Web pages based on the relation. In this paper, we discuss the two processes and an experimental result of the method.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132226434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Loosely Coupling Java Algorithms and XML Parsers: a Performance-Oriented Study 松散耦合Java算法和XML解析器:面向性能的研究
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.73
G. Psaila
The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.
采用XML来表示任何类型的数据和文档,甚至是复杂和庞大的数据和文档,正在成为一种事实。但是,将算法和应用程序与XML解析器连接需要调整算法和应用程序:基于事件的SAX解析器需要对解析器生成的事件作出反应的算法。但是解析/加载XML文档提供了较差的性能(如果与读取平面文件相比):因此,一些研究试图通过改进解析阶段来解决这个问题,例如,通过采用XML文档的压缩或二进制表示。本文处理硬币的另一面,即算法与XML解析器的耦合问题,以一种不需要改变许多算法的活动(基于轮询)性质并在执行期间提供可接受的性能的方式;当我们考虑Java算法时,这个问题变得更加重要,因为Java算法通常比C或c++算法效率低。本文研究了Java算法与XML解析器之间的松耦合问题。耦合是松散的,因为算法应该不知道解析器提供的特定接口。我们考虑了几种耦合技术,并通过分析它们的性能对它们进行了比较。评估使我们根据特定算法的需求和应用场景确定性能更好的耦合技术。
{"title":"Loosely Coupling Java Algorithms and XML Parsers: a Performance-Oriented Study","authors":"G. Psaila","doi":"10.1109/ICDEW.2006.73","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.73","url":null,"abstract":"The adoption of XML to represent any kind of data and documents, even complex and huge, is becoming a matter of fact. However, interfacing algorithms and applications with XML Parsers requires to adapt algorithms and applications: event-based SAX Parsers need algorithms that react to events generated by the parser. But parsing/loading XML documents provides poor performance (if compared to reading flat files): therefore, several researches are trying to address this problem by improving the parsing phase, e.g., by adopting condensed or binary representations of XML documents. This paper deals with the other side of the coin, i.e., the problem of coupling algorithms with XML Parsers, in a way that does not require to change the active (polling-based) nature of many algorithms and provides acceptable performance during execution; this problem becomes even more important when we consider Java algorithms, that usually are less efficient than C or C++ algorithms. This paper presents a study about the problem of loosely coupling Java algorithms with XML Parsers. The coupling is loose because the algorithm should be unaware of the particular interface provided by parsers. We consider several coupling techniques, and we compare them by analyzing their performance. The evaluation leads us to identify the coupling techniques that perform better, depending on the specific algorithm’s needs and application scenario.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127498965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Managing the Forecast Factory 管理预报工厂
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.76
Laura Bright, D. Maier, Bill Howe
The CORIE forecast factory consists of a set of data product generation runs that are executed daily on dedicated local resources. The goal is to maximize productivity and resource utilization while still ensuring timely completion of all forecasts. Many existing workflow management systems address low-level workflow specification and execution challenges, but do not directly address the high-level challenges posed by large-scale data product factories. In this paper we discuss several specific challenges to managing the CORIE forecast factory including planning and scheduling, improving data flow, and analyzing log data, and point out their analogs in the "physical" manufacturing world. We present solutions we have implemented to address these challenges, and present experimental results that show the benefits of these solutions.
CORIE预测工厂由一组每天在专用本地资源上执行的数据产品生成运行组成。目标是最大限度地提高生产力和资源利用率,同时仍然确保及时完成所有预测。许多现有的工作流管理系统解决了低级别的工作流规范和执行挑战,但没有直接解决大规模数据产品工厂带来的高级挑战。在本文中,我们讨论了管理CORIE预测工厂的几个具体挑战,包括计划和调度、改进数据流和分析日志数据,并指出了它们在“物理”制造世界中的类似之处。我们提出了我们为应对这些挑战而实施的解决方案,并提出了显示这些解决方案的好处的实验结果。
{"title":"Managing the Forecast Factory","authors":"Laura Bright, D. Maier, Bill Howe","doi":"10.1109/ICDEW.2006.76","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.76","url":null,"abstract":"The CORIE forecast factory consists of a set of data product generation runs that are executed daily on dedicated local resources. The goal is to maximize productivity and resource utilization while still ensuring timely completion of all forecasts. Many existing workflow management systems address low-level workflow specification and execution challenges, but do not directly address the high-level challenges posed by large-scale data product factories. In this paper we discuss several specific challenges to managing the CORIE forecast factory including planning and scheduling, improving data flow, and analyzing log data, and point out their analogs in the \"physical\" manufacturing world. We present solutions we have implemented to address these challenges, and present experimental results that show the benefits of these solutions.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
22nd International Conference on Data Engineering Workshops (ICDEW'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1