首页 > 最新文献

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献

英文 中文
Towards Co-Evolution of Data-Centric Ecosystems. 迈向以数据为中心的生态系统的共同进化。
Robert Schuler, Karl Czajkowski, Mike D'Arcy, Hongsuda Tangmunarunkit, Carl Kesselman

Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosystems that allows the components to seamlessly co-evolve by centralizing the models and mappings at the data service and pushing model-adaptive interactions to the database clients. Boundary objects fill the gap where applications are unable to adapt and need a stable interface to interact with the components of the ecosystem. Finally, evolution of the ecosystem is enabled via integrated schema modification and model management operations. We present use cases from actual experiences that demonstrate the utility of our approach.

众所周知,数据库演化是一项非常困难的任务,而演化依赖于数据库的应用程序的必要性又加剧了这种困难。随着科学越来越依赖于复杂的数据管理,发展一系列数据库驱动系统的需求只会加剧。在本文中,我们提出了一种以数据为中心的生态系统的体系结构,通过在数据服务中集中模型和映射,并向数据库客户端推送自适应模型的交互,允许组件无缝地共同进化。边界对象填补了应用程序无法适应的空白,需要一个稳定的接口来与生态系统的组件进行交互。最后,生态系统的进化是通过集成的模式修改和模型管理操作来实现的。我们给出了来自实际经验的用例,这些用例演示了我们的方法的实用性。
{"title":"Towards Co-Evolution of Data-Centric Ecosystems.","authors":"Robert Schuler,&nbsp;Karl Czajkowski,&nbsp;Mike D'Arcy,&nbsp;Hongsuda Tangmunarunkit,&nbsp;Carl Kesselman","doi":"10.1145/3400903.3400908","DOIUrl":"https://doi.org/10.1145/3400903.3400908","url":null,"abstract":"<p><p>Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosystems that allows the components to seamlessly co-evolve by centralizing the models and mappings at the data service and pushing model-adaptive interactions to the database clients. Boundary objects fill the gap where applications are unable to adapt and need a stable interface to interact with the components of the ecosystem. Finally, evolution of the ecosystem is enabled via integrated schema modification and model management operations. We present use cases from actual experiences that demonstrate the utility of our approach.</p>","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"2020 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3400903.3400908","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh 利用分层三角网格对复杂地理区域的数十亿点进行高效分类
Dániel Kondor, L. Dobos, I. Csabai, A. Bodor, G. Vattay, T. Budavári, A. Szalay
We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.
我们提出了一个案例研究,使用基于Microsoft SQL Server的分层三角网格(Hierarchical Triangular Mesh, HTM)对来自地理标记的社交网络数据的数十亿个地理坐标进行空间索引和区域分类。由于HTM库缺乏某些特性,我们将其与SQL Server的GIS功能结合使用,显著提高了空间过滤和连接查询的预过滤效率。例如,我们实现了一种新的算法来计算复杂地理区域的HTM细分,并预先计算HTM三角形与地理区域的交集,以实现更快的假正滤波。通过对索引结构的完全控制,基于html的简单包含搜索的预过滤性能比SQL Server空间索引高出10倍,基于html的空间连接运行速度大约快100倍。
{"title":"Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh","authors":"Dániel Kondor, L. Dobos, I. Csabai, A. Bodor, G. Vattay, T. Budavári, A. Szalay","doi":"10.1145/2618243.2618245","DOIUrl":"https://doi.org/10.1145/2618243.2618245","url":null,"abstract":"We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"42 1","pages":"4:1-4:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77624359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
SensorBench: benchmarking approaches to processing wireless sensor network data SensorBench:处理无线传感器网络数据的基准测试方法
I. Galpin, A. B. Stokes, G. Valkanas, A. Gray, N. Paton, A. Fernandes, K. Sattler, D. Gunopulos
Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources. Research that seeks to support data intensive sensor applications has explored a range of approaches and developed many different techniques, including bespoke algorithms for specific analyses and generic sensor network query processors. However, all such proposals sit within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. This paper presents a benchmark that seeks to support the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The contributions of the paper include: (i) the identification of key variables and performance metrics; (ii) the specification of experiments that explore how different types of task perform under different metrics for the controlled variables; and (iii) an application of the benchmark to investigate the behavior of several representative platforms and techniques.
无线传感器网络为精准农业和环境监测等任务提供了经济高效的数据收集。然而,传感器节点的资源受限特性(通常计算能力和电池寿命都有限)意味着使用它们的应用程序必须明智地利用这些资源。旨在支持数据密集型传感器应用的研究已经探索了一系列方法并开发了许多不同的技术,包括用于特定分析的定制算法和通用传感器网络查询处理器。然而,所有这些建议都位于多维设计空间中,很难理解特定决策的含义并确定最佳解决方案。本文提出了一个基准,旨在支持不同技术和平台的系统分析和比较,使开发和用户社区都能做出明智的选择。本文的贡献包括:(i)关键变量和绩效指标的识别;(ii)实验规范,探索不同类型的任务如何在不同的控制变量指标下执行;(iii)应用基准来调查几个有代表性的平台和技术的行为。
{"title":"SensorBench: benchmarking approaches to processing wireless sensor network data","authors":"I. Galpin, A. B. Stokes, G. Valkanas, A. Gray, N. Paton, A. Fernandes, K. Sattler, D. Gunopulos","doi":"10.1145/2618243.2618252","DOIUrl":"https://doi.org/10.1145/2618243.2618252","url":null,"abstract":"Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources. Research that seeks to support data intensive sensor applications has explored a range of approaches and developed many different techniques, including bespoke algorithms for specific analyses and generic sensor network query processors. However, all such proposals sit within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. This paper presents a benchmark that seeks to support the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The contributions of the paper include: (i) the identification of key variables and performance metrics; (ii) the specification of experiments that explore how different types of task perform under different metrics for the controlled variables; and (iii) an application of the benchmark to investigate the behavior of several representative platforms and techniques.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"31 1","pages":"21:1-21:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73367384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MR-microT: a MapReduce-based MicroRNA target prediction method MR-microT:一种基于mapreduce的MicroRNA靶标预测方法
Ilias Kanellos, Thanasis Vergoulis, Dimitris Sacharidis, Theodore Dalamagas, A. Hatzigeorgiou, S. Sartzetakis, T. Sellis
MicroRNAs (miRNAs) are small RNA molecules that inhibit the expression of particular genes, a function that makes them useful towards the treatment of many diseases. Computational methods that predict which genes are targeted by particular miRNA molecules are known as target prediction methods. In this paper, we present a MapReduce-based system, termed MR-microT, for one of the most popular and accurate, but computational intensive, prediction methods. MR-microT offers the highly requested by life scientists feature of predicting the targets of ad-hoc miRNA molecules in near-real time through an intuitive Web interface.
MicroRNAs (miRNAs)是一种小的RNA分子,可以抑制特定基因的表达,这种功能使它们对许多疾病的治疗有用。预测特定miRNA分子靶向哪些基因的计算方法被称为靶标预测方法。在本文中,我们提出了一个基于mapreduce的系统,称为MR-microT,这是最流行和最准确的预测方法之一,但计算量很大。MR-microT提供了生命科学家高度要求的功能,通过直观的Web界面近乎实时地预测ad-hoc miRNA分子的目标。
{"title":"MR-microT: a MapReduce-based MicroRNA target prediction method","authors":"Ilias Kanellos, Thanasis Vergoulis, Dimitris Sacharidis, Theodore Dalamagas, A. Hatzigeorgiou, S. Sartzetakis, T. Sellis","doi":"10.1145/2618243.2618289","DOIUrl":"https://doi.org/10.1145/2618243.2618289","url":null,"abstract":"MicroRNAs (miRNAs) are small RNA molecules that inhibit the expression of particular genes, a function that makes them useful towards the treatment of many diseases. Computational methods that predict which genes are targeted by particular miRNA molecules are known as target prediction methods. In this paper, we present a MapReduce-based system, termed MR-microT, for one of the most popular and accurate, but computational intensive, prediction methods. MR-microT offers the highly requested by life scientists feature of predicting the targets of ad-hoc miRNA molecules in near-real time through an intuitive Web interface.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"72 1","pages":"47:1-47:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87693453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
(k, d)-core anonymity: structural anonymization of massive networks (k, d)-核心匿名:大规模网络的结构性匿名化
Roland Assam, Marwan Hassani, M. Brysch, T. Seidl
Networks entail vulnerable and sensitive information that pose serious privacy threats. In this paper, we introduce, k-core attack, a new attack model which stems from the k-core decomposition principle. K-core attack undermines the privacy of some state-of-the-art techniques. We propose a novel structural anonymization technique called (k, Δ)-Core Anonymity, which harnesses the k-core attack and structurally anonymizes small and large networks. In addition, although real-world social networks are massive in nature, most existing works focus on the anonymization of networks with less than one hundred thousand nodes. (k, Δ)-Core Anonymity is tailored for massive networks. To the best of our knowledge, this is the first technique that provides empirical studies on structural network anonymization for massive networks. Using three real and two synthetic datasets, we demonstrate the effectiveness of our technique on small and large networks with up to 1.7 million nodes and 17.8 million edges. Our experiments reveal that our approach outperforms a state-of-the-art work in several aspects.
网络包含易受攻击和敏感的信息,构成严重的隐私威胁。本文介绍了基于k核分解原理的一种新的攻击模型——k核攻击。k核攻击破坏了一些最先进技术的隐私。我们提出了一种新的结构匿名技术,称为(k, Δ)-核心匿名,它利用k -Core攻击并对小型和大型网络进行结构匿名。此外,尽管现实世界的社交网络本质上是庞大的,但大多数现有的工作都集中在小于10万个节点的网络的匿名化上。(k, Δ)-核心匿名是为大规模网络量身定制的。据我们所知,这是第一个为大规模网络提供结构化网络匿名化实证研究的技术。使用三个真实数据集和两个合成数据集,我们证明了我们的技术在拥有多达170万个节点和1780万个边的小型和大型网络上的有效性。我们的实验表明,我们的方法在几个方面优于最先进的工作。
{"title":"(k, d)-core anonymity: structural anonymization of massive networks","authors":"Roland Assam, Marwan Hassani, M. Brysch, T. Seidl","doi":"10.1145/2618243.2618269","DOIUrl":"https://doi.org/10.1145/2618243.2618269","url":null,"abstract":"Networks entail vulnerable and sensitive information that pose serious privacy threats. In this paper, we introduce, k-core attack, a new attack model which stems from the k-core decomposition principle. K-core attack undermines the privacy of some state-of-the-art techniques. We propose a novel structural anonymization technique called (k, Δ)-Core Anonymity, which harnesses the k-core attack and structurally anonymizes small and large networks. In addition, although real-world social networks are massive in nature, most existing works focus on the anonymization of networks with less than one hundred thousand nodes. (k, Δ)-Core Anonymity is tailored for massive networks. To the best of our knowledge, this is the first technique that provides empirical studies on structural network anonymization for massive networks. Using three real and two synthetic datasets, we demonstrate the effectiveness of our technique on small and large networks with up to 1.7 million nodes and 17.8 million edges. Our experiments reveal that our approach outperforms a state-of-the-art work in several aspects.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"6 1","pages":"17:1-17:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87360927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Efficient processing of exploratory top-k joins 有效处理探索性top-k连接
Orestis Gkorgkas, Akrivi Vlachou, C. Doulkeridis, K. Nørvåg
In this paper, we address the problem of discovering a ranked set of k distinct main objects combined with additional (accessory) objects that best fit the given preferences. This problem is challenging because it considers object combinations of variable size, where objects are combined only if the combination produces a higher score, and thus becomes more preferable to a user. In this way, users can explore overviews of combinations that are more suited to their preferences than single objects, without the need to explicitly specify which objects should be combined. We model this problem as a rank-join problem where each combination is represented by a set of tuples from different relations and we call the respective query eXploratory Top-k Join query. Existing approaches fall short to tackle this problem because they impose a fixed size of combinations, they do not distinguish on combinations based on the main objects or they do not take into account user preferences. We introduce a more efficient bounding scheme that can be used on an adaptation of the rank-join algorithm, which exploits some key properties of our problem and allows earlier termination of query processing. Our experimental evaluation demonstrates the efficiency of the proposed bounding technique.
在本文中,我们解决了发现k个不同的主要对象与最适合给定偏好的附加(附属)对象相结合的排序集的问题。这个问题是具有挑战性的,因为它考虑了可变大小的对象组合,只有当组合产生更高的分数时,对象才会组合,从而变得更受用户欢迎。通过这种方式,用户可以探索比单个对象更适合其偏好的组合的概述,而无需显式指定应该组合哪些对象。我们将此问题建模为排序连接问题,其中每个组合由来自不同关系的一组元组表示,我们将各自的查询称为eXploratory Top-k Join查询。现有的方法无法解决这个问题,因为它们强加了固定大小的组合,它们没有根据主要对象区分组合,或者它们没有考虑用户的偏好。我们引入了一种更有效的边界方案,可用于对rank-join算法的改进,该方案利用了问题的一些关键属性,并允许更早地终止查询处理。我们的实验评估证明了所提出的边界技术的有效性。
{"title":"Efficient processing of exploratory top-k joins","authors":"Orestis Gkorgkas, Akrivi Vlachou, C. Doulkeridis, K. Nørvåg","doi":"10.1145/2618243.2618280","DOIUrl":"https://doi.org/10.1145/2618243.2618280","url":null,"abstract":"In this paper, we address the problem of discovering a ranked set of k distinct main objects combined with additional (accessory) objects that best fit the given preferences. This problem is challenging because it considers object combinations of variable size, where objects are combined only if the combination produces a higher score, and thus becomes more preferable to a user. In this way, users can explore overviews of combinations that are more suited to their preferences than single objects, without the need to explicitly specify which objects should be combined. We model this problem as a rank-join problem where each combination is represented by a set of tuples from different relations and we call the respective query eXploratory Top-k Join query. Existing approaches fall short to tackle this problem because they impose a fixed size of combinations, they do not distinguish on combinations based on the main objects or they do not take into account user preferences. We introduce a more efficient bounding scheme that can be used on an adaptation of the rank-join algorithm, which exploits some key properties of our problem and allows earlier termination of query processing. Our experimental evaluation demonstrates the efficiency of the proposed bounding technique.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"30 1","pages":"35:1-35:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75193587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating fault-tolerance and elasticity in a distributed data stream processing system 在分布式数据流处理系统中集成容错和弹性
Kasper Grud Skat Madsen, Philip Thyssen, Yongluan Zhou
Recently there has been an increasing interest in building distributed platforms for processing of fast data streams. In this demonstration, we highlight the need for elasticity in distributed data stream processing systems and present Enorm, a data stream processing platform with focus on elasticity, i.e. the ability to dynamically scale resource usage according to the runtime workload fluctuations. In order to achieve dynamic scaling with minimal overhead and latency, we use an integrated approach for both fault-tolerance and elasticity. The idea is that both fault-tolerance and elasticity essentially require replicating or migrating computation states among different nodes. Integrating and sharing the state management operations between the two modules can not only provide abundant opportunities to reduce the system's runtime overhead but also simplify the system's architecture.
最近,人们对构建用于处理快速数据流的分布式平台越来越感兴趣。在这个演示中,我们强调了分布式数据流处理系统中对弹性的需求,并介绍了Enorm,一个专注于弹性的数据流处理平台,即根据运行时工作负载波动动态扩展资源使用的能力。为了以最小的开销和延迟实现动态扩展,我们使用了容错和弹性的集成方法。其思想是,容错和弹性本质上都需要在不同节点之间复制或迁移计算状态。在两个模块之间集成和共享状态管理操作不仅可以提供大量的机会来减少系统的运行时开销,还可以简化系统的体系结构。
{"title":"Integrating fault-tolerance and elasticity in a distributed data stream processing system","authors":"Kasper Grud Skat Madsen, Philip Thyssen, Yongluan Zhou","doi":"10.1145/2618243.2618288","DOIUrl":"https://doi.org/10.1145/2618243.2618288","url":null,"abstract":"Recently there has been an increasing interest in building distributed platforms for processing of fast data streams. In this demonstration, we highlight the need for elasticity in distributed data stream processing systems and present Enorm, a data stream processing platform with focus on elasticity, i.e. the ability to dynamically scale resource usage according to the runtime workload fluctuations. In order to achieve dynamic scaling with minimal overhead and latency, we use an integrated approach for both fault-tolerance and elasticity. The idea is that both fault-tolerance and elasticity essentially require replicating or migrating computation states among different nodes. Integrating and sharing the state management operations between the two modules can not only provide abundant opportunities to reduce the system's runtime overhead but also simplify the system's architecture.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"16 1","pages":"48:1-48:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74953733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Data patterns to alleviate the design of scientific workflows exemplified by a bone simulation 数据模式减轻科学工作流程的设计,以骨模拟为例
P. Reimann, H. Schwarz, B. Mitschang
Scientific workflows often have to process huge data sets in a multiplicity of data formats. For that purpose, they typically embed complex data provisioning tasks that transform these heterogeneous data into formats the underlying tools or services can handle. This results in an increased complexity of workflow design. As scientists typically design their scientific workflows on their own, this complexity hinders them to concentrate on their core issue, namely the experiments, analyses, or simulations they conduct. In this paper, we present the core idea of a pattern-based approach to alleviate the design of scientific workflows. This approach is particularly targeted at the needs of scientists. We exemplify and assess the pattern-based design approach by applying it to a complex scientific workflow realizing a real-world simulation of structure changes in bones.
科学工作流程通常必须处理多种数据格式的大量数据集。为此,它们通常嵌入复杂的数据供应任务,将这些异构数据转换为底层工具或服务可以处理的格式。这导致工作流设计的复杂性增加。由于科学家通常自己设计他们的科学工作流程,这种复杂性阻碍了他们专注于他们的核心问题,即他们进行的实验,分析或模拟。在本文中,我们提出了一种基于模式的方法来减轻科学工作流的设计的核心思想。这种方法特别针对科学家的需求。我们举例说明和评估基于模式的设计方法,将其应用于复杂的科学工作流程,实现了骨骼结构变化的真实世界模拟。
{"title":"Data patterns to alleviate the design of scientific workflows exemplified by a bone simulation","authors":"P. Reimann, H. Schwarz, B. Mitschang","doi":"10.1145/2618243.2618279","DOIUrl":"https://doi.org/10.1145/2618243.2618279","url":null,"abstract":"Scientific workflows often have to process huge data sets in a multiplicity of data formats. For that purpose, they typically embed complex data provisioning tasks that transform these heterogeneous data into formats the underlying tools or services can handle. This results in an increased complexity of workflow design. As scientists typically design their scientific workflows on their own, this complexity hinders them to concentrate on their core issue, namely the experiments, analyses, or simulations they conduct. In this paper, we present the core idea of a pattern-based approach to alleviate the design of scientific workflows. This approach is particularly targeted at the needs of scientists. We exemplify and assess the pattern-based design approach by applying it to a complex scientific workflow realizing a real-world simulation of structure changes in bones.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"64 1","pages":"43:1-43:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83667896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Skew-resistant parallel in-memory spatial join 抗歪斜并行内存空间连接
S. Ray, Bogdan Simion, Angela Demke Brown, Ryan Johnson
Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time. Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.
空间连接是科学和地理信息系统中许多空间分析应用的关键操作。由于空间谓词计算的计算密集型性质,即使使用中等大小的数据集,空间连接查询也可能很慢。因此,空间连接的高效并行化对于许多空间应用程序实现可接受的性能至关重要。技术趋势,包括不断增加的核心数量和越来越大的主存,在这方面带来了很大的希望。以前的并行空间连接方法试图对数据集进行分区,使每个分区中的空间对象数量尽可能相等。他们也只关注过滤步骤。然而,当包含更多计算密集型的细化步骤时,由于对象的大小不均匀,可能会出现明显的处理偏差。这种处理倾斜极大地限制了空间连接查询可实现的并行性能,因为运行时间最长的空间分区决定了总体查询执行时间。我们的解决方案是SPINOJA,一个抗倾斜的并行内存空间连接基础设施。SPINOJA引入了mod -四叉树聚类,它对空间数据集进行分区,使每个分区所需的计算量相等,并使处理倾斜最小化。我们比较了用于创建分区的三种工作指标和用于将分区分配给多个核心的三种负载平衡策略。SPINOJA使用内存中的列存储来存储空间表。我们的评估表明,SPINOJA比以前的空间连接方法的内存实现有很大的优势,并且比最近提出的内存空间连接算法有一个数量级的优势。
{"title":"Skew-resistant parallel in-memory spatial join","authors":"S. Ray, Bogdan Simion, Angela Demke Brown, Ryan Johnson","doi":"10.1145/2618243.2618262","DOIUrl":"https://doi.org/10.1145/2618243.2618262","url":null,"abstract":"Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time.\u0000 Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"6:1-6:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90175090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Schema matching over relations, attributes, and data values 关系、属性和数据值上的模式匹配
Aibo Tian, M. Kejriwal, Daniel P. Miranker
Automatic schema matching algorithms are typically only concerned with finding attribute correspondences. However, real world data integration problems often require matchings whose arguments span all three types of elements in relational databases: relation, attribute and data value. This paper introduces the definitions and semantics of three additional correspondence types concerning both schema and data values. These correspondences cover the higher-order mappings identified in a seminal paper by Krishnamurthy, Litwin, and Kent. It is shown that these correspondences can be automatically translated to tuple generating dependencies (tgds), and thus this research is compatible with data integration applications that leverage tgds. Two methods for automatically identifying these correspondences are developed. One requires a limited number of duplicates across data sources. The other is a general instance-based method with no such requirement. Experiments conducted on four real world data sets demonstrate the effectiveness of the methods.
自动模式匹配算法通常只关注查找属性对应。然而,现实世界中的数据集成问题通常需要匹配,其参数跨越关系数据库中的所有三种元素类型:关系、属性和数据值。本文介绍了另外三种涉及模式和数据值的通信类型的定义和语义。这些对应关系涵盖了Krishnamurthy、Litwin和Kent在一篇开创性论文中确定的高阶映射。结果表明,这些对应关系可以自动转换为元组生成依赖关系(tgds),因此本研究与利用tgds的数据集成应用程序兼容。提出了两种自动识别这些对应关系的方法。一种方法是在数据源之间需要有限数量的副本。另一种是通用的基于实例的方法,没有这样的要求。在四个真实数据集上进行的实验证明了该方法的有效性。
{"title":"Schema matching over relations, attributes, and data values","authors":"Aibo Tian, M. Kejriwal, Daniel P. Miranker","doi":"10.1145/2618243.2618248","DOIUrl":"https://doi.org/10.1145/2618243.2618248","url":null,"abstract":"Automatic schema matching algorithms are typically only concerned with finding attribute correspondences. However, real world data integration problems often require matchings whose arguments span all three types of elements in relational databases: relation, attribute and data value. This paper introduces the definitions and semantics of three additional correspondence types concerning both schema and data values. These correspondences cover the higher-order mappings identified in a seminal paper by Krishnamurthy, Litwin, and Kent. It is shown that these correspondences can be automatically translated to tuple generating dependencies (tgds), and thus this research is compatible with data integration applications that leverage tgds.\u0000 Two methods for automatically identifying these correspondences are developed. One requires a limited number of duplicates across data sources. The other is a general instance-based method with no such requirement. Experiments conducted on four real world data sets demonstrate the effectiveness of the methods.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"10 1","pages":"28:1-28:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89378139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1