{"title":"On accurate POI recommendation via transfer learning","authors":"Hao Zhang, Siyi Wei, Xiaojiao Hu, Ying Li, Jiajie Xu","doi":"10.1007/s10619-020-07299-7","DOIUrl":"https://doi.org/10.1007/s10619-020-07299-7","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"38 1","pages":"585 - 599"},"PeriodicalIF":1.2,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07299-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47111403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-06DOI: 10.1007/s10619-020-07295-x
Edouard Fouché, Alan Mazankiewicz, Florian Kalinke, Klemens Böhm
{"title":"A framework for dependency estimation in heterogeneous data streams","authors":"Edouard Fouché, Alan Mazankiewicz, Florian Kalinke, Klemens Böhm","doi":"10.1007/s10619-020-07295-x","DOIUrl":"https://doi.org/10.1007/s10619-020-07295-x","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"39 1","pages":"415 - 444"},"PeriodicalIF":1.2,"publicationDate":"2020-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07295-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52191630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-16DOI: 10.1007/s10619-020-07296-w
Rebeca Schroeder, Raqueline R. M. Penteado, Carmem S. Hara
{"title":"A data distribution model for RDF","authors":"Rebeca Schroeder, Raqueline R. M. Penteado, Carmem S. Hara","doi":"10.1007/s10619-020-07296-w","DOIUrl":"https://doi.org/10.1007/s10619-020-07296-w","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"39 1","pages":"129 - 167"},"PeriodicalIF":1.2,"publicationDate":"2020-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07296-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52191653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDEW49219.2020.00013
Andrea Hillenbrand, U. Störl, Shamil Nabiyev, Meike Klettke
When NoSQL database systems are used in an agile software development setting, data model changes occur frequently and thus, data is routinely stored in different versions. The management of versioned data leads to an overhead potentially impeding the software development. Several data migration strategies exist that handle legacy data differently during data accesses, each of which can be characterized by certain advantages and disadvantages. Depending on the requirements for the software application, we evaluate and compare different migration strategies through metrics like migration costs and latency as well as precision and recall. Ideally, exactly that strategy should be selected whose characteristics fulfill service-level agreements and match the migration scenario, which depends on the query workload and the changes in the data model which imply an evolution of the database schema. In this paper, we present a methodology of self-adapting data migration, which automatically adjusts migration strategies and their parameters with respect to the migration scenario and service-level agreements, thereby contributing to the self-management of database systems and supporting agile development.
{"title":"Self-adapting data migration in the context of schema evolution in NoSQL databases","authors":"Andrea Hillenbrand, U. Störl, Shamil Nabiyev, Meike Klettke","doi":"10.1109/ICDEW49219.2020.00013","DOIUrl":"https://doi.org/10.1109/ICDEW49219.2020.00013","url":null,"abstract":"When NoSQL database systems are used in an agile software development setting, data model changes occur frequently and thus, data is routinely stored in different versions. The management of versioned data leads to an overhead potentially impeding the software development. Several data migration strategies exist that handle legacy data differently during data accesses, each of which can be characterized by certain advantages and disadvantages. Depending on the requirements for the software application, we evaluate and compare different migration strategies through metrics like migration costs and latency as well as precision and recall. Ideally, exactly that strategy should be selected whose characteristics fulfill service-level agreements and match the migration scenario, which depends on the query workload and the changes in the data model which imply an evolution of the database schema. In this paper, we present a methodology of self-adapting data migration, which automatically adjusts migration strategies and their parameters with respect to the migration scenario and service-level agreements, thereby contributing to the self-management of database systems and supporting agile development.","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"40 1","pages":"5 - 25"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDEW49219.2020.00013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48202535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDEW49219.2020.00010
M. Jibril, Philipp Götze, David Broneske, K. Sattler
After the introduction of Persistent Memory in the form of Intel’s Optane DC Persistent Memory on the market in 2019, it has found its way into manifold applications and systems. As Google and other cloud infrastructure providers are starting to incorporate Persistent Memory into their portfolio, it is only logical that cloud applications have to exploit its inherent properties. Persistent Memory can serve as a DRAM substitute, but guarantees persistence at the cost of compromised read/write performance compared to standard DRAM. These properties particularly affect the performance of index structures, since they are subject to frequent updates and queries. However, adapting each and every index structure to exploit the properties of Persistent Memory is tedious. Hence, we require a general technique that hides this access gap, e.g., by using DRAM caching strategies. To exploit Persistent Memory properties for analytical index structures, we propose selective caching . It is based on a mixture of dynamic and static caching of tree nodes in DRAM to reach near-DRAM access speeds for index structures. In this paper, we evaluate selective caching on the OLAP-optimized main-memory index structure Elf, because its memory layout allows for an easy caching. Our experiments show that if configured well, selective caching with a suitable replacement strategy can keep pace with pure DRAM storage of Elf while guaranteeing persistence. These results are also reflected when selective caching is used for parallel workloads.
在2019年以英特尔Optane DC Persistent Memory的形式在市场上推出Persistent Memory之后,它已经进入了多种应用和系统。随着b谷歌和其他云基础设施提供商开始将持久性内存整合到他们的产品组合中,云应用程序必须利用其固有属性是合乎逻辑的。持久性内存可以作为DRAM的替代品,但与标准DRAM相比,它以牺牲读/写性能为代价来保证持久性。这些属性特别影响索引结构的性能,因为它们受到频繁更新和查询的影响。然而,调整每个索引结构来利用持久性内存的属性是很繁琐的。因此,我们需要一种通用的技术来隐藏这种访问间隙,例如,通过使用DRAM缓存策略。为了利用分析索引结构的持久内存属性,我们提出了选择性缓存。它基于DRAM中树节点的动态和静态缓存的混合,以达到接近DRAM的索引结构访问速度。在本文中,我们评估了olap优化的主内存索引结构Elf上的选择性缓存,因为它的内存布局允许简单的缓存。我们的实验表明,如果配置得当,具有合适替换策略的选择性缓存可以与Elf的纯DRAM存储保持同步,同时保证持久性。当对并行工作负载使用选择性缓存时,也会反映出这些结果。
{"title":"Selective caching: a persistent memory approach for multi-dimensional index structures","authors":"M. Jibril, Philipp Götze, David Broneske, K. Sattler","doi":"10.1109/ICDEW49219.2020.00010","DOIUrl":"https://doi.org/10.1109/ICDEW49219.2020.00010","url":null,"abstract":"After the introduction of Persistent Memory in the form of Intel’s Optane DC Persistent Memory on the market in 2019, it has found its way into manifold applications and systems. As Google and other cloud infrastructure providers are starting to incorporate Persistent Memory into their portfolio, it is only logical that cloud applications have to exploit its inherent properties. Persistent Memory can serve as a DRAM substitute, but guarantees persistence at the cost of compromised read/write performance compared to standard DRAM. These properties particularly affect the performance of index structures, since they are subject to frequent updates and queries. However, adapting each and every index structure to exploit the properties of Persistent Memory is tedious. Hence, we require a general technique that hides this access gap, e.g., by using DRAM caching strategies. To exploit Persistent Memory properties for analytical index structures, we propose selective caching . It is based on a mixture of dynamic and static caching of tree nodes in DRAM to reach near-DRAM access speeds for index structures. In this paper, we evaluate selective caching on the OLAP-optimized main-memory index structure Elf, because its memory layout allows for an easy caching. Our experiments show that if configured well, selective caching with a suitable replacement strategy can keep pace with pure DRAM storage of Elf while guaranteeing persistence. These results are also reflected when selective caching is used for parallel workloads.","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"40 1","pages":"47-66"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDEW49219.2020.00010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47157551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDEW49219.2020.00009
Tobias Vinçon, Arthur Bernhardt, Lukas Weber, A. Koch, Ilia Petrov
Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.
{"title":"On the necessity of explicit cross-layer data formats in near-data processing systems","authors":"Tobias Vinçon, Arthur Bernhardt, Lukas Weber, A. Koch, Ilia Petrov","doi":"10.1109/ICDEW49219.2020.00009","DOIUrl":"https://doi.org/10.1109/ICDEW49219.2020.00009","url":null,"abstract":"Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"40 1","pages":"27-45"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDEW49219.2020.00009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44187166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-10DOI: 10.1007/s10619-020-07286-y
Hani Al-Sayeh, Stefan Hagedorn, K. Sattler
{"title":"A gray-box modeling methodology for runtime prediction of Apache Spark jobs","authors":"Hani Al-Sayeh, Stefan Hagedorn, K. Sattler","doi":"10.1007/s10619-020-07286-y","DOIUrl":"https://doi.org/10.1007/s10619-020-07286-y","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"1 1","pages":"1-21"},"PeriodicalIF":1.2,"publicationDate":"2020-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07286-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52191488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-08DOI: 10.1007/s10619-020-07283-1
Jiajie Xu, J. Chen, Lihua Yin
{"title":"Multi-objective spatial keyword query with semantics: a distance-owner based approach","authors":"Jiajie Xu, J. Chen, Lihua Yin","doi":"10.1007/s10619-020-07283-1","DOIUrl":"https://doi.org/10.1007/s10619-020-07283-1","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"69 1","pages":"625 - 647"},"PeriodicalIF":1.2,"publicationDate":"2020-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07283-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52191439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-29DOI: 10.1007/s10619-020-07284-0
Gabriela Suntaxi, Aboubakr Achraf El Ghazi, Klemens Böhm
{"title":"Secrecy and performance models for query processing on outsourced graph data","authors":"Gabriela Suntaxi, Aboubakr Achraf El Ghazi, Klemens Böhm","doi":"10.1007/s10619-020-07284-0","DOIUrl":"https://doi.org/10.1007/s10619-020-07284-0","url":null,"abstract":"","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"39 1","pages":"35 - 77"},"PeriodicalIF":1.2,"publicationDate":"2020-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10619-020-07284-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52191462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}