首页 > 最新文献

Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

英文 中文
UniCache: Efficient Log Replication through Learning Workload Patterns UniCache:通过学习工作负载模式实现高效日志复制
Harald Ng, Kun Wu, Paris Carbone
Most of the world’s cloud data service workloads are currently being backed by replicated state machines. Production-grade log replication protocols used for the job impose heavy data transfer duties on the primary server which need to disseminate the log commands to all the replica servers. UniCache proposes a principal solution to this problem using a learned replicated cache which enables commands to be sent over the network as compressed encodings. UniCache takes advantage of that each replica has access to a consistent prefix of the replicated log which allows them to build a uniform lookup cache used for compressing and decompressing commands consistently. UniCache achieves effective speedups, lowering the primary load in application workloads with a skewed data distribution. Our experimental studies showcase a low pre-processing overhead and the highest performance gains in cross-data center deployments over wide area networks.
世界上大多数云数据服务工作负载目前都由复制状态机提供支持。用于作业的生产级日志复制协议在主服务器上施加了繁重的数据传输任务,主服务器需要将日志命令传播到所有副本服务器。UniCache提出了一个主要的解决方案来解决这个问题,它使用学习复制缓存,使命令能够以压缩编码的形式在网络上发送。UniCache利用了每个副本都可以访问复制日志的一致前缀的优势,这使得它们可以构建统一的查找缓存,用于一致地压缩和解压缩命令。UniCache实现了有效的加速,降低了具有倾斜数据分布的应用程序工作负载的主负载。我们的实验研究表明,在广域网跨数据中心部署中,预处理开销较低,性能收益最高。
{"title":"UniCache: Efficient Log Replication through Learning Workload Patterns","authors":"Harald Ng, Kun Wu, Paris Carbone","doi":"10.48786/edbt.2023.39","DOIUrl":"https://doi.org/10.48786/edbt.2023.39","url":null,"abstract":"Most of the world’s cloud data service workloads are currently being backed by replicated state machines. Production-grade log replication protocols used for the job impose heavy data transfer duties on the primary server which need to disseminate the log commands to all the replica servers. UniCache proposes a principal solution to this problem using a learned replicated cache which enables commands to be sent over the network as compressed encodings. UniCache takes advantage of that each replica has access to a consistent prefix of the replicated log which allows them to build a uniform lookup cache used for compressing and decompressing commands consistently. UniCache achieves effective speedups, lowering the primary load in application workloads with a skewed data distribution. Our experimental studies showcase a low pre-processing overhead and the highest performance gains in cross-data center deployments over wide area networks.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"30 3 1","pages":"471-477"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90875060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SonicJoin: Fast, Robust and Worst-case Optimal SonicJoin:快速,稳健和最坏情况最优
Ahmad Khazaie, H. Pirk
The establishment of the AGM bound on the size of intermediate results of natural join queries has led to the development of several so-called worst-case join algorithms. These algorithms provably produce intermediate results that are (asymptotically) no larger than the final result of the join. The most notable ones are the Recursive Join , its successor, the Generic Join and the Leapfrog-Trie-Join . While algorithmically efficient, however, all of these algorithms require the availability of index structures that allow tuple lookups using the prefix of a key. Key-prefix-lookups in relational database systems are commonly supported by tree-based index structures since hash-based indices only support full-key lookups. In this paper, we study a wide variety of main-memory-oriented index structures that support key-prefix-lookups with a specific focus on supporting the Generic Join. Based on that study, we develop a novel, best-of-breed index structure called Sonic that combines the fast build and point lookup properties of hashtables with the prefix-lookups capabilities of trees and tries. To evaluate the performance of a variety of indices for worst-case optimal joins in a modern code-generating DBMS, we leveraged flexible, compile-time metaprogramming features to build a framework that creates highly efficient code, interweaving (at a microarchitectural level) a generic join implementation with any appropriate index structure. We demonstrate experimentally that in that framework, Sonic outperforms the fastest existing approaches by up to 2.5 times when supporting the Generic Join algorithm.
自然连接查询中间结果大小的AGM界的建立导致了几种所谓的最坏情况连接算法的发展。可以证明,这些算法产生的中间结果(渐近地)不大于连接的最终结果。最值得注意的是递归连接,它的继任者,泛型连接和跨越式尝试连接。然而,虽然算法效率很高,但所有这些算法都需要索引结构的可用性,这些索引结构允许使用键的前缀进行元组查找。关系数据库系统中的键前缀查找通常由基于树的索引结构支持,因为基于散列的索引只支持全键查找。在本文中,我们研究了各种面向主内存的索引结构,这些结构支持键前缀查找,并特别关注支持泛型连接。基于该研究,我们开发了一种新的、同类最佳的索引结构Sonic,它将散列表的快速构建和点查找属性与树和尝试的前缀查找功能相结合。为了评估现代代码生成DBMS中最坏情况下最优连接的各种索引的性能,我们利用灵活的编译时元编程特性构建了一个框架,该框架创建了高效的代码,(在微体系结构级别)将通用连接实现与任何适当的索引结构交织在一起。我们通过实验证明,在该框架中,当支持Generic Join算法时,Sonic的性能比现有最快的方法高出2.5倍。
{"title":"SonicJoin: Fast, Robust and Worst-case Optimal","authors":"Ahmad Khazaie, H. Pirk","doi":"10.48786/edbt.2023.46","DOIUrl":"https://doi.org/10.48786/edbt.2023.46","url":null,"abstract":"The establishment of the AGM bound on the size of intermediate results of natural join queries has led to the development of several so-called worst-case join algorithms. These algorithms provably produce intermediate results that are (asymptotically) no larger than the final result of the join. The most notable ones are the Recursive Join , its successor, the Generic Join and the Leapfrog-Trie-Join . While algorithmically efficient, however, all of these algorithms require the availability of index structures that allow tuple lookups using the prefix of a key. Key-prefix-lookups in relational database systems are commonly supported by tree-based index structures since hash-based indices only support full-key lookups. In this paper, we study a wide variety of main-memory-oriented index structures that support key-prefix-lookups with a specific focus on supporting the Generic Join. Based on that study, we develop a novel, best-of-breed index structure called Sonic that combines the fast build and point lookup properties of hashtables with the prefix-lookups capabilities of trees and tries. To evaluate the performance of a variety of indices for worst-case optimal joins in a modern code-generating DBMS, we leveraged flexible, compile-time metaprogramming features to build a framework that creates highly efficient code, interweaving (at a microarchitectural level) a generic join implementation with any appropriate index structure. We demonstrate experimentally that in that framework, Sonic outperforms the fastest existing approaches by up to 2.5 times when supporting the Generic Join algorithm.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"540-551"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75143850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reasoning over Financial Scenarios with the Vadalog System 用Vadalog系统对金融场景进行推理
Teodoro Baldazzi, Luigi Bellomarini, Emanuel Sallinger
{"title":"Reasoning over Financial Scenarios with the Vadalog System","authors":"Teodoro Baldazzi, Luigi Bellomarini, Emanuel Sallinger","doi":"10.48786/edbt.2023.66","DOIUrl":"https://doi.org/10.48786/edbt.2023.66","url":null,"abstract":"","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"53 1","pages":"782-791"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84585796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning the Utility-Privacy Trade-Off in Trajectory Data 轨迹数据中效用与隐私权衡的调优
Maja Schneider, P. Christen, E. Rahm, Jonathan Schneider, Lea Löffelmann
Trajectory data, often collected on a large scale with mobile sensors in smartphones and vehicles, are a valuable source for realiz-ing smart city applications, or for improving the user experience in mobile apps. But such data can also leak private information, such as a person’s whereabouts and their points of interest (POI). These in turn can reveal sensitive information, for example a person’s age, gender, religion, or home and work address. Location privacy preserving mechanisms (LPPM) can mitigate this issue by transforming data so that private details are protected. But privacy-preservation typically comes at the cost of a loss of utility. It can be challenging to find a suitable mechanism and the right settings to satisfy privacy as well as utility. In this work, we present Privacy Tuna, an interactive open-source framework to visualize trajectory data, and intuitively estimate data utility and privacy while applying various LPPMs. Our tool makes it easy for data owners to investigate the value of their data, choose a suitable privacy-preserving mechanism and tune its parameters to achieve a good utility-privacy trade-off.
轨迹数据通常通过智能手机和车辆中的移动传感器大规模收集,是实现智慧城市应用程序或改善移动应用程序用户体验的宝贵来源。但这些数据也可能泄露私人信息,比如一个人的行踪和他们的兴趣点(POI)。这反过来又会泄露敏感信息,例如一个人的年龄、性别、宗教信仰或家庭和工作地址。位置隐私保护机制(LPPM)可以通过转换数据以保护隐私详细信息来缓解这个问题。但保护隐私通常是以失去效用为代价的。找到一个合适的机制和正确的设置来满足隐私和实用是很有挑战性的。在这项工作中,我们提出了Privacy Tuna,这是一个交互式开源框架,用于可视化轨迹数据,并在应用各种lppm时直观地估计数据效用和隐私。我们的工具使数据所有者可以轻松地调查其数据的价值,选择合适的隐私保护机制并调整其参数,以实现良好的效用-隐私权衡。
{"title":"Tuning the Utility-Privacy Trade-Off in Trajectory Data","authors":"Maja Schneider, P. Christen, E. Rahm, Jonathan Schneider, Lea Löffelmann","doi":"10.48786/edbt.2023.78","DOIUrl":"https://doi.org/10.48786/edbt.2023.78","url":null,"abstract":"Trajectory data, often collected on a large scale with mobile sensors in smartphones and vehicles, are a valuable source for realiz-ing smart city applications, or for improving the user experience in mobile apps. But such data can also leak private information, such as a person’s whereabouts and their points of interest (POI). These in turn can reveal sensitive information, for example a person’s age, gender, religion, or home and work address. Location privacy preserving mechanisms (LPPM) can mitigate this issue by transforming data so that private details are protected. But privacy-preservation typically comes at the cost of a loss of utility. It can be challenging to find a suitable mechanism and the right settings to satisfy privacy as well as utility. In this work, we present Privacy Tuna, an interactive open-source framework to visualize trajectory data, and intuitively estimate data utility and privacy while applying various LPPMs. Our tool makes it easy for data owners to investigate the value of their data, choose a suitable privacy-preserving mechanism and tune its parameters to achieve a good utility-privacy trade-off.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"108 1","pages":"839-842"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85339441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-Network Approximate and Efficient Spatiotemporal Range Queries on Moving Objects 运动对象的网络近似和高效时空距离查询
Guang Yang, Liang Liang
Data aggregations enable privacy-aware data analytics for moving objects. A spatiotemporal range count query is a fundamental query that aggregates the count of objects in a given spatial region and a time interval. Existing works are designed for centralized systems, which lead to issues with extensive communication and the potential for data leaks. Current in-network systems suffer from the distinct count problem (counting the same objects multiple times) and the dead space problem (excessive intra-communication from ill-suited spatial subdivisions). We propose a novel framework based on a planar graph representation for efficient privacy-aware in-network aggregate queries. Unlike conventional spatial decomposition methods, our framework uses sensor placement techniques to select sensors to reduce dead space. A submodular maximization-based method is introduced when the query distribution is known and a host of sampling methods are used when the query distribution is unknown or dynamic. We avoid double counting by tracking movements along the graph edges using discrete differential forms. We support queries with arbitrary temporal intervals with a constant-sized regression model that accelerates the query performance and reduces the storage size. We evaluate our method on real-world mobility data, which yields us a relative error of at most 13 . 8% with 25 . 6% of sensors while achieving a speedup of 3 . 5 × , 69 . 81% reduction in sensors accessed, and a storage reduction of 99 . 96% compared to finding the exact count.
数据聚合支持对移动对象进行隐私感知的数据分析。时空范围计数查询是聚合给定空间区域和时间间隔内对象计数的基本查询。现有的工作是为集中式系统设计的,这导致了广泛的通信和潜在的数据泄露问题。当前的网络内系统存在明显计数问题(对相同对象进行多次计数)和死空间问题(由于不合适的空间细分而导致的过度内部通信)。我们提出了一种基于平面图表示的网络聚合查询框架。与传统的空间分解方法不同,我们的框架使用传感器放置技术来选择传感器以减少死区。当查询分布已知时,引入基于子模块最大化的方法;当查询分布未知或动态时,使用大量抽样方法。我们通过使用离散微分形式跟踪沿图边的运动来避免重复计数。我们支持具有任意时间间隔的查询,使用恒定大小的回归模型可以加速查询性能并减少存储大小。我们在真实世界的移动数据上评估了我们的方法,这使我们的相对误差最多为13。8%的人选择25。6%的传感器,同时实现3的加速。5 ×, 69。访问的传感器减少81%,存储减少99%。96%与找到准确的数字相比。
{"title":"In-Network Approximate and Efficient Spatiotemporal Range Queries on Moving Objects","authors":"Guang Yang, Liang Liang","doi":"10.48786/edbt.2024.04","DOIUrl":"https://doi.org/10.48786/edbt.2024.04","url":null,"abstract":"Data aggregations enable privacy-aware data analytics for moving objects. A spatiotemporal range count query is a fundamental query that aggregates the count of objects in a given spatial region and a time interval. Existing works are designed for centralized systems, which lead to issues with extensive communication and the potential for data leaks. Current in-network systems suffer from the distinct count problem (counting the same objects multiple times) and the dead space problem (excessive intra-communication from ill-suited spatial subdivisions). We propose a novel framework based on a planar graph representation for efficient privacy-aware in-network aggregate queries. Unlike conventional spatial decomposition methods, our framework uses sensor placement techniques to select sensors to reduce dead space. A submodular maximization-based method is introduced when the query distribution is known and a host of sampling methods are used when the query distribution is unknown or dynamic. We avoid double counting by tracking movements along the graph edges using discrete differential forms. We support queries with arbitrary temporal intervals with a constant-sized regression model that accelerates the query performance and reduces the storage size. We evaluate our method on real-world mobility data, which yields us a relative error of at most 13 . 8% with 25 . 6% of sensors while achieving a speedup of 3 . 5 × , 69 . 81% reduction in sensors accessed, and a storage reduction of 99 . 96% compared to finding the exact count.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"2 1","pages":"34-46"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90721274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental Stream Query Merging 增量流查询合并
Ankit Chaudhary, Steffen Zeuch, V. Markl, Jeyhun Karimov
{"title":"Incremental Stream Query Merging","authors":"Ankit Chaudhary, Steffen Zeuch, V. Markl, Jeyhun Karimov","doi":"10.48786/edbt.2023.51","DOIUrl":"https://doi.org/10.48786/edbt.2023.51","url":null,"abstract":"","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"604-617"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89640908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TempoGRAPHer: A Tool for Aggregating and Exploring Evolving Graphs TempoGRAPHer:一个用于聚合和探索演化图的工具
Evangelia Tsoukanara, Georgia Koloniari, E. Pitoura
Graphs offer a generic abstraction for modeling entities as nodes and their interactions and relationships as edges. Since most graphs evolve over time, it is important to study their evolution. To this end, we propose demonstrating TempoGRAPHer, a tool that provides an overview of the evolution of an attributed graph offering aggregation at both the time and the attribute dimensions. The tool also supports a novel exploration strategy that helps in identifying time intervals of significant growth, shrinkage, or stability. Finally, we describe a scenario that showcases the usefulness of the TempoGRAPHer tool in understanding the evolution of contacts between primary school students.
图提供了一个通用的抽象,将实体建模为节点,将它们的交互和关系建模为边。由于大多数图表都随着时间的推移而演变,因此研究它们的演变是很重要的。为此,我们建议演示TempoGRAPHer,这是一个工具,它提供了在时间和属性维度上提供聚合的属性图的发展概况。该工具还支持一种新的勘探策略,有助于识别显著增长、收缩或稳定的时间间隔。最后,我们描述了一个场景,展示了TempoGRAPHer工具在理解小学生之间接触演变方面的有用性。
{"title":"TempoGRAPHer: A Tool for Aggregating and Exploring Evolving Graphs","authors":"Evangelia Tsoukanara, Georgia Koloniari, E. Pitoura","doi":"10.48786/edbt.2023.79","DOIUrl":"https://doi.org/10.48786/edbt.2023.79","url":null,"abstract":"Graphs offer a generic abstraction for modeling entities as nodes and their interactions and relationships as edges. Since most graphs evolve over time, it is important to study their evolution. To this end, we propose demonstrating TempoGRAPHer, a tool that provides an overview of the evolution of an attributed graph offering aggregation at both the time and the attribute dimensions. The tool also supports a novel exploration strategy that helps in identifying time intervals of significant growth, shrinkage, or stability. Finally, we describe a scenario that showcases the usefulness of the TempoGRAPHer tool in understanding the evolution of contacts between primary school students.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"128 1","pages":"843-846"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88115076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Dimensional Data Publishing With Local Differential Privacy 具有局部差分隐私的多维数据发布
Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo
This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.
研究了基于局部差分隐私(LDP)的多维数据发布问题。这个问题在计算效率和数据效用方面都提出了巨大的挑战。最先进的解决方案通过首先构建一个连接树(一种概率图形模型,PGM)来生成一组输入数据的噪声低维边缘,然后使用它们来近似输入数据集的分布,以生成合成数据。但是,现有的解决方案存在两个严重的局限性,即计算大量属性对的边际来构造PGM,以及计算PGM中大集团的边际分布不能很好地求解,从而降低了合成数据的质量。针对上述不足,基于构造的PGM的稀疏性和LDP的可整除性,我们首先提出了一种基于增量学习的PGM构造方法。在该方法中,我们逐渐修剪弱相关性的边(属性对),并将更多的数据和隐私预算分配给有用的边,从而提高模型的准确性。在该方法中,我们引入了高精度的数据积累技术和低误差的边缘修剪技术。其次,基于联合分布分解和冗余消除,提出了一种新的LDP背景下大集团的边际计算方法。在真实数据集上的大量实验表明,我们的解决方案提供了理想的数据效用。
{"title":"Multi-Dimensional Data Publishing With Local Differential Privacy","authors":"Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo","doi":"10.48786/edbt.2023.15","DOIUrl":"https://doi.org/10.48786/edbt.2023.15","url":null,"abstract":"This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"18 1","pages":"183-194"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86073013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Joint Source and Schema Evolution: Insights from a Study of 195 FOSS Projects 联合源和模式演化:来自195个自由/开源软件项目研究的见解
Panos Vassiliadis, Fation Shehaj, George Kalampokis, A. Zarras
In this paper, we address the problem of the co-evolution of Free Open Source Software projects with the relational schemata that they encompass. We exploit a data set of 195 publicly available schema histories of FOSS projects hosted in Github, for which we locally cloned their respective project and measured their evolution progress. Our first research question asks which percentage of the projects demonstrates a “hand-in-hand” schema and source code co-evolution? To address this question, we defined synchronicity by allowing a bounded amount of lag between the cumulative evolution of the schema and the entire project. A core finding is that there are all kinds of behaviors with respect to project and schema co-evolution, resulting in only a small number of projects where the evolution of schema and project progress in sync. Moreover, we discovered that after exceeding a 5-year threshold of project life, schemata gravitate to lower rates of evolution, which practically means that, with time, the schemata stop evolving as actively as they originally did. To answer a second question, on whether evolution comes early in the life of a schema, we measured how often does the cumulative progress of schema evolution exceed the respective progress of source change, as well as the respective progress of time. The results indicate that a large majority of schemata demonstrates early advance of schema change with respect to code evolution, and, an even larger majority is also demonstrating an advance of schema evolution with respect to time, too. Third, we asked at which time point in their lives do schemata attain a substantial
在本文中,我们讨论了自由开源软件项目与它们所包含的关系模式的共同发展问题。我们利用了托管在Github上的195个公开可用的自由/开源软件项目的模式历史数据集,为此我们在本地克隆了它们各自的项目并测量了它们的发展进度。我们的第一个研究问题是,有多少百分比的项目展示了“手拉手”的模式和源代码协同进化?为了解决这个问题,我们通过允许模式的累积进化和整个项目之间的有限延迟来定义同步性。一个核心的发现是,关于项目和模式的共同进化有各种各样的行为,导致只有少数项目的模式的进化和项目的进展是同步的。此外,我们发现在超过5年的项目生命阈值之后,模式倾向于较低的进化速率,这实际上意味着,随着时间的推移,模式停止像最初那样积极地进化。为了回答第二个问题,即进化是否发生在模式生命的早期,我们测量了模式进化的累积进展超过源变化的各自进展以及时间的各自进展的频率。结果表明,大多数模式显示了相对于代码演化的模式变更的早期进展,而且,更大的多数模式也显示了相对于时间的模式演化的进展。第三,我们问他们在生命中的哪个时间点图式达到实质性的
{"title":"Joint Source and Schema Evolution: Insights from a Study of 195 FOSS Projects","authors":"Panos Vassiliadis, Fation Shehaj, George Kalampokis, A. Zarras","doi":"10.48786/edbt.2023.03","DOIUrl":"https://doi.org/10.48786/edbt.2023.03","url":null,"abstract":"In this paper, we address the problem of the co-evolution of Free Open Source Software projects with the relational schemata that they encompass. We exploit a data set of 195 publicly available schema histories of FOSS projects hosted in Github, for which we locally cloned their respective project and measured their evolution progress. Our first research question asks which percentage of the projects demonstrates a “hand-in-hand” schema and source code co-evolution? To address this question, we defined synchronicity by allowing a bounded amount of lag between the cumulative evolution of the schema and the entire project. A core finding is that there are all kinds of behaviors with respect to project and schema co-evolution, resulting in only a small number of projects where the evolution of schema and project progress in sync. Moreover, we discovered that after exceeding a 5-year threshold of project life, schemata gravitate to lower rates of evolution, which practically means that, with time, the schemata stop evolving as actively as they originally did. To answer a second question, on whether evolution comes early in the life of a schema, we measured how often does the cumulative progress of schema evolution exceed the respective progress of source change, as well as the respective progress of time. The results indicate that a large majority of schemata demonstrates early advance of schema change with respect to code evolution, and, an even larger majority is also demonstrating an advance of schema evolution with respect to time, too. Third, we asked at which time point in their lives do schemata attain a substantial","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"4 1","pages":"27-39"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90532072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recommending Unanimously Preferred Items to Groups 向组推荐一致首选项目
Karim Benouaret, K. Tan
Due to the pervasiveness of group activities in people’s daily life, group recommendation has attracted a massive research effort in both industry and academia. A fundamental challenge in group recommendation is how to aggregate the preferences of group members to select a set of items maximizing the overall satisfaction of the group; this is the focus of this paper. Specifically, we introduce a dual adjustment aggregation score, which measures the relevance of an item to a group. We then propose a recommendation scheme, termed 𝑘 -dual adjustment unanimous skyline, that seeks to retrieve the 𝑘 items with the highest score, while discarding items that are unanimously considered inap-propriate. Furthermore, we design and develop algorithms for computing the 𝑘 -dual adjustment unanimous skyline efficiently. Finally, we demonstrate both the retrieval effectiveness and the efficiency of our approach through an extensive experimental evaluation on real datasets.
由于群体活动在人们日常生活中的普遍存在,群体推荐在业界和学术界都引起了大量的研究。群体推荐的一个基本问题是如何综合群体成员的偏好来选择一组项目,使群体整体满意度最大化;这是本文的重点。具体来说,我们引入了一个双调整聚合分数,它测量了一个项目与一个组的相关性。然后,我们提出了一个推荐方案,称为𝑘-双重调整一致的天际线,它寻求检索得分最高的𝑘项,同时丢弃一致认为不合适的项。此外,我们设计并开发了有效计算𝑘-对偶平差一致天际线的算法。最后,我们通过对真实数据集的广泛实验评估来证明我们的方法的检索有效性和效率。
{"title":"Recommending Unanimously Preferred Items to Groups","authors":"Karim Benouaret, K. Tan","doi":"10.48786/edbt.2023.29","DOIUrl":"https://doi.org/10.48786/edbt.2023.29","url":null,"abstract":"Due to the pervasiveness of group activities in people’s daily life, group recommendation has attracted a massive research effort in both industry and academia. A fundamental challenge in group recommendation is how to aggregate the preferences of group members to select a set of items maximizing the overall satisfaction of the group; this is the focus of this paper. Specifically, we introduce a dual adjustment aggregation score, which measures the relevance of an item to a group. We then propose a recommendation scheme, termed 𝑘 -dual adjustment unanimous skyline, that seeks to retrieve the 𝑘 items with the highest score, while discarding items that are unanimously considered inap-propriate. Furthermore, we design and develop algorithms for computing the 𝑘 -dual adjustment unanimous skyline efficiently. Finally, we demonstrate both the retrieval effectiveness and the efficiency of our approach through an extensive experimental evaluation on real datasets.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"116 1","pages":"364-377"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89386677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Advances in database technology : proceedings. International Conference on Extending Database Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1