{"title":"分析工作负载趋势,提高三重存储性能","authors":"Ahmed Al-Ghezi, Lena Wiese","doi":"10.1016/j.is.2024.102420","DOIUrl":null,"url":null,"abstract":"<div><p>The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102420"},"PeriodicalIF":3.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000784/pdfft?md5=4a9d8f0acac2d10b05565ee129773c94&pid=1-s2.0-S0306437924000784-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Analyzing workload trends for boosting triple stores performance\",\"authors\":\"Ahmed Al-Ghezi, Lena Wiese\",\"doi\":\"10.1016/j.is.2024.102420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.</p></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"125 \",\"pages\":\"Article 102420\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0306437924000784/pdfft?md5=4a9d8f0acac2d10b05565ee129773c94&pid=1-s2.0-S0306437924000784-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437924000784\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437924000784","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Analyzing workload trends for boosting triple stores performance
The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.