Tianjing Zeng, Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, Ji-Rong Wen
{"title":"持续的总结","authors":"Tianjing Zeng, Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, Ji-Rong Wen","doi":"https://dl.acm.org/doi/10.1145/3531053","DOIUrl":null,"url":null,"abstract":"<p>A <i>persistent data structure</i>, also known as a <i>multiversion data structure</i> in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried.</p><p>Persistent data structures aim at recording all versions accurately, which results in a space requirement that is at least linear to the number of updates. In many of today’s big data applications, in particular, for high-speed streaming data, the volume and velocity of the data are so high that we cannot afford to store everything. Therefore, streaming algorithms have received a lot of attention in the research community, which uses only sublinear space by sacrificing slightly on accuracy.</p><p>All streaming algorithms work by maintaining a small data structure in memory, which is usually called a <i>sketch</i>, <i>summary</i>, or <i>synopsis</i>. The summary is updated upon the arrival of every element in the stream, thus it is <i>ephemeral</i>, meaning that it can only answer queries about the current status of the stream. In this article, we aim at designing persistent summaries, thereby giving streaming algorithms the ability to answer queries about the stream at any prior time.</p>","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"1 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Persistent Summaries\",\"authors\":\"Tianjing Zeng, Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, Ji-Rong Wen\",\"doi\":\"https://dl.acm.org/doi/10.1145/3531053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A <i>persistent data structure</i>, also known as a <i>multiversion data structure</i> in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried.</p><p>Persistent data structures aim at recording all versions accurately, which results in a space requirement that is at least linear to the number of updates. In many of today’s big data applications, in particular, for high-speed streaming data, the volume and velocity of the data are so high that we cannot afford to store everything. Therefore, streaming algorithms have received a lot of attention in the research community, which uses only sublinear space by sacrificing slightly on accuracy.</p><p>All streaming algorithms work by maintaining a small data structure in memory, which is usually called a <i>sketch</i>, <i>summary</i>, or <i>synopsis</i>. The summary is updated upon the arrival of every element in the stream, thus it is <i>ephemeral</i>, meaning that it can only answer queries about the current status of the stream. In this article, we aim at designing persistent summaries, thereby giving streaming algorithms the ability to answer queries about the stream at any prior time.</p>\",\"PeriodicalId\":50915,\"journal\":{\"name\":\"ACM Transactions on Database Systems\",\"volume\":\"1 4\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2022-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3531053\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3531053","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A persistent data structure, also known as a multiversion data structure in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried.
Persistent data structures aim at recording all versions accurately, which results in a space requirement that is at least linear to the number of updates. In many of today’s big data applications, in particular, for high-speed streaming data, the volume and velocity of the data are so high that we cannot afford to store everything. Therefore, streaming algorithms have received a lot of attention in the research community, which uses only sublinear space by sacrificing slightly on accuracy.
All streaming algorithms work by maintaining a small data structure in memory, which is usually called a sketch, summary, or synopsis. The summary is updated upon the arrival of every element in the stream, thus it is ephemeral, meaning that it can only answer queries about the current status of the stream. In this article, we aim at designing persistent summaries, thereby giving streaming algorithms the ability to answer queries about the stream at any prior time.
期刊介绍:
Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.