{"title":"学习数据库的集合","authors":"Angjela Davitkova, Damjan Gjurovski, S. Michel","doi":"10.48786/edbt.2024.07","DOIUrl":null,"url":null,"abstract":"In this work, we consider using deep learning models over a collection of sets to replace traditional approaches utilized in database systems. We propose solutions for data indexing, membership queries, and cardinality estimation. Unlike relational data, learned models over sets need to be permutation invariant and able to deal with variable set sizes. The proposed models are based on the DeepSets architecture and include per-element compression to achieve acceptable accuracy with modest model sizes. We further suggest a hybrid structure with bounded error guarantees using guided learning to mitigate the inherent challenges when working with set data. We outline challenges and opportunities when dealing with set data and demonstrate the suitability of the models through extensive experimental evaluation with one synthetic and two real-world datasets.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"45 1","pages":"68-80"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning over Sets for Databases\",\"authors\":\"Angjela Davitkova, Damjan Gjurovski, S. Michel\",\"doi\":\"10.48786/edbt.2024.07\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we consider using deep learning models over a collection of sets to replace traditional approaches utilized in database systems. We propose solutions for data indexing, membership queries, and cardinality estimation. Unlike relational data, learned models over sets need to be permutation invariant and able to deal with variable set sizes. The proposed models are based on the DeepSets architecture and include per-element compression to achieve acceptable accuracy with modest model sizes. We further suggest a hybrid structure with bounded error guarantees using guided learning to mitigate the inherent challenges when working with set data. We outline challenges and opportunities when dealing with set data and demonstrate the suitability of the models through extensive experimental evaluation with one synthetic and two real-world datasets.\",\"PeriodicalId\":88813,\"journal\":{\"name\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"volume\":\"45 1\",\"pages\":\"68-80\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48786/edbt.2024.07\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2024.07","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this work, we consider using deep learning models over a collection of sets to replace traditional approaches utilized in database systems. We propose solutions for data indexing, membership queries, and cardinality estimation. Unlike relational data, learned models over sets need to be permutation invariant and able to deal with variable set sizes. The proposed models are based on the DeepSets architecture and include per-element compression to achieve acceptable accuracy with modest model sizes. We further suggest a hybrid structure with bounded error guarantees using guided learning to mitigate the inherent challenges when working with set data. We outline challenges and opportunities when dealing with set data and demonstrate the suitability of the models through extensive experimental evaluation with one synthetic and two real-world datasets.