Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives
{"title":"选择性学习中的泛化实用理论","authors":"Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives","doi":"arxiv-2409.07014","DOIUrl":null,"url":null,"abstract":"Query-driven machine learning models have emerged as a promising estimation\ntechnique for query selectivities. Yet, surprisingly little is known about the\nefficacy of these techniques from a theoretical perspective, as there exist\nsubstantial gaps between practical solutions and state-of-the-art (SOTA) theory\nbased on the Probably Approximately Correct (PAC) learning framework. In this\npaper, we aim to bridge the gaps between theory and practice. First, we\ndemonstrate that selectivity predictors induced by signed measures are\nlearnable, which relaxes the reliance on probability measures in SOTA theory.\nMore importantly, beyond the PAC learning framework (which only allows us to\ncharacterize how the model behaves when both training and test workloads are\ndrawn from the same distribution), we establish, under mild assumptions, that\nselectivity predictors from this class exhibit favorable out-of-distribution\n(OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the\nin-distribution and OOD generalization capabilities of query-driven selectivity\nlearning, and facilitate the design of two general strategies to improve OOD\ngeneralization for existing query-driven selectivity models. We empirically\nverify that our techniques help query-driven selectivity models generalize\nsignificantly better to OOD queries both in terms of prediction accuracy and\nquery latency performance, while maintaining their superior in-distribution\ngeneralization performance.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Practical Theory of Generalization in Selectivity Learning\",\"authors\":\"Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives\",\"doi\":\"arxiv-2409.07014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Query-driven machine learning models have emerged as a promising estimation\\ntechnique for query selectivities. Yet, surprisingly little is known about the\\nefficacy of these techniques from a theoretical perspective, as there exist\\nsubstantial gaps between practical solutions and state-of-the-art (SOTA) theory\\nbased on the Probably Approximately Correct (PAC) learning framework. In this\\npaper, we aim to bridge the gaps between theory and practice. First, we\\ndemonstrate that selectivity predictors induced by signed measures are\\nlearnable, which relaxes the reliance on probability measures in SOTA theory.\\nMore importantly, beyond the PAC learning framework (which only allows us to\\ncharacterize how the model behaves when both training and test workloads are\\ndrawn from the same distribution), we establish, under mild assumptions, that\\nselectivity predictors from this class exhibit favorable out-of-distribution\\n(OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the\\nin-distribution and OOD generalization capabilities of query-driven selectivity\\nlearning, and facilitate the design of two general strategies to improve OOD\\ngeneralization for existing query-driven selectivity models. We empirically\\nverify that our techniques help query-driven selectivity models generalize\\nsignificantly better to OOD queries both in terms of prediction accuracy and\\nquery latency performance, while maintaining their superior in-distribution\\ngeneralization performance.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"75 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Practical Theory of Generalization in Selectivity Learning
Query-driven machine learning models have emerged as a promising estimation
technique for query selectivities. Yet, surprisingly little is known about the
efficacy of these techniques from a theoretical perspective, as there exist
substantial gaps between practical solutions and state-of-the-art (SOTA) theory
based on the Probably Approximately Correct (PAC) learning framework. In this
paper, we aim to bridge the gaps between theory and practice. First, we
demonstrate that selectivity predictors induced by signed measures are
learnable, which relaxes the reliance on probability measures in SOTA theory.
More importantly, beyond the PAC learning framework (which only allows us to
characterize how the model behaves when both training and test workloads are
drawn from the same distribution), we establish, under mild assumptions, that
selectivity predictors from this class exhibit favorable out-of-distribution
(OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the
in-distribution and OOD generalization capabilities of query-driven selectivity
learning, and facilitate the design of two general strategies to improve OOD
generalization for existing query-driven selectivity models. We empirically
verify that our techniques help query-driven selectivity models generalize
significantly better to OOD queries both in terms of prediction accuracy and
query latency performance, while maintaining their superior in-distribution
generalization performance.