{"title":"Learning from Uncertain Data: From Possible Worlds to Possible Models","authors":"Jiongli Zhu, Su Feng, Boris Glavic, Babak Salimi","doi":"arxiv-2405.18549","DOIUrl":null,"url":null,"abstract":"We introduce an efficient method for learning linear models from uncertain\ndata, where uncertainty is represented as a set of possible variations in the\ndata, leading to predictive multiplicity. Our approach leverages abstract\ninterpretation and zonotopes, a type of convex polytope, to compactly represent\nthese dataset variations, enabling the symbolic execution of gradient descent\non all possible worlds simultaneously. We develop techniques to ensure that\nthis process converges to a fixed point and derive closed-form solutions for\nthis fixed point. Our method provides sound over-approximations of all possible\noptimal models and viable prediction ranges. We demonstrate the effectiveness\nof our approach through theoretical and empirical analysis, highlighting its\npotential to reason about model and prediction uncertainty due to data quality\nissues in training data.","PeriodicalId":501033,"journal":{"name":"arXiv - CS - Symbolic Computation","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Symbolic Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.18549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce an efficient method for learning linear models from uncertain
data, where uncertainty is represented as a set of possible variations in the
data, leading to predictive multiplicity. Our approach leverages abstract
interpretation and zonotopes, a type of convex polytope, to compactly represent
these dataset variations, enabling the symbolic execution of gradient descent
on all possible worlds simultaneously. We develop techniques to ensure that
this process converges to a fixed point and derive closed-form solutions for
this fixed point. Our method provides sound over-approximations of all possible
optimal models and viable prediction ranges. We demonstrate the effectiveness
of our approach through theoretical and empirical analysis, highlighting its
potential to reason about model and prediction uncertainty due to data quality
issues in training data.