{"title":"基于神经的均匀密度直方图分布偏移检测","authors":"Kei Yonekawa, Kazuhiro Saito, Mori Kurokawa","doi":"10.1145/3564121.3564136","DOIUrl":null,"url":null,"abstract":"It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RIDEN: Neural-based Uniform Density Histogram for Distribution Shift Detection\",\"authors\":\"Kei Yonekawa, Kazuhiro Saito, Mori Kurokawa\",\"doi\":\"10.1145/3564121.3564136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.\",\"PeriodicalId\":166150,\"journal\":{\"name\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3564121.3564136\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RIDEN: Neural-based Uniform Density Histogram for Distribution Shift Detection
It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.