{"title":"Testing the equality of distributions using integrated maximum mean discrepancy","authors":"Tianxuan Ding , Zhimei Li , Yaowu Zhang","doi":"10.1016/j.jspi.2024.106246","DOIUrl":null,"url":null,"abstract":"<div><div>Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the <span><math><mi>U</mi></math></span>-statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106246"},"PeriodicalIF":0.8000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824001034","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the -statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.