{"title":"二元响应分类错误的回归模型推理","authors":"Arindam Chatterjee , Tathagata Bandyopadhyay , Ayoushman Bhattacharya","doi":"10.1016/j.jspi.2023.106121","DOIUrl":null,"url":null,"abstract":"<div><p><span>Misclassification of binary responses, if ignored, may severely bias the </span>maximum likelihood estimators<span><span> (MLEs) of regression parameters<span>. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification </span></span>probabilities<span><span> with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the </span>asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106121"},"PeriodicalIF":0.8000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference on regression model with misclassified binary response\",\"authors\":\"Arindam Chatterjee , Tathagata Bandyopadhyay , Ayoushman Bhattacharya\",\"doi\":\"10.1016/j.jspi.2023.106121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Misclassification of binary responses, if ignored, may severely bias the </span>maximum likelihood estimators<span><span> (MLEs) of regression parameters<span>. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification </span></span>probabilities<span><span> with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the </span>asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.</span></span></p></div>\",\"PeriodicalId\":50039,\"journal\":{\"name\":\"Journal of Statistical Planning and Inference\",\"volume\":\"231 \",\"pages\":\"Article 106121\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Planning and Inference\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378375823000903\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375823000903","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Inference on regression model with misclassified binary response
Misclassification of binary responses, if ignored, may severely bias the maximum likelihood estimators (MLEs) of regression parameters. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification probabilities with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.