{"title":"Residual's influence index (RINFIN), bad leverage and unmasking in high dimensional L2‐regression","authors":"Y. Yatracos","doi":"10.1002/sam.11550","DOIUrl":null,"url":null,"abstract":"In linear regression of Y on X(∈ Rp) with parameters β(∈ Rp+1), statistical inference is unreliable when observations are obtained from gross‐error model, Fϵ,G = (1 − ϵ)F + ϵG, instead of the assumed probability F;G is gross‐error probability, 0 < ϵ < 1. Residual's influence index (RINFIN) at (x, y) is introduced, with components measuring also the local influence of x in the residual and large value flagging a bad leverage case (from G), thus causing unmasking. Large sample properties of RINFIN are presented to confirm significance of the findings, but often the large difference in the RINFIN scores of the data is indicative. RINFIN is successful with microarray data, simulated, high dimensional data and classic regression data sets. RINFIN's performance improves as p increases and can be used in multiple response linear regression.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In linear regression of Y on X(∈ Rp) with parameters β(∈ Rp+1), statistical inference is unreliable when observations are obtained from gross‐error model, Fϵ,G = (1 − ϵ)F + ϵG, instead of the assumed probability F;G is gross‐error probability, 0 < ϵ < 1. Residual's influence index (RINFIN) at (x, y) is introduced, with components measuring also the local influence of x in the residual and large value flagging a bad leverage case (from G), thus causing unmasking. Large sample properties of RINFIN are presented to confirm significance of the findings, but often the large difference in the RINFIN scores of the data is indicative. RINFIN is successful with microarray data, simulated, high dimensional data and classic regression data sets. RINFIN's performance improves as p increases and can be used in multiple response linear regression.