Jean-François Plante, Maxime Larocque, Michel Adès
In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.
{"title":"Objective model selection with parallel genetic algorithms using an eradication strategy","authors":"Jean-François Plante, Maxime Larocque, Michel Adès","doi":"10.1002/cjs.11775","DOIUrl":"10.1002/cjs.11775","url":null,"abstract":"<p>In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"636-654"},"PeriodicalIF":0.6,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11775","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48685546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The existence of immune or cured individuals in a population and whether there is sufficient follow-up in a sample of censored observations on their lifetimes to be confident of their presence are questions of major importance in medical survival analysis. Here we give a detailed analysis of a statistic designed to test for sufficient follow-up in a sample. Assuming an i.i.d. censoring model, we obtain exact finite-sample and asymptotic distributions for the statistic, and use these to calculate the power of a test based on it. A particularly useful finding is that the asymptotic distribution of the test statistic is parameter-free in the null case when follow-up is insufficient. The methods are illustrated with application to a glioma cancer dataset.
{"title":"Finite sample and asymptotic distributions of a statistic for sufficient follow-up in cure models","authors":"Ross Maller, Sidney Resnick, Soudabeh Shemehsavar","doi":"10.1002/cjs.11771","DOIUrl":"10.1002/cjs.11771","url":null,"abstract":"<p>The existence of immune or cured individuals in a population and whether there is sufficient follow-up in a sample of censored observations on their lifetimes to be confident of their presence are questions of major importance in medical survival analysis. Here we give a detailed analysis of a statistic designed to test for sufficient follow-up in a sample. Assuming an i.i.d. censoring model, we obtain exact finite-sample and asymptotic distributions for the statistic, and use these to calculate the power of a test based on it. A particularly useful finding is that the asymptotic distribution of the test statistic is parameter-free in the null case when follow-up is insufficient. The methods are illustrated with application to a glioma cancer dataset.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"359-379"},"PeriodicalIF":0.6,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47178106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we study the problem of parameter estimation for measurement error models by combining the Bayes method with the instrumental variable approach, deriving the posterior distribution of parameters under different priors with known and unknown variance parameters, respectively, and calculating the Bayes estimator (BE) of the parameters under quadratic loss. However, it is difficult to obtain an explicit expression for BE because of the complex multiple integrals involved. Therefore, we adopt the linear Bayes method, which does not specify the form of the prior and avoids these complicated integral calculations, to obtain an expression for the linear Bayes estimator (LBE) for different priors. We prove that this LBE is superior to the two-stage least squares estimator under the mean squared error matrix criterion. Numerical simulations show that our LBE is very close to the real parameter whether the variance parameters are known or unknown, and it gradually approaches BE as the sample size increases. Our results indicate that this instrumental variable approach is valid for measurement error models.
{"title":"Bayesian instrumental variable estimation in linear measurement error models","authors":"Qi Wang, Lichun Wang, Liqun Wang","doi":"10.1002/cjs.11773","DOIUrl":"10.1002/cjs.11773","url":null,"abstract":"<p>In this article, we study the problem of parameter estimation for measurement error models by combining the Bayes method with the instrumental variable approach, deriving the posterior distribution of parameters under different priors with known and unknown variance parameters, respectively, and calculating the Bayes estimator (BE) of the parameters under quadratic loss. However, it is difficult to obtain an explicit expression for BE because of the complex multiple integrals involved. Therefore, we adopt the linear Bayes method, which does not specify the form of the prior and avoids these complicated integral calculations, to obtain an expression for the linear Bayes estimator (LBE) for different priors. We prove that this LBE is superior to the two-stage least squares estimator under the mean squared error matrix criterion. Numerical simulations show that our LBE is very close to the real parameter whether the variance parameters are known or unknown, and it gradually approaches BE as the sample size increases. Our results indicate that this instrumental variable approach is valid for measurement error models.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"500-531"},"PeriodicalIF":0.6,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48533408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}