Paul A. Smith, Peter G.M. van der Heijden, Maarten Cruyff, Francesco Pantalone, Hannes Diener, Kim Dunstan
We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.
{"title":"Population Size Estimation Using Covariates Having Missing Values and Measurement Error: Estimating Ethnic Group Sizes in New Zealand","authors":"Paul A. Smith, Peter G.M. van der Heijden, Maarten Cruyff, Francesco Pantalone, Hannes Diener, Kim Dunstan","doi":"10.1111/anzs.70014","DOIUrl":"https://doi.org/10.1111/anzs.70014","url":null,"abstract":"<p>We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"67 3","pages":"432-453"},"PeriodicalIF":0.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fused lasso regression is a popular method for identifying homogeneous groups and sparsity patterns in regression coefficients based on either the presumed order or a more general graph structure of the covariates. However, the traditional fused lasso may yield misleading outcomes in the presence of outliers. In this paper, we propose an extension of the fused lasso, namely the robust adaptive fused lasso (RAFL), which pursues homogeneity and sparsity patterns in regression coefficients while accounting for potential outliers within the data. By using Huber's loss or Tukey's biweight loss, RAFL can resist outliers in the responses or in both the responses and the covariates. We also demonstrate that when the adaptive weights are properly chosen, the proposed RAFL achieves consistency in variable selection, consistency in grouping and asymptotic normality. Furthermore, a novel optimization algorithm, which employs the alternating direction method of multipliers, embedded with an accelerated proximal gradient algorithm, is developed to solve RAFL efficiently. Our simulation study shows that RAFL offers substantial improvements in terms of both grouping accuracy and prediction accuracy compared with the fused lasso, particularly when dealing with contaminated data. Additionally, a real analysis of cookie data demonstrates the effectiveness of RAFL.
{"title":"Homogeneity and Sparsity Pursuit Using Robust Adaptive Fused Lasso","authors":"Le Chang, Yanlin Shi","doi":"10.1111/anzs.70010","DOIUrl":"https://doi.org/10.1111/anzs.70010","url":null,"abstract":"<p>Fused lasso regression is a popular method for identifying homogeneous groups and sparsity patterns in regression coefficients based on either the presumed order or a more general graph structure of the covariates. However, the traditional fused lasso may yield misleading outcomes in the presence of outliers. In this paper, we propose an extension of the fused lasso, namely the robust adaptive fused lasso (RAFL), which pursues homogeneity and sparsity patterns in regression coefficients while accounting for potential outliers within the data. By using Huber's loss or Tukey's biweight loss, RAFL can resist outliers in the responses or in both the responses and the covariates. We also demonstrate that when the adaptive weights are properly chosen, the proposed RAFL achieves consistency in variable selection, consistency in grouping and asymptotic normality. Furthermore, a novel optimization algorithm, which employs the alternating direction method of multipliers, embedded with an accelerated proximal gradient algorithm, is developed to solve RAFL efficiently. Our simulation study shows that RAFL offers substantial improvements in terms of both grouping accuracy and prediction accuracy compared with the fused lasso, particularly when dealing with contaminated data. Additionally, a real analysis of cookie data demonstrates the effectiveness of RAFL.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"67 2","pages":"157-174"},"PeriodicalIF":0.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144615493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}