Pub Date : 2024-09-05DOI: 10.1007/s42952-024-00280-5
Yoshihide Kakizawa
Nonparametric density estimation for nonnegative data is considered in a situation where a random sample is not directly available but the data are instead observed from the length-biased sampling. Due to the so-called boundary bias problem of the location-scale kernel, the approach in this paper is an application of asymmetric kernel. Some nonparametric density estimators are proposed. The mean integrated squared error, strong consistency, and asymptotic normality of the estimators are investigated. Simulation studies and a real data analysis illustrate the estimators.
{"title":"Asymmetric kernel density estimation for biased data","authors":"Yoshihide Kakizawa","doi":"10.1007/s42952-024-00280-5","DOIUrl":"https://doi.org/10.1007/s42952-024-00280-5","url":null,"abstract":"<p>Nonparametric density estimation for nonnegative data is considered in a situation where a random sample is not directly available but the data are instead observed from the length-biased sampling. Due to the so-called boundary bias problem of the location-scale kernel, the approach in this paper is an application of asymmetric kernel. Some nonparametric density estimators are proposed. The mean integrated squared error, strong consistency, and asymptotic normality of the estimators are investigated. Simulation studies and a real data analysis illustrate the estimators.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"312 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1007/s42952-024-00287-y
Wei Yu
The community detection is a significant problem in network data analysis. In this paper, we implement community detection by minimizing an objective function based on the difference between the adjacency matrix and its expected value, and explain the rationality of the objective function. To solve the optimization problem, we propose a new algorithm which is referred to the thoughts of Markov Chain Monte Carlo and low discrepancy sequence in the random simulation fields. We introduce a new indicator to compare the performance of the methods by measuring the similarity of the true community and the estimated community. Synthetic networks and real networks are analyzed to investigate the effectiveness of the new method. Results show that the performance of the proposed method is stable in all simulated scenarios. And in most cases, it outperforms existing methods.
{"title":"Community detection for networks based on Monte Carlo type algorithms","authors":"Wei Yu","doi":"10.1007/s42952-024-00287-y","DOIUrl":"https://doi.org/10.1007/s42952-024-00287-y","url":null,"abstract":"<p>The community detection is a significant problem in network data analysis. In this paper, we implement community detection by minimizing an objective function based on the difference between the adjacency matrix and its expected value, and explain the rationality of the objective function. To solve the optimization problem, we propose a new algorithm which is referred to the thoughts of Markov Chain Monte Carlo and low discrepancy sequence in the random simulation fields. We introduce a new indicator to compare the performance of the methods by measuring the similarity of the true community and the estimated community. Synthetic networks and real networks are analyzed to investigate the effectiveness of the new method. Results show that the performance of the proposed method is stable in all simulated scenarios. And in most cases, it outperforms existing methods.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"11 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1007/s42952-024-00286-z
Erindi Allaj
We propose a new estimator of the integrated volatility in presence of observed noise variables, measured, for example, by the trading volume or the bid-ask-spread. We find that, under specific conditions, the proposed estimator is consistent and the error, adjusted for the noise effects, between the proposed estimator and the integrated volatility has the same asymptotic distribution of the realized volatility estimator under no noise effects. Finally, our results are validated by a simulation and an empirical study.
{"title":"Integrated volatility estimation: the case of observed noise variables","authors":"Erindi Allaj","doi":"10.1007/s42952-024-00286-z","DOIUrl":"https://doi.org/10.1007/s42952-024-00286-z","url":null,"abstract":"<p>We propose a new estimator of the integrated volatility in presence of observed noise variables, measured, for example, by the trading volume or the bid-ask-spread. We find that, under specific conditions, the proposed estimator is consistent and the error, adjusted for the noise effects, between the proposed estimator and the integrated volatility has the same asymptotic distribution of the realized volatility estimator under no noise effects. Finally, our results are validated by a simulation and an empirical study.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"161 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1007/s42952-024-00284-1
Dongguen Kim, Heejin Kim, Yejin Kim, Minwoo Chae, Young Myoung Ko, Young-Mok Bae, Hyungsub Sim, Young Chan Oh, Keum Hwan Noh
The importance of the back-end process in semiconductor manufacturing has recently received significant attention from global manufacturers. The analysis of manufacturing data often provides crucial insights into problems inherent in the manufacturing processes. An important goal of the back-end process is to improve the yield of final products, called packages. A simple way to achieve this goal is to characterize low-quality wafers based on the analysis of manufacturing data and discard them before proceeding to the packaging step. Alternatively, this paper proposes a novel packaging method that significantly improves the package yield using statistical models scoring the quality of dies. We prove that the proposed packaging method is optimal and conduct thorough numerical experiments, showing its superiority.
{"title":"Using statistical models for optimal packaging in semiconductor manufacturing processes","authors":"Dongguen Kim, Heejin Kim, Yejin Kim, Minwoo Chae, Young Myoung Ko, Young-Mok Bae, Hyungsub Sim, Young Chan Oh, Keum Hwan Noh","doi":"10.1007/s42952-024-00284-1","DOIUrl":"https://doi.org/10.1007/s42952-024-00284-1","url":null,"abstract":"<p>The importance of the back-end process in semiconductor manufacturing has recently received significant attention from global manufacturers. The analysis of manufacturing data often provides crucial insights into problems inherent in the manufacturing processes. An important goal of the back-end process is to improve the yield of final products, called packages. A simple way to achieve this goal is to characterize low-quality wafers based on the analysis of manufacturing data and discard them before proceeding to the packaging step. Alternatively, this paper proposes a novel packaging method that significantly improves the package yield using statistical models scoring the quality of dies. We prove that the proposed packaging method is optimal and conduct thorough numerical experiments, showing its superiority.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"59 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1007/s42952-024-00283-2
Seung Hyun Moon, Young Kyung Lee, Byeong U. Park
This paper introduces a powerful bias reduction technique applied to local linear additive regression. The main idea is to make use of a parametric family. Existing techniques based on this idea use a parametric model that is linear in the parameter. In this paper we generalize the approaches by allowing nonlinear parametric families. We develop the methodology and theory for response variables taking values in a general separable Hilbert space. Under mild conditions, our proposed approach not only offers flexibility but also gains bias reduction while maintaining the variance of the local linear additive regression estimators. We also provide numerical evidences that support our approach.
{"title":"Generalized parametric help in Hilbertian additive regression","authors":"Seung Hyun Moon, Young Kyung Lee, Byeong U. Park","doi":"10.1007/s42952-024-00283-2","DOIUrl":"https://doi.org/10.1007/s42952-024-00283-2","url":null,"abstract":"<p>This paper introduces a powerful bias reduction technique applied to local linear additive regression. The main idea is to make use of a parametric family. Existing techniques based on this idea use a parametric model that is linear in the parameter. In this paper we generalize the approaches by allowing nonlinear parametric families. We develop the methodology and theory for response variables taking values in a general separable Hilbert space. Under mild conditions, our proposed approach not only offers flexibility but also gains bias reduction while maintaining the variance of the local linear additive regression estimators. We also provide numerical evidences that support our approach.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"29 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1007/s42952-024-00281-4
Sang Gil Kang, Yongku Kim
This article proposes objective Bayesian multiple testing procedures for a normal model. The challenging task of considering all the configurations of true and false null hypotheses is addressed here by ordering the null hypotheses based on their Bayes factors. This approach reduces the size of the compared models for posterior search from (2^k) to (k+1), for k null hypotheses. Furthermore, the consistency of the proposed multiple testing procedures is established and their behavior is analyzed with simulated and real examples. In addition, the proposed procedures are compared with classical and Bayesian multiple testing procedures in all the possible configurations of true and false ordered null hypotheses.
本文提出了正态模型的客观贝叶斯多重检验程序。考虑所有真假零假设的配置是一项具有挑战性的任务,本文通过根据贝叶斯因子对零假设进行排序来解决这一问题。对于 k 个空假设,这种方法将用于后验搜索的比较模型的大小从 (2^k) 减少到 (k+1/)。此外,还建立了所提出的多重检验程序的一致性,并用模拟和实际例子分析了它们的行为。此外,在所有可能的真假有序零假设配置中,将所提出的程序与经典和贝叶斯多重检验程序进行了比较。
{"title":"Objective Bayesian multiple testing for k normal populations","authors":"Sang Gil Kang, Yongku Kim","doi":"10.1007/s42952-024-00281-4","DOIUrl":"https://doi.org/10.1007/s42952-024-00281-4","url":null,"abstract":"<p>This article proposes objective Bayesian multiple testing procedures for a normal model. The challenging task of considering all the configurations of true and false null hypotheses is addressed here by ordering the null hypotheses based on their Bayes factors. This approach reduces the size of the compared models for posterior search from <span>(2^k)</span> to <span>(k+1)</span>, for <i>k</i> null hypotheses. Furthermore, the consistency of the proposed multiple testing procedures is established and their behavior is analyzed with simulated and real examples. In addition, the proposed procedures are compared with classical and Bayesian multiple testing procedures in all the possible configurations of true and false ordered null hypotheses.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"16 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1007/s42952-024-00279-y
Zhenzhen Fu, Ke Yang, Yaohua Rong, Yu Shu
Missing data is prevalent in many fields. Among all missing mechanisms, nonignorable missing data is more challenging for model identification. In this paper, we propose a semiparametric regression model estimation method with nonignorable missing responses. To be specific, we first construct a parametric model for the propensity score and apply the generalized method of moments to obtain the estimated propensity score. For nonignorable missing responses, based on the inverse probability weighting approach, we propose the penalized garrotized kernel machine method to flexibly depict the complex nonlinear relationships between the response and the predictors, allow for interactions between the predictors, and eliminate the redundant variables automatically. The cyclical coordinate descent algorithm is provided to solve the corresponding optimization problems. Numerical results and real data analysis indicate that our proposed method achieves better prediction performance compared with the competing ones.
{"title":"Kernel machine in semiparametric regression with nonignorable missing responses","authors":"Zhenzhen Fu, Ke Yang, Yaohua Rong, Yu Shu","doi":"10.1007/s42952-024-00279-y","DOIUrl":"https://doi.org/10.1007/s42952-024-00279-y","url":null,"abstract":"<p>Missing data is prevalent in many fields. Among all missing mechanisms, nonignorable missing data is more challenging for model identification. In this paper, we propose a semiparametric regression model estimation method with nonignorable missing responses. To be specific, we first construct a parametric model for the propensity score and apply the generalized method of moments to obtain the estimated propensity score. For nonignorable missing responses, based on the inverse probability weighting approach, we propose the penalized garrotized kernel machine method to flexibly depict the complex nonlinear relationships between the response and the predictors, allow for interactions between the predictors, and eliminate the redundant variables automatically. The cyclical coordinate descent algorithm is provided to solve the corresponding optimization problems. Numerical results and real data analysis indicate that our proposed method achieves better prediction performance compared with the competing ones.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"69 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly with light detection and ranging (LiDAR) measurements. However, regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the complexities of a multiplicative error structure while identifying significant variables in spatially dependent observations. The proposed estimator can be also applied to classical additive error spatial regression. By establishing asymptotic properties of the proposed estimator under increasing domain asymptotics with stochastic sampling design, we provide a rigorous foundation for its effectiveness. A comprehensive simulation study confirms the superior performance of our proposed estimator in accurately estimating and selecting parameters, outperforming existing approaches. To demonstrate its real-world applicability, we employ our proposed method, along with other alternative techniques, to estimate a rotational landslide surface using LiDAR measurements. The results highlight the efficacy and potential of our approach in tackling complex spatial regression problems involving multiplicative errors.
{"title":"Spatial regression with multiplicative errors, and its application with LiDAR measurements","authors":"Hojun You, Wei-Ying Wu, Chae Young Lim, Kyubaek Yoon, Jongeun Choi","doi":"10.1007/s42952-024-00282-3","DOIUrl":"https://doi.org/10.1007/s42952-024-00282-3","url":null,"abstract":"<p>Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly with light detection and ranging (LiDAR) measurements. However, regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the complexities of a multiplicative error structure while identifying significant variables in spatially dependent observations. The proposed estimator can be also applied to classical additive error spatial regression. By establishing asymptotic properties of the proposed estimator under increasing domain asymptotics with stochastic sampling design, we provide a rigorous foundation for its effectiveness. A comprehensive simulation study confirms the superior performance of our proposed estimator in accurately estimating and selecting parameters, outperforming existing approaches. To demonstrate its real-world applicability, we employ our proposed method, along with other alternative techniques, to estimate a rotational landslide surface using LiDAR measurements. The results highlight the efficacy and potential of our approach in tackling complex spatial regression problems involving multiplicative errors.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"264 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1007/s42952-024-00278-z
Yajie Mi, Lei Wang
In the era of big data, online updating problems have attracted extensive attention. In practice, the covariates set of the models may change according to the conditions of data streams. In this paper, we propose a two-stage online debiased lasso estimation and inference method for high-dimensional heterogenous linear regression models with new variables added midway. At the first stage, the homogenization strategy is conducted to represent the heterogenous models by defining the pseudo covariates and responses. At the second stage, we conduct the online debiased lasso estimation procedure to obtain the final estimator. Theoretically, the asymptotic normality of the heterogenous online debiased lasso estimator (HODL) is established. The finite-sample performance of the proposed estimators is studied through simulation studies and a real data example.
{"title":"Online debiased lasso estimation and inference for heterogenous updating regressions","authors":"Yajie Mi, Lei Wang","doi":"10.1007/s42952-024-00278-z","DOIUrl":"https://doi.org/10.1007/s42952-024-00278-z","url":null,"abstract":"<p>In the era of big data, online updating problems have attracted extensive attention. In practice, the covariates set of the models may change according to the conditions of data streams. In this paper, we propose a two-stage online debiased lasso estimation and inference method for high-dimensional heterogenous linear regression models with new variables added midway. At the first stage, the homogenization strategy is conducted to represent the heterogenous models by defining the pseudo covariates and responses. At the second stage, we conduct the online debiased lasso estimation procedure to obtain the final estimator. Theoretically, the asymptotic normality of the heterogenous online debiased lasso estimator (HODL) is established. The finite-sample performance of the proposed estimators is studied through simulation studies and a real data example.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"6 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141738030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1007/s42952-024-00277-0
Jing Zhang, Zhensheng Huang
Motivated by different groups containing different group information under the heteroscedastic error structure, we propose the groupwise scaled envelope model that is invariable to scale changes and is permissible for distinct regression coefficients and the heteroscedastic error structure across groups. It retains the potential of the scaled envelope methods to keep the scale invariant and allows for both different regression coefficients and different error structures for diverse groups. Further, we demonstrate the maximum likelihood estimators and its theoretical properties including parameter identifiability, asymptotic distribution and consistency of the groupwise scaled envelope estimator. Lastly, simulation studies and a real-data example demonstrate the advantages of the groupwise scaled envelope estimators, including a comparison with the standard model estimators, groupwise envelope estimators, scaled envelope estimators and separate scaled envelope estimators.
{"title":"Scale invariant and efficient estimation for groupwise scaled envelope model","authors":"Jing Zhang, Zhensheng Huang","doi":"10.1007/s42952-024-00277-0","DOIUrl":"https://doi.org/10.1007/s42952-024-00277-0","url":null,"abstract":"<p>Motivated by different groups containing different group information under the heteroscedastic error structure, we propose the groupwise scaled envelope model that is invariable to scale changes and is permissible for distinct regression coefficients and the heteroscedastic error structure across groups. It retains the potential of the scaled envelope methods to keep the scale invariant and allows for both different regression coefficients and different error structures for diverse groups. Further, we demonstrate the maximum likelihood estimators and its theoretical properties including parameter identifiability, asymptotic distribution and consistency of the groupwise scaled envelope estimator. Lastly, simulation studies and a real-data example demonstrate the advantages of the groupwise scaled envelope estimators, including a comparison with the standard model estimators, groupwise envelope estimators, scaled envelope estimators and separate scaled envelope estimators.</p>","PeriodicalId":49992,"journal":{"name":"Journal of the Korean Statistical Society","volume":"30 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}