Pub Date : 2022-03-31DOI: 10.29220/csam.2022.29.2.193
Hyun Bae, Hyungwoo Kim, S. Shin
Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but e ff ective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.
{"title":"The use of support vector machines in semi-supervised classification","authors":"Hyun Bae, Hyungwoo Kim, S. Shin","doi":"10.29220/csam.2022.29.2.193","DOIUrl":"https://doi.org/10.29220/csam.2022.29.2.193","url":null,"abstract":"Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but e ff ective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43802971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.707
Moon-Yeop Jang, Younshik Chung
{"title":"Variational Bayesian inference for binary image restoration using Ising model","authors":"Moon-Yeop Jang, Younshik Chung","doi":"10.29220/csam.2022.29.1.707","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.707","url":null,"abstract":"","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44792264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.765
Man-Kyun Kim, Seongjoo Song
Derivative-linked securities (DLS) is a type of derivatives that o ff er an agreed return when the underlying asset price moves within a specified range by the maturity date. The underlying assets of DLS are diverse such as interest rates, exchange rates, crude oil, or gold. A German 10-year bond rate-linked DLS and a USD-GBP CMS rate-linked DLS have recently become a social issue in Korea due to a huge loss to investors. In this regard, this paper accounts for the payo ff structure of these products and evaluates their prices and fair coupon rates as well as risk measures such as Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR). We would like to examine how risky these products were and whether or not their coupon rates were appropriate. We use Hull-White Model as the stochastic model for the underlying assets and Monte Carlo (MC) methods to obtain numerical results. The no-arbitrage prices of the German 10-year bond rate-linked DLS and the USD-GBP CMS rate-linked DLS at the center of the social issue turned out to be 0.9662% and 0.9355% of the original investment, respectively. Considering that Korea government bond rate for 2018 is about 2%, these values are quite low. The fair coupon rates that make the prices of DLS equal to the original investment are computed as 4.76% for the German 10-year bond rate-linked DLS and 7% for the USD-GBP CMS rate-linked DLS. Their actual coupon rates were 1.4% and 3.5%. The 95% VaR and TVaR of the loss for German 10-year bond rate-linked DLS are 37.30% and 64.45%, and those of the loss for USD-GBP CMS rate-linked DLS are 73.98% and 87.43% of the initial investment. Summing up the numerical results obtained, we could see that the DLS products of our interest were indeed quite unfavorable to individual investors.
{"title":"Evaluation of interest rate-linked DLSs","authors":"Man-Kyun Kim, Seongjoo Song","doi":"10.29220/csam.2022.29.1.765","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.765","url":null,"abstract":"Derivative-linked securities (DLS) is a type of derivatives that o ff er an agreed return when the underlying asset price moves within a specified range by the maturity date. The underlying assets of DLS are diverse such as interest rates, exchange rates, crude oil, or gold. A German 10-year bond rate-linked DLS and a USD-GBP CMS rate-linked DLS have recently become a social issue in Korea due to a huge loss to investors. In this regard, this paper accounts for the payo ff structure of these products and evaluates their prices and fair coupon rates as well as risk measures such as Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR). We would like to examine how risky these products were and whether or not their coupon rates were appropriate. We use Hull-White Model as the stochastic model for the underlying assets and Monte Carlo (MC) methods to obtain numerical results. The no-arbitrage prices of the German 10-year bond rate-linked DLS and the USD-GBP CMS rate-linked DLS at the center of the social issue turned out to be 0.9662% and 0.9355% of the original investment, respectively. Considering that Korea government bond rate for 2018 is about 2%, these values are quite low. The fair coupon rates that make the prices of DLS equal to the original investment are computed as 4.76% for the German 10-year bond rate-linked DLS and 7% for the USD-GBP CMS rate-linked DLS. Their actual coupon rates were 1.4% and 3.5%. The 95% VaR and TVaR of the loss for German 10-year bond rate-linked DLS are 37.30% and 64.45%, and those of the loss for USD-GBP CMS rate-linked DLS are 73.98% and 87.43% of the initial investment. Summing up the numerical results obtained, we could see that the DLS products of our interest were indeed quite unfavorable to individual investors.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43795993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.681
M. Ibrahim, K. Aidi, Mir Masoom Ali, H. Yousof
A modified Bagdonaviˇcius-Nikulin chi-square goodness-of-fit is defined and studied. The lymphoma data is analyzed using the modified goodness-of-fit test statistic. Di ff erent non-Bayesian estimation methods under complete samples schemes are considered, discussed and compared such as the maximum likelihood least square estimation method, the Cramer-von Mises estimation method, the weighted least square estimation method, the left tail-Anderson Darling estimation method and the right tail Anderson Darling estimation method. Numerical simulation studies are performed for comparing these estimation methods. The potentiality of the new model is illustrated using three real data sets and compared with many other well-known generalizations. Cram´er-von-Mises estimation, the weighted least square estimation, the left tail-Anderson Darling estimation, and the right tail Anderson Darling estimation methods. Numerical simulation studies were performed for comparing these estimation methods using di ff erent sample sizes and three di ff erent combinations of parameters. The potentiality of the EG-LL model is illustrated using three real data sets and the model is compared with many other well-known generalizations. The new model was proven worthy in modeling breaking stress, survival times and medical data sets. The Barzilai-Borwein algorithm is employed via a simulation study for assessing the performance of the estimators with di ff erent sample sizes as sample size tends to ∞ . Using the Bagdonaviˇcius-Nikulin goodness-of-fit test for validation, we propose a modified chi-square GOF tests for the EG-LL model. We have analyzed a lymphoma data set consisting of times (in months) from diagnosis to death for 31 individuals with advanced non Hodgkin’s lymphoma clinical symptoms by using our model under the modified Bagdonaviˇcius-Nikulin goodness-of-fit test statistic. Based on the MLEs, the modified Bagdon-aviˇcius-Nikulin goodness-of-fit test recovered the loss of information for the grouping data and fol-24
定义并研究了改良的Bagdonavi cius Nikulin卡方优度。淋巴瘤数据采用改良的t检验优度统计进行分析。考虑、讨论和比较了完全样本方案下的不同非贝叶斯估计方法,如最大似然最小二乘估计方法、Cramer von Mises估计方法、加权最小二乘估计方法,左尾Anderson-Darling估计方法和右尾Anderson-Dalling估计方法。为了比较这些估计方法,进行了数值模拟研究。使用三个真实数据集说明了新模型的潜力,并与许多其他众所周知的推广进行了比较。Cram´er-von Mises估计、加权最小二乘估计、左尾Anderson-Darling估计和右尾Anderson-Dalling估计方法。使用不同的样本量和三种不同的参数组合进行了数值模拟研究,以比较这些估计方法。使用三个真实数据集说明了EG-LL模型的潜力,并将该模型与许多其他众所周知的推广进行了比较。事实证明,该新模型在打破压力、生存时间和医疗数据集建模方面是有价值的。Barzilai-Borwein算法通过模拟研究用于评估不同样本量的估计器的性能,因为样本量趋于∞。使用Bagdonavi cius Nikulin优度检验进行验证,我们提出了EG-LL模型的改进卡方GOF检验。我们在改进的Bagdonavi cius Nikulin优度检验统计下,使用我们的模型分析了31名晚期非霍奇金淋巴瘤临床症状患者从诊断到死亡的时间(以月为单位)组成的淋巴瘤数据集。基于MLE,改良的Bagdon avi cius Nikulin优度检验恢复了分组数据的信息损失,如下24
{"title":"The exponential generalized log-logistic model: Bagdonavičius-Nikulin test for validation and non-Bayesian estimation methods","authors":"M. Ibrahim, K. Aidi, Mir Masoom Ali, H. Yousof","doi":"10.29220/csam.2022.29.1.681","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.681","url":null,"abstract":"A modified Bagdonaviˇcius-Nikulin chi-square goodness-of-fit is defined and studied. The lymphoma data is analyzed using the modified goodness-of-fit test statistic. Di ff erent non-Bayesian estimation methods under complete samples schemes are considered, discussed and compared such as the maximum likelihood least square estimation method, the Cramer-von Mises estimation method, the weighted least square estimation method, the left tail-Anderson Darling estimation method and the right tail Anderson Darling estimation method. Numerical simulation studies are performed for comparing these estimation methods. The potentiality of the new model is illustrated using three real data sets and compared with many other well-known generalizations. Cram´er-von-Mises estimation, the weighted least square estimation, the left tail-Anderson Darling estimation, and the right tail Anderson Darling estimation methods. Numerical simulation studies were performed for comparing these estimation methods using di ff erent sample sizes and three di ff erent combinations of parameters. The potentiality of the EG-LL model is illustrated using three real data sets and the model is compared with many other well-known generalizations. The new model was proven worthy in modeling breaking stress, survival times and medical data sets. The Barzilai-Borwein algorithm is employed via a simulation study for assessing the performance of the estimators with di ff erent sample sizes as sample size tends to ∞ . Using the Bagdonaviˇcius-Nikulin goodness-of-fit test for validation, we propose a modified chi-square GOF tests for the EG-LL model. We have analyzed a lymphoma data set consisting of times (in months) from diagnosis to death for 31 individuals with advanced non Hodgkin’s lymphoma clinical symptoms by using our model under the modified Bagdonaviˇcius-Nikulin goodness-of-fit test statistic. Based on the MLEs, the modified Bagdon-aviˇcius-Nikulin goodness-of-fit test recovered the loss of information for the grouping data and fol-24","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43278462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.733
Andrew Jaeho Shin, Minsu Park, Changryong Baek
High dimensional time series is gaining considerable attention in recent years. The sparse vector heterogeneous autoregressive (VHAR) model proposed by Baek and Park (2020) uses adaptive lasso and debiasing procedure in estimation, and showed superb forecasting performance in realized volatilities. This paper extends the sparse VHAR model by considering non-convex penalties such as SCAD and MCP for possible bias reduction from their penalty design. Finite sample performances of three estimation methods are compared through Monte Carlo simulation. Our study shows first that taking into cross-sectional correlations reduces bias. Second, nonconvex penalties performs better when the sample size is small. On the other hand, the adaptive lasso with debiasing performs well as sample size increases. Also, empirical analysis based on 20 multinational realized volatilities is provided.
{"title":"Sparse vector heterogeneous autoregressive model with nonconvex penalties","authors":"Andrew Jaeho Shin, Minsu Park, Changryong Baek","doi":"10.29220/csam.2022.29.1.733","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.733","url":null,"abstract":"High dimensional time series is gaining considerable attention in recent years. The sparse vector heterogeneous autoregressive (VHAR) model proposed by Baek and Park (2020) uses adaptive lasso and debiasing procedure in estimation, and showed superb forecasting performance in realized volatilities. This paper extends the sparse VHAR model by considering non-convex penalties such as SCAD and MCP for possible bias reduction from their penalty design. Finite sample performances of three estimation methods are compared through Monte Carlo simulation. Our study shows first that taking into cross-sectional correlations reduces bias. Second, nonconvex penalties performs better when the sample size is small. On the other hand, the adaptive lasso with debiasing performs well as sample size increases. Also, empirical analysis based on 20 multinational realized volatilities is provided.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42257723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.783
Xingdi Li, Panpan Zhang, Q. Feng
In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.
{"title":"Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis","authors":"Xingdi Li, Panpan Zhang, Q. Feng","doi":"10.29220/csam.2022.29.1.783","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.783","url":null,"abstract":"In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.721
Ji-Eun Choi, D. Shin
We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.
{"title":"How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model","authors":"Ji-Eun Choi, D. Shin","doi":"10.29220/csam.2022.29.1.721","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.721","url":null,"abstract":"We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46194908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.745
Wonil Chung
Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ffi cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ff ect or some interaction e ff ect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have di ff erent numbers of measurements at di ff erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ffi ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.
{"title":"Grid-based Gaussian process models for longitudinal genetic data","authors":"Wonil Chung","doi":"10.29220/csam.2022.29.1.745","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.745","url":null,"abstract":"Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ffi cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ff ect or some interaction e ff ect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have di ff erent numbers of measurements at di ff erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ffi ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48031644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.29220/csam.2022.29.1.807
Jaehee Kim
{"title":"Multiple change-point estimation in spectral representation","authors":"Jaehee Kim","doi":"10.29220/csam.2022.29.1.807","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.807","url":null,"abstract":"","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44217583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-20DOI: 10.29220/csam.2022.29.4.453
Jinkyung Yoo, Zequn Sun, M. Greenacre, Q. Ma, Dongjun Chung, Young Min Kim
The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the generalized linear model with Dirichlet distribution, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
{"title":"A guideline for the statistical analysis of compositional data in immunology","authors":"Jinkyung Yoo, Zequn Sun, M. Greenacre, Q. Ma, Dongjun Chung, Young Min Kim","doi":"10.29220/csam.2022.29.4.453","DOIUrl":"https://doi.org/10.29220/csam.2022.29.4.453","url":null,"abstract":"The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the generalized linear model with Dirichlet distribution, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49479326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}