首页 > 最新文献

Communications for Statistical Applications and Methods最新文献

英文 中文
The use of support vector machines in semi-supervised classification 支持向量机在半监督分类中的应用
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-03-31 DOI: 10.29220/csam.2022.29.2.193
Hyun Bae, Hyungwoo Kim, S. Shin
Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but e ff ective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.
半监督学习在最近的应用中得到了极大的关注。在本文中,我们提供了流行的半监督方法的选择性概述,然后使用支持向量机(SVM)提出了一种简单但有效的半监督分类算法,支持向量机(SVM)是机器学习社区中最流行的二分类器之一。这个想法很简单。首先,我们对未标记的观测值进行降维,并对它们进行聚类,在降维后的空间上分配标签。然后将支持向量机应用于标记和未标记观测数据的组合集,构建分类规则。支持向量机的使用使我们能够通过核技巧将其扩展到非线性对应物。各种场景下的数值实验表明,该方法在半监督分类中是有前途的。
{"title":"The use of support vector machines in semi-supervised classification","authors":"Hyun Bae, Hyungwoo Kim, S. Shin","doi":"10.29220/csam.2022.29.2.193","DOIUrl":"https://doi.org/10.29220/csam.2022.29.2.193","url":null,"abstract":"Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but e ff ective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43802971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Bayesian inference for binary image restoration using Ising model 基于Ising模型的二值图像恢复的变分贝叶斯推理
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.707
Moon-Yeop Jang, Younshik Chung
{"title":"Variational Bayesian inference for binary image restoration using Ising model","authors":"Moon-Yeop Jang, Younshik Chung","doi":"10.29220/csam.2022.29.1.707","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.707","url":null,"abstract":"","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44792264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of interest rate-linked DLSs 评估与利率挂钩的抵押贷款支持证券
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.765
Man-Kyun Kim, Seongjoo Song
Derivative-linked securities (DLS) is a type of derivatives that o ff er an agreed return when the underlying asset price moves within a specified range by the maturity date. The underlying assets of DLS are diverse such as interest rates, exchange rates, crude oil, or gold. A German 10-year bond rate-linked DLS and a USD-GBP CMS rate-linked DLS have recently become a social issue in Korea due to a huge loss to investors. In this regard, this paper accounts for the payo ff structure of these products and evaluates their prices and fair coupon rates as well as risk measures such as Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR). We would like to examine how risky these products were and whether or not their coupon rates were appropriate. We use Hull-White Model as the stochastic model for the underlying assets and Monte Carlo (MC) methods to obtain numerical results. The no-arbitrage prices of the German 10-year bond rate-linked DLS and the USD-GBP CMS rate-linked DLS at the center of the social issue turned out to be 0.9662% and 0.9355% of the original investment, respectively. Considering that Korea government bond rate for 2018 is about 2%, these values are quite low. The fair coupon rates that make the prices of DLS equal to the original investment are computed as 4.76% for the German 10-year bond rate-linked DLS and 7% for the USD-GBP CMS rate-linked DLS. Their actual coupon rates were 1.4% and 3.5%. The 95% VaR and TVaR of the loss for German 10-year bond rate-linked DLS are 37.30% and 64.45%, and those of the loss for USD-GBP CMS rate-linked DLS are 73.98% and 87.43% of the initial investment. Summing up the numerical results obtained, we could see that the DLS products of our interest were indeed quite unfavorable to individual investors.
衍生工具挂钩证券(DLS)是一种衍生工具,当到期日标的资产价格在特定范围内波动时,它会提供约定的回报。DLS的基础资产多种多样,如利率、汇率、原油或黄金。由于投资者遭受巨大损失,德国10年期债券利率挂钩DLS和美元-GBP CMS利率挂钩DLS最近在韩国成为一个社会问题。在这方面,本文解释了这些产品的支付结构,并评估了它们的价格和公平票面利率,以及风险价值(VaR)和尾风险价值(TVaR)等风险指标。我们想检查一下这些产品的风险有多大,以及它们的票面利率是否合适。我们使用Hull-White模型作为基础资产的随机模型,并使用蒙特卡罗(MC)方法来获得数值结果。处于社会问题中心的德国10年期债券利率挂钩DLS和美元-GBP CMS利率挂钩DLS的无套利价格分别为原始投资的0.9662%和0.9355%。考虑到2018年韩国政府债券利率约为2%,这些数值相当低。使DLS价格等于原始投资的公平票面利率计算为德国10年期债券利率挂钩DLS的4.76%,以及美元-GBP CMS利率挂钩的DLS的7%。它们的实际票面利率分别为1.4%和3.5%。德国10年期债券利率挂钩DLS的95%VaR和TVaR损失分别为37.30%和64.45%,美元-英国国债CMS利率挂钩DLS的损失分别为初始投资的73.98%和87.43%。总结所得的数字结果,我们可以看到,我们感兴趣的DLS产品确实对个人投资者非常不利。
{"title":"Evaluation of interest rate-linked DLSs","authors":"Man-Kyun Kim, Seongjoo Song","doi":"10.29220/csam.2022.29.1.765","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.765","url":null,"abstract":"Derivative-linked securities (DLS) is a type of derivatives that o ff er an agreed return when the underlying asset price moves within a specified range by the maturity date. The underlying assets of DLS are diverse such as interest rates, exchange rates, crude oil, or gold. A German 10-year bond rate-linked DLS and a USD-GBP CMS rate-linked DLS have recently become a social issue in Korea due to a huge loss to investors. In this regard, this paper accounts for the payo ff structure of these products and evaluates their prices and fair coupon rates as well as risk measures such as Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR). We would like to examine how risky these products were and whether or not their coupon rates were appropriate. We use Hull-White Model as the stochastic model for the underlying assets and Monte Carlo (MC) methods to obtain numerical results. The no-arbitrage prices of the German 10-year bond rate-linked DLS and the USD-GBP CMS rate-linked DLS at the center of the social issue turned out to be 0.9662% and 0.9355% of the original investment, respectively. Considering that Korea government bond rate for 2018 is about 2%, these values are quite low. The fair coupon rates that make the prices of DLS equal to the original investment are computed as 4.76% for the German 10-year bond rate-linked DLS and 7% for the USD-GBP CMS rate-linked DLS. Their actual coupon rates were 1.4% and 3.5%. The 95% VaR and TVaR of the loss for German 10-year bond rate-linked DLS are 37.30% and 64.45%, and those of the loss for USD-GBP CMS rate-linked DLS are 73.98% and 87.43% of the initial investment. Summing up the numerical results obtained, we could see that the DLS products of our interest were indeed quite unfavorable to individual investors.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43795993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The exponential generalized log-logistic model: Bagdonavičius-Nikulin test for validation and non-Bayesian estimation methods 指数广义对数逻辑模型:用于验证的Bagdonavičius Nikulin检验和非贝叶斯估计方法
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.681
M. Ibrahim, K. Aidi, Mir Masoom Ali, H. Yousof
A modified Bagdonaviˇcius-Nikulin chi-square goodness-of-fit is defined and studied. The lymphoma data is analyzed using the modified goodness-of-fit test statistic. Di ff erent non-Bayesian estimation methods under complete samples schemes are considered, discussed and compared such as the maximum likelihood least square estimation method, the Cramer-von Mises estimation method, the weighted least square estimation method, the left tail-Anderson Darling estimation method and the right tail Anderson Darling estimation method. Numerical simulation studies are performed for comparing these estimation methods. The potentiality of the new model is illustrated using three real data sets and compared with many other well-known generalizations. Cram´er-von-Mises estimation, the weighted least square estimation, the left tail-Anderson Darling estimation, and the right tail Anderson Darling estimation methods. Numerical simulation studies were performed for comparing these estimation methods using di ff erent sample sizes and three di ff erent combinations of parameters. The potentiality of the EG-LL model is illustrated using three real data sets and the model is compared with many other well-known generalizations. The new model was proven worthy in modeling breaking stress, survival times and medical data sets. The Barzilai-Borwein algorithm is employed via a simulation study for assessing the performance of the estimators with di ff erent sample sizes as sample size tends to ∞ . Using the Bagdonaviˇcius-Nikulin goodness-of-fit test for validation, we propose a modified chi-square GOF tests for the EG-LL model. We have analyzed a lymphoma data set consisting of times (in months) from diagnosis to death for 31 individuals with advanced non Hodgkin’s lymphoma clinical symptoms by using our model under the modified Bagdonaviˇcius-Nikulin goodness-of-fit test statistic. Based on the MLEs, the modified Bagdon-aviˇcius-Nikulin goodness-of-fit test recovered the loss of information for the grouping data and fol-24
定义并研究了改良的Bagdonavi cius Nikulin卡方优度。淋巴瘤数据采用改良的t检验优度统计进行分析。考虑、讨论和比较了完全样本方案下的不同非贝叶斯估计方法,如最大似然最小二乘估计方法、Cramer von Mises估计方法、加权最小二乘估计方法,左尾Anderson-Darling估计方法和右尾Anderson-Dalling估计方法。为了比较这些估计方法,进行了数值模拟研究。使用三个真实数据集说明了新模型的潜力,并与许多其他众所周知的推广进行了比较。Cram´er-von Mises估计、加权最小二乘估计、左尾Anderson-Darling估计和右尾Anderson-Dalling估计方法。使用不同的样本量和三种不同的参数组合进行了数值模拟研究,以比较这些估计方法。使用三个真实数据集说明了EG-LL模型的潜力,并将该模型与许多其他众所周知的推广进行了比较。事实证明,该新模型在打破压力、生存时间和医疗数据集建模方面是有价值的。Barzilai-Borwein算法通过模拟研究用于评估不同样本量的估计器的性能,因为样本量趋于∞。使用Bagdonavi cius Nikulin优度检验进行验证,我们提出了EG-LL模型的改进卡方GOF检验。我们在改进的Bagdonavi cius Nikulin优度检验统计下,使用我们的模型分析了31名晚期非霍奇金淋巴瘤临床症状患者从诊断到死亡的时间(以月为单位)组成的淋巴瘤数据集。基于MLE,改良的Bagdon avi cius Nikulin优度检验恢复了分组数据的信息损失,如下24
{"title":"The exponential generalized log-logistic model: Bagdonavičius-Nikulin test for validation and non-Bayesian estimation methods","authors":"M. Ibrahim, K. Aidi, Mir Masoom Ali, H. Yousof","doi":"10.29220/csam.2022.29.1.681","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.681","url":null,"abstract":"A modified Bagdonaviˇcius-Nikulin chi-square goodness-of-fit is defined and studied. The lymphoma data is analyzed using the modified goodness-of-fit test statistic. Di ff erent non-Bayesian estimation methods under complete samples schemes are considered, discussed and compared such as the maximum likelihood least square estimation method, the Cramer-von Mises estimation method, the weighted least square estimation method, the left tail-Anderson Darling estimation method and the right tail Anderson Darling estimation method. Numerical simulation studies are performed for comparing these estimation methods. The potentiality of the new model is illustrated using three real data sets and compared with many other well-known generalizations. Cram´er-von-Mises estimation, the weighted least square estimation, the left tail-Anderson Darling estimation, and the right tail Anderson Darling estimation methods. Numerical simulation studies were performed for comparing these estimation methods using di ff erent sample sizes and three di ff erent combinations of parameters. The potentiality of the EG-LL model is illustrated using three real data sets and the model is compared with many other well-known generalizations. The new model was proven worthy in modeling breaking stress, survival times and medical data sets. The Barzilai-Borwein algorithm is employed via a simulation study for assessing the performance of the estimators with di ff erent sample sizes as sample size tends to ∞ . Using the Bagdonaviˇcius-Nikulin goodness-of-fit test for validation, we propose a modified chi-square GOF tests for the EG-LL model. We have analyzed a lymphoma data set consisting of times (in months) from diagnosis to death for 31 individuals with advanced non Hodgkin’s lymphoma clinical symptoms by using our model under the modified Bagdonaviˇcius-Nikulin goodness-of-fit test statistic. Based on the MLEs, the modified Bagdon-aviˇcius-Nikulin goodness-of-fit test recovered the loss of information for the grouping data and fol-24","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43278462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sparse vector heterogeneous autoregressive model with nonconvex penalties 具有非凸惩罚的稀疏向量异构自回归模型
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.733
Andrew Jaeho Shin, Minsu Park, Changryong Baek
High dimensional time series is gaining considerable attention in recent years. The sparse vector heterogeneous autoregressive (VHAR) model proposed by Baek and Park (2020) uses adaptive lasso and debiasing procedure in estimation, and showed superb forecasting performance in realized volatilities. This paper extends the sparse VHAR model by considering non-convex penalties such as SCAD and MCP for possible bias reduction from their penalty design. Finite sample performances of three estimation methods are compared through Monte Carlo simulation. Our study shows first that taking into cross-sectional correlations reduces bias. Second, nonconvex penalties performs better when the sample size is small. On the other hand, the adaptive lasso with debiasing performs well as sample size increases. Also, empirical analysis based on 20 multinational realized volatilities is provided.
近年来,高维时间序列越来越受到关注。Baek和Park(2020)提出的稀疏向量异质自回归(VHAR)模型在估计中使用了自适应套索和去偏程序,并在实现的波动性中显示出优异的预测性能。本文通过考虑非凸惩罚(如SCAD和MCP)来扩展稀疏VHAR模型,以从惩罚设计中减少可能的偏差。通过蒙特卡洛模拟比较了三种估计方法的有限样本性能。我们的研究首先表明,考虑横截面相关性可以减少偏差。其次,当样本量较小时,非凸惩罚表现更好。另一方面,具有去偏的自适应套索在样本大小增加时表现良好。此外,还对20个跨国公司实现的波动性进行了实证分析。
{"title":"Sparse vector heterogeneous autoregressive model with nonconvex penalties","authors":"Andrew Jaeho Shin, Minsu Park, Changryong Baek","doi":"10.29220/csam.2022.29.1.733","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.733","url":null,"abstract":"High dimensional time series is gaining considerable attention in recent years. The sparse vector heterogeneous autoregressive (VHAR) model proposed by Baek and Park (2020) uses adaptive lasso and debiasing procedure in estimation, and showed superb forecasting performance in realized volatilities. This paper extends the sparse VHAR model by considering non-convex penalties such as SCAD and MCP for possible bias reduction from their penalty design. Finite sample performances of three estimation methods are compared through Monte Carlo simulation. Our study shows first that taking into cross-sectional correlations reduces bias. Second, nonconvex penalties performs better when the sample size is small. On the other hand, the adaptive lasso with debiasing performs well as sample size increases. Also, empirical analysis based on 20 multinational realized volatilities is provided.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42257723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis 通过功能数据分析探索武汉封锁期间中国大陆的新冠肺炎
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.783
Xingdi Li, Panpan Zhang, Q. Feng
In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.
在本文中,我们分析了2019年12月中国爆发的新冠肺炎病例和死亡人数的时间序列数据。研究期间为武汉封城期间。我们利用函数数据分析方法来分析收集的时间序列数据。分析分为三个部分。首先,进行了函数主成分分析来研究变异模式。其次,我们进行了功能规范相关分析,以探讨确诊病例与死亡病例之间的关系。最后,我们利用基于期望最大化(EM)算法的聚类方法对确诊病例的计数进行聚类分析,其中聚类的数量是通过交叉验证方法确定的。此外,我们还将聚类结果与一些可供公众使用的迁移数据进行了比较。
{"title":"Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis","authors":"Xingdi Li, Panpan Zhang, Q. Feng","doi":"10.29220/csam.2022.29.1.783","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.783","url":null,"abstract":"In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model 如何利用在线大数据的谷歌趋势改进石油消费预测?:大向量自回归模型的结构化正则化方法
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.721
Ji-Eun Choi, D. Shin
We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.
我们利用谷歌趋势预测了美国的石油消费水平。谷歌趋势是人们在谷歌上搜索的特定搜索词的搜索量。我们关注的是谷歌趋势术语的正确选择是否会改善石油消费的预测表现。作为预测模型,我们考虑了最小绝对收缩和选择算子(LASSO)回归和Nicholson等人(2017)的大向量自回归(VAR-L)模型的结构化正则化方法,该方法自动选择预测因子的谷歌趋势项和滞后。样本外预测比较表明,与经常使用的预测模型(如自回归模型、自回归分布滞后模型和向量误差校正模型)相比,通过LASSO和VAR-L模型将高维谷歌趋势数据集减少为低维数据集可以产生更好的石油消耗预测性能。
{"title":"How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model","authors":"Ji-Eun Choi, D. Shin","doi":"10.29220/csam.2022.29.1.721","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.721","url":null,"abstract":"We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46194908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grid-based Gaussian process models for longitudinal genetic data 基于网格的纵向遗传数据高斯过程模型
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.745
Wonil Chung
Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ffi cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ff ect or some interaction e ff ect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have di ff erent numbers of measurements at di ff erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ffi ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.
尽管已经开发了各种统计方法来绘制与时间相关的遗传因素,但大多数已确定的遗传变异只能解释纵向性状中估计的遗传变异的一小部分。已知基因-基因和基因-时间/环境相互作用是缺失遗传性的重要推定来源。然而,由于包含这种相互作用的模型的参数空间非常大,映射上位基因-基因相互作用是非常困难的。本文提出了一种基于高斯过程(GP)的非参数贝叶斯变量选择方法。它绘制了多个遗传标记,而不局限于成对的相互作用。GP模型不是显式地对每个主要和交互项进行建模,而是通过未指定的函数测量每个标记的重要性,而不管它主要是由于主要影响还是某些交互影响。为了提高GP模型的灵活性,我们提出了一种基于网格的主题内依赖结构的新方法。该方法可以精确逼近复杂的协方差结构。协方差矩阵的维数仅取决于固定网格点的数量,尽管每个受试者在不同的时间点可能有不同的测量次数。提出了偏差信息准则(DIC)和贝叶斯预测信息准则(BPIC)来选择网格点的最优数量。为了有效地提取后验样本,我们将混合蒙特卡罗方法与部分折叠吉布斯(PCG)采样器相结合。我们将提出的GP模型应用于与年龄相关的体重的小鼠数据集。
{"title":"Grid-based Gaussian process models for longitudinal genetic data","authors":"Wonil Chung","doi":"10.29220/csam.2022.29.1.745","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.745","url":null,"abstract":"Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time / environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely di ffi cult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main e ff ect or some interaction e ff ect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have di ff erent numbers of measurements at di ff erent time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To e ffi ciently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48031644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multiple change-point estimation in spectral representation 谱表示中的多变点估计
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-31 DOI: 10.29220/csam.2022.29.1.807
Jaehee Kim
{"title":"Multiple change-point estimation in spectral representation","authors":"Jaehee Kim","doi":"10.29220/csam.2022.29.1.807","DOIUrl":"https://doi.org/10.29220/csam.2022.29.1.807","url":null,"abstract":"","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44217583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A guideline for the statistical analysis of compositional data in immunology 免疫学成分数据统计分析指南
IF 0.4 Q4 STATISTICS & PROBABILITY Pub Date : 2022-01-20 DOI: 10.29220/csam.2022.29.4.453
Jinkyung Yoo, Zequn Sun, M. Greenacre, Q. Ma, Dongjun Chung, Young Min Kim
The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the generalized linear model with Dirichlet distribution, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
由于产生了多个大规模数据,免疫细胞组成的研究在免疫学中引起了极大的科学兴趣。从统计学的角度来看,这种免疫细胞数据应该被视为成分数据。在组成数据中,每个元素都是正的,所有元素的总和为一个常数,通常可以设置为一。标准统计方法不直接适用于成分数据的分析,因为它们不能适当地处理成分元素之间的相关性。在本文中,我们回顾了成分数据分析的统计方法,并在免疫学的背景下对其进行了说明。具体而言,我们专注于使用对数变换和具有狄利克雷分布的广义线性模型进行回归分析,讨论其理论基础,并说明其在癌症结直肠癌患者免疫细胞分数数据中的应用。
{"title":"A guideline for the statistical analysis of compositional data in immunology","authors":"Jinkyung Yoo, Zequn Sun, M. Greenacre, Q. Ma, Dongjun Chung, Young Min Kim","doi":"10.29220/csam.2022.29.4.453","DOIUrl":"https://doi.org/10.29220/csam.2022.29.4.453","url":null,"abstract":"The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the generalized linear model with Dirichlet distribution, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.","PeriodicalId":44931,"journal":{"name":"Communications for Statistical Applications and Methods","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49479326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Communications for Statistical Applications and Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1