{"title":"etrm: Energy Trading and Risk Management in R","authors":"Anders D. Sleire","doi":"10.32614/rj-2022-013","DOIUrl":"https://doi.org/10.32614/rj-2022-013","url":null,"abstract":"","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69958723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Subgaussian Stable Distributions in R","authors":"B. Swihart, J. P. Nolan","doi":"10.32614/rj-2022-056","DOIUrl":"https://doi.org/10.32614/rj-2022-056","url":null,"abstract":"","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69959039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01Epub Date: 2021-08-17DOI: 10.32614/rj-2021-072
Andrew G Allmon, J S Marron, Michael G Hudgens
High-dimensional low sample size (HDLSS) data sets frequently emerge in many biomedical applications. The direction-projection-permutation (DiProPerm) test is a two-sample hypothesis test for comparing two high-dimensional distributions. The DiProPerm test is exact, i.e., the type I error is guaranteed to be controlled at the nominal level for any sample size, and thus is applicable in the HDLSS setting. This paper discusses the key components of the DiProPerm test, introduces the diproperm R package, and demonstrates the package on a real-world data set.
高维低样本量(HDLSS)数据集经常出现在许多生物医学应用中。方向-投影-畸变(DiProPerm)检验是一种双样本假设检验,用于比较两个高维分布。DiProPerm 检验是精确的,即在任何样本量下都能保证 I 型误差控制在标称水平,因此适用于 HDLSS 设置。本文讨论了 DiProPerm 检验的关键组成部分,介绍了 diproperm R 软件包,并在实际数据集上演示了该软件包。
{"title":"diproperm: An R Package for the DiProPerm Test.","authors":"Andrew G Allmon, J S Marron, Michael G Hudgens","doi":"10.32614/rj-2021-072","DOIUrl":"10.32614/rj-2021-072","url":null,"abstract":"<p><p>High-dimensional low sample size (HDLSS) data sets frequently emerge in many biomedical applications. The direction-projection-permutation (DiProPerm) test is a two-sample hypothesis test for comparing two high-dimensional distributions. The DiProPerm test is exact, i.e., the type I error is guaranteed to be controlled at the nominal level for any sample size, and thus is applicable in the HDLSS setting. This paper discusses the key components of the DiProPerm test, introduces the diproperm R package, and demonstrates the package on a real-world data set.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202909/pdf/nihms-1809552.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40026217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The R Journal: Changes in R 4.0–4.1","authors":"T. Kalibera, S. Meyer, K. Hornik","doi":"10.32614/CORE","DOIUrl":"https://doi.org/10.32614/CORE","url":null,"abstract":"","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86262117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachael C Aikens, Joseph Rigdon, Justin Lee, Michael Baiocchi, Andrew B Goldstone, Peter Chiu, Y Joseph Woo, Jonathan H Chen
In a block-randomized controlled trial, individuals are subdivided by prognostically important baseline characteristics (e.g., age group, sex, or smoking status), prior to randomization. This step reduces the heterogeneity between the treatment groups with respect to the baseline factors most important to determining the outcome, thus enabling more precise estimation of treatment effect. The stratamatch package extends this approach to the observational setting by implementing functions to separate an observational data set into strata and interrogate the quality of different stratification schemes. Once an acceptable stratification is found, treated and control individuals can be matched by propensity score within strata, thereby recapitulating the block-randomized trial design for the observational study. The stratification scheme implemented by stratamatch applies a "pilot design" approach (Aikens, Greaves, and Baiocchi 2019) to estimate a quantity called the prognostic score (Hansen 2008), which is used to divide individuals into strata. The potential benefits of such an approach are twofold. First, stratifying the data enables more computationally efficient matching of large data sets. Second, methodological studies suggest that using a prognostic score to inform the matching process increases the precision of the effect estimate and reduces sensitivity to bias from unmeasured confounding factors (Aikens et al. 2019; Leacy and Stuart 2014; Antonelli, Cefalu, Palmer, and Agniel 2018). A common mistake is to believe reserving more data for the analysis phase of a study is always better. Instead, the stratamatch approach suggests how clever use of data in the design phase of large studies can lead to major benefits in the robustness of the study conclusions.
在一项整体随机对照试验中,在随机化之前,根据预后重要的基线特征(例如,年龄组、性别或吸烟状况)对个体进行细分。这一步骤减少了治疗组之间对于决定结果最重要的基线因素的异质性,从而能够更精确地估计治疗效果。stratmatch软件包通过实现将观测数据集分离到地层并询问不同分层方案的质量的功能,将这种方法扩展到观测设置。一旦找到了可接受的分层,治疗组和对照组就可以通过分层内的倾向得分进行匹配,从而概括了观察性研究的区域随机试验设计。分层方案采用“试点设计”方法(Aikens, Greaves, and Baiocchi, 2019)来估计一个称为预后评分的数量(Hansen, 2008),用于将个体划分为不同的层。这种方法的潜在好处是双重的。首先,对数据进行分层可以更有效地匹配大型数据集。其次,方法学研究表明,使用预后评分来告知匹配过程可以提高效果估计的精度,并降低对未测量混杂因素的偏差的敏感性(Aikens et al. 2019;Leacy and Stuart 2014;Antonelli, Cefalu, Palmer, and Agniel 2018)。一个常见的错误是认为为研究的分析阶段保留更多的数据总是更好。相反,分层匹配方法表明,在大型研究的设计阶段如何巧妙地使用数据,可以在研究结论的稳健性方面带来重大好处。
{"title":"stratamatch: Prognostic Score Stratification Using a Pilot Design.","authors":"Rachael C Aikens, Joseph Rigdon, Justin Lee, Michael Baiocchi, Andrew B Goldstone, Peter Chiu, Y Joseph Woo, Jonathan H Chen","doi":"10.32614/RJ-2021-063","DOIUrl":"https://doi.org/10.32614/RJ-2021-063","url":null,"abstract":"In a block-randomized controlled trial, individuals are subdivided by prognostically important baseline characteristics (e.g., age group, sex, or smoking status), prior to randomization. This step reduces the heterogeneity between the treatment groups with respect to the baseline factors most important to determining the outcome, thus enabling more precise estimation of treatment effect. The stratamatch package extends this approach to the observational setting by implementing functions to separate an observational data set into strata and interrogate the quality of different stratification schemes. Once an acceptable stratification is found, treated and control individuals can be matched by propensity score within strata, thereby recapitulating the block-randomized trial design for the observational study. The stratification scheme implemented by stratamatch applies a \"pilot design\" approach (Aikens, Greaves, and Baiocchi 2019) to estimate a quantity called the prognostic score (Hansen 2008), which is used to divide individuals into strata. The potential benefits of such an approach are twofold. First, stratifying the data enables more computationally efficient matching of large data sets. Second, methodological studies suggest that using a prognostic score to inform the matching process increases the precision of the effect estimate and reduces sensitivity to bias from unmeasured confounding factors (Aikens et al. 2019; Leacy and Stuart 2014; Antonelli, Cefalu, Palmer, and Agniel 2018). A common mistake is to believe reserving more data for the analysis phase of a study is always better. Instead, the stratamatch approach suggests how clever use of data in the design phase of large studies can lead to major benefits in the robustness of the study conclusions.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273035/pdf/nihms-1619246.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40597502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-07DOI: 10.32614/RJ-2021-033
Eashwar V Somasundaram, Shael E Brown, Adam Litzler, Jacob G Scott, Raoul R Wadhwa
Several persistent homology software libraries have been implemented in R. Specifically, the Dionysus, GUDHI, and Ripser libraries have been wrapped by the TDA and TDAstats CRAN packages. These software represent powerful analysis tools that are computationally expensive and, to our knowledge, have not been formally benchmarked. Here, we analyze runtime and memory growth for the 2 R packages and the 3 underlying libraries. We find that datasets with less than 3 dimensions can be evaluated with persistent homology fastest by the GUDHI library in the TDA package. For higher-dimensional datasets, the Ripser library in the TDAstats package is the fastest. Ripser and TDAstats are also the most memory-efficient tools to calculate persistent homology.
{"title":"Benchmarking R packages for Calculation of Persistent Homology.","authors":"Eashwar V Somasundaram, Shael E Brown, Adam Litzler, Jacob G Scott, Raoul R Wadhwa","doi":"10.32614/RJ-2021-033","DOIUrl":"https://doi.org/10.32614/RJ-2021-033","url":null,"abstract":"<p><p>Several persistent homology software libraries have been implemented in R. Specifically, the Dionysus, GUDHI, and Ripser libraries have been wrapped by the <b>TDA</b> and <b>TDAstats</b> CRAN packages. These software represent powerful analysis tools that are computationally expensive and, to our knowledge, have not been formally benchmarked. Here, we analyze runtime and memory growth for the 2 R packages and the 3 underlying libraries. We find that datasets with less than 3 dimensions can be evaluated with persistent homology fastest by the GUDHI library in the <b>TDA</b> package. For higher-dimensional datasets, the Ripser library in the TDAstats package is the fastest. Ripser and <b>TDAstats</b> are also the most memory-efficient tools to calculate persistent homology.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8434812/pdf/nihms-1733366.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39409270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Çavuş, Olgun Aydın, Ozan Evkaya, Ozancan Özdemir, Deniz Bezer, Ugur Dar
The Why R? Turkey 2021 as a three-day online conference was organized to bring together researchers and professionals from Turkey on April 16-17-18, 2021. We hereby aimed to promote the R community in Turkey by bringing R users with different backgrounds such as genetics, sociology, finance, economy, bio-statistics. There were 8 thematic sessions and 18 invited speakers. In this article, it is aimed to describe the preparation phase, technical details, and the impact of the conference on audience.
{"title":"Conference Report of Why R? Turkey 2021","authors":"M. Çavuş, Olgun Aydın, Ozan Evkaya, Ozancan Özdemir, Deniz Bezer, Ugur Dar","doi":"10.32614/RJ-2021","DOIUrl":"https://doi.org/10.32614/RJ-2021","url":null,"abstract":"The Why R? Turkey 2021 as a three-day online conference was organized to bring together researchers and professionals from Turkey on April 16-17-18, 2021. We hereby aimed to promote the R community in Turkey by bringing R users with different backgrounds such as genetics, sociology, finance, economy, bio-statistics. There were 8 thematic sessions and 18 invited speakers. In this article, it is aimed to describe the preparation phase, technical details, and the impact of the conference on audience.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72897054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily Morris, Kevin He, Yanming Li, Yi Li, Jian Kang
High-dimensional variable selection in the proportional hazards (PH) model has many successful applications in different areas. In practice, data may involve confounding variables that do not satisfy the PH assumption, in which case the stratified proportional hazards (SPH) model can be adopted to control the confounding effects by stratification without directly modeling the confounding effects. However, there is a lack of computationally efficient statistical software for high-dimensional variable selection in the SPH model. In this work an R package, SurvBoost, is developed to implement the gradient boosting algorithm for fitting the SPH model with high-dimensional covariate variables. Simulation studies demonstrate that in many scenarios SurvBoost can achieve better selection accuracy and reduce computational time substantially compared to the existing R package that implements boosting algorithms without stratification. The proposed R package is also illustrated by an analysis of gene expression data with survival outcome in The Cancer Genome Atlas study. In addition, a detailed hands-on tutorial for SurvBoost is provided.
{"title":"SurvBoost: An R Package for High-Dimensional Variable Selection in the Stratified Proportional Hazards Model via Gradient Boosting.","authors":"Emily Morris, Kevin He, Yanming Li, Yi Li, Jian Kang","doi":"10.32614/rj-2020-018","DOIUrl":"https://doi.org/10.32614/rj-2020-018","url":null,"abstract":"<p><p>High-dimensional variable selection in the proportional hazards (PH) model has many successful applications in different areas. In practice, data may involve confounding variables that do not satisfy the PH assumption, in which case the stratified proportional hazards (SPH) model can be adopted to control the confounding effects by stratification without directly modeling the confounding effects. However, there is a lack of computationally efficient statistical software for high-dimensional variable selection in the SPH model. In this work an R package, <b>SurvBoost</b>, is developed to implement the gradient boosting algorithm for fitting the SPH model with high-dimensional covariate variables. Simulation studies demonstrate that in many scenarios <b>SurvBoost</b> can achieve better selection accuracy and reduce computational time substantially compared to the existing R package that implements boosting algorithms without stratification. The proposed R package is also illustrated by an analysis of gene expression data with survival outcome in The Cancer Genome Atlas study. In addition, a detailed hands-on tutorial for <b>SurvBoost</b> is provided.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8174798/pdf/nihms-1656432.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39084202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Hyun Lee, Heng Zhou, Jing Ning, Diane D Liu, Yu Shen
Data subject to length-biased sampling are frequently encountered in various applications including prevalent cohort studies and are considered as a special case of left-truncated data under the stationarity assumption. Many semiparametric regression methods have been proposed for length-biased data to model the association between covariates and the survival outcome of interest. In this paper, we present a brief review of the statistical methodologies established for the analysis of length-biased data under the Cox model, which is the most commonly adopted semiparametric model, and introduce an R package CoxPhLb that implements these methods. Specifically, the package includes features such as fitting the Cox model to explore covariate effects on survival times and checking the proportional hazards model assumptions and the stationarity assumption. We illustrate usage of the package with a simulated data example and a real dataset, the Channing House data, which are publicly available.
{"title":"CoxPhLb: An R Package for Analyzing Length Biased Data under Cox Model.","authors":"Chi Hyun Lee, Heng Zhou, Jing Ning, Diane D Liu, Yu Shen","doi":"10.32614/rj-2020-024","DOIUrl":"https://doi.org/10.32614/rj-2020-024","url":null,"abstract":"<p><p>Data subject to length-biased sampling are frequently encountered in various applications including prevalent cohort studies and are considered as a special case of left-truncated data under the stationarity assumption. Many semiparametric regression methods have been proposed for length-biased data to model the association between covariates and the survival outcome of interest. In this paper, we present a brief review of the statistical methodologies established for the analysis of length-biased data under the Cox model, which is the most commonly adopted semiparametric model, and introduce an R package <b>CoxPhLb</b> that implements these methods. Specifically, the package includes features such as fitting the Cox model to explore covariate effects on survival times and checking the proportional hazards model assumptions and the stationarity assumption. We illustrate usage of the package with a simulated data example and a real dataset, the Channing House data, which are publicly available.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7595345/pdf/nihms-1638580.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38657972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}