Lee S McDaniel, Nicholas C Henderson, Paul J Rathouz
Generalized estimating equation solvers in R only allow for a few pre-determined options for the link and variance functions. We provide a package, geeM, which is implemented entirely in R and allows for user specified link and variance functions. The sparse matrix representations provided in the Matrix package enable a fast implementation. To gain speed, we make use of analytic inverses of the working correlation when possible and a trick to find quick numeric inverses when an analytic inverse is not available. Through three examples, we demonstrate the speed of geeM, which is not much worse than C implementations like geepack and gee on small data sets and faster on large data sets.
{"title":"Fast Pure R Implementation of GEE: Application of the Matrix Package.","authors":"Lee S McDaniel, Nicholas C Henderson, Paul J Rathouz","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Generalized estimating equation solvers in R only allow for a few pre-determined options for the link and variance functions. We provide a package, <b>geeM</b>, which is implemented entirely in R and allows for user specified link and variance functions. The sparse matrix representations provided in the <b>Matrix</b> package enable a fast implementation. To gain speed, we make use of analytic inverses of the working correlation when possible and a trick to find quick numeric inverses when an analytic inverse is not available. Through three examples, we demonstrate the speed of <b>geeM</b>, which is not much worse than C implementations like <b>geepack</b> and <b>gee</b> on small data sets and faster on large data sets.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289620/pdf/nihms-607237.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32974591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.
{"title":"Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming","authors":"Haizhou Wang, Mingzhou Song","doi":"10.32614/RJ-2011-015","DOIUrl":"https://doi.org/10.32614/RJ-2011-015","url":null,"abstract":"The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69958431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.
{"title":"Ckmeans.1d.dp: Optimal <i>k</i>-means Clustering in One Dimension by Dynamic Programming.","authors":"Haizhou Wang, Mingzhou Song","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The heuristic <i>k</i>-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called <b>Ckmeans.1d.dp</b>. We demonstrate its advantage in optimality and runtime over the standard iterative <i>k</i>-means algorithm.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72211872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When the prevalence of a disease or of some other binary characteristic is small, group testing (also known as pooled testing) is frequently used to estimate the prevalence and/or to identify individuals as positive or negative. We have developed the binGroup package as the first package designed to address the estimation problem in group testing. We present functions to estimate an overall prevalence for a homogeneous population. Also, for this setting, we have functions to aid in the very important choice of the group size. When individuals come from a heterogeneous population, our group testing regression functions can be used to estimate an individual probability of disease positivity by using the group observations only. We illustrate our functions with data from a multiple vector transfer design experiment and a human infectious disease prevalence study.
{"title":"binGroup: A Package for Group Testing","authors":"C. Bilder, Boan Zhang, F. Schaarschmidt, J. Tebbs","doi":"10.32614/RJ-2010-016","DOIUrl":"https://doi.org/10.32614/RJ-2010-016","url":null,"abstract":"When the prevalence of a disease or of some other binary characteristic is small, group testing (also known as pooled testing) is frequently used to estimate the prevalence and/or to identify individuals as positive or negative. We have developed the binGroup package as the first package designed to address the estimation problem in group testing. We present functions to estimate an overall prevalence for a homogeneous population. Also, for this setting, we have functions to aid in the very important choice of the group size. When individuals come from a heterogeneous population, our group testing regression functions can be used to estimate an individual probability of disease positivity by using the group observations only. We illustrate our functions with data from a multiple vector transfer design experiment and a human infectious disease prevalence study.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69958389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher R Bilder, Boan Zhang, Frank Schaarschmidt, Joshua M Tebbs
When the prevalence of a disease or of some other binary characteristic is small, group testing (also known as pooled testing) is frequently used to estimate the prevalence and/or to identify individuals as positive or negative. We have developed the binGroup package as the first package designed to address the estimation problem in group testing. We present functions to estimate an overall prevalence for a homogeneous population. Also, for this setting, we have functions to aid in the very important choice of the group size. When individuals come from a heterogeneous population, our group testing regression functions can be used to estimate an individual probability of disease positivity by using the group observations only. We illustrate our functions with data from a multiple vector transfer design experiment and a human infectious disease prevalence study.
{"title":"binGroup: A Package for Group Testing.","authors":"Christopher R Bilder, Boan Zhang, Frank Schaarschmidt, Joshua M Tebbs","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When the prevalence of a disease or of some other binary characteristic is small, group testing (also known as pooled testing) is frequently used to estimate the prevalence and/or to identify individuals as positive or negative. We have developed the binGroup package as the first package designed to address the estimation problem in group testing. We present functions to estimate an overall prevalence for a homogeneous population. Also, for this setting, we have functions to aid in the very important choice of the group size. When individuals come from a heterogeneous population, our group testing regression functions can be used to estimate an individual probability of disease positivity by using the group observations only. We illustrate our functions with data from a multiple vector transfer design experiment and a human infectious disease prevalence study.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3152446/pdf/nihms267443.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29925479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miwa et al. (2003) proposed a numerical algorithm for evaluating multivariate normal probabilities. Starting with version 0.9-0 of the mvtnorm package (Hothorn et al., 2001; Genz et al., 2008), this algorithm is available to the R community. We give a brief introduction to Miwa’s procedure and compare it, with respect to computing time and accuracy, to a quasi-randomized Monte-Carlo procedure proposed by Genz and Bretz (1999), which has been available through mvtnorm for some years now. The new algorithm is applicable to problems with dimension smaller than 20, whereas the procedures by Genz and Bretz (1999) can be used to evaluate 1000-dimensional normal distributions. At the end of this article, a suggestion is given for choosing a suitable algorithm in different situations.
Miwa等人(2003)提出了一种评估多元正态概率的数值算法。从mvtnorm包的0.9-0版本开始(Hothorn et al., 2001;Genz et al., 2008),该算法可供R社区使用。我们简要介绍了Miwa的程序,并将其与Genz和Bretz(1999)提出的准随机蒙特卡罗程序进行了比较,该程序已通过mvtnorm提供了多年。新算法适用于小于20维的问题,而Genz和Bretz(1999)的程序可用于评估1000维的正态分布。在本文的最后,给出了在不同情况下选择合适算法的建议。
{"title":"mvtnorm: New numerical algorithm for multivariate normal probabilities","authors":"Xuefei Mi, Tetsuhisa Miwa, T. Hothorn","doi":"10.15488/3835","DOIUrl":"https://doi.org/10.15488/3835","url":null,"abstract":"Miwa et al. (2003) proposed a numerical algorithm for evaluating multivariate normal probabilities. Starting with version 0.9-0 of the mvtnorm package (Hothorn et al., 2001; Genz et al., 2008), this algorithm is available to the R community. We give a brief introduction to Miwa’s procedure and compare it, with respect to computing time and accuracy, to a quasi-randomized Monte-Carlo procedure proposed by Genz and Bretz (1999), which has been available through mvtnorm for some years now. The new algorithm is applicable to problems with dimension smaller than 20, whereas the procedures by Genz and Bretz (1999) can be used to evaluate 1000-dimensional normal distributions. At the end of this article, a suggestion is given for choosing a suitable algorithm in different situations.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77916968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}