首页 > 最新文献

R J.最新文献

英文 中文
Package wsbackfit for Smooth Backfitting Estimation of Generalized Structured Models 广义结构模型的光滑反拟合估计
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-042
J. Roca-Pardiñas, M. Rodríguez-Álvarez, S. Sperlich
A package is introduced that provides the weighted smooth backfitting estimator for a large family of popular semiparametric regression models. This family is known as generalized structured models, comprising, for example, generalized varying coefficient model, generalized additive models, mixtures, potentially including parametric parts. The kernel based weighted smooth backfitting belongs to the statistically most efficient procedures for this model class. Its asymptotic properties are well understood thanks to the large body of literature about this estimator. The introduced weights allow for the inclusion of sampling weights, trimming, and efficient estimation under heteroscedasticity. Further options facilitate an easy handling of aggregated data, prediction, and the presentation of estimation results. Cross-validation methods are provided which can be used for model and bandwidth selection.
介绍了一个包,它提供了一大类流行的半参数回归模型的加权光滑反拟合估计。这个家族被称为广义结构模型,包括,例如,广义变系数模型,广义加性模型,混合物,可能包括参数部分。基于核的加权平滑反拟合是这类模型统计上最有效的方法。由于关于这个估计量的大量文献,它的渐近性质被很好地理解。引入的权重允许在异方差下包含抽样权重,修剪和有效估计。进一步的选项可以方便地处理聚合数据、预测和估计结果的表示。提出了可用于模型和带宽选择的交叉验证方法。
{"title":"Package wsbackfit for Smooth Backfitting Estimation of Generalized Structured Models","authors":"J. Roca-Pardiñas, M. Rodríguez-Álvarez, S. Sperlich","doi":"10.32614/rj-2021-042","DOIUrl":"https://doi.org/10.32614/rj-2021-042","url":null,"abstract":"A package is introduced that provides the weighted smooth backfitting estimator for a large family of popular semiparametric regression models. This family is known as generalized structured models, comprising, for example, generalized varying coefficient model, generalized additive models, mixtures, potentially including parametric parts. The kernel based weighted smooth backfitting belongs to the statistically most efficient procedures for this model class. Its asymptotic properties are well understood thanks to the large body of literature about this estimator. The introduced weights allow for the inclusion of sampling weights, trimming, and efficient estimation under heteroscedasticity. Further options facilitate an easy handling of aggregated data, prediction, and the presentation of estimation results. Cross-validation methods are provided which can be used for model and bandwidth selection.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"13 1","pages":"330"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78498328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package 使用正则表达式和nc包进行从宽到高的数据重塑
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-029
T. Hocking
Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe the information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).
正则表达式是从非表格文本数据中提取表格的强大工具。捕获描述要从列名中提取的信息的正则表达式在将数据表从宽(具有许多规则命名的列的少数行)重塑为高(具有更多行的较少列)时特别有用。我们介绍了R包nc(命名捕获的缩写),它提供了使用正则表达式进行从宽到高的数据重塑的函数。我们描述了nc的主要新思想,并提供了与相关R包(stats, utils, data. js)的详细比较。Table、tidyr、tidyfast、tidyfst、shape2、cdata)。
{"title":"Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package","authors":"T. Hocking","doi":"10.32614/rj-2021-029","DOIUrl":"https://doi.org/10.32614/rj-2021-029","url":null,"abstract":"Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe the information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).","PeriodicalId":20974,"journal":{"name":"R J.","volume":"66 1","pages":"69"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73837511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
StratigrapheR: Concepts for Litholog Generation in R 地层学家:R中岩性生成的概念
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-039
Sébastien Wouters, A. Silva, F. Boulvain, X. Devleeschouwer
The StratigrapheR package proposes new concepts for the generation of lithological logs, or lithologs, in R. The generation of lithologs in a scripting environment opens new opportunities for the processing and analysis of stratified geological data. Among the new concepts presented: new plotting and data processing methodologies, new general R functions, and computer-oriented data conventions are provided. The package structure allows for these new concepts to be further improved, which can be done independently by any R user. The current limitations of the package are highlighted, along with the limitations in R for geological data processing, to help identify the best paths for improvements. Introduction StratigrapheR is a package implemented in the open-source programming environment R. StratigrapheR endeavors to explore new concepts to process stratified geological data. These concepts are provided to answer a major difficulty posed by such data; namely a large amount of field observations of varied nature, sometimes localized and small-scale, can carry information on large-scale processes. Visualizing the relevant observations all at once is therefore difficult. The usual answer to this problem in successions of stratified rocks is to report observations in a schematic form: the lithological log, or litholog (e.g., Fig. 1). The litholog is an essential tool in sedimentology and stratigraphy and proves to be equally invaluable in other fields such as volcanology, igneous petrology, or paleontology. Ideally, any data contained in a litholog should be available in a reproducible form. Therefore, the challenge at hand is what we would call "from art to useful data"; how can we best extract and/or process the information contained in a litholog, designed to be as visually informative as possible (see again Fig. 1). 28 29 30 31 32 33 34 44 45a 45b 45c 46 47 48 49 51 35 52a 52b 60a 60b 60c 61 HIATUS lamellar stromatoporoids branching stromatoporoids lamellar tabulate corals branching tabulate corals brachiopods crinoids receptaculitids small fenestrae large fenestrae
StratigrapheR套件提出了在r中生成岩性测井或岩性的新概念,在脚本环境中生成岩性为处理和分析分层地质数据提供了新的机会。提出的新概念包括:新的绘图和数据处理方法、新的通用R函数和面向计算机的数据约定。包结构允许进一步改进这些新概念,这可以由任何R用户独立完成。强调了当前软件包的局限性,以及R在地质数据处理方面的局限性,以帮助确定改进的最佳途径。StratigrapheR是在开源编程环境r中实现的软件包。StratigrapheR致力于探索处理分层地质数据的新概念。提供这些概念是为了回答这些数据所造成的一个主要困难;也就是说,大量不同性质的实地观测,有时是局部的和小规模的,可以提供关于大规模过程的信息。因此,一下子把相关的观察结果形象化是很困难的。对于分层岩石序列的这个问题,通常的答案是用图示形式报告观察结果:岩性测井或岩性(例如,图1)。岩性是沉积学和地层学的重要工具,在其他领域,如火山学、火成岩岩石学或古生物学,也被证明是同样宝贵的。理想情况下,岩性中包含的任何数据都应该以可复制的形式提供。因此,我们所面临的挑战便是所谓的“从艺术到有用数据”;我们如何才能最好地提取和/或处理岩性中包含的信息,这些信息被设计成尽可能直观地提供信息(再次参见图1)。HIATUS片层状叠层孔虫分枝状叠层孔虫片层状板状珊瑚分枝状板状珊瑚臂足类海百合类插孔虫小窗大窗
{"title":"StratigrapheR: Concepts for Litholog Generation in R","authors":"Sébastien Wouters, A. Silva, F. Boulvain, X. Devleeschouwer","doi":"10.32614/rj-2021-039","DOIUrl":"https://doi.org/10.32614/rj-2021-039","url":null,"abstract":"The StratigrapheR package proposes new concepts for the generation of lithological logs, or lithologs, in R. The generation of lithologs in a scripting environment opens new opportunities for the processing and analysis of stratified geological data. Among the new concepts presented: new plotting and data processing methodologies, new general R functions, and computer-oriented data conventions are provided. The package structure allows for these new concepts to be further improved, which can be done independently by any R user. The current limitations of the package are highlighted, along with the limitations in R for geological data processing, to help identify the best paths for improvements. Introduction StratigrapheR is a package implemented in the open-source programming environment R. StratigrapheR endeavors to explore new concepts to process stratified geological data. These concepts are provided to answer a major difficulty posed by such data; namely a large amount of field observations of varied nature, sometimes localized and small-scale, can carry information on large-scale processes. Visualizing the relevant observations all at once is therefore difficult. The usual answer to this problem in successions of stratified rocks is to report observations in a schematic form: the lithological log, or litholog (e.g., Fig. 1). The litholog is an essential tool in sedimentology and stratigraphy and proves to be equally invaluable in other fields such as volcanology, igneous petrology, or paleontology. Ideally, any data contained in a litholog should be available in a reproducible form. Therefore, the challenge at hand is what we would call \"from art to useful data\"; how can we best extract and/or process the information contained in a litholog, designed to be as visually informative as possible (see again Fig. 1). 28 29 30 31 32 33 34 44 45a 45b 45c 46 47 48 49 51 35 52a 52b 60a 60b 60c 61 HIATUS lamellar stromatoporoids branching stromatoporoids lamellar tabulate corals branching tabulate corals brachiopods crinoids receptaculitids small fenestrae large fenestrae","PeriodicalId":20974,"journal":{"name":"R J.","volume":"29 1","pages":"70"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90626911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automating Reproducible, Collaborative Clinical Trial Document Generation with the listdown Package 使用listdown包自动生成可重复、协作的临床试验文件
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-051
M. Kane, Xun Jiang, Simon Urbanek
The conveyance of clinical trial explorations and analysis results from a statistician to a clinical investigator is a critical component to the drug development and clinical research cycle. Automating the process of generating documents for data descriptions, summaries, exploration, and analysis allows statistician to provide a more comprehensive view of the information captured by a clinical trial and efficient generation of these documents allows the statistican to focus more on the conceptual development of a trial or trial analysis and less on the implementation of the summaries and results on which decisions are made. This paper explores the use of the listdown package for automating reproducible documents in clinical trials that facilitate the collaboration between statisticians and clinicians as well as defining an analysis pipeline for document generation. Background and Introduction The conveyance of clinical trial explorations and analysis results from a statistician to a clinical investigator is an often overlooked but critical component to the drug development and clinical research cycle. Graphs, tables, and other analysis artifacts are at the nexus of these collaborations. They facilitate identifying problems and bugs in the data preparation and processing stage, they help to build an intuitive understanding of mechanisms of disease and their treatment, they elucidate prognostic and predictive relationships, they provide insight that results in new hypotheses, and they convince researchers of analyses testing hypotheses. Despite their importance, the process of generating these artifacts is usually done in an ad-hoc manner. This is partially because of the nuance and diversity of the hypotheses and scientific questions being interrogated and, to a lesser degree, the variation in clinical data formatting. The usual process usually has a statistician providing a standard set of artifacts, receiving feedback, and providing an updates based on feedback. Work performed for one trial is rarely leveraged on others and as a result, a large amount of work needs to be reproduced for each trial. There are two glaring problems with this approach. First, each analysis of a trial requires a substantial amount of error-prone work. While the variation between trials means some work needs to be done for preparation, exploration, and analysis, there are many aspects of these trials that could be better automated resulting in greater efficiency and accuracy. Second, because this work is challenging, it often occupies the majority of the statisticians effort. Less time is spent on trial design and analysis and the this portion is taken up by a clinician who often has less expertise with the statistical aspects of the trial. As a result, the extra effort spent on processing data undermines statisticians role as a collaborator and relegates them to service provider. Need tools leveraging existing work to more efficiently provide holistic views on trials
将临床试验结果和分析结果从统计学家传递给临床研究者是药物开发和临床研究周期的关键组成部分。为数据描述、摘要、探索和分析生成文档的自动化过程使统计学家能够对临床试验捕获的信息提供更全面的视图,并且这些文档的高效生成使统计学家能够更多地关注试验或试验分析的概念发展,而不是关注制定决策的摘要和结果的实施。本文探讨了listdown包在临床试验中自动化可重复文件的使用,促进了统计学家和临床医生之间的合作,并定义了文档生成的分析管道。从统计学家到临床研究者的临床试验探索和分析结果的传递是药物开发和临床研究周期中经常被忽视的关键组成部分。图、表和其他分析工件是这些协作的连接点。它们有助于识别数据准备和处理阶段的问题和缺陷,有助于建立对疾病及其治疗机制的直观理解,阐明预后和预测关系,提供导致新假设的洞察力,并说服研究人员进行分析以检验假设。尽管它们很重要,但是生成这些工件的过程通常是以一种特别的方式完成的。这部分是因为假设和科学问题的细微差别和多样性,在较小程度上,临床数据格式的变化。通常的流程通常由统计人员提供一组标准工件,接收反馈,并根据反馈提供更新。为一个试验执行的工作很少对其他试验产生影响,因此,需要为每个试验重复大量的工作。这种方法有两个明显的问题。首先,每次试验分析都需要大量容易出错的工作。虽然试验之间的差异意味着需要做一些准备、探索和分析工作,但这些试验的许多方面可以更好地自动化,从而提高效率和准确性。其次,由于这项工作具有挑战性,它往往占据了统计学家的大部分精力。在试验设计和分析上花费的时间较少,这部分工作由临床医生承担,他们通常对试验的统计方面缺乏专业知识。因此,花费在处理数据上的额外努力破坏了统计学家作为合作者的角色,并将他们降级为服务提供者。需要利用现有工作的工具来更有效地提供有关试验的整体视图,这将减少工作量,并使试验设计和分析更加准确和全面。R Core Team(2012)的软件包生态系统的丰富性,特别是其对分析、可视化、可再现性和传播的强调,使得为临床试验创建这些工具的目标变得可行。表的生成由tableone (Yoshida and Bartel, 2020)、gt (Iannone et al., 2020)、gtsummary (Sjoberg et al., 2020)等软件包支持。可视化使用包括ggplot2 (Wickham, 2016)和survminer (Kassambara et al., 2020)在内的软件包实现。我们甚至可以使用DT (Xie et al., 2020)、plot (Sievert, 2020)和trelliscopejs (Hafen and Schloerke, 2020)提供数据的交互式演示。还应该认识到,基于这些临床试验数据工具的工作已经在进行中。报告(Harrell Jr, 2020)包提供了临床试验的图形摘要,并与markdown (Allaire et al., 2020)一起使用,以生成具有特定格式的特定试验报告类型。最近发布的listdown包(Kane et al., 2020)用于自动化生成可重复(RMarkdown)文档的过程。从摘要、探索或分析派生的对象按层次结构存储在R列表中,该列表定义了文档的结构。这些对象被称为计算组件,因为它们来自于计算,而不是散文,后者构成了文档的叙述组件。计算组件捕获并构造要呈现的对象。通过创建listdown对象来描述对象的呈现方式和文档的呈现方式。 计算组件的创建方式与向用户显示方式的分离提供了两个优势。首先,它将数据处理和分析与数据的探索和可视化分离开来。对于计算密集型分析,这种分离对于避免对表示中的小更改进行冗余计算至关重要。它还不鼓励将计算密集型代码放入RMarkdown文档中。其次,它提供了快速更改计算组件的可视化或汇总方式,甚至文档呈现方式的灵活性。这使得从交互式。html文档到静态。pdf文档的转换比替换R Mardown文档中的函数和参数要容易得多。研究发现,该软件包在临床试验数据的报告和研究中特别有用。特别是,该软件包已用于服务器协作,重点是分析过去的试验数据以制定新的试验,或用于试验监测,其中报告试验遥测(登记,反应等)并将初步分析传达给临床医生。相关的陈述需要很少的背景,因为临床医生通常对收集的数据有很好的理解,就像统计学家的意义叙述成分一样,不需要。同时,大量的分层的、异构的工件(表和多种类型的图)可以自动化,而手工创建RMarkdown文档将是不方便和低效的。本文的其余部分描述了在listdown包中实现的概念,用于自动化、可重复的文档生成,并展示了它与一个简化的、合成的临床试验数据集的使用,该数据集的变量是非小细胞肺癌试验的典型变量。数据集来自钳包(Kane, 2020)。在撰写本文档时,该软件包仍在开发中,在CRAN上不可用。但是,它可以按照以下方式安装。下面的小节使用试验数据来构建一个用于文档生成的管道。我们注意到,与大多数此类分析相比,数据和管道都很简单。但是,说明相关概念就足够了,并且分析和概念都可以很容易地转换为实际应用程序。最后一节讨论包的使用及其当前方向。分析数据的过程可以使用Benington(1983)的经典瀑布模型来描述,其中输出(分析表示或服务)依赖于在它之前出现的一系列任务。这种依赖结构意味着,如果在分析生产的给定阶段检测到问题,则必须重新运行所有下游部分以反映更改。瀑布模型的图形化描述,具体到数据分析(临床或其他)如图1所示。请注意,数据探索和可视化是生产所有阶段不可或缺的一部分,通常是识别问题和改进分析的手段。如前一节所述,我们将实现一个简单的分析管道。数据采集和预处理步骤通过从镊子包导入数据集并使用包中实现的一些功能来创建单个试验数据集来处理,从而减少管道中的这些组件的重要性。虽然这些步骤是至关重要的,但本文的重点是将listdown包合并到后面的阶段。数据采集和预处理数据采集是指分析管道的一部分,在此部分中,数据从某些托管数据存储中检索,以便集成到管道中。这些数据集可以作为表从数据库、病例报告、根据临床数据交换标准联盟(CDISC) (CDI, 2020)、电子健康记录或其他临床真实世界数据(RWD)格式格式化的分析数据模型(ADaM)数据中检索。然后将这些数据转换为适合分析的格式。R Journal Vol. XX/YY, AAAA ISSN 2073-4859贡献研究文章3图1:数据分析瀑布。在我们的简单示例中,这是通过加载与试验结果、患者不良事件、患者生物标志物和患者人口统计相对应的数据,并使用镊子和dplyr (Wickham et al., 2020)包将其转换为单个数据集,每个患者一行,每列一个变量来完成的。数据还包括纵向不良事件信息,这些信息将作为一个嵌套数据框架存储在结果数据集的ae_long列中。 库(forceps)库(dplyr)数据(lc_adsl, lc_adverse_events, lc_biomarkers, lc_demography) lc_trial %队列(on = "usubjid", name = "ae_long"), biomarkers = lc_biomarkers
{"title":"Automating Reproducible, Collaborative Clinical Trial Document Generation with the listdown Package","authors":"M. Kane, Xun Jiang, Simon Urbanek","doi":"10.32614/rj-2021-051","DOIUrl":"https://doi.org/10.32614/rj-2021-051","url":null,"abstract":"The conveyance of clinical trial explorations and analysis results from a statistician to a clinical investigator is a critical component to the drug development and clinical research cycle. Automating the process of generating documents for data descriptions, summaries, exploration, and analysis allows statistician to provide a more comprehensive view of the information captured by a clinical trial and efficient generation of these documents allows the statistican to focus more on the conceptual development of a trial or trial analysis and less on the implementation of the summaries and results on which decisions are made. This paper explores the use of the listdown package for automating reproducible documents in clinical trials that facilitate the collaboration between statisticians and clinicians as well as defining an analysis pipeline for document generation. Background and Introduction The conveyance of clinical trial explorations and analysis results from a statistician to a clinical investigator is an often overlooked but critical component to the drug development and clinical research cycle. Graphs, tables, and other analysis artifacts are at the nexus of these collaborations. They facilitate identifying problems and bugs in the data preparation and processing stage, they help to build an intuitive understanding of mechanisms of disease and their treatment, they elucidate prognostic and predictive relationships, they provide insight that results in new hypotheses, and they convince researchers of analyses testing hypotheses. Despite their importance, the process of generating these artifacts is usually done in an ad-hoc manner. This is partially because of the nuance and diversity of the hypotheses and scientific questions being interrogated and, to a lesser degree, the variation in clinical data formatting. The usual process usually has a statistician providing a standard set of artifacts, receiving feedback, and providing an updates based on feedback. Work performed for one trial is rarely leveraged on others and as a result, a large amount of work needs to be reproduced for each trial. There are two glaring problems with this approach. First, each analysis of a trial requires a substantial amount of error-prone work. While the variation between trials means some work needs to be done for preparation, exploration, and analysis, there are many aspects of these trials that could be better automated resulting in greater efficiency and accuracy. Second, because this work is challenging, it often occupies the majority of the statisticians effort. Less time is spent on trial design and analysis and the this portion is taken up by a clinician who often has less expertise with the statistical aspects of the trial. As a result, the extra effort spent on processing data undermines statisticians role as a collaborator and relegates them to service provider. Need tools leveraging existing work to more efficiently provide holistic views on trials ","PeriodicalId":20974,"journal":{"name":"R J.","volume":"1 1","pages":"556"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90848279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Dependence between Point Processes in Time Using IndTestPP 用IndTestPP分析点进程间的时间依赖性
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-049
A. Cebrián, J. Asín
The need to analyze the dependence between two or more point processes in time appears in many modeling problems related to the occurrence of events, such as the occurrence of climate events at different spatial locations or synchrony detection in spike train analysis. The package IndTestPP provides a general framework for all the steps in this type of analysis, and one of its main features is the implementation of three families of tests to study independence given the intensities of the processes, which are not only useful to assess independence but also to identify factors causing dependence. The package also includes functions for generating different types of dependent point processes, and implements computational statistical inference tools using them. An application to characterize the dependence between the occurrence of extreme heat events in three Spanish locations using the package is shown.
在许多与事件发生相关的建模问题中,都需要分析两个或多个点过程之间在时间上的依赖关系,例如气候事件在不同空间位置的发生或尖峰序列分析中的同步检测。IndTestPP包为这类分析的所有步骤提供了一个总体框架,其主要特点之一是实施了三组测试,根据流程的强度来研究独立性,这不仅有助于评估独立性,而且有助于确定导致依赖性的因素。该包还包括用于生成不同类型的相关点过程的函数,并使用它们实现计算统计推断工具。一个应用程序,以表征极端热事件的发生之间的依赖关系,在三个西班牙地点使用包显示。
{"title":"Analyzing Dependence between Point Processes in Time Using IndTestPP","authors":"A. Cebrián, J. Asín","doi":"10.32614/rj-2021-049","DOIUrl":"https://doi.org/10.32614/rj-2021-049","url":null,"abstract":"The need to analyze the dependence between two or more point processes in time appears in many modeling problems related to the occurrence of events, such as the occurrence of climate events at different spatial locations or synchrony detection in spike train analysis. The package IndTestPP provides a general framework for all the steps in this type of analysis, and one of its main features is the implementation of three families of tests to study independence given the intensities of the processes, which are not only useful to assess independence but also to identify factors causing dependence. The package also includes functions for generating different types of dependent point processes, and implements computational statistical inference tools using them. An application to characterize the dependence between the occurrence of extreme heat events in three Spanish locations using the package is shown.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"48 1","pages":"499"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90901395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
miRecSurv Package: Prentice-Williams-Peterson Models with Multiple Imputation of Unknown Number of Previous Episodes miRecSurv包:Prentice-Williams-Peterson模型与先前事件的未知数量的多重Imputation
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-082
D. Moriña, G. Hernández-Herrera, A. Navarro
Left censoring can occur with relative frequency when analysing recurrent events in epidemiological studies, especially observational ones. Concretely, the inclusion of individuals that were already at risk before the effective initiation in a cohort study, may cause the unawareness of prior episodes that have already been experienced, and this will easily lead to biased and inefficient estimates. The miRecSurv package is based on the use of models with specific baseline hazard, with multiple imputation of the number of prior episodes when unknown by means of the COMPoisson distribution, a very flexible count distribution that can handle over-, suband equidispersion, with a stratified model depending on whether the individual had or had not previously been at risk, and the use of a frailty term. The usage of the package is illustrated by means of a real data example based on a occupational cohort study and a simulation study.
在分析流行病学研究中的复发事件时,特别是在观察性研究中,左侧审查可以相对频繁地发生。具体地说,在队列研究中纳入在有效开始之前已经处于危险中的个体,可能会导致对已经经历过的先前事件的不了解,这很容易导致有偏见和低效的估计。miRecSurv包是基于使用具有特定基线危险的模型,通过COMPoisson分布对未知的先前发作次数进行多次代入,COMPoisson分布是一种非常灵活的计数分布,可以处理过分散、次分散和等分散,分层模型取决于个体以前是否存在风险,并使用脆弱性术语。通过一个基于职业队列研究和模拟研究的真实数据实例说明了该软件包的使用方法。
{"title":"miRecSurv Package: Prentice-Williams-Peterson Models with Multiple Imputation of Unknown Number of Previous Episodes","authors":"D. Moriña, G. Hernández-Herrera, A. Navarro","doi":"10.32614/rj-2021-082","DOIUrl":"https://doi.org/10.32614/rj-2021-082","url":null,"abstract":"Left censoring can occur with relative frequency when analysing recurrent events in epidemiological studies, especially observational ones. Concretely, the inclusion of individuals that were already at risk before the effective initiation in a cohort study, may cause the unawareness of prior episodes that have already been experienced, and this will easily lead to biased and inefficient estimates. The miRecSurv package is based on the use of models with specific baseline hazard, with multiple imputation of the number of prior episodes when unknown by means of the COMPoisson distribution, a very flexible count distribution that can handle over-, suband equidispersion, with a stratified model depending on whether the individual had or had not previously been at risk, and the use of a frailty term. The usage of the package is illustrated by means of a real data example based on a occupational cohort study and a simulation study.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"36 1","pages":"321"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82942226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
clustcurv: An R Package for Determining Groups in Multiple Curves 一个R包,用于确定多曲线中的群
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-032
Nora M. Villanueva, M. Sestelo, Luís Meira-Machado, J. Roca-Pardiñas
In many situations it could be interesting to ascertain whether groups of curves can be performed, especially when confronted with a considerable number of curves. This paper introduces an R package, known as clustcurv, for determining clusters of curves with an automatic selection of their number. The package can be used for determining groups in multiple survival curves as well as for multiple regression curves. Moreover, it can be used with large numbers of curves. An illustration of the use of clustcurv is provided, using both real data examples and artificial data.
在许多情况下,确定是否可以执行一组曲线是很有趣的,特别是当面对相当多的曲线时。本文介绍了一个R包,称为clustcurv,用于确定曲线簇,并自动选择它们的数量。该包可用于确定多个生存曲线中的组,也可用于确定多个回归曲线中的组。此外,它可以用于大量的曲线。本文用实际数据和人工数据对聚类曲线的应用进行了说明。
{"title":"clustcurv: An R Package for Determining Groups in Multiple Curves","authors":"Nora M. Villanueva, M. Sestelo, Luís Meira-Machado, J. Roca-Pardiñas","doi":"10.32614/rj-2021-032","DOIUrl":"https://doi.org/10.32614/rj-2021-032","url":null,"abstract":"In many situations it could be interesting to ascertain whether groups of curves can be performed, especially when confronted with a considerable number of curves. This paper introduces an R package, known as clustcurv, for determining clusters of curves with an automatic selection of their number. The package can be used for determining groups in multiple survival curves as well as for multiple regression curves. Moreover, it can be used with large numbers of curves. An illustration of the use of clustcurv is provided, using both real data examples and artificial data.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"191 1","pages":"164"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77621929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RPESE: Risk and Performance Estimators Standard Errors with Serially Dependent Data 风险和绩效评估与序列相关数据的标准误差
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-106
A. Christidis, R. Martin
The Risk and Performance Estimators Standard Errors package RPESE implements a new method for computing accurate standard errors of risk and performance estimators when returns are serially dependent. The new method makes use of the representation of a risk or performance estimator as a summation of a time series of influence-function (IF) transformed returns, and computes estimator standard errors using a sophisticated method of estimating the spectral density at frequency zero of the time series of IF-transformed returns. Two additional packages used by RPESE are introduced, namely RPEIF which computes and provides graphical displays of the IF of risk and performance estimators, and RPEGLMEN which implements a regularized Gamma generalized linear model polynomial fit to the periodogram of the time series of the IF-transformed returns. A Monte Carlo study shows that the new method provides more accurate estimates of standard errors for risk and performance estimators compared to well-known alternative methods in the presence of serial correlation.
风险和绩效估计器标准误差包rpse实现了一种新的方法,用于在收益序列相关时计算风险和绩效估计器的准确标准误差。新方法将风险或业绩估计量表示为影响函数(IF)转换收益的时间序列的总和,并使用一种估计IF转换收益时间序列频率为零的谱密度的复杂方法计算估计量标准误差。介绍了rpeese使用的两个附加软件包,即计算并提供风险和业绩估计器的IF的图形显示的RPEIF,以及实现正则化伽玛广义线性模型多项式拟合的周期图的RPEGLMEN。一项蒙特卡罗研究表明,在存在序列相关性的情况下,与已知的替代方法相比,新方法为风险和性能估计器提供了更准确的标准误差估计。
{"title":"RPESE: Risk and Performance Estimators Standard Errors with Serially Dependent Data","authors":"A. Christidis, R. Martin","doi":"10.32614/rj-2021-106","DOIUrl":"https://doi.org/10.32614/rj-2021-106","url":null,"abstract":"The Risk and Performance Estimators Standard Errors package RPESE implements a new method for computing accurate standard errors of risk and performance estimators when returns are serially dependent. The new method makes use of the representation of a risk or performance estimator as a summation of a time series of influence-function (IF) transformed returns, and computes estimator standard errors using a sophisticated method of estimating the spectral density at frequency zero of the time series of IF-transformed returns. Two additional packages used by RPESE are introduced, namely RPEIF which computes and provides graphical displays of the IF of risk and performance estimators, and RPEGLMEN which implements a regularized Gamma generalized linear model polynomial fit to the periodogram of the time series of the IF-transformed returns. A Monte Carlo study shows that the new method provides more accurate estimates of standard errors for risk and performance estimators compared to well-known alternative methods in the presence of serial correlation.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"74 1","pages":"624"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82234110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The R Quest: from Users to Developers R任务:从用户到开发者
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-111
Simon Urbanek
R is not a programming language, and this produces the inherent dichotomy between analytics and software engineering. With the emergence of data science, the opportunity exists to bridge this gap, especially through teaching practices. Genesis: How did we get here? The article “Software Engineering and R Programming: A Call to Action” summarizes the dichotomy between analytics and software engineering in the R ecosystem, provides examples where this leads to problems and proposes what we as R users can do to bridge the gap. Data Analytic Language The fundamental basis of the dichotomy is inherent in the evolution of S and R: they are not programming languages, but they ended up being mistaken for such. S was designed to be a data analytic language: to turn ideas into software quickly and faithfully, often used in “non-programming” style (Chambers, 1998). Its original goal was to enable the statisticians to apply code which was written in programming languages (at the time mostly FORTRAN) to analyze data quickly and interactively for some suitable definition of “interactive” at the time (Becker, 1994). The success of S and then R can be traced to the ability to perform data analysis by applying existing tools to data in creative ways. A data analysis is a quest at every step we learn more about the data which informs our decision about next steps. Whether it is an exploratory data analysis leveraging graphics or computing statistics or fitting models the final goal is typically not known ahead of time, it is obtained by an iterative process of applying tools that we as analysts think may lead us further (Tukey, 1977). It is important to note that this is exactly the opposite of software engineering where there is a well-defined goal: a specification or desired outcome, which simply needs to be expressed in a way understandable to the computer.
R不是一种编程语言,这就产生了分析和软件工程之间固有的二分法。随着数据科学的出现,存在着弥合这一差距的机会,特别是通过教学实践。创世纪:我们是怎么来到这里的?“软件工程和R编程:行动呼吁”这篇文章总结了R生态系统中分析和软件工程之间的二分法,提供了导致问题的例子,并建议我们作为R用户可以做些什么来弥合差距。这种二分法的基本基础是S和R的演变所固有的:它们不是编程语言,但它们最终被误认为是编程语言。S被设计成一种数据分析语言:将想法快速而忠实地转化为软件,通常以“非编程”风格使用(Chambers, 1998)。它最初的目标是使统计学家能够应用用编程语言(当时主要是FORTRAN)编写的代码来快速和交互式地分析数据,以获得当时“交互式”的一些合适定义(Becker, 1994)。S和R的成功可以追溯到通过创造性地将现有工具应用于数据来执行数据分析的能力。数据分析是一种探索,在每一步中我们都了解更多关于数据的信息,从而为我们下一步的决策提供信息。无论是利用图形或计算统计或拟合模型的探索性数据分析,最终目标通常是事先不知道的,它是通过应用工具的迭代过程获得的,我们作为分析师认为这些工具可能会引导我们走得更远(Tukey, 1977)。重要的是要注意,这与软件工程完全相反,软件工程有一个定义良好的目标:一个规范或期望的结果,只需要用计算机可以理解的方式来表达。
{"title":"The R Quest: from Users to Developers","authors":"Simon Urbanek","doi":"10.32614/rj-2021-111","DOIUrl":"https://doi.org/10.32614/rj-2021-111","url":null,"abstract":"R is not a programming language, and this produces the inherent dichotomy between analytics and software engineering. With the emergence of data science, the opportunity exists to bridge this gap, especially through teaching practices. Genesis: How did we get here? The article “Software Engineering and R Programming: A Call to Action” summarizes the dichotomy between analytics and software engineering in the R ecosystem, provides examples where this leads to problems and proposes what we as R users can do to bridge the gap. Data Analytic Language The fundamental basis of the dichotomy is inherent in the evolution of S and R: they are not programming languages, but they ended up being mistaken for such. S was designed to be a data analytic language: to turn ideas into software quickly and faithfully, often used in “non-programming” style (Chambers, 1998). Its original goal was to enable the statisticians to apply code which was written in programming languages (at the time mostly FORTRAN) to analyze data quickly and interactively for some suitable definition of “interactive” at the time (Becker, 1994). The success of S and then R can be traced to the ability to perform data analysis by applying existing tools to data in creative ways. A data analysis is a quest at every step we learn more about the data which informs our decision about next steps. Whether it is an exploratory data analysis leveraging graphics or computing statistics or fitting models the final goal is typically not known ahead of time, it is obtained by an iterative process of applying tools that we as analysts think may lead us further (Tukey, 1977). It is important to note that this is exactly the opposite of software engineering where there is a well-defined goal: a specification or desired outcome, which simply needs to be expressed in a way understandable to the computer.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"475 1","pages":"697"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79938019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
openSkies - Integration of Aviation Data into the R Ecosystem openSkies -将航空数据集成到R生态系统中
Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-095
Rafael Ayala, D. Ayala, L. S. Vidal, David Ruiz
Aviation data has become increasingly more accessible to the public thanks to the adoption of technologies such as Automatic Dependent Surveillance-Broadcast (ADS-B) and Mode S, which provide aircraft information over publicly accessible radio channels. Furthermore, the OpenSky Network provides multiple public resources to access such air traffic data from a large network of ADS-B receivers. Here, we present openSkies , the first R package for processing public air traffic data. The package provides an interface to the OpenSky Network resources, standardized data structures to represent the different entities involved in air traffic data and functionalities to analyze and visualize such data. Furthermore, the portability of the implemented data structures makes openSkies easily reusable by other packages, therefore laying the foundation of aviation data engineering in R.
接下来,介绍了目前可用的可视化(空中可视化)工具(R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859贡献研究文章2交通)和聚类(飞机轨迹聚类)。最后,提出了一种ADS-B原始报文解码工具(解码ADS-B报文)。在不同的部分中提供了带有实际数据的示例,以说明该包的特性。为了建立标准化的数据结构,作为未来在R中发展空中交通分析的基础,实现了一组代表经常涉及实体的R6类。之所以选择R6作为类系统,是因为它可以建立引用类的正式定义,而且与基R的引用类系统相比,R6的内存占用更小,对象实例化、字段访问和字段设置的速度更快。此外,R6类是可移植的,因此openSkies中定义的类可以被其他包使用。在其当前版本(1.1.3)中,openSkies为以下实体定义了类:飞机、机场、飞行实例、飞行路线、单个飞机状态向量和一系列飞机状态向量(图1)。
{"title":"openSkies - Integration of Aviation Data into the R Ecosystem","authors":"Rafael Ayala, D. Ayala, L. S. Vidal, David Ruiz","doi":"10.32614/rj-2021-095","DOIUrl":"https://doi.org/10.32614/rj-2021-095","url":null,"abstract":"Aviation data has become increasingly more accessible to the public thanks to the adoption of technologies such as Automatic Dependent Surveillance-Broadcast (ADS-B) and Mode S, which provide aircraft information over publicly accessible radio channels. Furthermore, the OpenSky Network provides multiple public resources to access such air traffic data from a large network of ADS-B receivers. Here, we present openSkies , the first R package for processing public air traffic data. The package provides an interface to the OpenSky Network resources, standardized data structures to represent the different entities involved in air traffic data and functionalities to analyze and visualize such data. Furthermore, the portability of the implemented data structures makes openSkies easily reusable by other packages, therefore laying the foundation of aviation data engineering in R.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"1 1","pages":"485"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89877538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
R J.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1