R J.最新文献

英文中文

ROBustness In Network (robin): an R Package for Comparison and Validation of Communities 网络中的鲁棒性(robin):一个用于社区比较和验证的R包

R J.

Pub Date : 2021-02-05 DOI: 10.32614/rj-2021-040

V. Policastro, D. Righelli, A. Carissimo, L. Cutillo, I. Feis

In network analysis, many community detection algorithms have been developed, however, their implementation leaves unaddressed the question of the statistical validation of the results. Here we present robin(ROBustness In Network), an R package to assess the robustness of the community structure of a network found by one or more methods to give indications about their reliability. The procedure initially detects if the community structure found by a set of algorithms is statistically significant and then compares two selected detection algorithms on the same graph to choose the one that better fits the network of interest. We demonstrate the use of our package on the American College Football benchmark dataset.

在网络分析中，已经开发了许多社区检测算法，然而，它们的实现没有解决结果的统计验证问题。在这里，我们提出robin(鲁棒性网络)，这是一个R包，用于评估通过一种或多种方法发现的网络社区结构的鲁棒性，以给出其可靠性的指示。该过程首先检测由一组算法发现的社区结构是否具有统计显著性，然后在同一图上比较选定的两种检测算法，以选择更适合感兴趣网络的算法。我们在美国大学橄榄球基准数据集上演示了我们的包的使用。

引用次数: 3

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R R中非线性和非高斯状态空间模型的贝叶斯推理

R J.

Pub Date : 2021-01-21 DOI: 10.32614/RJ-2021-103

Jouni Helske, M. Vihola

We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman ﬁlter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers an easy-to-use interface to deﬁne models with linear-Gaussian state dynamics with non-Gaussian observation models, and has an Rcpp interface for specifying custom non-linear and diffusion models. models are a ﬂexible tool for analysing a variety of time series data. Here we introduced the R package bssm for fully Bayesian state space modelling for a large class of models with several alternative MCMC sampling strategies. All computationally intensive parts of the package are

我们提出了一个R包bssm用于贝叶斯非线性/非高斯状态空间建模。与现有的软件包不同，bssm允许基于高斯近似(如拉普拉斯近似和扩展卡尔曼滤波器)的易于使用的近似推理。包也容纳离散观察潜在扩散过程。该推理基于超参数的全自动自适应马尔可夫链蒙特卡罗(MCMC)，并具有可选的重要性采样后校正以消除任何近似偏差。该包还使用中间近似实现了直接伪边际MCMC和延迟接受伪边际MCMC。该软件包提供了一个易于使用的界面来定义具有非高斯观测模型的线性-高斯状态动力学模型，并具有用于指定自定义非线性和扩散模型的Rcpp接口。模型是分析各种时间序列数据的灵活工具。在这里，我们介绍了R包bssm，用于对具有几种可选MCMC采样策略的大型模型进行完全贝叶斯状态空间建模。该包的所有计算密集型部分都是

引用次数: 6

Finding Optimal Normalizing Transformations via bestNormalize 通过bestNormalize找到最优的规范化转换

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-041

Ryan A. Peterson

The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.

bestNormalize R包的设计目的是帮助用户找到一个变换，它可以有效地规范化一个向量，而不管它的实际分布如何。已经开发的许多标准化技术中的每一种都有自己的优点和缺点，在完全观察到数据之前决定使用哪一种是困难的或不可能的。这个包有助于在一系列可能的转换之间进行选择，并将自动返回最佳转换，即使数据看起来最正常的转换。为了评估和比较一组可能转换的归一化效果，我们开发了一个基于拟合优度检验除以其自由度的统计量。转换可以无缝地训练并应用于新观察到的数据，并且可以与机器学习工作流程中的数据预处理的插入符号和配方一起实现。支持自定义转换和规范化统计。

引用次数: 111

A New Versatile Discrete Distribution 一个新的多用途离散分布

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-067

Rolf Turner

This paper introduces a new flexible distribution for discrete data. Approximate moment estimators of the parameters of the distribution, to be used as starting values for numerical optimization procedures, are discussed. “Exact” moment estimation, effected via a numerical procedure, and maximum likelihood estimation, are considered. The quality of the results produced by these estimators is assessed via simulation experiments. Several examples are given of fitting instances of the new distribution to real and simulated data. It is noted that the new distribution is a member of the exponential family. Expressions for the gradient and Hessian of the log-likelihood of the new distribution are derived. The former facilitates the numerical maximization of the likelihood with optim(); the latter provides means of calculating or estimating the covariance matrix of of the parameter estimates. A discrepancy between estimates of the covariance matrix obtained by inverting the Hessian and those obtained by Monte Carlo methods is discussed.

本文介绍了一种新的离散数据柔性分布。讨论了作为数值优化过程起始值的分布参数的近似矩估计量。考虑了通过数值过程实现的“精确”矩估计和最大似然估计。通过仿真实验对这些估计器产生的结果的质量进行了评估。文中给出了对实际数据和模拟数据的拟合实例。值得注意的是，新分布是指数族的一个成员。导出了新分布的对数似然梯度和Hessian的表达式。前者便于利用optimtim()实现似然值的数值最大化;后者提供了计算或估计参数估计的协方差矩阵的方法。讨论了用黑森法求逆得到的协方差矩阵估计与用蒙特卡罗法得到的协方差矩阵估计之间的差异。

引用次数: 1

We Need Trustworthy R Packages 我们需要值得信赖的R包

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-109

Will Landau

引用次数: 0

dad: an R Package for Visualisation, Classification and Discrimination of Multivariate Groups Modelled by their Densities dad:一个用其密度建模的多变量群的可视化、分类和判别的R包

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-071

R. Boumaza, Pierre Santagostini, Smail Yousfi, S. Demotes-Mainard

Multidimensional scaling (MDS), hierarchical cluster analysis (HCA) and discriminant analysis (DA) are classical techniques which deal with data made of n individuals and p variables. When the individuals are divided into T groups, the R package dad associates with each group a multivariate probability density function and then carries out these techniques on the densities which are estimated by the data under consideration. These techniques are based on distance measures between densities: chi-square, Hellinger, Jeffreys, Jensen-Shannon and L p for discrete densities, Hellinger , Jeffreys, L 2 and 2-Wasserstein for Gaussian densities, and L 2 for numeric non Gaussian densities estimated by the Gaussian kernel method. Practical methods help the user to give meaning to the outputs in the context of MDS and HCA, and to look for an optimal prediction in the context of DA based on the one-leave-out misclassiﬁcation ratio. Some functions for data management or basic statistics calculations on groups are annexed.

多维尺度分析(MDS)、层次聚类分析(HCA)和判别分析(DA)是处理由n个个体和p个变量组成的数据的经典技术。当个体被分成T组时，R包将每个组关联一个多变量概率密度函数，然后对所考虑的数据估计的密度执行这些技术。这些技术基于密度之间的距离度量:离散密度用卡方、Hellinger、Jeffreys、Jensen-Shannon和L p，高斯密度用Hellinger、Jeffreys、l2和2- wasserstein，高斯核方法估计的数值非高斯密度用l2。实用的方法帮助用户在MDS和HCA的背景下赋予输出意义，并在DA的背景下基于一次遗漏错分类率寻找最佳预测。附件提供了一些数据管理或组的基本统计计算功能。

引用次数: 0

g2f as a Novel Tool to Find and Fill Gaps in Metabolic Networks g2f作为发现和填补代谢网络空白的新工具

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-064

Daniel Osório, Kelly Botero, A. Velasco, Nicolás Mendoza-Mejía, Felipe Rojas-Rodríguez, G. Barreto, Janneth González

During the building of a genome-scale metabolic model, there are several dead-end metabolites and substrates which cannot be imported, produced nor used by any reaction incorporated in the network. The presence of these dead-end metabolites can block out the net flux of the objective function when it is evaluated through Flux Balance Analysis (FBA), and when it is not blocked, bias in the biological conclusions increase. In this aspect, the refinement to restore the connectivity of the network can be carried out manually or using computational algorithms. The g2f package was designed as a tool to find the gaps from dead-end metabolites and fill them from the stoichiometric reactions of a reference, filtering candidate reactions using a weighting function. Additionally, this algorithm allows to download all the set of gene-associated stoichiometric reactions for a specific organism from the KEGG database. Our package is compatible with both 4.0.0 and 3.6.0 R versions.

在基因组尺度代谢模型的构建过程中，存在一些终端代谢物和底物，它们不能被纳入网络的任何反应输入、产生或利用。当通过通量平衡分析(FBA)评估目标函数时，这些终端代谢物的存在会阻断目标函数的净通量，当不阻断时，生物学结论的偏差会增加。在这方面，恢复网络连通性的细化可以手工进行，也可以使用计算算法。g2f包被设计为一种工具，用于从参考的化学计量反应中找到终端代谢物的空白，并填补它们，使用加权函数过滤候选反应。此外，该算法允许从KEGG数据库中下载特定生物体的所有基因相关化学计量反应集。我们的包兼容4.0.0和3.6.0 R版本。

引用次数: 0

The bdpar Package: Big Data Pipelining Architecture for R bdpar包:面向R的大数据流水线架构

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-065

Miguel Ferreiro-Díaz, T. Cotos-Yáñez, J. R. Méndez, David Ruano-Ordás

In the last years, big data has become a useful paradigm for taking advantage of multiple sources to find relevant knowledge in real domains (such as the design of personalized marketing campaigns or helping to palliate the effects of several mortal diseases). Big data programming tools and methods have evolved over time from a MapReduce to a pipeline-based archetype. Concretely the use of pipelining schemes has become the most reliable way of processing and analysing large amounts of data. To this end, this work introduces bdpar, a new highly customizable pipeline-based framework (using the OOP paradigm provided by R6 package) able to execute multiple pre-processing tasks over heterogeneous data sources. Moreover, to increase the flexibility and performance, bdpar provides helpful features such as (i) the definition of a novel object-based pipe operator (%>|%), (ii) the ability to easily design and deploy new (and customized) input data parsers, tasks and pipelines, (iii) only-once execution which avoids the execution of previously processed information (instances) guaranteeing that only new both input data and pipelines are executed, (iv) the capability to perform serial or parallel operations according to the user needs, (v) the inclusion of a debugging mechanism which allows users to check the status of each instance (and find possible errors) throughout the process.

在过去几年中，大数据已经成为一种有用的范例，可以利用多种来源在实际领域(例如个性化营销活动的设计或帮助减轻几种致命疾病的影响)中找到相关知识。随着时间的推移，大数据编程工具和方法已经从MapReduce发展到基于管道的原型。具体地说，流水线方案的使用已经成为处理和分析大量数据的最可靠的方法。为此，本工作引入了bdpar，这是一种新的高度可定制的基于管道的框架(使用R6包提供的OOP范例)，能够在异构数据源上执行多个预处理任务。此外，为了提高灵活性和性能，bdpar提供了一些有用的特性，例如(i)定义了一个新的基于对象的管道操作符(%>|%)，(ii)能够轻松地设计和部署新的(和定制的)输入数据解析器、任务和管道，(iii)只执行一次，避免执行先前处理过的信息(实例)，保证只执行新的输入数据和管道。(iv)按用户需要进行串行或并行操作的能力;(v)包括调试机制，允许用户在整个过程中检查每个实例的状态(并发现可能的错误)。

{"title":"The bdpar Package: Big Data Pipelining Architecture for R","authors":"Miguel Ferreiro-Díaz, T. Cotos-Yáñez, J. R. Méndez, David Ruano-Ordás","doi":"10.32614/rj-2021-065","DOIUrl":"https://doi.org/10.32614/rj-2021-065","url":null,"abstract":"In the last years, big data has become a useful paradigm for taking advantage of multiple sources to find relevant knowledge in real domains (such as the design of personalized marketing campaigns or helping to palliate the effects of several mortal diseases). Big data programming tools and methods have evolved over time from a MapReduce to a pipeline-based archetype. Concretely the use of pipelining schemes has become the most reliable way of processing and analysing large amounts of data. To this end, this work introduces bdpar, a new highly customizable pipeline-based framework (using the OOP paradigm provided by R6 package) able to execute multiple pre-processing tasks over heterogeneous data sources. Moreover, to increase the flexibility and performance, bdpar provides helpful features such as (i) the definition of a novel object-based pipe operator (%>|%), (ii) the ability to easily design and deploy new (and customized) input data parsers, tasks and pipelines, (iii) only-once execution which avoids the execution of previously processed information (instances) guaranteeing that only new both input data and pipelines are executed, (iv) the capability to perform serial or parallel operations according to the user needs, (v) the inclusion of a debugging mechanism which allows users to check the status of each instance (and find possible errors) throughout the process.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"43 1","pages":"130"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77507422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A GUIded tour of Bayesian regression 贝叶斯回归的导览

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-081

A. Ramírez‐Hassan, Mateo Graciano-Londoño

This paper presents a graphical user interface (GUI) to carry out a Bayesian regression analysis in a very friendly environment without any programming skills (drag and drop). This paper is designed for teaching and applied purposes at an introductory level. Our GUI is based on an interactive web application using shiny, and libraries from R. We carry out some applications to highlight the potential of our GUI for applied researchers and practitioners. In addition, the Help option in the main tap panel has an extended version of this paper, where we present the basic theory underlying all regression models that we developed in our GUI, and more applications associated with each model.

本文提供了一个图形用户界面(GUI)，可以在一个非常友好的环境中执行贝叶斯回归分析，而无需任何编程技能(拖放)。这篇论文是为入门级的教学和应用目的而设计的。我们的GUI是基于一个交互式的web应用程序，使用shiny和r中的库。我们执行了一些应用程序，以突出我们的GUI对应用研究人员和从业者的潜力。此外，主点击面板中的帮助选项有本文的扩展版本，其中我们展示了我们在GUI中开发的所有回归模型的基本理论，以及与每个模型相关的更多应用程序。

引用次数: 0

BayesSPsurv: An R Package to Estimate Bayesian (Spatial) Split-Population Survival Models BayesSPsurv:一个估计贝叶斯(空间)分裂种群生存模型的R包

R J.

Pub Date : 2021-01-01 DOI: 10.32614/rj-2021-068

Brandon Bolte, N. Schmidt, Sergio Béjar, N. Huynh, Bumba Mukherjee

Survival data often include a fraction of units that are susceptible to an event of interest as well as a fraction of “immune” units. In many applications, spatial clustering in unobserved risk factors across nearby units can also affect their survival rates and odds of becoming immune. To address these methodological challenges, this article introduces our BayesSPsurv R-package, which fits parametric Bayesian Spatial split-population survival (cure) models that can account for spatial autocorrelation in both subpopulations of the user’s time-to-event data. Spatial autocorrelation is modeled with spatially weighted frailties, which are estimated using a conditionally autoregressive prior. The user can also fit parametric cure models with or without non-spatial i.i.d. frailties, and each model can incorporate time-varying covariates. BayesSPsurv also includes various functions to conduct pre-estimation spatial autocorrelation tests, visualize results, and assess model performance, all of which are illustrated using data on post-civil war peace survival.

生存数据通常包括一小部分易受感兴趣事件影响的单位以及一小部分“免疫”单位。在许多应用中，未观察到的风险因素在附近单位的空间聚类也会影响它们的存活率和免疫几率。为了解决这些方法上的挑战，本文介绍了我们的BayesSPsurv r包，它适合参数贝叶斯空间分裂种群生存(cure)模型，该模型可以解释用户时间到事件数据的两个子种群中的空间自相关性。空间自相关用空间加权脆弱性建模，利用条件自回归先验估计空间加权脆弱性。用户还可以拟合具有或不具有非空间i.i.d脆弱性的参数化治愈模型，并且每个模型都可以包含时变协变量。BayesSPsurv还包括各种功能，用于进行预估计空间自相关测试，可视化结果和评估模型性能，所有这些都使用内战后和平生存的数据进行说明。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

R J.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀