首页 > 最新文献

arXiv - STAT - Other Statistics最新文献

英文 中文
Interpret the estimand framework from a causal inference perspective 从因果推理的角度解释估计值框架
Pub Date : 2024-06-29 DOI: arxiv-2407.00292
Jinghong Zeng
The estimand framework proposed by ICH in 2017 has brought fundamentalchanges in the pharmaceutical industry. It clearly describes how a treatmenteffect in a clinical question should be precisely defined and estimated,through attributes including treatments, endpoints and intercurrent events.However, ideas around the estimand framework are commonly in text, anddifferent interpretations on this framework may exist. This article aims tointerpret the estimand framework through its underlying theories, the causalinference framework based on potential outcomes. The statistical origin andformula of an estimand is given through the causal inference framework, withall attributes translated into statistical terms. How five strategies proposedby ICH to analyze intercurrent events are incorporated in the statisticalformula of an estimand is described, and a new strategy to analyze intercurrentevents is also suggested. The roles of target populations and analysis sets inthe estimand framework are compared and discussed based on the statisticalformula of an estimand. This article recommends continuing study of causalinference theories behind the estimand framework and improving the estimandframework with greater methodological comprehensibility and availability.
2017 年 ICH 提出的估计值框架给制药行业带来了根本性的变革。它明确描述了临床问题中的治疗效果应如何通过包括治疗、终点和并发事件在内的属性进行精确定义和估算。然而,围绕estimand框架的观点常见于文本中,对这一框架可能存在不同的解释。本文旨在通过其基础理论--基于潜在结果的因果推断框架--来解释估计值框架。通过因果推理框架给出估计值的统计起源和公式,并将所有属性转化为统计术语。介绍了如何将非物质文化遗产提出的五种分析并发事件的策略纳入估算对象的统计公式,并提出了一种分析并发事件的新策略。根据估算指标的统计公式,比较并讨论了目标人群和分析集在估算指标框架中的作用。本文建议继续研究估计指标框架背后的因果推断理论,并改进估计指标框架,使其在方法上更加易懂和可用。
{"title":"Interpret the estimand framework from a causal inference perspective","authors":"Jinghong Zeng","doi":"arxiv-2407.00292","DOIUrl":"https://doi.org/arxiv-2407.00292","url":null,"abstract":"The estimand framework proposed by ICH in 2017 has brought fundamental\u0000changes in the pharmaceutical industry. It clearly describes how a treatment\u0000effect in a clinical question should be precisely defined and estimated,\u0000through attributes including treatments, endpoints and intercurrent events.\u0000However, ideas around the estimand framework are commonly in text, and\u0000different interpretations on this framework may exist. This article aims to\u0000interpret the estimand framework through its underlying theories, the causal\u0000inference framework based on potential outcomes. The statistical origin and\u0000formula of an estimand is given through the causal inference framework, with\u0000all attributes translated into statistical terms. How five strategies proposed\u0000by ICH to analyze intercurrent events are incorporated in the statistical\u0000formula of an estimand is described, and a new strategy to analyze intercurrent\u0000events is also suggested. The roles of target populations and analysis sets in\u0000the estimand framework are compared and discussed based on the statistical\u0000formula of an estimand. This article recommends continuing study of causal\u0000inference theories behind the estimand framework and improving the estimand\u0000framework with greater methodological comprehensibility and availability.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractal dimension, and the problems traps of its estimation 分形维度及其估算的问题陷阱
Pub Date : 2024-06-27 DOI: arxiv-2406.19885
Carlos Sevcik
This chapter deals with error and uncertainty in data. Treats their measuringmethods and meaning. It shows that uncertainty is a natural property of manydata sets. Uncertainty is fundamental for the survival os living species,Uncertainty of the "chaos" type occurs in many systems, is fundamental tounderstand these systems.
本章讨论数据中的误差和不确定性。讨论了它们的测量方法和含义。它表明,不确定性是许多数据集的自然属性。不确定性是生物物种生存的基础,"混沌 "类型的不确定性出现在许多系统中,是理解这些系统的基础。
{"title":"Fractal dimension, and the problems traps of its estimation","authors":"Carlos Sevcik","doi":"arxiv-2406.19885","DOIUrl":"https://doi.org/arxiv-2406.19885","url":null,"abstract":"This chapter deals with error and uncertainty in data. Treats their measuring\u0000methods and meaning. It shows that uncertainty is a natural property of many\u0000data sets. Uncertainty is fundamental for the survival os living species,\u0000Uncertainty of the \"chaos\" type occurs in many systems, is fundamental to\u0000understand these systems.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer Learning for High Dimensional Robust Regression 高维鲁棒回归的迁移学习
Pub Date : 2024-06-25 DOI: arxiv-2406.17567
Xiaohui Yuan, Shujie Ren
Transfer learning has become an essential technique for utilizing informationfrom source datasets to improve the performance of the target task. However, inthe context of high-dimensional data, heterogeneity arises due toheteroscedastic variance or inhomogeneous covariate effects. To solve thisproblem, this paper proposes a robust transfer learning based on the Huberregression, specifically designed for scenarios where the transferable sourcedata set is known. This method effectively mitigates the impact of dataheteroscedasticity, leading to improvements in estimation and predictionaccuracy. Moreover, when the transferable source data set is unknown, the paperintroduces an efficient detection algorithm to identify informative sources.The effectiveness of the proposed method is proved through numerical simulationand empirical analysis using superconductor data.
迁移学习已成为利用源数据集信息提高目标任务性能的一项基本技术。然而,在高维数据的背景下,由于异速方差或不均匀的协变量效应,会产生异质性。为了解决这个问题,本文提出了一种基于 Huber 回归的稳健迁移学习方法,专门用于已知可迁移源数据集的情况。这种方法有效地减轻了数据异方差的影响,从而提高了估计和预测的准确性。此外,当可转移源数据集未知时,本文引入了一种高效的检测算法来识别信息源。通过使用超导体数据进行数值模拟和实证分析,证明了所提方法的有效性。
{"title":"Transfer Learning for High Dimensional Robust Regression","authors":"Xiaohui Yuan, Shujie Ren","doi":"arxiv-2406.17567","DOIUrl":"https://doi.org/arxiv-2406.17567","url":null,"abstract":"Transfer learning has become an essential technique for utilizing information\u0000from source datasets to improve the performance of the target task. However, in\u0000the context of high-dimensional data, heterogeneity arises due to\u0000heteroscedastic variance or inhomogeneous covariate effects. To solve this\u0000problem, this paper proposes a robust transfer learning based on the Huber\u0000regression, specifically designed for scenarios where the transferable source\u0000data set is known. This method effectively mitigates the impact of data\u0000heteroscedasticity, leading to improvements in estimation and prediction\u0000accuracy. Moreover, when the transferable source data set is unknown, the paper\u0000introduces an efficient detection algorithm to identify informative sources.\u0000The effectiveness of the proposed method is proved through numerical simulation\u0000and empirical analysis using superconductor data.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Learning for Fair and Stable Online Allocations 主动学习,实现公平稳定的在线分配
Pub Date : 2024-06-20 DOI: arxiv-2406.14784
Riddhiman Bhattacharya, Thanh Nguyen, Will Wei Sun, Mohit Tawarmalani
We explore an active learning approach for dynamic fair resource allocationproblems. Unlike previous work that assumes full feedback from all agents ontheir allocations, we consider feedback from a select subset of agents at eachepoch of the online resource allocation process. Despite this restriction, ourproposed algorithms provide regret bounds that are sub-linear in number oftime-periods for various measures that include fairness metrics commonly usedin resource allocation problems and stability considerations in matchingmechanisms. The key insight of our algorithms lies in adaptively identifyingthe most informative feedback using dueling upper and lower confidence bounds.With this strategy, we show that efficient decision-making does not requireextensive feedback and produces efficient outcomes for a variety of problemclasses.
我们探索了一种用于动态公平资源分配问题的主动学习方法。以往的研究假定所有代理都会对其分配情况做出全面反馈,与此不同的是,我们考虑的是在在线资源分配过程的每个阶段,从选定的代理子集中获得反馈。尽管有这样的限制,我们提出的算法还是为各种度量提供了遗憾界限,这些度量包括资源分配问题中常用的公平性度量和匹配机制中的稳定性考虑,遗憾界限在时间周期数上是亚线性的。我们算法的关键之处在于利用对立的置信上限和下限,自适应地识别信息量最大的反馈。通过这种策略,我们证明了高效决策并不需要大量反馈,并能为各种问题类别带来高效结果。
{"title":"Active Learning for Fair and Stable Online Allocations","authors":"Riddhiman Bhattacharya, Thanh Nguyen, Will Wei Sun, Mohit Tawarmalani","doi":"arxiv-2406.14784","DOIUrl":"https://doi.org/arxiv-2406.14784","url":null,"abstract":"We explore an active learning approach for dynamic fair resource allocation\u0000problems. Unlike previous work that assumes full feedback from all agents on\u0000their allocations, we consider feedback from a select subset of agents at each\u0000epoch of the online resource allocation process. Despite this restriction, our\u0000proposed algorithms provide regret bounds that are sub-linear in number of\u0000time-periods for various measures that include fairness metrics commonly used\u0000in resource allocation problems and stability considerations in matching\u0000mechanisms. The key insight of our algorithms lies in adaptively identifying\u0000the most informative feedback using dueling upper and lower confidence bounds.\u0000With this strategy, we show that efficient decision-making does not require\u0000extensive feedback and produces efficient outcomes for a variety of problem\u0000classes.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder 用条件变异自动编码器利用高级信息预测股票成交量
Pub Date : 2024-06-19 DOI: arxiv-2406.19414
Parley R Yang, Alexander Y Shestopaloff
We demonstrate the use of Conditional Variational Encoder (CVAE) to improvethe forecasts of daily stock volume time series in both short and long termforecasting tasks, with the use of advanced information of input variables suchas rebalancing dates. CVAE generates non-linear time series as out-of-sampleforecasts, which have better accuracy and closer fit of correlation to theactual data, compared to traditional linear models. These generative forecastscan also be used for scenario generation, which aids interpretation. We furtherdiscuss correlations in non-stationary time series and other potentialextensions from the CVAE forecasts.
我们展示了如何利用条件变分编码器(CVAE)来改进短期和长期预测任务中每日股票成交量时间序列的预测,并利用输入变量的高级信息(如再平衡日期)。与传统的线性模型相比,CVAE 生成的非线性时间序列作为样本外预测,具有更好的准确性和更接近实际数据的相关性。这些生成预测也可用于情景生成,从而有助于解释。我们将进一步讨论非平稳时间序列中的相关性以及 CVAE 预测的其他潜在扩展。
{"title":"Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder","authors":"Parley R Yang, Alexander Y Shestopaloff","doi":"arxiv-2406.19414","DOIUrl":"https://doi.org/arxiv-2406.19414","url":null,"abstract":"We demonstrate the use of Conditional Variational Encoder (CVAE) to improve\u0000the forecasts of daily stock volume time series in both short and long term\u0000forecasting tasks, with the use of advanced information of input variables such\u0000as rebalancing dates. CVAE generates non-linear time series as out-of-sample\u0000forecasts, which have better accuracy and closer fit of correlation to the\u0000actual data, compared to traditional linear models. These generative forecasts\u0000can also be used for scenario generation, which aids interpretation. We further\u0000discuss correlations in non-stationary time series and other potential\u0000extensions from the CVAE forecasts.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Producing treatment hierarchies in network meta-analysis using probabilistic models and treatment-choice criteria 利用概率模型和治疗选择标准在网络荟萃分析中建立治疗分层体系
Pub Date : 2024-06-15 DOI: arxiv-2406.10612
Theodoros Evrenoglou, Adriani Nikolakopoulou, Guido Schwarzer, Gerta Rücker, Anna Chaimani
A key output of network meta-analysis (NMA) is the relative ranking of thetreatments; nevertheless, it has attracted a lot of criticism. This is mainlydue to the fact that ranking is an influential output and prone toover-interpretations even when relative effects imply small differences betweentreatments. To date, common ranking methods rely on metrics that lack astraightforward interpretation, while it is still unclear how to measure theiruncertainty. We introduce a novel framework for estimating treatmenthierarchies in NMA. At first, we formulate a mathematical expression thatdefines a treatment choice criterion (TCC) based on clinically importantvalues. This TCC is applied to the study treatment effects to generate paireddata indicating treatment preferences or ties. Then, we synthesize the paireddata across studies using an extension of the so-called "Bradley-Terry" model.We assign to each treatment a latent variable interpreted as the treatment"ability" and we estimate the ability parameters within a regression model.Higher ability estimates correspond to higher positions in the final ranking.We further extend our model to adjust for covariates that may affect treatmentselection. We illustrate the proposed approach and compare it with alternativesin two datasets: a network comparing 18 antidepressants for major depressionand a network comparing 6 antihypertensives for the incidence of diabetes. Ourapproach provides a robust and interpretable treatment hierarchy which accountsfor clinically important values and is presented alongside with uncertaintymeasures. Overall, the proposed framework offers a novel approach for rankingin NMA based on concrete criteria and preserves from over-interpretation ofunimportant differences between treatments.
网络荟萃分析(NMA)的一个重要输出结果是治疗方法的相对排名;然而,它也招致了许多批评。这主要是因为排名是一种有影响力的结果,即使相对效应意味着治疗之间的差异很小,也容易引起过度解读。迄今为止,常见的排名方法依赖于缺乏直观解释的指标,而如何衡量其不确定性仍不清楚。我们介绍了一种在 NMA 中估算治疗等级的新框架。首先,我们制定了一个数学表达式,该表达式基于临床重要值定义了治疗选择标准(TCC)。将此 TCC 应用于研究治疗效果,生成表示治疗偏好或并列的配对数据。然后,我们使用所谓的 "Bradley-Terry "模型的扩展方法综合各研究的配对数据。我们为每种治疗分配一个被解释为治疗 "能力 "的潜在变量,并在回归模型中估计能力参数。我们进一步扩展了我们的模型,对可能影响治疗选择的协变量进行了调整。我们在两个数据集中对所提出的方法进行了说明,并与其他方法进行了比较:一个是比较 18 种抗抑郁药治疗重度抑郁症的网络,另一个是比较 6 种抗高血压药治疗糖尿病发病率的网络。我们的方法提供了一个稳健且可解释的治疗层次结构,它考虑了临床重要值,并与不确定性度量一起呈现。总之,所提出的框架为基于具体标准的 NMA 排序提供了一种新方法,并避免了对治疗之间不重要差异的过度解读。
{"title":"Producing treatment hierarchies in network meta-analysis using probabilistic models and treatment-choice criteria","authors":"Theodoros Evrenoglou, Adriani Nikolakopoulou, Guido Schwarzer, Gerta Rücker, Anna Chaimani","doi":"arxiv-2406.10612","DOIUrl":"https://doi.org/arxiv-2406.10612","url":null,"abstract":"A key output of network meta-analysis (NMA) is the relative ranking of the\u0000treatments; nevertheless, it has attracted a lot of criticism. This is mainly\u0000due to the fact that ranking is an influential output and prone to\u0000over-interpretations even when relative effects imply small differences between\u0000treatments. To date, common ranking methods rely on metrics that lack a\u0000straightforward interpretation, while it is still unclear how to measure their\u0000uncertainty. We introduce a novel framework for estimating treatment\u0000hierarchies in NMA. At first, we formulate a mathematical expression that\u0000defines a treatment choice criterion (TCC) based on clinically important\u0000values. This TCC is applied to the study treatment effects to generate paired\u0000data indicating treatment preferences or ties. Then, we synthesize the paired\u0000data across studies using an extension of the so-called \"Bradley-Terry\" model.\u0000We assign to each treatment a latent variable interpreted as the treatment\u0000\"ability\" and we estimate the ability parameters within a regression model.\u0000Higher ability estimates correspond to higher positions in the final ranking.\u0000We further extend our model to adjust for covariates that may affect treatment\u0000selection. We illustrate the proposed approach and compare it with alternatives\u0000in two datasets: a network comparing 18 antidepressants for major depression\u0000and a network comparing 6 antihypertensives for the incidence of diabetes. Our\u0000approach provides a robust and interpretable treatment hierarchy which accounts\u0000for clinically important values and is presented alongside with uncertainty\u0000measures. Overall, the proposed framework offers a novel approach for ranking\u0000in NMA based on concrete criteria and preserves from over-interpretation of\u0000unimportant differences between treatments.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Machine Learning, Statistical Methods, and AI for Single-Cell RNA Annotation Using Raw Count Matrices in scRNA-seq Data 利用 scRNA-seq 数据中的原始计数矩阵进行单细胞 RNA 注释的机器学习、统计方法和人工智能研究进展
Pub Date : 2024-06-07 DOI: arxiv-2406.05258
Megha Patel, Nimish Magre, Himanshi Motwani, Nik Bear Brown
Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability toanalyze gene expression at the resolution of individual cells, providingunprecedented insights into cellular heterogeneity and complex biologicalsystems. This paper reviews various advanced computational and machine learningtechniques tailored for the analysis of scRNA-seq data, emphasizing their rolesin different stages of the data processing pipeline.
单细胞 RNA 测序(scRNA-seq)彻底改变了我们以单个细胞为单位分析基因表达的能力,为我们深入了解细胞异质性和复杂的生物系统提供了前所未有的视角。本文回顾了为分析 scRNA-seq 数据而量身定制的各种先进计算和机器学习技术,强调了它们在数据处理管道不同阶段的作用。
{"title":"Advances in Machine Learning, Statistical Methods, and AI for Single-Cell RNA Annotation Using Raw Count Matrices in scRNA-seq Data","authors":"Megha Patel, Nimish Magre, Himanshi Motwani, Nik Bear Brown","doi":"arxiv-2406.05258","DOIUrl":"https://doi.org/arxiv-2406.05258","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to\u0000analyze gene expression at the resolution of individual cells, providing\u0000unprecedented insights into cellular heterogeneity and complex biological\u0000systems. This paper reviews various advanced computational and machine learning\u0000techniques tailored for the analysis of scRNA-seq data, emphasizing their roles\u0000in different stages of the data processing pipeline.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Nonparametric Quasi Likelihood 贝叶斯非参数准似然法
Pub Date : 2024-05-31 DOI: arxiv-2405.20601
Antonio R. Linero
A recent trend in Bayesian research has been revisiting generalizations ofthe likelihood that enable Bayesian inference without requiring thespecification of a model for the data generating mechanism. This paper focuseson a Bayesian nonparametric extension of Wedderburn's quasi-likelihood, usingBayesian additive regression trees to model the mean function. Here, theanalyst posits only a structural relationship between the mean and variance ofthe outcome. We show that this approach provides a unified, computationallyefficient, framework for extending Bayesian decision tree ensembles to many newsettings, including simplex-valued and heavily heteroskedastic data. We alsointroduce Bayesian strategies for inferring the dispersion parameter of thequasi-likelihood, a task which is complicated by the fact that thequasi-likelihood itself does not contain information about this parameter;despite these challenges, we are able to inject updates for the dispersionparameter into a Markov chain Monte Carlo inference scheme in a way that, inthe parametric setting, leads to a Bernstein-von Mises result for thestationary distribution of the resulting Markov chain. We illustrate theutility of our approach on a variety of both synthetic and non-syntheticdatasets.
最近,贝叶斯研究的一个趋势是重新审视似然法的一般化,这种似然法无需为数据生成机制指定模型即可进行贝叶斯推断。本文的重点是韦德伯恩准似然法的贝叶斯非参数扩展,使用贝叶斯加性回归树来建立均值函数模型。在这里,分析者只假设结果的均值和方差之间存在结构关系。我们的研究表明,这种方法提供了一个统一的、计算效率高的框架,可以将贝叶斯决策树集合扩展到许多新闻环境,包括单值数据和重度异方差数据。我们还引入了贝叶斯策略来推断卡方概率的离散参数,由于卡方概率本身并不包含该参数的信息,因此这项任务变得非常复杂;尽管存在这些挑战,我们还是能够将离散参数的更新注入马尔可夫链蒙特卡罗推断方案中,在参数设置中,这种方法可以为所得到的马尔可夫链的平稳分布带来伯恩斯坦-冯-米塞斯结果。我们在各种合成和非合成数据集上说明了这种方法的实用性。
{"title":"Bayesian Nonparametric Quasi Likelihood","authors":"Antonio R. Linero","doi":"arxiv-2405.20601","DOIUrl":"https://doi.org/arxiv-2405.20601","url":null,"abstract":"A recent trend in Bayesian research has been revisiting generalizations of\u0000the likelihood that enable Bayesian inference without requiring the\u0000specification of a model for the data generating mechanism. This paper focuses\u0000on a Bayesian nonparametric extension of Wedderburn's quasi-likelihood, using\u0000Bayesian additive regression trees to model the mean function. Here, the\u0000analyst posits only a structural relationship between the mean and variance of\u0000the outcome. We show that this approach provides a unified, computationally\u0000efficient, framework for extending Bayesian decision tree ensembles to many new\u0000settings, including simplex-valued and heavily heteroskedastic data. We also\u0000introduce Bayesian strategies for inferring the dispersion parameter of the\u0000quasi-likelihood, a task which is complicated by the fact that the\u0000quasi-likelihood itself does not contain information about this parameter;\u0000despite these challenges, we are able to inject updates for the dispersion\u0000parameter into a Markov chain Monte Carlo inference scheme in a way that, in\u0000the parametric setting, leads to a Bernstein-von Mises result for the\u0000stationary distribution of the resulting Markov chain. We illustrate the\u0000utility of our approach on a variety of both synthetic and non-synthetic\u0000datasets.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially Private Boxplots 差异化私人方框图
Pub Date : 2024-05-30 DOI: arxiv-2405.20415
Kelly Ramsay, Jairo Diaz-Rodriguez
Despite the potential of differentially private data visualization toharmonize data analysis and privacy, research in this area remains relativelyunderdeveloped. Boxplots are a widely popular visualization used forsummarizing a dataset and for comparison of multiple datasets. Consequentially,we introduce a differentially private boxplot. We evaluate its effectivenessfor displaying location, scale, skewness and tails of a given empiricaldistribution. In our theoretical exposition, we show that the location andscale of the boxplot are estimated with optimal sample complexity, and theskewness and tails are estimated consistently. In simulations, we show thatthis boxplot performs similarly to a non-private boxplot, and it outperforms aboxplot naively constructed from existing differentially private quantilealgorithms. Additionally, we conduct a real data analysis of Airbnb listings,which shows that comparable analysis can be achieved through differentiallyprivate boxplot visualization.
尽管差异化隐私数据可视化在协调数据分析和隐私方面具有潜力,但这一领域的研究仍相对落后。方框图是一种广泛流行的可视化方法,用于总结数据集和比较多个数据集。因此,我们引入了一种差异化隐私方框图。我们评估了它在显示给定经验分布的位置、规模、倾斜度和尾部方面的有效性。在理论阐述中,我们表明方框图的位置和规模是以最佳样本复杂度估算的,而倾斜度和尾部是一致估算的。在仿真中,我们发现这个箱形图的表现与非私有箱形图类似,而且优于根据现有的差异私有量化算法天真地构建的箱形图。此外,我们还对 Airbnb 房源进行了真实数据分析,结果表明,通过差异化私有盒图可视化技术也能实现类似的分析。
{"title":"Differentially Private Boxplots","authors":"Kelly Ramsay, Jairo Diaz-Rodriguez","doi":"arxiv-2405.20415","DOIUrl":"https://doi.org/arxiv-2405.20415","url":null,"abstract":"Despite the potential of differentially private data visualization to\u0000harmonize data analysis and privacy, research in this area remains relatively\u0000underdeveloped. Boxplots are a widely popular visualization used for\u0000summarizing a dataset and for comparison of multiple datasets. Consequentially,\u0000we introduce a differentially private boxplot. We evaluate its effectiveness\u0000for displaying location, scale, skewness and tails of a given empirical\u0000distribution. In our theoretical exposition, we show that the location and\u0000scale of the boxplot are estimated with optimal sample complexity, and the\u0000skewness and tails are estimated consistently. In simulations, we show that\u0000this boxplot performs similarly to a non-private boxplot, and it outperforms a\u0000boxplot naively constructed from existing differentially private quantile\u0000algorithms. Additionally, we conduct a real data analysis of Airbnb listings,\u0000which shows that comparable analysis can be achieved through differentially\u0000private boxplot visualization.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaling up archival text analysis with the blockmodeling of n-gram networks -- A case study of Bulgaria's representation in the Osservatore Romano (January-May 1877) 利用 n-gram 网络的块模型扩大档案文本分析 -- 保加利亚在《罗马汇报》(1877 年 1 月至 5 月)中的代表性案例研究
Pub Date : 2024-05-30 DOI: arxiv-2405.20156
Fabio Ashtar Telarico
This paper seeks to bridge the gap between archival text analysis and networkanalysis by applying network clustering methods to analyze the coverage ofBulgaria in 123 issues of the newspaper Osservatore Romano published betweenJanuary and May 1877. Utilizing optical character recognition and generalizedhomogeneity blockmodeling, the study constructs networks of relevant keywords.Those including the sets Bulgaria and Russia are rather isomorphic and theylargely overlap with those for Germany, Britain, and War. In structural terms,the blockmodel of the two networks exhibits a clearcore-semiperiphery-periphery structure that reflects relations between conceptsin the newpaper's coverage. The newspaper's lexical choices effectivelydelegitimised the Bulgarian national revival, highlighting the influence of theHoly See on the newspaper's editorial line.
本文试图通过应用网络聚类方法分析 1877 年 1 月至 5 月间出版的 123 期《罗马观察报》中有关保加利亚的报道,弥合档案文本分析与网络分析之间的差距。该研究利用光学字符识别和广义同质性块模型构建了相关关键词的网络,其中包括保加利亚和俄罗斯的关键词集相当同构,与德国、英国和战争的关键词集大体重叠。从结构上看,这两个网络的词块模型呈现出明显的核心-边缘-外围结构,反映了报纸报道中概念之间的关系。该报的词汇选择有效地将保加利亚的民族复兴合法化,凸显了罗马教廷对该报社论路线的影响。
{"title":"Scaling up archival text analysis with the blockmodeling of n-gram networks -- A case study of Bulgaria's representation in the Osservatore Romano (January-May 1877)","authors":"Fabio Ashtar Telarico","doi":"arxiv-2405.20156","DOIUrl":"https://doi.org/arxiv-2405.20156","url":null,"abstract":"This paper seeks to bridge the gap between archival text analysis and network\u0000analysis by applying network clustering methods to analyze the coverage of\u0000Bulgaria in 123 issues of the newspaper Osservatore Romano published between\u0000January and May 1877. Utilizing optical character recognition and generalized\u0000homogeneity blockmodeling, the study constructs networks of relevant keywords.\u0000Those including the sets Bulgaria and Russia are rather isomorphic and they\u0000largely overlap with those for Germany, Britain, and War. In structural terms,\u0000the blockmodel of the two networks exhibits a clear\u0000core-semiperiphery-periphery structure that reflects relations between concepts\u0000in the newpaper's coverage. The newspaper's lexical choices effectively\u0000delegitimised the Bulgarian national revival, highlighting the influence of the\u0000Holy See on the newspaper's editorial line.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Other Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1