首页 > 最新文献

arXiv - STAT - Other Statistics最新文献

英文 中文
How to survive the Squid Games using probability theory 如何利用概率论在乌贼游戏中生存下来
Pub Date : 2024-09-09 DOI: arxiv-2409.05263
Elena Moltchanova, Miguel Moyers-González, Geertrui Van de Voorde, José Felipe Voloch, Philipp Wacker
In this paper, we consider how probability theory can be used to determinethe survival strategy in two of the ``Squid Game" and ``Squid Game: TheChallenge" challenges: the Hopscotch and the Warships. We show how Hopscotchcan be easily tackled with the knowledge of the binomial distribution, taughtin introductory statistics courses, while Warships is a much more complexproblem, which can be tackled at different levels.
在本文中,我们考虑了如何利用概率论来确定 "Squid Game "和 "Squid Game:挑战 "中的两个挑战:跳房子和战舰。我们展示了如何利用统计学入门课程中教授的二项分布知识轻松解决 "跳房子 "问题,而 "战舰 "则是一个复杂得多的问题,可以在不同层次上解决。
{"title":"How to survive the Squid Games using probability theory","authors":"Elena Moltchanova, Miguel Moyers-González, Geertrui Van de Voorde, José Felipe Voloch, Philipp Wacker","doi":"arxiv-2409.05263","DOIUrl":"https://doi.org/arxiv-2409.05263","url":null,"abstract":"In this paper, we consider how probability theory can be used to determine\u0000the survival strategy in two of the ``Squid Game\" and ``Squid Game: The\u0000Challenge\" challenges: the Hopscotch and the Warships. We show how Hopscotch\u0000can be easily tackled with the knowledge of the binomial distribution, taught\u0000in introductory statistics courses, while Warships is a much more complex\u0000problem, which can be tackled at different levels.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Censored Data Forecasting: Applying Tobit Exponential Smoothing with Time Aggregation 有删减数据的预测:应用带时间聚合的托比特指数平滑法
Pub Date : 2024-09-09 DOI: arxiv-2409.05412
Diego J. Pedregal, Juan R. Trapero
This study introduces a novel approach to forecasting by Tobit ExponentialSmoothing with time aggregation constraints. This model, a particular case ofthe Tobit Innovations State Space system, handles censored observed time serieseffectively, such as sales data, with known and potentially variable censoringlevels over time. The paper provides a comprehensive analysis of the modelstructure, including its representation in system equations and the optimalrecursive estimation of states. It also explores the benefits of timeaggregation in state space systems, particularly for inventory management anddemand forecasting. Through a series of case studies, the paper demonstratesthe effectiveness of the model across various scenarios, including hourly anddaily censoring levels. The results highlight the model's ability to produceaccurate forecasts and confidence bands comparable to those from uncensoredmodels, even under severe censoring conditions. The study further discusses theimplications for inventory policy, emphasizing the importance of avoidingspiral-down effects in demand estimation. The paper concludes by showcasing thesuperiority of the proposed model over standard methods, particularly inreducing lost sales and excess stock, thereby optimizing inventory costs. Thisresearch contributes to the field of forecasting by offering a robust modelthat effectively addresses the challenges of censored data and timeaggregation.
本研究介绍了一种新颖的预测方法,即带有时间聚合约束的托比特指数平滑法。该模型是托比特创新状态空间系统的一种特殊情况,它能有效地处理有删减的观测时间序列,如销售数据,其删减水平随时间变化是已知的且可能是可变的。本文对模型结构进行了全面分析,包括其在系统方程中的表示和状态的最优递归估计。论文还探讨了状态空间系统中时间聚类的好处,特别是在库存管理和需求预测方面。通过一系列案例研究,论文展示了该模型在各种情况下的有效性,包括每小时和每天的删减水平。研究结果突出表明,即使在严格的删减条件下,该模型也能做出准确的预测,其置信区间可与未删减模型相媲美。研究进一步讨论了对库存政策的影响,强调了在需求估计中避免螺旋下降效应的重要性。论文最后展示了所提出的模型优于标准方法,特别是在减少销售损失和过剩库存,从而优化库存成本方面。这项研究为预测领域做出了贡献,它提供了一个稳健的模型,有效地解决了有删减数据和时间分隔带来的挑战。
{"title":"Censored Data Forecasting: Applying Tobit Exponential Smoothing with Time Aggregation","authors":"Diego J. Pedregal, Juan R. Trapero","doi":"arxiv-2409.05412","DOIUrl":"https://doi.org/arxiv-2409.05412","url":null,"abstract":"This study introduces a novel approach to forecasting by Tobit Exponential\u0000Smoothing with time aggregation constraints. This model, a particular case of\u0000the Tobit Innovations State Space system, handles censored observed time series\u0000effectively, such as sales data, with known and potentially variable censoring\u0000levels over time. The paper provides a comprehensive analysis of the model\u0000structure, including its representation in system equations and the optimal\u0000recursive estimation of states. It also explores the benefits of time\u0000aggregation in state space systems, particularly for inventory management and\u0000demand forecasting. Through a series of case studies, the paper demonstrates\u0000the effectiveness of the model across various scenarios, including hourly and\u0000daily censoring levels. The results highlight the model's ability to produce\u0000accurate forecasts and confidence bands comparable to those from uncensored\u0000models, even under severe censoring conditions. The study further discusses the\u0000implications for inventory policy, emphasizing the importance of avoiding\u0000spiral-down effects in demand estimation. The paper concludes by showcasing the\u0000superiority of the proposed model over standard methods, particularly in\u0000reducing lost sales and excess stock, thereby optimizing inventory costs. This\u0000research contributes to the field of forecasting by offering a robust model\u0000that effectively addresses the challenges of censored data and time\u0000aggregation.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-sectional personal network analysis of adult smoking in rural areas 农村地区成人吸烟的横断面个人网络分析
Pub Date : 2024-08-27 DOI: arxiv-2408.14832
Bianca-Elena Mihăilă, Marian-Gabriel Hâncean, Matjaž Perc, Jürgen Lerner, Iulian Oană, Marius Geantă, José Luis Molina, Cosmina Cioroboiu
While research on adolescent smoking is extensive, little attention has beengiven to smoking behaviors among rural middle-aged and older adults. This studyexamines the role of personal networks and sociodemographic factors inpredicting smoking status in a rural Romanian community. Using a link-tracingsampling method, we gathered data from 76 participants out of 83 in Leresti,Arges County. Face-to-face interviews collected sociodemographic data andnetwork information, including smoking status and relational dynamics. Weapplied multilevel logistic regression models to predict smoking behaviors(current smokers, former smokers, and non-smokers) based on individualcharacteristics and network influences. Results indicate that social networkssignificantly influence smoking behaviors. For current smokers, having asmoking family member greatly increased the odds of smoking (OR = 2.51, 95% CI:1.62, 3.91, p < 0.001). Similarly, non-smoking family members increased thelikelihood of being a non-smoker (OR = 1.64, 95% CI: 1.04, 2.61, p < 0.05).Women were less likely to smoke, highlighting sex differences in behavior.These findings emphasize the critical role of social networks in shapingsmoking habits, advocating for targeted interventions in rural areas.
尽管有关青少年吸烟的研究非常广泛,但对农村中老年人吸烟行为的关注却很少。本研究探讨了个人网络和社会人口因素在预测罗马尼亚农村社区吸烟状况中的作用。我们采用链接追踪抽样法,从阿尔赫斯县莱莱斯蒂 83 名参与者中收集了 76 名参与者的数据。面对面访谈收集了社会人口学数据和网络信息,包括吸烟状况和关系动态。我们应用多层次逻辑回归模型,根据个人特征和网络影响因素预测吸烟行为(当前吸烟者、曾经吸烟者和不吸烟者)。结果表明,社交网络对吸烟行为有显著影响。对于当前吸烟者来说,拥有吸烟家庭成员会大大增加吸烟几率(OR = 2.51,95% CI:1.62, 3.91,p < 0.001)。同样,不吸烟的家庭成员也会增加不吸烟的可能性(OR = 1.64,95% CI:1.04, 2.61,p < 0.05)。
{"title":"Cross-sectional personal network analysis of adult smoking in rural areas","authors":"Bianca-Elena Mihăilă, Marian-Gabriel Hâncean, Matjaž Perc, Jürgen Lerner, Iulian Oană, Marius Geantă, José Luis Molina, Cosmina Cioroboiu","doi":"arxiv-2408.14832","DOIUrl":"https://doi.org/arxiv-2408.14832","url":null,"abstract":"While research on adolescent smoking is extensive, little attention has been\u0000given to smoking behaviors among rural middle-aged and older adults. This study\u0000examines the role of personal networks and sociodemographic factors in\u0000predicting smoking status in a rural Romanian community. Using a link-tracing\u0000sampling method, we gathered data from 76 participants out of 83 in Leresti,\u0000Arges County. Face-to-face interviews collected sociodemographic data and\u0000network information, including smoking status and relational dynamics. We\u0000applied multilevel logistic regression models to predict smoking behaviors\u0000(current smokers, former smokers, and non-smokers) based on individual\u0000characteristics and network influences. Results indicate that social networks\u0000significantly influence smoking behaviors. For current smokers, having a\u0000smoking family member greatly increased the odds of smoking (OR = 2.51, 95% CI:\u00001.62, 3.91, p < 0.001). Similarly, non-smoking family members increased the\u0000likelihood of being a non-smoker (OR = 1.64, 95% CI: 1.04, 2.61, p < 0.05).\u0000Women were less likely to smoke, highlighting sex differences in behavior.\u0000These findings emphasize the critical role of social networks in shaping\u0000smoking habits, advocating for targeted interventions in rural areas.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling information spread across networks with communities using a multitype branching process framework 利用多类型分支过程框架模拟信息在网络中的社群传播
Pub Date : 2024-08-08 DOI: arxiv-2408.04456
Alina Dubovskaya, Caroline B. Pena, David J. P. O'Sullivan
The dynamics of information diffusion in complex networks is widely studiedin an attempt to understand how individuals communicate and how informationtravels and reaches individuals through interactions. However, complex networksoften present community structure, and tools to analyse information diffusionon networks with communities are needed. In this paper, we develop theoreticaltools using multi-type branching processes to model and analyse simplecontagion information spread on a broad class of networks with communitystructure. We show how, by using limited information about the network -- thedegree distribution within and between communities -- we can calculate standardstatistical characteristics of the dynamics of information diffusion, such asthe extinction probability, hazard function, and cascade size distribution.These properties can be estimated not only for the entire network but also foreach community separately. Furthermore, we estimate the probability ofinformation spreading from one community to another where it is not currentlyspreading. We demonstrate the accuracy of our framework by applying it to twospecific examples: the Stochastic Block Model and a log-normal network withcommunity structure. We show how the initial seeding location affects theobserved cascade size distribution on a heavy-tailed network and that ourframework accurately captures this effect.
人们广泛研究复杂网络中的信息扩散动态,试图了解个体如何交流,以及信息如何通过互动传播并到达个体。然而,复杂网络往往具有群落结构,因此需要一些工具来分析具有群落的网络中的信息扩散。在本文中,我们开发了使用多类型分支过程的理论工具,以模拟和分析具有社群结构的各类网络上的简单传染信息传播。我们展示了如何通过使用有限的网络信息--群落内部和群落之间的度分布--来计算信息扩散动态的标准统计特征,如消亡概率、危害函数和级联规模分布。此外,我们还估算了信息从一个社群扩散到另一个目前尚未扩散的社群的概率。我们将这一框架应用于两个特定的例子:随机块模型和具有社群结构的对数正态网络,从而证明了它的准确性。我们展示了初始播种位置如何影响重尾网络上观测到的级联大小分布,而我们的框架准确地捕捉到了这种影响。
{"title":"Modeling information spread across networks with communities using a multitype branching process framework","authors":"Alina Dubovskaya, Caroline B. Pena, David J. P. O'Sullivan","doi":"arxiv-2408.04456","DOIUrl":"https://doi.org/arxiv-2408.04456","url":null,"abstract":"The dynamics of information diffusion in complex networks is widely studied\u0000in an attempt to understand how individuals communicate and how information\u0000travels and reaches individuals through interactions. However, complex networks\u0000often present community structure, and tools to analyse information diffusion\u0000on networks with communities are needed. In this paper, we develop theoretical\u0000tools using multi-type branching processes to model and analyse simple\u0000contagion information spread on a broad class of networks with community\u0000structure. We show how, by using limited information about the network -- the\u0000degree distribution within and between communities -- we can calculate standard\u0000statistical characteristics of the dynamics of information diffusion, such as\u0000the extinction probability, hazard function, and cascade size distribution.\u0000These properties can be estimated not only for the entire network but also for\u0000each community separately. Furthermore, we estimate the probability of\u0000information spreading from one community to another where it is not currently\u0000spreading. We demonstrate the accuracy of our framework by applying it to two\u0000specific examples: the Stochastic Block Model and a log-normal network with\u0000community structure. We show how the initial seeding location affects the\u0000observed cascade size distribution on a heavy-tailed network and that our\u0000framework accurately captures this effect.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic confidence intervals for the difference and the ratio of the weighted kappa coefficients of two diagnostic tests subject to a paired design 采用配对设计的两种诊断检测的加权卡帕系数之差和之比的渐近置信区间
Pub Date : 2024-07-31 DOI: arxiv-2407.21387
Jose Antonio Roldan-Nofuentes, Saad bouh Sidaty-regad
The weighted kappa coefficient of a binary diagnostic test is a measure ofthe beyond-chance agreement between the diagnostic test and the gold standard,and depends on the sensitivity and specificity of the diagnostic test, on thedisease prevalence and on the relative importance between the false positivesand the false negatives. This article studies the comparison of the weightedkappa coefficients of two binary diagnostic tests subject to a paired designthrough confidence intervals. Three asymptotic confidence intervals are studiedfor the difference between the parameters and five other intervals for theratio. Simulation experiments were carried out to study the coverageprobabilities and the average lengths of the intervals, giving some generalrules for application. A method is also proposed to calculate the sample sizenecessary to compare the two weighted kappa coefficients through a confidenceinterval. A program in R has been written to solve the problem studied and itis available as supplementary material. The results were applied to a realexample of the diagnosis of malaria.
二元诊断检测的加权卡帕系数是对诊断检测与金标准之间机会外一致性的衡量,它取决于诊断检测的灵敏度和特异性、疾病流行率以及假阳性和假阴性之间的相对重要性。本文通过置信区间研究了两种二元诊断检测的加权卡帕系数的比较。研究了参数差异的三个渐近置信区间和比率的其他五个置信区间。通过模拟实验研究了区间的覆盖概率和平均长度,并给出了一些应用的一般规则。此外,还提出了一种方法来计算通过置信区间比较两个加权卡帕系数所需的样本大小。为解决所研究的问题,我们用 R 语言编写了一个程序,并将其作为补充材料提供。研究结果已应用于疟疾诊断的实际案例中。
{"title":"Asymptotic confidence intervals for the difference and the ratio of the weighted kappa coefficients of two diagnostic tests subject to a paired design","authors":"Jose Antonio Roldan-Nofuentes, Saad bouh Sidaty-regad","doi":"arxiv-2407.21387","DOIUrl":"https://doi.org/arxiv-2407.21387","url":null,"abstract":"The weighted kappa coefficient of a binary diagnostic test is a measure of\u0000the beyond-chance agreement between the diagnostic test and the gold standard,\u0000and depends on the sensitivity and specificity of the diagnostic test, on the\u0000disease prevalence and on the relative importance between the false positives\u0000and the false negatives. This article studies the comparison of the weighted\u0000kappa coefficients of two binary diagnostic tests subject to a paired design\u0000through confidence intervals. Three asymptotic confidence intervals are studied\u0000for the difference between the parameters and five other intervals for the\u0000ratio. Simulation experiments were carried out to study the coverage\u0000probabilities and the average lengths of the intervals, giving some general\u0000rules for application. A method is also proposed to calculate the sample size\u0000necessary to compare the two weighted kappa coefficients through a confidence\u0000interval. A program in R has been written to solve the problem studied and it\u0000is available as supplementary material. The results were applied to a real\u0000example of the diagnosis of malaria.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of the likelihood ratios of two diagnostic tests subject to a paired design: confidence intervals and sample size 采用配对设计的两种诊断测试的似然比比较:置信区间和样本量
Pub Date : 2024-07-31 DOI: arxiv-2407.21382
Jose Antonio Roldan-Nofuentes, Saad Bouh Sidaty-Regad
Positive and negative likelihood ratios are parameters which are used toassess and compare the effectiveness of binary diagnostic tests. Bothparameters only depend on the sensitivity and specificity of the diagnostictest and are equivalent to a relative risk. This article studies the comparisonof the likelihood ratios of two binary diagnostic tests subject to a paireddesign through confidence intervals. Six approximate confidence intervals arepresented for the ratio of the likelihood ratios, and simulation experimentsare carried out to study the coverage probabilities and the average lengths ofthe intervals considered, and some general rules of application are proposed. Amethod is also proposed to determine the sample size necessary to estimate theratio between the likelihood ratios with a determined precision. The resultswere applied to the diagnosis of coronary artery disease.
阳性似然比和阴性似然比是用于评估和比较二元诊断检测有效性的参数。这两个参数只取决于诊断检测的灵敏度和特异性,相当于相对风险。本文研究了通过置信区间来比较两种二元诊断检测的似然比的配对设计。针对似然比的比值提出了六个近似置信区间,并通过模拟实验研究了所考虑的置信区间的覆盖概率和平均长度,同时提出了一些一般应用规则。此外,还提出了一种方法来确定必要的样本量,以便以确定的精度估算似然比之间的比率。结果被应用于冠状动脉疾病的诊断。
{"title":"Comparison of the likelihood ratios of two diagnostic tests subject to a paired design: confidence intervals and sample size","authors":"Jose Antonio Roldan-Nofuentes, Saad Bouh Sidaty-Regad","doi":"arxiv-2407.21382","DOIUrl":"https://doi.org/arxiv-2407.21382","url":null,"abstract":"Positive and negative likelihood ratios are parameters which are used to\u0000assess and compare the effectiveness of binary diagnostic tests. Both\u0000parameters only depend on the sensitivity and specificity of the diagnostic\u0000test and are equivalent to a relative risk. This article studies the comparison\u0000of the likelihood ratios of two binary diagnostic tests subject to a paired\u0000design through confidence intervals. Six approximate confidence intervals are\u0000presented for the ratio of the likelihood ratios, and simulation experiments\u0000are carried out to study the coverage probabilities and the average lengths of\u0000the intervals considered, and some general rules of application are proposed. A\u0000method is also proposed to determine the sample size necessary to estimate the\u0000ratio between the likelihood ratios with a determined precision. The results\u0000were applied to the diagnosis of coronary artery disease.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational methods to simultaneously compare the predictive values of two diagnostic tests with missing data: EM-SEM algorithms and multiple imputation 同时比较两种诊断测试预测值与缺失数据的计算方法:EM-SEM 算法和多重估算
Pub Date : 2024-07-30 DOI: arxiv-2407.21190
Jose Antonio Roldan-Nofuentes
Predictive values are measures of the clinical accuracy of a binarydiagnostic test, and depend on the sensitivity and the specificity of thediagnostic test and on the disease prevalence among the population beingstudied. This article studies hypothesis tests to simultaneously compare thepredictive values of two binary diagnostic tests in the presence of missingdata. The hypothesis tests were solved applying two computational methods: theexpectation maximization and the supplemented expectation maximizationalgorithms, and multiple imputation. Simulation experiments were carried out tostudy the sizes and the powers of the hypothesis tests, giving some generalrules of application. Two R programmes were written to apply each method, andthey are available as supplementary material for the manuscript. The resultswere applied to the diagnosis of Alzheimer's disease.
预测值是衡量二元诊断检测临床准确性的指标,它取决于诊断检测的灵敏度和特异性以及所研究人群的疾病流行率。本文研究了假设检验,以同时比较两种二元诊断检测在数据缺失情况下的预测价值。假设检验采用了两种计算方法:期望最大化算法和补充期望最大化算法以及多重归因法。通过模拟实验研究了假设检验的大小和功率,并给出了一些一般应用规则。我们编写了两个 R 程序来应用每种方法,它们作为手稿的补充材料提供。研究结果被应用于阿尔茨海默病的诊断。
{"title":"Computational methods to simultaneously compare the predictive values of two diagnostic tests with missing data: EM-SEM algorithms and multiple imputation","authors":"Jose Antonio Roldan-Nofuentes","doi":"arxiv-2407.21190","DOIUrl":"https://doi.org/arxiv-2407.21190","url":null,"abstract":"Predictive values are measures of the clinical accuracy of a binary\u0000diagnostic test, and depend on the sensitivity and the specificity of the\u0000diagnostic test and on the disease prevalence among the population being\u0000studied. This article studies hypothesis tests to simultaneously compare the\u0000predictive values of two binary diagnostic tests in the presence of missing\u0000data. The hypothesis tests were solved applying two computational methods: the\u0000expectation maximization and the supplemented expectation maximization\u0000algorithms, and multiple imputation. Simulation experiments were carried out to\u0000study the sizes and the powers of the hypothesis tests, giving some general\u0000rules of application. Two R programmes were written to apply each method, and\u0000they are available as supplementary material for the manuscript. The results\u0000were applied to the diagnosis of Alzheimer's disease.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Books Tell a History of Statistics in Portugal: Works of Foreigners, Estrangeirados, and Others 书籍是如何讲述葡萄牙统计史的?外国人和其他人士的作品
Pub Date : 2024-07-28 DOI: arxiv-2407.19433
Dinis Pestana, Rui Santos
Foreigners and "estrangeirados", an expression meaning "people going to aforeign country ["estrangeiro"] getting there further education", had a leadingrole in the development of Mathematical Statistics in Portugal. In whatconcerns Statistics, "estrangeirados" in the nineteenth century were mainlyliberal intellectuals exiled for political reasons. From 1930 onwards, theresearch funding authority sent university professors abroad, and hired foreignresearchers to stay in Portuguese institutions, and some of them wereinstrumental in the importation of new concepts and methods of inferentialstatistics. After 1970, there was a huge program of sending young researchersabroad for doctoral studies. At the same time, many new universities andpolytechnic institutes have been created in Portugal. After that, aside fromforeigners who choose to have a research career in those institutions and the"estrangeirados" who had returned and created programs of doctoral studies,others, who hadn't the opportunity of studying abroad, began to play a decisiverole in the development of Statistics in Portugal. The publication of handbookson Probability and Statistics, thesis and core papers in Portuguese scientificjournals, and also of works for the layman, reveals how Statistics progressedfrom descriptive to a mathematical discipline used for inference in all fieldsof knowledge, from natural sciences to methodology of scientific research.
外国人和 "estrangeirados"(意为 "去外国("estrangeiro")深造的人")在葡萄牙数学统计的发展中发挥了主导作用。就统计学而言,19 世纪的 "estrangedos "主要是指因政治原因被流放的自由主义知识分子。从 1930 年起,研究资助机构将大学教授派往国外,并聘请外国研究人员在葡萄牙机构工作,其中一些人在引进推理统计学的新概念和新方法方面发挥了重要作用。1970 年后,葡萄牙开展了一项大规模的计划,派遣年轻研究人员出国攻读博士学位。与此同时,葡萄牙新建了许多大学和理工学院。此后,除了选择在这些机构从事研究工作的外籍人士和回国后创建了博士课程的 "被遗弃者 "外,其他没有机会出国留学的人也开始在葡萄牙统计学的发展中发挥决定性作用。在葡萄牙科学杂志上发表的《概率论与统计学手册》、论文和核心论文,以及面向普通读者的作品,揭示了统计学如何从描述性学科发展成为一门数学学科,用于从自然科学到科学研究方法论等所有知识领域的推理。
{"title":"How Books Tell a History of Statistics in Portugal: Works of Foreigners, Estrangeirados, and Others","authors":"Dinis Pestana, Rui Santos","doi":"arxiv-2407.19433","DOIUrl":"https://doi.org/arxiv-2407.19433","url":null,"abstract":"Foreigners and \"estrangeirados\", an expression meaning \"people going to a\u0000foreign country [\"estrangeiro\"] getting there further education\", had a leading\u0000role in the development of Mathematical Statistics in Portugal. In what\u0000concerns Statistics, \"estrangeirados\" in the nineteenth century were mainly\u0000liberal intellectuals exiled for political reasons. From 1930 onwards, the\u0000research funding authority sent university professors abroad, and hired foreign\u0000researchers to stay in Portuguese institutions, and some of them were\u0000instrumental in the importation of new concepts and methods of inferential\u0000statistics. After 1970, there was a huge program of sending young researchers\u0000abroad for doctoral studies. At the same time, many new universities and\u0000polytechnic institutes have been created in Portugal. After that, aside from\u0000foreigners who choose to have a research career in those institutions and the\u0000\"estrangeirados\" who had returned and created programs of doctoral studies,\u0000others, who hadn't the opportunity of studying abroad, began to play a decisive\u0000role in the development of Statistics in Portugal. The publication of handbooks\u0000on Probability and Statistics, thesis and core papers in Portuguese scientific\u0000journals, and also of works for the layman, reveals how Statistics progressed\u0000from descriptive to a mathematical discipline used for inference in all fields\u0000of knowledge, from natural sciences to methodology of scientific research.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Foreign Players in the English Premier League: A Mathematical Analys 外籍球员对英格兰足球超级联赛的影响:数学分析
Pub Date : 2024-07-27 DOI: arxiv-2407.19285
Amit K Chattopadhyay, A. Abdul, Sudhir Jain
We undertake extensive analysis of English Premier League data over theperiod 2009/10 to 2017/18 to identify and rank key factors affecting theeconomic and footballing performances of the teams. Alternative end-of-seasonleague tables are generated by re-ranking the teams based on five differentdescriptors - total expenditure, total funds spent on players, total fundsspent on foreign players, the ratio of foreign to British players and theoverall profit. The unequal distribution of resources and expenditure betweenthe clubs is analyzed through Lorenz curves. A comparative analysis of thedifferences between the alternative tables and the conventional end-of-seasonleague table establishes the most likely factors to influence the performancesof the teams that we also rank using Principal Component Analysis. We find thatthe top teams in the league are also those that tend to have the highestexpenditure overall, for all players, including foreign players; they also havethe highest ratios of foreign to British players. Our statistical and machinelearning study also indicates that successful performance on the field may notguarantee healthy profits at the end of the season.
我们对 2009/10 年至 2017/18 年期间的英格兰足球超级联赛数据进行了广泛分析,以确定影响球队经济和足球表现的关键因素并对其进行排名。我们根据五个不同的描述指标--总支出、花在球员身上的资金总额、花在外国球员身上的资金总额、外国球员与英国球员的比例以及总体利润--对球队进行重新排名,从而生成了备选的季末联赛排名表。通过洛伦兹曲线分析了各俱乐部之间资源和支出分配不均的情况。通过对备选表格与传统的赛季末联赛表格之间的差异进行比较分析,我们确定了最有可能影响球队表现的因素,并使用主成分分析法对这些因素进行了排序。我们发现,联赛排名靠前的球队往往也是所有球员(包括外籍球员)总体支出最高的球队;他们的外籍球员与英国球员的比例也最高。我们的统计和机器学习研究还表明,球场上的成功表现并不能保证赛季结束时的健康收益。
{"title":"The Impact of Foreign Players in the English Premier League: A Mathematical Analys","authors":"Amit K Chattopadhyay, A. Abdul, Sudhir Jain","doi":"arxiv-2407.19285","DOIUrl":"https://doi.org/arxiv-2407.19285","url":null,"abstract":"We undertake extensive analysis of English Premier League data over the\u0000period 2009/10 to 2017/18 to identify and rank key factors affecting the\u0000economic and footballing performances of the teams. Alternative end-of-season\u0000league tables are generated by re-ranking the teams based on five different\u0000descriptors - total expenditure, total funds spent on players, total funds\u0000spent on foreign players, the ratio of foreign to British players and the\u0000overall profit. The unequal distribution of resources and expenditure between\u0000the clubs is analyzed through Lorenz curves. A comparative analysis of the\u0000differences between the alternative tables and the conventional end-of-season\u0000league table establishes the most likely factors to influence the performances\u0000of the teams that we also rank using Principal Component Analysis. We find that\u0000the top teams in the league are also those that tend to have the highest\u0000expenditure overall, for all players, including foreign players; they also have\u0000the highest ratios of foreign to British players. Our statistical and machine\u0000learning study also indicates that successful performance on the field may not\u0000guarantee healthy profits at the end of the season.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid summary statistics: neural weak lensing inference beyond the power spectrum 混合汇总统计:超越功率谱的神经弱透镜推理
Pub Date : 2024-07-26 DOI: arxiv-2407.18909
T. Lucas Makinen, Tom Charnock, Natalia Porqueres, Axel Lapel, Alan Heavens, Benjamin D. Wandelt
In inference problems, we often have domain knowledge which allows us todefine summary statistics that capture most of the information content in adataset. In this paper, we present a hybrid approach, where such physics-basedsummaries are augmented by a set of compressed neural summary statistics thatare optimised to extract the extra information that is not captured by thepredefined summaries. The resulting statistics are very powerful inputs tosimulation-based or implicit inference of model parameters. We apply thisgeneralisation of Information Maximising Neural Networks (IMNNs) to parameterconstraints from tomographic weak gravitational lensing convergence maps tofind summary statistics that are explicitly optimised to complement angularpower spectrum estimates. We study several dark matter simulation resolutionsin low- and high-noise regimes. We show that i) the information-updateformalism extracts at least $3times$ and up to $8times$ as much informationas the angular power spectrum in all noise regimes, ii) the network summariesare highly complementary to existing 2-point summaries, and iii) our formalismallows for networks with smaller, physically-informed architectures to matchmuch larger regression networks with far fewer simulations needed to obtainasymptotically optimal inference.
在推理问题中,我们通常掌握了一些领域知识,这些知识允许我们定义能够捕捉数据集中大部分信息内容的摘要统计。在本文中,我们提出了一种混合方法,即用一组压缩神经汇总统计量来增强这种基于物理的汇总统计量,这些统计量经过优化,可以提取预定义的汇总统计量未捕捉到的额外信息。由此产生的统计量是对模型参数进行基于模拟或隐式推断的强大输入。我们将信息最大化神经网络(IMNNs)的这一概括应用于来自断层扫描弱引力透镜收敛图的参数约束,以找到明确优化以补充角功率谱估计的汇总统计量。我们研究了低噪声和高噪声状态下的几种暗物质模拟分辨率。我们表明:i)信息上数据形式主义提取的信息量至少是角功率谱在所有噪声状态下的3倍,最多可达8倍;ii)网络摘要与现有的2点摘要具有高度互补性;iii)我们的形式主义允许具有较小的、物理信息架构的网络与较大的回归网络相匹配,而获得渐近最优推理所需的模拟次数要少得多。
{"title":"Hybrid summary statistics: neural weak lensing inference beyond the power spectrum","authors":"T. Lucas Makinen, Tom Charnock, Natalia Porqueres, Axel Lapel, Alan Heavens, Benjamin D. Wandelt","doi":"arxiv-2407.18909","DOIUrl":"https://doi.org/arxiv-2407.18909","url":null,"abstract":"In inference problems, we often have domain knowledge which allows us to\u0000define summary statistics that capture most of the information content in a\u0000dataset. In this paper, we present a hybrid approach, where such physics-based\u0000summaries are augmented by a set of compressed neural summary statistics that\u0000are optimised to extract the extra information that is not captured by the\u0000predefined summaries. The resulting statistics are very powerful inputs to\u0000simulation-based or implicit inference of model parameters. We apply this\u0000generalisation of Information Maximising Neural Networks (IMNNs) to parameter\u0000constraints from tomographic weak gravitational lensing convergence maps to\u0000find summary statistics that are explicitly optimised to complement angular\u0000power spectrum estimates. We study several dark matter simulation resolutions\u0000in low- and high-noise regimes. We show that i) the information-update\u0000formalism extracts at least $3times$ and up to $8times$ as much information\u0000as the angular power spectrum in all noise regimes, ii) the network summaries\u0000are highly complementary to existing 2-point summaries, and iii) our formalism\u0000allows for networks with smaller, physically-informed architectures to match\u0000much larger regression networks with far fewer simulations needed to obtain\u0000asymptotically optimal inference.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Other Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1