Stylianos Kampakis, Melody Yuan, Oritsebawo Paul Ikpobe, Linas Stankevicius
In the evolving domain of cryptocurrency markets, accurate token valuation remains a critical aspect influencing investment decisions and policy development. Whilst the prevailing equation of exchange pricing model offers a quantitative valuation approach based on the interplay between token price, transaction volume, supply, and either velocity or holding time, it exhibits intrinsic shortcomings. Specifically, the model may not consistently delineate the relationship between average token velocity and holding time. This paper aims to refine this equation, enhancing the depth of insight into token valuation methodologies.
{"title":"Improving the Equation of Exchange for Cryptoasset Valuation Using Empirical Data","authors":"Stylianos Kampakis, Melody Yuan, Oritsebawo Paul Ikpobe, Linas Stankevicius","doi":"arxiv-2403.04914","DOIUrl":"https://doi.org/arxiv-2403.04914","url":null,"abstract":"In the evolving domain of cryptocurrency markets, accurate token valuation\u0000remains a critical aspect influencing investment decisions and policy\u0000development. Whilst the prevailing equation of exchange pricing model offers a\u0000quantitative valuation approach based on the interplay between token price,\u0000transaction volume, supply, and either velocity or holding time, it exhibits\u0000intrinsic shortcomings. Specifically, the model may not consistently delineate\u0000the relationship between average token velocity and holding time. This paper\u0000aims to refine this equation, enhancing the depth of insight into token\u0000valuation methodologies.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"115 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mine Dogucu, Sinem Demirci, Harry Bendekgey, Federica Zoe Ricci, Catalina M. Medina
The presence of data science has been profound in the scientific community in almost every discipline. An important part of the data science education expansion has been at the undergraduate level. We conducted a systematic literature review to (1) specify current evidence and knowledge gaps in undergraduate data science education and (2) inform policymakers and data science educators/practitioners about the present status of data science education research. The majority of the publications in data science education that met our search criteria were available open-access. Our results indicate that data science education research lacks empirical data and reproducibility. Not all disciplines contribute equally to the field of data science education. Computer science and data science as a separate field emerge as the leading contributors to the literature. In contrast, fields such as statistics, mathematics, as well as other fields closely related to data science exhibit a limited presence in studies. We recommend that federal agencies and researchers 1) invest in empirical data science education research; 2) diversify research efforts to enrich the spectrum of types of studies; 3) encourage scholars in key data science fields that are currently underrepresented in the literature to contribute more to research and publications.
{"title":"Undergraduate data science education: Who has the microphone and what are they saying?","authors":"Mine Dogucu, Sinem Demirci, Harry Bendekgey, Federica Zoe Ricci, Catalina M. Medina","doi":"arxiv-2403.03387","DOIUrl":"https://doi.org/arxiv-2403.03387","url":null,"abstract":"The presence of data science has been profound in the scientific community in\u0000almost every discipline. An important part of the data science education\u0000expansion has been at the undergraduate level. We conducted a systematic\u0000literature review to (1) specify current evidence and knowledge gaps in\u0000undergraduate data science education and (2) inform policymakers and data\u0000science educators/practitioners about the present status of data science\u0000education research. The majority of the publications in data science education\u0000that met our search criteria were available open-access. Our results indicate\u0000that data science education research lacks empirical data and reproducibility.\u0000Not all disciplines contribute equally to the field of data science education.\u0000Computer science and data science as a separate field emerge as the leading\u0000contributors to the literature. In contrast, fields such as statistics,\u0000mathematics, as well as other fields closely related to data science exhibit a\u0000limited presence in studies. We recommend that federal agencies and researchers\u00001) invest in empirical data science education research; 2) diversify research\u0000efforts to enrich the spectrum of types of studies; 3) encourage scholars in\u0000key data science fields that are currently underrepresented in the literature\u0000to contribute more to research and publications.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140053661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In December 2023 the Florida State Seminoles became the first Power 5 school to have an undefeated season and miss selection for the College Football Playoff. In order to assess this decision, we employed an Elo ratings model to rank the teams and found that the selection committee's decision was justified and that Florida State were not one of the four best teams in college football in that season (ranking only 11th!). We extended this analysis to all other years of the CFP and found that the top four teams by Elo ratings differ greatly from the four teams selected in almost every year of the CFP's existence. Furthermore, we found that there have been more egregious non-selections including when Alabama was ranked first by Elo ratings in 2022 and were not selected. The analysis suggests that the current criteria are too subjective and a ratings model should be implemented to provide transparency for the sport, its teams, and its fans.
2023 年 12 月,佛罗里达州立大学塞米诺尔队成为第一支赛季保持不败,但未能入选大学橄榄球季后赛的五强学校。为了对这一决定进行评估,我们采用了 Elo 评分模型对各支球队进行了排名,结果发现选拔委员会的决定是合理的,佛罗里达州立大学并不是该赛季大学橄榄球赛事中最好的四支球队之一(仅排名第 11 位!)。我们将这一分析扩展到 CFP 的其他年份,发现 Elo 评分排名前四的球队与 CFP 几乎每年选出的四支球队都有很大不同。此外,我们还发现出现了更严重的落选情况,包括 2022 年阿拉巴马队在 Elo 评分中排名第一却落选。分析表明,目前的标准过于主观,应该实施一种评级模式,为这项运动、球队和球迷提供透明度。
{"title":"An analysis of the NCAA college football playoff team selections using an Elo ratings model","authors":"Benjamin Lucas","doi":"arxiv-2403.03862","DOIUrl":"https://doi.org/arxiv-2403.03862","url":null,"abstract":"In December 2023 the Florida State Seminoles became the first Power 5 school\u0000to have an undefeated season and miss selection for the College Football\u0000Playoff. In order to assess this decision, we employed an Elo ratings model to\u0000rank the teams and found that the selection committee's decision was justified\u0000and that Florida State were not one of the four best teams in college football\u0000in that season (ranking only 11th!). We extended this analysis to all other\u0000years of the CFP and found that the top four teams by Elo ratings differ\u0000greatly from the four teams selected in almost every year of the CFP's\u0000existence. Furthermore, we found that there have been more egregious\u0000non-selections including when Alabama was ranked first by Elo ratings in 2022\u0000and were not selected. The analysis suggests that the current criteria are too\u0000subjective and a ratings model should be implemented to provide transparency\u0000for the sport, its teams, and its fans.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"81 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140053912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mattia Stival, Lorenzo Schiavon, Stefano Campostrini
In several countries, including Italy, a prominent approach to population health surveillance involves conducting repeated cross-sectional surveys at short intervals of time. These surveys gather information on the health status of individual respondents, including details on their behaviors, risk factors, and relevant socio-demographic information. While the collected data undoubtedly provides valuable information, modeling such data presents several challenges. For instance, in health risk models, it is essential to consider behavioral information, spatio-temporal dynamics, and disease co-occurrence. In response to these challenges, our work proposes a multivariate spatio-temporal logistic model for chronic disease diagnoses. Predictors are modeled using individual risk factor covariates and a latent individual propensity to the disease. Leveraging a state space formulation of the model, we construct a framework in which spatio-temporal heterogeneity in regression parameters is informed by exogenous spatial information, corresponding to different spatial contextual risk factors that may affect health and the occurrence of chronic diseases in different ways. To explore the utility and the effectiveness of our method, we analyze behavioral and risk factor surveillance data collected in Italy (PASSI), which is well-known as a country characterized by high peculiar administrative, social and territorial diversities reflected on high variability in morbidity among population subgroups.
{"title":"A Bayesian approach to uncover spatio-temporal determinants of heterogeneity in repeated cross-sectional health surveys","authors":"Mattia Stival, Lorenzo Schiavon, Stefano Campostrini","doi":"arxiv-2402.19162","DOIUrl":"https://doi.org/arxiv-2402.19162","url":null,"abstract":"In several countries, including Italy, a prominent approach to population\u0000health surveillance involves conducting repeated cross-sectional surveys at\u0000short intervals of time. These surveys gather information on the health status\u0000of individual respondents, including details on their behaviors, risk factors,\u0000and relevant socio-demographic information. While the collected data\u0000undoubtedly provides valuable information, modeling such data presents several\u0000challenges. For instance, in health risk models, it is essential to consider\u0000behavioral information, spatio-temporal dynamics, and disease co-occurrence. In\u0000response to these challenges, our work proposes a multivariate spatio-temporal\u0000logistic model for chronic disease diagnoses. Predictors are modeled using\u0000individual risk factor covariates and a latent individual propensity to the\u0000disease. Leveraging a state space formulation of the model, we construct a framework\u0000in which spatio-temporal heterogeneity in regression parameters is informed by\u0000exogenous spatial information, corresponding to different spatial contextual\u0000risk factors that may affect health and the occurrence of chronic diseases in\u0000different ways. To explore the utility and the effectiveness of our method, we\u0000analyze behavioral and risk factor surveillance data collected in Italy\u0000(PASSI), which is well-known as a country characterized by high peculiar\u0000administrative, social and territorial diversities reflected on high\u0000variability in morbidity among population subgroups.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140001997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Priulla, Alessandro Albano, Nicoletta D'Angelo, Massimo Attanasio
This paper explores the influence of Italian high school students' proficiency in mathematics and the Italian language on their university enrolment choices, specifically focusing on STEM (Science, Technology, Engineering, and Mathematics) courses. We distinguish between students from scientific and humanistic backgrounds in high school, providing valuable insights into their enrolment preferences. Furthermore, we investigate potential gender differences in response to similar previous educational choices and achievements. The study employs gradient boosting methodology, known for its high predicting performance and ability to capture non-linear relationships within data, and adjusts for variables related to the socio-demographic characteristics of the students and their previous educational achievements. Our analysis reveals significant differences in the enrolment choices based on previous high school achievements. The findings shed light on the complex interplay of academic proficiency, gender, and high school background in shaping students' choices regarding university education, with implications for educational policy and future research endeavours.
{"title":"A machine learning approach to predict university enrolment choices through students' high school background in Italy","authors":"Andrea Priulla, Alessandro Albano, Nicoletta D'Angelo, Massimo Attanasio","doi":"arxiv-2403.13819","DOIUrl":"https://doi.org/arxiv-2403.13819","url":null,"abstract":"This paper explores the influence of Italian high school students'\u0000proficiency in mathematics and the Italian language on their university\u0000enrolment choices, specifically focusing on STEM (Science, Technology,\u0000Engineering, and Mathematics) courses. We distinguish between students from\u0000scientific and humanistic backgrounds in high school, providing valuable\u0000insights into their enrolment preferences. Furthermore, we investigate\u0000potential gender differences in response to similar previous educational\u0000choices and achievements. The study employs gradient boosting methodology,\u0000known for its high predicting performance and ability to capture non-linear\u0000relationships within data, and adjusts for variables related to the\u0000socio-demographic characteristics of the students and their previous\u0000educational achievements. Our analysis reveals significant differences in the\u0000enrolment choices based on previous high school achievements. The findings shed\u0000light on the complex interplay of academic proficiency, gender, and high school\u0000background in shaping students' choices regarding university education, with\u0000implications for educational policy and future research endeavours.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140202533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the integration of gamification into educational settings has garnered significant attention as a means to enhance student engagement and learning outcomes. By leveraging gamified elements such as points and leaderboards, educators aim to promote active participation, motivation, and deeper understanding among students. This study investigates the effects of gamification on student engagement in a flipped classroom environment. The findings suggest that gamification strategies, when effectively implemented, can have a positive impact on student motivation and engagement. This paper concludes with recommendations for educators, potential challenges such as superficial engagement and demotivation, and future directions for research to address these challenges and further explore the potential of gamification in fostering student success.
{"title":"Levelling Up Learning: Exploring the Impact of Gamification in Flipped Classrooms","authors":"Eilidh Jack, Craig Alexander, Elinor M Jones","doi":"arxiv-2402.18313","DOIUrl":"https://doi.org/arxiv-2402.18313","url":null,"abstract":"In recent years, the integration of gamification into educational settings\u0000has garnered significant attention as a means to enhance student engagement and\u0000learning outcomes. By leveraging gamified elements such as points and\u0000leaderboards, educators aim to promote active participation, motivation, and\u0000deeper understanding among students. This study investigates the effects of\u0000gamification on student engagement in a flipped classroom environment. The\u0000findings suggest that gamification strategies, when effectively implemented,\u0000can have a positive impact on student motivation and engagement. This paper\u0000concludes with recommendations for educators, potential challenges such as\u0000superficial engagement and demotivation, and future directions for research to\u0000address these challenges and further explore the potential of gamification in\u0000fostering student success.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the complexity of order statistics, the finite sample behaviour of robust statistics is generally not analytically solvable. While the Monte Carlo method can provide approximate solutions, its convergence rate is typically very slow, making the computational cost to achieve the desired accuracy unaffordable for ordinary users. In this paper, we propose an approach analogous to the Fourier transformation to decompose the finite sample structure of the uniform distribution. By obtaining sets of sequences that are consistent with parametric distributions for the first four sample moments, we can approximate the finite sample behavior of other estimators with significantly reduced computational costs. This article reveals the underlying structure of randomness and presents a novel approach to integrate multiple assumptions.
{"title":"Robust estimations from distribution structures: V. Non-asymptotic","authors":"Tuobang Li","doi":"arxiv-2403.18951","DOIUrl":"https://doi.org/arxiv-2403.18951","url":null,"abstract":"Due to the complexity of order statistics, the finite sample behaviour of\u0000robust statistics is generally not analytically solvable. While the Monte Carlo\u0000method can provide approximate solutions, its convergence rate is typically\u0000very slow, making the computational cost to achieve the desired accuracy\u0000unaffordable for ordinary users. In this paper, we propose an approach\u0000analogous to the Fourier transformation to decompose the finite sample\u0000structure of the uniform distribution. By obtaining sets of sequences that are\u0000consistent with parametric distributions for the first four sample moments, we\u0000can approximate the finite sample behavior of other estimators with\u0000significantly reduced computational costs. This article reveals the underlying\u0000structure of randomness and presents a novel approach to integrate multiple\u0000assumptions.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140325969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitry Logashenko, Alexander Litvinenko, Raul Tempone, Ekaterina Vasilyeva, Gabriel Wittum
We investigate the applicability of the well-known multilevel Monte Carlo (MLMC) method to the class of density-driven flow problems, in particular the problem of salinisation of coastal aquifers. As a test case, we solve the uncertain Henry saltwater intrusion problem. Unknown porosity, permeability and recharge parameters are modelled by using random fields. The classical deterministic Henry problem is non-linear and time-dependent, and can easily take several hours of computing time. Uncertain settings require the solution of multiple realisations of the deterministic problem, and the total computational cost increases drastically. Instead of computing of hundreds random realisations, typically the mean value and the variance are computed. The standard methods such as the Monte Carlo or surrogate-based methods is a good choice, but they compute all stochastic realisations on the same, often, very fine mesh. They also do not balance the stochastic and discretisation errors. These facts motivated us to apply the MLMC method. We demonstrate that by solving the Henry problem on multi-level spatial and temporal meshes, the MLMC method reduces the overall computational and storage costs. To reduce the computing cost further, parallelization is performed in both physical and stochastic spaces. To solve each deterministic scenario, we run the parallel multigrid solver ug4 in a black-box fashion.
{"title":"Uncertainty quantification in the Henry problem using the multilevel Monte Carlo method","authors":"Dmitry Logashenko, Alexander Litvinenko, Raul Tempone, Ekaterina Vasilyeva, Gabriel Wittum","doi":"arxiv-2403.17018","DOIUrl":"https://doi.org/arxiv-2403.17018","url":null,"abstract":"We investigate the applicability of the well-known multilevel Monte Carlo\u0000(MLMC) method to the class of density-driven flow problems, in particular the\u0000problem of salinisation of coastal aquifers. As a test case, we solve the\u0000uncertain Henry saltwater intrusion problem. Unknown porosity, permeability and\u0000recharge parameters are modelled by using random fields. The classical\u0000deterministic Henry problem is non-linear and time-dependent, and can easily\u0000take several hours of computing time. Uncertain settings require the solution\u0000of multiple realisations of the deterministic problem, and the total\u0000computational cost increases drastically. Instead of computing of hundreds\u0000random realisations, typically the mean value and the variance are computed.\u0000The standard methods such as the Monte Carlo or surrogate-based methods is a\u0000good choice, but they compute all stochastic realisations on the same, often,\u0000very fine mesh. They also do not balance the stochastic and discretisation\u0000errors. These facts motivated us to apply the MLMC method. We demonstrate that\u0000by solving the Henry problem on multi-level spatial and temporal meshes, the\u0000MLMC method reduces the overall computational and storage costs. To reduce the\u0000computing cost further, parallelization is performed in both physical and\u0000stochastic spaces. To solve each deterministic scenario, we run the parallel\u0000multigrid solver ug4 in a black-box fashion.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Portuguese School of Extremes and Applications is nowadays well recognised by the international scientific community, and in my opinion, the organisation of a NATO Advanced Study Institute on Statistical Extremes and Applications, which took place at Vimeiro in the summer of 1983, was a landmark for the international recognition of the group. The dynamic of publication has been very high and the topics under investigation in the area of Extremes have been quite diverse. In this article, attention will be paid essentially to some of the scientific achievements of the author in this field.
{"title":"The PORTSEA (Portuguese School of Extremes and Applications) and a few personal scientific achievements","authors":"M. Ivette Gomes","doi":"arxiv-2402.14414","DOIUrl":"https://doi.org/arxiv-2402.14414","url":null,"abstract":"The Portuguese School of Extremes and Applications is nowadays well\u0000recognised by the international scientific community, and in my opinion, the\u0000organisation of a NATO Advanced Study Institute on Statistical Extremes and\u0000Applications, which took place at Vimeiro in the summer of 1983, was a landmark\u0000for the international recognition of the group. The dynamic of publication has\u0000been very high and the topics under investigation in the area of Extremes have\u0000been quite diverse. In this article, attention will be paid essentially to some\u0000of the scientific achievements of the author in this field.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suppose the lifetime of a large sample of batteries in routine use is measured. A confidence interval is computed to 394 plus/minus 1.96 times 4.6 days. The standard interpretation is that if we repeatedly draw samples and compute confidence intervals, about 95% of the intervals will cover the unknown true lifetime. What can be said about the particular interval 394 plus/minus 1.96 times 4.6 has not been clear. We clarify this by using an epistemic interpretation of probability. The conclusion is that a realised (computed) confidence interval covers the parameter with the probability given by the confidence level is a valid statement, unless there are relevant and recognisable subsets of the sample.
{"title":"A computed 95% confidence interval does cover the true value with probability 0.95 if epistemically interpreted","authors":"Dan Hedlin","doi":"arxiv-2402.10000","DOIUrl":"https://doi.org/arxiv-2402.10000","url":null,"abstract":"Suppose the lifetime of a large sample of batteries in routine use is\u0000measured. A confidence interval is computed to 394 plus/minus 1.96 times 4.6\u0000days. The standard interpretation is that if we repeatedly draw samples and\u0000compute confidence intervals, about 95% of the intervals will cover the unknown\u0000true lifetime. What can be said about the particular interval 394 plus/minus\u00001.96 times 4.6 has not been clear. We clarify this by using an epistemic\u0000interpretation of probability. The conclusion is that a realised (computed)\u0000confidence interval covers the parameter with the probability given by the\u0000confidence level is a valid statement, unless there are relevant and\u0000recognisable subsets of the sample.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}