P-values have played a central role in the advancement of research in virtually all scientific fields; however, there has been significant controversy over their use. “The ASA president’s task force statement on statistical significance and replicability” has provided a solid basis for resolving the quarrel, but although the significance part is clearly dealt with, the replicability part raises further discussions. Given the clear statement regarding significance, in this article, we consider the validity of p-value use for statistical inference as de facto. We briefly review the bibliography regarding the relevant controversy in recent years and illustrate how already proposed approaches, or slight adaptations thereof, can be readily implemented to address both significance and reproducibility, adding credibility to empirical study findings. The definitions used for the notions of replicability and reproducibility are also clearly described. We argue that any p-value must be reported along with its corresponding s-value followed by (1−α)% confidence intervals and the rejection replication index.
{"title":"Adaptations on the Use of p-Values for Statistical Inference: An Interpretation of Messages from Recent Public Discussions","authors":"E. Verykouki, Chris Nakas","doi":"10.3390/stats6020035","DOIUrl":"https://doi.org/10.3390/stats6020035","url":null,"abstract":"P-values have played a central role in the advancement of research in virtually all scientific fields; however, there has been significant controversy over their use. “The ASA president’s task force statement on statistical significance and replicability” has provided a solid basis for resolving the quarrel, but although the significance part is clearly dealt with, the replicability part raises further discussions. Given the clear statement regarding significance, in this article, we consider the validity of p-value use for statistical inference as de facto. We briefly review the bibliography regarding the relevant controversy in recent years and illustrate how already proposed approaches, or slight adaptations thereof, can be readily implemented to address both significance and reproducibility, adding credibility to empirical study findings. The definitions used for the notions of replicability and reproducibility are also clearly described. We argue that any p-value must be reported along with its corresponding s-value followed by (1−α)% confidence intervals and the rejection replication index.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44422807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The past quarter century has seen a resurgence of research on the controversial topic of gender differences in variability, in part because of its potential implications for the issue of under- and over-representation of various subpopulations of our society, with respect to different traits. Unfortunately, several basic statistical, inferential, and logical errors are being propagated in studies on this highly publicized topic. These errors include conflicting interpretations of the numerical significance of actual variance ratio values; a mistaken claim about variance ratios in mixtures of distributions; incorrect inferences from variance ratio values regarding the relative roles of sociocultural and biological factors; and faulty experimental designs. Most importantly, without knowledge of the underlying distributions, the standard variance ratio test statistic is shown to have no implications for tail ratios. The main aim of this note is to correct the scientific record and to illuminate several of these key errors in order to reduce their further propagation. For concreteness, the arguments will focus on one highly influential paper.
{"title":"Recurring Errors in Studies of Gender Differences in Variability","authors":"T. Hill, Rosalind Arden","doi":"10.3390/stats6020033","DOIUrl":"https://doi.org/10.3390/stats6020033","url":null,"abstract":"The past quarter century has seen a resurgence of research on the controversial topic of gender differences in variability, in part because of its potential implications for the issue of under- and over-representation of various subpopulations of our society, with respect to different traits. Unfortunately, several basic statistical, inferential, and logical errors are being propagated in studies on this highly publicized topic. These errors include conflicting interpretations of the numerical significance of actual variance ratio values; a mistaken claim about variance ratios in mixtures of distributions; incorrect inferences from variance ratio values regarding the relative roles of sociocultural and biological factors; and faulty experimental designs. Most importantly, without knowledge of the underlying distributions, the standard variance ratio test statistic is shown to have no implications for tail ratios. The main aim of this note is to correct the scientific record and to illuminate several of these key errors in order to reduce their further propagation. For concreteness, the arguments will focus on one highly influential paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47022726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the waves of the COVID-19 pandemic, both national and/or territorial healthcare systems have been severely stressed in many countries. The availability (and complexity) of data requires proper comparisons for understanding differences in the performance of health services. With this aim, we propose a methodological approach to compare the performance of the Italian healthcare system at the territorial level, i.e., considering NUTS 2 regions. Our approach consists of three steps: the choice of a distance measure between available time series, the application of weighted multidimensional scaling (wMDS) based on this distance, and, finally, a cluster analysis on the MDS coordinates. We separately consider daily time series regarding the deceased, intensive care units, and ordinary hospitalizations of patients affected by COVID-19. The proposed procedure identifies four clusters apart from two outlier regions. Changes between the waves at a regional level emerge from the main results, allowing the pressure on territorial health services to be mapped between 2020 and 2022.
{"title":"Detecting Regional Differences in Italian Health Services during Five COVID-19 Waves","authors":"Lucio Palazzo, Riccardo Ievoli","doi":"10.3390/stats6020032","DOIUrl":"https://doi.org/10.3390/stats6020032","url":null,"abstract":"During the waves of the COVID-19 pandemic, both national and/or territorial healthcare systems have been severely stressed in many countries. The availability (and complexity) of data requires proper comparisons for understanding differences in the performance of health services. With this aim, we propose a methodological approach to compare the performance of the Italian healthcare system at the territorial level, i.e., considering NUTS 2 regions. Our approach consists of three steps: the choice of a distance measure between available time series, the application of weighted multidimensional scaling (wMDS) based on this distance, and, finally, a cluster analysis on the MDS coordinates. We separately consider daily time series regarding the deceased, intensive care units, and ordinary hospitalizations of patients affected by COVID-19. The proposed procedure identifies four clusters apart from two outlier regions. Changes between the waves at a regional level emerge from the main results, allowing the pressure on territorial health services to be mapped between 2020 and 2022.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42503535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing at random. Conventional information criteria implicitly assume that any subset of missing-at-random data is also missing at random, and thus the maximum likelihood estimator is assumed to be consistent; that is, it is assumed that the estimator will converge to the true value. However, this assumption may not be practical. In this paper, we develop an information criterion that works even for not-missing-at-random data, so long as the largest missing data set is missing at random. Simulations are performed to show the superiority of the proposed information criterion over conventional criteria.
{"title":"Model Selection with Missing Data Embedded in Missing-at-Random Data","authors":"Keiji Takai, Kenichi Hayashi","doi":"10.3390/stats6020031","DOIUrl":"https://doi.org/10.3390/stats6020031","url":null,"abstract":"When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing at random. Conventional information criteria implicitly assume that any subset of missing-at-random data is also missing at random, and thus the maximum likelihood estimator is assumed to be consistent; that is, it is assumed that the estimator will converge to the true value. However, this assumption may not be practical. In this paper, we develop an information criterion that works even for not-missing-at-random data, so long as the largest missing data set is missing at random. Simulations are performed to show the superiority of the proposed information criterion over conventional criteria.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43087422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Bass diffusion model is defined on an arbitrary network, with the additional introduction of behavioral compartments, such that nodes can have different probabilities of receiving the information/innovation from the source and transmitting it to other nodes. The dynamics are described by a large system of non-linear ordinary differential equations, whose numerical solutions can be analyzed in dependence on diffusion parameters, network parameters, and relations between the compartments. For example, in a simple case with two compartments (Enthusiasts and Sceptics about the innovation), we consider cases in which the “publicity” and imitation terms act differently on the compartments, and individuals from one compartment do not imitate those of the other, thus increasing the polarization of the system and creating sectors of the population where adoption becomes very slow. For some categories of scale-free networks, we also investigate the dependence on the features of the networks of the diffusion peak time and of the time at which adoptions reach 90% of the population.
{"title":"The Network Bass Model with Behavioral Compartments","authors":"G. Modanese","doi":"10.3390/stats6020030","DOIUrl":"https://doi.org/10.3390/stats6020030","url":null,"abstract":"A Bass diffusion model is defined on an arbitrary network, with the additional introduction of behavioral compartments, such that nodes can have different probabilities of receiving the information/innovation from the source and transmitting it to other nodes. The dynamics are described by a large system of non-linear ordinary differential equations, whose numerical solutions can be analyzed in dependence on diffusion parameters, network parameters, and relations between the compartments. For example, in a simple case with two compartments (Enthusiasts and Sceptics about the innovation), we consider cases in which the “publicity” and imitation terms act differently on the compartments, and individuals from one compartment do not imitate those of the other, thus increasing the polarization of the system and creating sectors of the population where adoption becomes very slow. For some categories of scale-free networks, we also investigate the dependence on the features of the networks of the diffusion peak time and of the time at which adoptions reach 90% of the population.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo, Yun Li
Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d≥2.
{"title":"Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data","authors":"Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo, Yun Li","doi":"10.3390/stats6010029","DOIUrl":"https://doi.org/10.3390/stats6010029","url":null,"abstract":"Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d≥2.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42241283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as the input to build up the variance–covariance matrix. The model is applied to study the common sunflower, Helianthus annuus, by investigating its traits used to respond to drought conditions. Results show that our model provides acceptable estimates of the parameters, where most of the traits analyzed were found to have a significant correlation with drought tolerance.
{"title":"A Phylogenetic Regression Model for Studying Trait Evolution on Network","authors":"Dwueng-Chwuan Jhwueng","doi":"10.3390/stats6010028","DOIUrl":"https://doi.org/10.3390/stats6010028","url":null,"abstract":"A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as the input to build up the variance–covariance matrix. The model is applied to study the common sunflower, Helianthus annuus, by investigating its traits used to respond to drought conditions. Results show that our model provides acceptable estimates of the parameters, where most of the traits analyzed were found to have a significant correlation with drought tolerance.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42084304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the present paper, we establish a new consecutive-type reliability model with a single change point. The proposed structure has two common failure criteria and consists of two different types of components. The general framework for constructing the so-called consecutive-k1 and k2-out-of-n: F system with a single change point is launched. In addition, the number of path sets of the proposed structure is determined with the aid of a combinatorial approach. Moreover, two crucial performance characteristics of the proposed model are studied. The numerical investigation carried out reveals that the behavior of the new structure is outperforming against its competitors.
{"title":"Consecutive-k1 and k2-out-of-n: F Structures with a Single Change Point","authors":"I. Triantafyllou, M. Chalikias","doi":"10.3390/stats6010027","DOIUrl":"https://doi.org/10.3390/stats6010027","url":null,"abstract":"In the present paper, we establish a new consecutive-type reliability model with a single change point. The proposed structure has two common failure criteria and consists of two different types of components. The general framework for constructing the so-called consecutive-k1 and k2-out-of-n: F system with a single change point is launched. In addition, the number of path sets of the proposed structure is determined with the aid of a combinatorial approach. Moreover, two crucial performance characteristics of the proposed model are studied. The numerical investigation carried out reveals that the behavior of the new structure is outperforming against its competitors.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46095423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 outbreak has rapidly affected global economies and the parties involved. There was a need to ensure the sustainability of corporate finance and avoid bankruptcy. The reactions of individuals were not routine, but covered a wide range of approaches to surviving the crisis. A creative way of accounting was also adopted. This study is primarily concerned with the behavior of businesses in the Visegrad Four countries between 2019 and 2021. The pandemic era was the driving force behind the renaissance of manipulation. Thus, the purpose of the article is to explore how the behavior of enterprises changed during the ongoing pandemic. The Beneish model was applied to reveal creative manipulation in the analyzed samples. Its M-score was calculated for 6113 Slovak, 153 Czech, 585 Polish, and 155 Hungarian enterprises. Increasing numbers of handling enterprises were confirmed in the V4 region. The dependency between the size of the enterprise and the occurrence of creative accounting was also proven. However, the structure of manipulators has been changing. Correspondence analysis specifically showed behavioral changes over time. Correspondence maps demonstrate which enterprises already used creative accounting before the pandemic in 2019. Then, it was noted that enterprises were influenced to modify their patterns in 2020 and 2021. The coronavirus pandemic had a significant potency on the use of creative accounting, not only for individual units, but for businesses of all sizes. In addition, the methodology may be applied for the investigation of individual sectors post-COVID.
{"title":"Renaissance of Creative Accounting Due to the Pandemic: New Patterns Explored by Correspondence Analysis","authors":"R. Blazek, P. Durana, Jakub Michulek","doi":"10.3390/stats6010025","DOIUrl":"https://doi.org/10.3390/stats6010025","url":null,"abstract":"The COVID-19 outbreak has rapidly affected global economies and the parties involved. There was a need to ensure the sustainability of corporate finance and avoid bankruptcy. The reactions of individuals were not routine, but covered a wide range of approaches to surviving the crisis. A creative way of accounting was also adopted. This study is primarily concerned with the behavior of businesses in the Visegrad Four countries between 2019 and 2021. The pandemic era was the driving force behind the renaissance of manipulation. Thus, the purpose of the article is to explore how the behavior of enterprises changed during the ongoing pandemic. The Beneish model was applied to reveal creative manipulation in the analyzed samples. Its M-score was calculated for 6113 Slovak, 153 Czech, 585 Polish, and 155 Hungarian enterprises. Increasing numbers of handling enterprises were confirmed in the V4 region. The dependency between the size of the enterprise and the occurrence of creative accounting was also proven. However, the structure of manipulators has been changing. Correspondence analysis specifically showed behavioral changes over time. Correspondence maps demonstrate which enterprises already used creative accounting before the pandemic in 2019. Then, it was noted that enterprises were influenced to modify their patterns in 2020 and 2021. The coronavirus pandemic had a significant potency on the use of creative accounting, not only for individual units, but for businesses of all sizes. In addition, the methodology may be applied for the investigation of individual sectors post-COVID.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47562439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using geometric considerations, we provided a clear derivation of the integral representation for the error function, known as the Craig formula. We calculated the corresponding power series expansion and proved the convergence. The same geometric means finally assisted in systematically deriving useful formulas that approximated the inverse error function. Our approach could be used for applications in high-speed Monte Carlo simulations, where this function is used extensively.
{"title":"Analytic Error Function and Numeric Inverse Obtained by Geometric Means","authors":"D. Martila, S. Groote","doi":"10.3390/stats6010026","DOIUrl":"https://doi.org/10.3390/stats6010026","url":null,"abstract":"Using geometric considerations, we provided a clear derivation of the integral representation for the error function, known as the Craig formula. We calculated the corresponding power series expansion and proved the convergence. The same geometric means finally assisted in systematically deriving useful formulas that approximated the inverse error function. Our approach could be used for applications in high-speed Monte Carlo simulations, where this function is used extensively.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45796659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}