The concept of (potential) years of life lost is a measure of premature mortality that can be used to compare the impacts of different specific causes of death. However, interpreting a given number of years of life lost at face value is more problematic because of the lack of a sensible reference value. In this paper, we propose three denominators to divide an excess years of life lost, thus obtaining three indicators, called average life lost, increase of life lost, and proportion of life lost, which should facilitate interpretation and comparisons. We study the links between these three indicators and classical mortality indicators, such as life expectancy and standardized mortality rate, introduce the concept of weighted standardized mortality rate, and calculate them in 30 countries to assess the impact of COVID-19 on mortality in the year 2020. Using any of the three indicators, a significant excess loss is found for both genders in 18 of the 30 countries.
{"title":"Years of Life Lost to COVID-19 and Related Mortality Indicators: An Illustration in 30 Countries","authors":"Valentin Rousson, Isabella Locatelli","doi":"10.1002/bimj.202300386","DOIUrl":"10.1002/bimj.202300386","url":null,"abstract":"<p>The concept of (potential) years of life lost is a measure of premature mortality that can be used to compare the impacts of different specific causes of death. However, interpreting a given number of years of life lost at face value is more problematic because of the lack of a sensible reference value. In this paper, we propose three denominators to divide an excess years of life lost, thus obtaining three indicators, called <i>average life lost</i>, <i>increase of life lost</i>, and <i>proportion of life lost</i>, which should facilitate interpretation and comparisons. We study the links between these three indicators and classical mortality indicators, such as life expectancy and standardized mortality rate, introduce the concept of <i>weighted standardized mortality rate</i>, and calculate them in 30 countries to assess the impact of COVID-19 on mortality in the year 2020. Using any of the three indicators, a significant excess loss is found for both genders in 18 of the 30 countries.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A generalization of Passing–Bablok regression is proposed for comparing multiple measurement methods simultaneously. Possible applications include assay migration studies or interlaboratory trials. When comparing only two methods, the method boils down to the usual Passing–Bablok estimator. It is close in spirit to reduced major axis regression, which is, however, not robust. To obtain a robust estimator, the major axis is replaced by the (hyper-)spherical median axis. This technique has been applied to compare SARS-CoV-2 serological tests, bilirubin in neonates, and an in vitro diagnostic test using different instruments, sample preparations, and reagent lots. In addition, plots similar to the well-known Bland–Altman plots have been developed to represent the variance structure.
{"title":"Robust Regression Techniques for Multiple Method Comparison and Transformation","authors":"Florian Dufey","doi":"10.1002/bimj.202400027","DOIUrl":"10.1002/bimj.202400027","url":null,"abstract":"<p>A generalization of Passing–Bablok regression is proposed for comparing multiple measurement methods simultaneously. Possible applications include assay migration studies or interlaboratory trials. When comparing only two methods, the method boils down to the usual Passing–Bablok estimator. It is close in spirit to reduced major axis regression, which is, however, not robust. To obtain a robust estimator, the major axis is replaced by the (hyper-)spherical median axis. This technique has been applied to compare SARS-CoV-2 serological tests, bilirubin in neonates, and an in vitro diagnostic test using different instruments, sample preparations, and reagent lots. In addition, plots similar to the well-known Bland–Altman plots have been developed to represent the variance structure.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camille Frévent, Mohamed-Salem Ahmed, Sophie Dabo-Niang, Michaël Genin
Spatial scan statistics are well-known methods widely used to detect spatial clusters of events. Furthermore, several spatial scan statistics models have been applied to the spatial analysis of time-to-event data. However, these models do not take account of potential correlations between the observations of individuals within the same spatial unit or potential spatial dependence between spatial units. To overcome this problem, we have developed a scan statistic based on a Cox model with shared frailty and that takes account of the spatial dependence between spatial units. In simulation studies, we found that (i) conventional models of spatial scan statistics for time-to-event data fail to maintain the type I error in the presence of a correlation between the observations of individuals within the same spatial unit and (ii) our model performed well in the presence of such correlation and spatial dependence. We have applied our method to epidemiological data and the detection of spatial clusters of mortality in patients with end-stage renal disease in northern France.
空间扫描统计是众所周知的方法,被广泛用于检测事件的空间集群。此外,一些空间扫描统计模型已被应用于时间到事件数据的空间分析。然而,这些模型并没有考虑到同一空间单位内个体观测数据之间的潜在相关性,也没有考虑到空间单位之间的潜在空间依赖性。为了解决这个问题,我们开发了一种基于具有共同脆弱性的 Cox 模型的扫描统计量,它考虑到了空间单位之间的空间依赖性。在模拟研究中,我们发现:(i) 用于时间到事件数据的传统空间扫描统计模型,在同一空间单元内的个体观测值之间存在相关性的情况下,无法保持 I 型误差;(ii) 我们的模型在存在这种相关性和空间依赖性的情况下表现良好。我们已将我们的方法应用于流行病学数据和法国北部终末期肾病患者死亡率空间集群的检测。
{"title":"A Shared-Frailty Spatial Scan Statistic Model for Time-to-Event Data","authors":"Camille Frévent, Mohamed-Salem Ahmed, Sophie Dabo-Niang, Michaël Genin","doi":"10.1002/bimj.202300200","DOIUrl":"10.1002/bimj.202300200","url":null,"abstract":"<p>Spatial scan statistics are well-known methods widely used to detect spatial clusters of events. Furthermore, several spatial scan statistics models have been applied to the spatial analysis of time-to-event data. However, these models do not take account of potential correlations between the observations of individuals within the same spatial unit or potential spatial dependence between spatial units. To overcome this problem, we have developed a scan statistic based on a Cox model with shared frailty and that takes account of the spatial dependence between spatial units. In simulation studies, we found that (i) conventional models of spatial scan statistics for time-to-event data fail to maintain the type I error in the presence of a correlation between the observations of individuals within the same spatial unit and (ii) our model performed well in the presence of such correlation and spatial dependence. We have applied our method to epidemiological data and the detection of spatial clusters of mortality in patients with end-stage renal disease in northern France.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300200","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aude Allemang-Trivalle, Annabel Maruani, Bruno Giraudeau
In the individual stepped-wedge randomized trial (ISW-RT), subjects are allocated to sequences, each sequence being defined by a control period followed by an experimental period. The total follow-up time is the same for all sequences, but the duration of the control and experimental periods varies among sequences. To our knowledge, there is no validated sample size calculation formula for ISW-RTs unlike stepped-wedge cluster randomized trials (SW-CRTs). The objective of this study was to adapt the formula used for SW-CRTs to the case of individual randomization and to validate this adaptation using a Monte Carlo simulation study. The proposed sample size calculation formula for an ISW-RT design yielded satisfactory empirical power for most scenarios except scenarios with operating characteristic values near the boundary (i.e., smallest possible number of periods, very high or very low autocorrelation coefficient). Overall, the results provide useful insights into the sample size calculation for ISW-RTs.
{"title":"Sample Size Calculation for an Individual Stepped-Wedge Randomized Trial","authors":"Aude Allemang-Trivalle, Annabel Maruani, Bruno Giraudeau","doi":"10.1002/bimj.202300167","DOIUrl":"10.1002/bimj.202300167","url":null,"abstract":"<p>In the individual stepped-wedge randomized trial (ISW-RT), subjects are allocated to sequences, each sequence being defined by a control period followed by an experimental period. The total follow-up time is the same for all sequences, but the duration of the control and experimental periods varies among sequences. To our knowledge, there is no validated sample size calculation formula for ISW-RTs unlike stepped-wedge cluster randomized trials (SW-CRTs). The objective of this study was to adapt the formula used for SW-CRTs to the case of individual randomization and to validate this adaptation using a Monte Carlo simulation study. The proposed sample size calculation formula for an ISW-RT design yielded satisfactory empirical power for most scenarios except scenarios with operating characteristic values near the boundary (i.e., smallest possible number of periods, very high or very low autocorrelation coefficient). Overall, the results provide useful insights into the sample size calculation for ISW-RTs.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300167","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raphael O. Betschart, Cristian Riccio, Domingo Aguilera-Garcia, Stefan Blankenberg, Linlin Guo, Holger Moch, Dagmar Seidl, Hugo Solleder, Felix Thalén, Alexandre Thiéry, Raphael Twerenbold, Tanja Zeller, Martin Zoche, Andreas Ziegler
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg–Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
{"title":"Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control","authors":"Raphael O. Betschart, Cristian Riccio, Domingo Aguilera-Garcia, Stefan Blankenberg, Linlin Guo, Holger Moch, Dagmar Seidl, Hugo Solleder, Felix Thalén, Alexandre Thiéry, Raphael Twerenbold, Tanja Zeller, Martin Zoche, Andreas Ziegler","doi":"10.1002/bimj.202300278","DOIUrl":"10.1002/bimj.202300278","url":null,"abstract":"<p>Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg–Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Guo, Jiajia Zhang, Yichao Wu, Alexander C. McLain, James W. Hardin, Bankole Olatosi, Xiaoming Li
Motivated by improving the prediction of the human immunodeficiency virus (HIV) suppression status using electronic health records (EHR) data, we propose a functional multivariable logistic regression model, which accounts for the longitudinal binary process and continuous process simultaneously. Specifically, the longitudinal measurements for either binary or continuous variables are modeled by functional principal components analysis, and their corresponding functional principal component scores are used to build a logistic regression model for prediction. The longitudinal binary data are linked to underlying Gaussian processes. The estimation is done using penalized spline for the longitudinal continuous and binary data. Group-lasso is used to select longitudinal processes, and the multivariate functional principal components analysis is proposed to revise functional principal component scores with the correlation. The method is evaluated via comprehensive simulation studies and then applied to predict viral suppression using EHR data for people living with HIV in South Carolina.
{"title":"Functional Multivariable Logistic Regression With an Application to HIV Viral Suppression Prediction","authors":"Siyuan Guo, Jiajia Zhang, Yichao Wu, Alexander C. McLain, James W. Hardin, Bankole Olatosi, Xiaoming Li","doi":"10.1002/bimj.202300081","DOIUrl":"10.1002/bimj.202300081","url":null,"abstract":"<p>Motivated by improving the prediction of the human immunodeficiency virus (HIV) suppression status using electronic health records (EHR) data, we propose a functional multivariable logistic regression model, which accounts for the longitudinal binary process and continuous process simultaneously. Specifically, the longitudinal measurements for either binary or continuous variables are modeled by functional principal components analysis, and their corresponding functional principal component scores are used to build a logistic regression model for prediction. The longitudinal binary data are linked to underlying Gaussian processes. The estimation is done using penalized spline for the longitudinal continuous and binary data. Group-lasso is used to select longitudinal processes, and the multivariate functional principal components analysis is proposed to revise functional principal component scores with the correlation. The method is evaluated via comprehensive simulation studies and then applied to predict viral suppression using EHR data for people living with HIV in South Carolina.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300081","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141536065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Closed testing has recently been shown to be optimal for simultaneous true discovery proportion control. It is, however, challenging to construct true discovery guarantee procedures in such a way that it focuses power on some feature sets chosen by users based on their specific interest or expertise. We propose a procedure that allows users to target power on prespecified feature sets, that is, “focus sets.” Still, the method also allows inference for feature sets chosen post hoc, that is, “nonfocus sets,” for which we deduce a true discovery lower confidence bound by interpolation. Our procedure is built from partial true discovery guarantee procedures combined with Holm's procedure and is a conservative shortcut to the closed testing procedure. A simulation study confirms that the statistical power of our method is relatively high for focus sets, at the cost of power for nonfocus sets, as desired. In addition, we investigate its power property for sets with specific structures, for example, trees and directed acyclic graphs. We also compare our method with AdaFilter in the context of replicability analysis. The application of our method is illustrated with a gene ontology analysis in gene expression data.
{"title":"Combining Partial True Discovery Guarantee Procedures","authors":"Ningning Xu, Aldo Solari, Jelle J. Goeman","doi":"10.1002/bimj.202300075","DOIUrl":"10.1002/bimj.202300075","url":null,"abstract":"<p>Closed testing has recently been shown to be optimal for simultaneous true discovery proportion control. It is, however, challenging to construct true discovery guarantee procedures in such a way that it focuses power on some feature sets chosen by users based on their specific interest or expertise. We propose a procedure that allows users to target power on prespecified feature sets, that is, “focus sets.” Still, the method also allows inference for feature sets chosen post hoc, that is, “nonfocus sets,” for which we deduce a true discovery lower confidence bound by interpolation. Our procedure is built from partial true discovery guarantee procedures combined with Holm's procedure and is a conservative shortcut to the closed testing procedure. A simulation study confirms that the statistical power of our method is relatively high for focus sets, at the cost of power for nonfocus sets, as desired. In addition, we investigate its power property for sets with specific structures, for example, trees and directed acyclic graphs. We also compare our method with AdaFilter in the context of replicability analysis. The application of our method is illustrated with a gene ontology analysis in gene expression data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sören Budig, Klaus Jung, Mario Hasler, Frank Schaarschmidt
In biomedical research, the simultaneous inference of multiple binary endpoints may be of interest. In such cases, an appropriate multiplicity adjustment is required that controls the family-wise error rate, which represents the probability of making incorrect test decisions. In this paper, we investigate two approaches that perform single-step