首页 > 最新文献

Statistics and Its Interface最新文献

英文 中文
Learning conditional dependence graph for concepts via matrix normal graphical model 通过矩阵正态图模型学习概念的条件依赖图
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii784
Jizheng Lai, Jianxin Yin
Conditional dependence relationships for random vectors are extensively studied and broadly applied. But it is not very clear how to construct the dependence graph for unstructured data like concept words or phrases in text corpus, where the variables(concepts) are not jointly observed with i.i.d. assumption. Using the global embedding methods like GloVe, we get the ‘structured’ representation vectors for concepts. Then we assume that all the concept vectors jointly follow a matrix normal distribution with sparse precision matrices. With the observation of the word-word co-occurrence matrix and the GloVe construction procedure, we can test this assumption empirically. The asymptotic distribution for the test statistics is derived. Another advantage of this matrix-normal distributional assumption is that the linearly additive property in word analogy tasks is natural and straightforward. Different from knowledge graph methods, the conditional dependence graph describes the conditional dependence structure between concepts given all other concepts, which means that the concepts(nodes) linked by edges cannot be separated by other concepts. It represents an essential semantic relationship. There is no need to enumerate all related pairs as head and tail elements of a triplet in knowledge graph regime. And the relation type in this graph is solely the conditional dependence between concepts. A penalized matrix normal graphical model (MNGM) is then employed to learn the conditional dependence graph for both the concepts and the embedding ‘dimensions’. Since the concept words are nodes in our graph with huge dimensions, we employ the MDMC optimization method to speed up the glasso algorithm. Also, the algorithm is adaptive to incremental accumulation of new concepts in text corpus. On the other hand, we propose a sentence granularity bootstrap to get ‘independent’ repeats of samples to enhance the penalized MNGM algorithm.We name the proposed method as Matrix-GloVe. In simulation studies, we check that the graph learned by Matrix-GloVe is more suitable for Graph Convolutional Networks(GCN) than a correlation graph, i.e. a graph determined from the k-NN method. We employ the proposed method in two scenarios from real data. The first scenario is concept graph learning for concepts in textbook corpus. Under this scenario, two tasks are studied. One is comparing the vectors output by GloVe and other word2vec methods, i.e. CBOW and Skip-Gram, then the vectors are used by penalized MNGM. Another task is link prediction among the concepts. On both tasks, Matrix-GloVe achieves better. In the second scenario, Matrix-GloVe is applied to a downstream method i.e. GCN. For node classification tasks on the BBC and BBCSport datasets, both GCN with Matrix- GloVe and GCN with Matrix-GloVe plus Deepwalk outperform GCN with k-NN.
随机向量的条件依赖关系已被广泛研究和应用。但是,对于文本语料库中的概念词或短语等非结构化数据,变量(概念)并非以 i.i.d. 假设联合观测,如何构建其依赖关系图还不是很清楚。使用 GloVe 等全局嵌入方法,我们可以得到概念的 "结构化 "表示向量。然后,我们假定所有概念向量共同遵循具有稀疏精度矩阵的矩阵正态分布。通过观察词-词共现矩阵和 GloVe 构建程序,我们可以对这一假设进行实证检验。测试统计量的渐近分布由此得出。这种矩阵正态分布假设的另一个优点是,单词类比任务中的线性相加属性是自然而直接的。与知识图谱方法不同,条件依存图描述的是给定所有其他概念的概念之间的条件依存结构,这意味着由边连接的概念(节点)不能被其他概念分开。它代表了一种基本的语义关系。在知识图谱体系中,没有必要将所有相关的对作为三元组的头元素和尾元素进行枚举。而且这种图中的关系类型仅是概念之间的条件依赖关系。然后,我们采用惩罚矩阵正则图模型(MNGM)来学习概念和嵌入 "维度 "的条件依赖图。由于概念词是具有巨大维度的图中节点,我们采用了 MDMC 优化方法来加快玻璃算法的速度。此外,该算法还能适应文本语料中新概念的增量积累。另一方面,我们提出了一种句子粒度引导方法,以获得 "独立 "的重复样本,从而增强受惩罚的 MNGM 算法。在模拟研究中,我们验证了矩阵-GloVe 学习到的图比相关图(即由 k-NN 方法确定的图)更适合图卷积网络(GCN)。我们在两个真实数据场景中使用了所提出的方法。第一个场景是教科书语料库中的概念图学习。在这种情况下,我们研究了两个任务。一个是比较 GloVe 和其他 word2vec 方法(即 CBOW 和 Skip-Gram)输出的向量,然后将向量用于受惩罚的 MNGM。另一项任务是概念之间的链接预测。在这两项任务中,Matrix-GloVe 都取得了较好的成绩。在第二种情况下,Matrix-GloVe 被应用于下游方法,即 GCN。在 BBC 和 BBCSport 数据集的节点分类任务中,使用矩阵-GloVe 的 GCN 和使用矩阵-GloVe 加 Deepwalk 的 GCN 均优于使用 k-NN 的 GCN。
{"title":"Learning conditional dependence graph for concepts via matrix normal graphical model","authors":"Jizheng Lai, Jianxin Yin","doi":"10.4310/23-sii784","DOIUrl":"https://doi.org/10.4310/23-sii784","url":null,"abstract":"Conditional dependence relationships for random vectors are extensively studied and broadly applied. But it is not very clear how to construct the dependence graph for unstructured data like concept words or phrases in text corpus, where the variables(concepts) are not jointly observed with i.i.d. assumption. Using the global embedding methods like GloVe, we get the ‘structured’ representation vectors for concepts. Then we assume that all the concept vectors jointly follow a matrix normal distribution with sparse precision matrices. With the observation of the word-word co-occurrence matrix and the GloVe construction procedure, we can test this assumption empirically. The asymptotic distribution for the test statistics is derived. Another advantage of this matrix-normal distributional assumption is that the linearly additive property in word analogy tasks is natural and straightforward. Different from knowledge graph methods, the conditional dependence graph describes the conditional dependence structure between concepts given all other concepts, which means that the concepts(nodes) linked by edges cannot be separated by other concepts. It represents an essential semantic relationship. There is no need to enumerate all related pairs as head and tail elements of a triplet in knowledge graph regime. And the relation type in this graph is solely the conditional dependence between concepts. A penalized matrix normal graphical model (MNGM) is then employed to learn the conditional dependence graph for both the concepts and the embedding ‘dimensions’. Since the concept words are nodes in our graph with huge dimensions, we employ the MDMC optimization method to speed up the glasso algorithm. Also, the algorithm is adaptive to incremental accumulation of new concepts in text corpus. On the other hand, we propose a sentence granularity bootstrap to get ‘independent’ repeats of samples to enhance the penalized MNGM algorithm.We name the proposed method as Matrix-GloVe. In simulation studies, we check that the graph learned by Matrix-GloVe is more suitable for Graph Convolutional Networks(GCN) than a correlation graph, i.e. a graph determined from the k-NN method. We employ the proposed method in two scenarios from real data. The first scenario is concept graph learning for concepts in textbook corpus. Under this scenario, two tasks are studied. One is comparing the vectors output by GloVe and other word2vec methods, i.e. CBOW and Skip-Gram, then the vectors are used by penalized MNGM. Another task is link prediction among the concepts. On both tasks, Matrix-GloVe achieves better. In the second scenario, Matrix-GloVe is applied to a downstream method i.e. GCN. For node classification tasks on the BBC and BBCSport datasets, both GCN with Matrix- GloVe and GCN with Matrix-GloVe plus Deepwalk outperform GCN with k-NN.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"281 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based statistical depth for matrix data 基于模型的矩阵数据统计深度
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii829
Yue Mu, Guanyu Hu, Wei Wu
The field of matrix data learning has witnessed significant advancements in recent years, encompassing diverse datasets such as medical images, social networks, and personalized recommendation systems. These advancements have found widespread application in various domains, including medicine, biology, public health, engineering, finance, economics, sports analytics, and environmental sciences. While extensive research has been conducted on estimation, inference, prediction, and computation for matrix data, the ranking problem has not received adequate attention. Statistical depth, a measure providing a centeroutward rank for different data types, has been introduced in the past few decades. However, its exploration has been limited due to the complexity of the second and higher orderstatistics. In this paper, we propose an approach to rank matrix data by employing a model-based depth framework. Our methodology involves estimating the eigen-decomposition of a 4th-order covariance tensor. To enable this process using conventional matrix operations, we specify the tensor product operator between matrices and 4th-order tensors. Furthermore, we introduce a Kronecker product form on the covariance to enhance the robustness and efficiency of the estimation process, effectively reducing the number of parameters in the model. Based on this new framework, we develop an efficient algorithm to estimate the model-based statistical depth. To validate the effectiveness of our proposed method, we conduct simulations and apply it to two real-world applications: field goal attempts of NBA players and global temperature anomalies.
近年来,矩阵数据学习领域取得了重大进展,涵盖了医疗图像、社交网络和个性化推荐系统等各种数据集。这些进步在医学、生物学、公共卫生、工程学、金融学、经济学、体育分析和环境科学等各个领域得到了广泛应用。虽然对矩阵数据的估计、推理、预测和计算进行了广泛的研究,但排序问题却没有得到足够的重视。统计深度是一种为不同数据类型提供向心排序的测量方法,在过去几十年中已经被引入。然而,由于二阶和高阶统计的复杂性,对它的探索一直受到限制。在本文中,我们提出了一种通过采用基于模型的深度框架对矩阵数据进行排序的方法。我们的方法涉及估计四阶协方差张量的特征分解。为了使用传统的矩阵运算实现这一过程,我们指定了矩阵和四阶张量之间的张量乘积算子。此外,我们还引入了协方差的 Kronecker 积形式,以提高估计过程的稳健性和效率,从而有效减少模型中的参数数量。基于这一新框架,我们开发了一种高效算法来估计基于模型的统计深度。为了验证我们提出的方法的有效性,我们进行了模拟,并将其应用于两个现实世界的应用中:NBA 球员的射门尝试和全球温度异常。
{"title":"Model-based statistical depth for matrix data","authors":"Yue Mu, Guanyu Hu, Wei Wu","doi":"10.4310/23-sii829","DOIUrl":"https://doi.org/10.4310/23-sii829","url":null,"abstract":"The field of matrix data learning has witnessed significant advancements in recent years, encompassing diverse datasets such as medical images, social networks, and personalized recommendation systems. These advancements have found widespread application in various domains, including medicine, biology, public health, engineering, finance, economics, sports analytics, and environmental sciences. While extensive research has been conducted on estimation, inference, prediction, and computation for matrix data, the ranking problem has not received adequate attention. Statistical depth, a measure providing a centeroutward rank for different data types, has been introduced in the past few decades. However, its exploration has been limited due to the complexity of the second and higher orderstatistics. In this paper, we propose an approach to rank matrix data by employing a model-based depth framework. Our methodology involves estimating the eigen-decomposition of a 4th-order covariance tensor. To enable this process using conventional matrix operations, we specify the tensor product operator between matrices and 4th-order tensors. Furthermore, we introduce a Kronecker product form on the covariance to enhance the robustness and efficiency of the estimation process, effectively reducing the number of parameters in the model. Based on this new framework, we develop an efficient algorithm to estimate the model-based statistical depth. To validate the effectiveness of our proposed method, we conduct simulations and apply it to two real-world applications: field goal attempts of NBA players and global temperature anomalies.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"281 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139658971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank-R matrix autoregressive models for modeling spatio-temporal data 用于时空数据建模的 Rank-R 矩阵自回归模型
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii812
Nan-Jung Hsu, Hsin-Cheng Huang, Ruey S. Tsay, Tzu-Chieh Kao
We develop a matrix-variate autoregressive (MAR) model to analyze spatio-temporal data organized on a regular grid in space. The model is an extension of the bilinear MAR spatial model of Hsu, Huang and Tsay $href{ https://doi.org/10.1080/10618600.2021.1938587 }{[10]}$ by increasing its flexibility and applicability in empirical applications. Specifically, we propose to model each autoregressive (AR) coefficient matrix of the MAR model by $R$ bilinear terms, thereby establishing a rank‑R model. The extension can be interpreted as decomposing the AR dynamics of the data into $R$ bilinear MAR components. We further incorporate a banded neighborhood structure for AR coefficient matrices and utilize a flexible nonstationary low-rank covariance model for the spatial innovation process, leading to a parsimonious model without sacrificing its flexibility. We estimate all parameters of the model by the maximum likelihood method and develop a computationally efficient alternating direction method of multipliers algorithm, involving only closed-form expressions in all steps. Applications to a wind-speed dataset and an employment dataset, as well as two simulation experiments, demonstrate the effectiveness of the proposed method in estimation, model selection, and prediction.
我们建立了一个矩阵变量自回归(MAR)模型,用于分析在空间规则网格上组织的时空数据。该模型是对 Hsu、Huang 和 Tsay $href{ https://doi.org/10.1080/10618600.2021.1938587 }{[10]}$ 的双线性 MAR 空间模型的扩展,提高了其灵活性和在实证应用中的适用性。具体来说,我们建议用 $R$ 双线性项来模拟 MAR 模型的每个自回归(AR)系数矩阵,从而建立一个秩 R 模型。这种扩展可以解释为将数据的 AR 动态分解为 $R$ 双线性 MAR 组件。我们进一步为 AR 系数矩阵加入了带状邻域结构,并为空间创新过程使用了灵活的非平稳低阶协方差模型,从而在不牺牲灵活性的前提下建立了一个简洁的模型。我们用最大似然法估计了模型的所有参数,并开发了一种计算高效的交替方向乘法算法,所有步骤都只涉及闭式表达式。风速数据集和就业数据集的应用以及两个模拟实验证明了所提方法在估计、模型选择和预测方面的有效性。
{"title":"Rank-R matrix autoregressive models for modeling spatio-temporal data","authors":"Nan-Jung Hsu, Hsin-Cheng Huang, Ruey S. Tsay, Tzu-Chieh Kao","doi":"10.4310/23-sii812","DOIUrl":"https://doi.org/10.4310/23-sii812","url":null,"abstract":"We develop a matrix-variate autoregressive (MAR) model to analyze spatio-temporal data organized on a regular grid in space. The model is an extension of the bilinear MAR spatial model of Hsu, Huang and Tsay $href{ https://doi.org/10.1080/10618600.2021.1938587 }{[10]}$ by increasing its flexibility and applicability in empirical applications. Specifically, we propose to model each autoregressive (AR) coefficient matrix of the MAR model by $R$ bilinear terms, thereby establishing a rank‑R model. The extension can be interpreted as decomposing the AR dynamics of the data into $R$ bilinear MAR components. We further incorporate a banded neighborhood structure for AR coefficient matrices and utilize a flexible nonstationary low-rank covariance model for the spatial innovation process, leading to a parsimonious model without sacrificing its flexibility. We estimate all parameters of the model by the maximum likelihood method and develop a computationally efficient alternating direction method of multipliers algorithm, involving only closed-form expressions in all steps. Applications to a wind-speed dataset and an employment dataset, as well as two simulation experiments, demonstrate the effectiveness of the proposed method in estimation, model selection, and prediction.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"36 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imaging mediation analysis for longitudinal outcomes: a case study of childhood brain tumor survivorship. 纵向结果的影像中介分析:儿童脑肿瘤存活个案研究。
IF 0.7 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-01 Epub Date: 2024-07-19 DOI: 10.4310/23-sii815
Yimei Li, Jade Xiaoqing Wang, Grace Chen Zhou, Heather M Conklin, Arzu Onar-Thomas, Amar Gajjar, Wilburn E Reddick, Cai Li

Aggressive cancer treatments that affect the central nervous system are associated with an increased risk of cognitive deficits. As treatment for pediatric brain tumors has become more effective, there has been a heightened focus on improving cognitive outcomes, which can significantly affect the quality of life for pediatric cancer survivors. This paper is motivated by and applied to a clinical trial for medulloblastoma, the most common malignant brain tumor in children. The trial collects comprehensive data including treatment-related clinical information, neuroimaging, and longitudinal neurocognitive outcomes to enhance our understanding of the responses to treatment and the enduring impacts of radiation therapy on the survivors of medulloblastoma. To this end, we have developed a new mediation model tailored for longitudinal outcomes with high-dimensional imaging mediators. Specifically, we adopt a joint binary Ising-Gaussian Markov random field prior distribution to account for spatial dependency and smoothness of ultra-high-dimensional neuroimaging mediators for enhancing detection power of informative voxels. By exploiting the proposed approach, we identify causal pathways and the corresponding white matter microstructures mediating the negative impact of irradiation on neurodevelopment. The results provide guidance on sparing the brain regions and improving long-term neurodevelopment for pediatric cancer survivors. Simulation studies also confirm the validity of the proposed method.

影响中枢神经系统的积极的癌症治疗与认知缺陷的风险增加有关。随着儿童脑肿瘤的治疗越来越有效,人们越来越关注改善认知结果,这可以显著影响儿童癌症幸存者的生活质量。本文的研究目的是针对儿童最常见的恶性脑肿瘤髓母细胞瘤进行临床试验。该试验收集了全面的数据,包括治疗相关的临床信息、神经影像学和纵向神经认知结果,以增强我们对治疗反应的理解以及放射治疗对髓母细胞瘤幸存者的持久影响。为此,我们开发了一种针对高维成像介质纵向结果量身定制的新中介模型。具体来说,我们采用联合二值伊辛高斯马尔可夫随机场先验分布来考虑超高维神经成像介质的空间依赖性和平滑性,以提高信息体素的检测能力。通过利用所提出的方法,我们确定了辐射对神经发育的负面影响的因果途径和相应的白质微结构。该结果为儿童癌症幸存者保留大脑区域和改善长期神经发育提供了指导。仿真研究也证实了该方法的有效性。
{"title":"Imaging mediation analysis for longitudinal outcomes: a case study of childhood brain tumor survivorship.","authors":"Yimei Li, Jade Xiaoqing Wang, Grace Chen Zhou, Heather M Conklin, Arzu Onar-Thomas, Amar Gajjar, Wilburn E Reddick, Cai Li","doi":"10.4310/23-sii815","DOIUrl":"10.4310/23-sii815","url":null,"abstract":"<p><p>Aggressive cancer treatments that affect the central nervous system are associated with an increased risk of cognitive deficits. As treatment for pediatric brain tumors has become more effective, there has been a heightened focus on improving cognitive outcomes, which can significantly affect the quality of life for pediatric cancer survivors. This paper is motivated by and applied to a clinical trial for medulloblastoma, the most common malignant brain tumor in children. The trial collects comprehensive data including treatment-related clinical information, neuroimaging, and longitudinal neurocognitive outcomes to enhance our understanding of the responses to treatment and the enduring impacts of radiation therapy on the survivors of medulloblastoma. To this end, we have developed a new mediation model tailored for longitudinal outcomes with high-dimensional imaging mediators. Specifically, we adopt a joint binary Ising-Gaussian Markov random field prior distribution to account for spatial dependency and smoothness of ultra-high-dimensional neuroimaging mediators for enhancing detection power of informative voxels. By exploiting the proposed approach, we identify causal pathways and the corresponding white matter microstructures mediating the negative impact of irradiation on neurodevelopment. The results provide guidance on sparing the brain regions and improving long-term neurodevelopment for pediatric cancer survivors. Simulation studies also confirm the validity of the proposed method.</p>","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"17 3","pages":"533-548"},"PeriodicalIF":0.7,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent class proportional hazards regression with heterogeneous survival data 潜在类别比例风险回归与异质生存数据
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/23-sii785
Teng Fei, John J. Hanfelt, Limin Peng
Heterogeneous survival data are commonly present in chronic disease studies. Delineating meaningful disease subtypes directly linked to a survival outcome can generate useful scientific implications. In this work, we develop a latent class proportional hazards (PH) regression framework to address such an interest. We propose mixture proportional hazards modeling, which flexibly accommodates class-specific covariate effects while allowing for the baseline hazard function to vary across latent classes. Adapting the strategy of nonparametric maximum likelihood estimation, we derive an Expectation-Maximization (E‑M) algorithm to estimate the proposed model. We establish the theoretical properties of the resulting estimators. Extensive simulation studies are conducted, demonstrating satisfactory finite-sample performance of the proposed method as well as the predictive benefit from accounting for the heterogeneity across latent classes. We further illustrate the practical utility of the proposed method through an application to a mild cognitive impairment (MCI) cohort in the Uniform Data Set.
异质性生存数据通常存在于慢性病研究中。描述与生存结果直接相关的有意义的疾病亚型可以产生有用的科学意义。在这项工作中,我们开发了一个潜在类别比例风险(PH)回归框架来解决这样的问题。我们提出了混合比例风险模型,该模型灵活地适应特定类别的协变量效应,同时允许基线风险函数在潜在类别之间变化。采用非参数极大似然估计策略,我们推导了期望最大化(E -M)算法来估计所提出的模型。我们建立了所得估计量的理论性质。进行了广泛的模拟研究,证明了所提出的方法具有令人满意的有限样本性能,以及从考虑潜在类别的异质性中获得的预测效益。我们通过在统一数据集中的轻度认知障碍(MCI)队列应用进一步说明了所提出方法的实际效用。
{"title":"Latent class proportional hazards regression with heterogeneous survival data","authors":"Teng Fei, John J. Hanfelt, Limin Peng","doi":"10.4310/23-sii785","DOIUrl":"https://doi.org/10.4310/23-sii785","url":null,"abstract":"Heterogeneous survival data are commonly present in chronic disease studies. Delineating meaningful disease subtypes directly linked to a survival outcome can generate useful scientific implications. In this work, we develop a latent class proportional hazards (PH) regression framework to address such an interest. We propose mixture proportional hazards modeling, which flexibly accommodates class-specific covariate effects while allowing for the baseline hazard function to vary across latent classes. Adapting the strategy of nonparametric maximum likelihood estimation, we derive an Expectation-Maximization (E‑M) algorithm to estimate the proposed model. We establish the theoretical properties of the resulting estimators. Extensive simulation studies are conducted, demonstrating satisfactory finite-sample performance of the proposed method as well as the predictive benefit from accounting for the heterogeneity across latent classes. We further illustrate the practical utility of the proposed method through an application to a mild cognitive impairment (MCI) cohort in the Uniform Data Set.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"36 3","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequentist Bayesian compound inference 频率贝叶斯复合推理
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/23-sii797
Jinfeng Xu, Ao Yuan
In practice often either the Bayesian or frequentist method is used, although there are some combined uses of the two methods, a formal unified methodology of the two hasn’t been seen. Here we first give a brief review of the two methods and some combination of the two, then propose a procedure using both the frequentist likelihood and the Bayesian posterior loss in parameter estimation and hypothesis testing, as an attempt to unify the two methods. Basic properties of the proposed method are studied, and simulation studies are carried out to evaluate the performance of the method.
在实践中,通常使用贝叶斯方法或频率方法,尽管这两种方法有一些组合使用,但尚未看到两者的正式统一方法。在这里,我们首先简要回顾了这两种方法以及两者的一些组合,然后提出了一种在参数估计和假设检验中同时使用频率似然和贝叶斯后验损失的方法,试图统一这两种方法。研究了该方法的基本特性,并进行了仿真研究以评价该方法的性能。
{"title":"Frequentist Bayesian compound inference","authors":"Jinfeng Xu, Ao Yuan","doi":"10.4310/23-sii797","DOIUrl":"https://doi.org/10.4310/23-sii797","url":null,"abstract":"In practice often either the Bayesian or frequentist method is used, although there are some combined uses of the two methods, a formal unified methodology of the two hasn’t been seen. Here we first give a brief review of the two methods and some combination of the two, then propose a procedure using both the frequentist likelihood and the Bayesian posterior loss in parameter estimation and hypothesis testing, as an attempt to unify the two methods. Basic properties of the proposed method are studied, and simulation studies are carried out to evaluate the performance of the method.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"8 3","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guiding light: An essay for Professor Lincheng Zhao on the occasion of his 80th birthday 指路明灯:赵林成教授八十大寿随笔
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/22-sii772
Zhidong Bai
Lincheng Zhao was admitted to the Department of Applied Mathematics of the University of Science and Technology of China (USTC) in 1960, three years before me, and then took a year off due to illness and transferred to the entering class of 1961. We were both not good at socializing, so although we had been classmates for three years, we didn’t know each other. In 1978, when we were both admitted to the Department of Mathematics for graduate studies, we got to know each other. Since then, we have known each other, made friends, and helped each other in all aspects of research and life, and we have become good mentors and friends with each other. On the occasion of Professor Zhao’s 80th birthday, I would like to recall a little of the past events of our acquaintance and friendship to express my gratitude to Academic Elder Brother Zhao.
赵林成于1960年考入中国科学技术大学应用数学系,比我早三年,后因病休学一年,转入1961年的新生班。我们都不擅长社交,所以虽然我们是三年的同班同学,但我们并不了解对方。1978年,当我们都被数学系研究生录取时,我们彼此认识了。从那时起,我们在研究和生活的各个方面相互认识,结交朋友,互相帮助,成为彼此的良师益友。在赵教授八十大寿之际,我想回顾一下我们的相识和友谊,以表达我对赵学术师兄的谢意。
{"title":"Guiding light: An essay for Professor Lincheng Zhao on the occasion of his 80th birthday","authors":"Zhidong Bai","doi":"10.4310/22-sii772","DOIUrl":"https://doi.org/10.4310/22-sii772","url":null,"abstract":"Lincheng Zhao was admitted to the Department of Applied Mathematics of the University of Science and Technology of China (USTC) in 1960, three years before me, and then took a year off due to illness and transferred to the entering class of 1961. We were both not good at socializing, so although we had been classmates for three years, we didn’t know each other. In 1978, when we were both admitted to the Department of Mathematics for graduate studies, we got to know each other. Since then, we have known each other, made friends, and helped each other in all aspects of research and life, and we have become good mentors and friends with each other. On the occasion of Professor Zhao’s 80th birthday, I would like to recall a little of the past events of our acquaintance and friendship to express my gratitude to Academic Elder Brother Zhao.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"33 3-4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of nonparametric regression methods for longitudinal data 纵向数据的非参数回归方法综述
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/23-sii801
Changxin Yang, Zhongyi Zhu
Longitudinal data, which involve measuring a group of subjects repeatedly over time, frequently arise in many clinical and biomedical applications. To identify the complex patterns of change in the outcome and their association with covariates over time, a sufficiently flexible model is always required. Nonparametric regression, known for being data-adaptive and less restrictive than parametric approaches, becomes a promising tool for handling longitudinal data. This paper reviews various nonparametric regression methods for longitudinal data, including specific traditional nonparametric methods for the univariate case and several representative methods for the multivariate case, among which tree-based techniques are dominant. We summarize their motivations and provide a brief practical performance comparison of these methods in simulations, as well as discuss potential future research directions.
纵向数据涉及一组受试者在一段时间内反复测量,在许多临床和生物医学应用中经常出现。为了识别结果变化的复杂模式及其随时间变化与协变量的关联,总是需要一个足够灵活的模型。非参数回归以数据自适应和比参数方法限制更少而闻名,成为处理纵向数据的有前途的工具。本文综述了纵向数据的各种非参数回归方法,包括针对单变量情况的特定传统非参数回归方法和针对多变量情况的几种有代表性的方法,其中以基于树的方法为主导。我们总结了这些方法的动机,并在仿真中简要比较了这些方法的实际性能,并讨论了未来可能的研究方向。
{"title":"A review of nonparametric regression methods for longitudinal data","authors":"Changxin Yang, Zhongyi Zhu","doi":"10.4310/23-sii801","DOIUrl":"https://doi.org/10.4310/23-sii801","url":null,"abstract":"Longitudinal data, which involve measuring a group of subjects repeatedly over time, frequently arise in many clinical and biomedical applications. To identify the complex patterns of change in the outcome and their association with covariates over time, a sufficiently flexible model is always required. Nonparametric regression, known for being data-adaptive and less restrictive than parametric approaches, becomes a promising tool for handling longitudinal data. This paper reviews various nonparametric regression methods for longitudinal data, including specific traditional nonparametric methods for the univariate case and several representative methods for the multivariate case, among which tree-based techniques are dominant. We summarize their motivations and provide a brief practical performance comparison of these methods in simulations, as well as discuss potential future research directions.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"2 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Copy number variation detection based on constraint least squares 基于约束最小二乘的拷贝数变异检测
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/23-sii814
Xiaopu Wang, Xueqin Wang, Aijun Zhang, Canhong Wen
Copy number variations (CNVs) are a form of structural variation of a DNA sequence, including amplification and deletion of a particular DNA segment on chromosomes. Due to the huge amount of data in every DNA sequence, there is a great need for a computationally fast algorithm that accurately identifies CNVs. In this paper, we formulate the detection of CNVs as a constraint least squares problem and show that circular binary segmentation is a greedy approach to solving this problem. To solve this problem with high accuracy and efficiency, we first derived a necessary optimality condition for its solution based on the alternating minimization technique and then developed a computationally efficient algorithm named AMIAS. The performance of our method was tested on both simulated data and two realworld applications using genomic data from diagnosed primal glioblastoma and the HapMap project. Our proposed method has competitive performance in identifying CNVs with high-throughput genotypic data.
拷贝数变异(CNVs)是DNA序列结构变异的一种形式,包括染色体上特定DNA片段的扩增和删除。由于每个DNA序列的数据量非常大,因此非常需要一种计算速度快的算法来准确识别CNVs。本文将CNVs的检测表述为约束最小二乘问题,并证明了圆二值分割是解决这一问题的贪婪方法。为了高精度、高效率地求解这一问题,我们首先基于交替极小化技术推导了其解的必要最优性条件,然后开发了计算效率高的AMIAS算法。我们的方法在模拟数据和两个现实应用中进行了性能测试,这些应用使用了来自诊断的原始胶质母细胞瘤和HapMap项目的基因组数据。我们提出的方法在利用高通量基因型数据识别CNVs方面具有竞争力。
{"title":"Copy number variation detection based on constraint least squares","authors":"Xiaopu Wang, Xueqin Wang, Aijun Zhang, Canhong Wen","doi":"10.4310/23-sii814","DOIUrl":"https://doi.org/10.4310/23-sii814","url":null,"abstract":"Copy number variations (CNVs) are a form of structural variation of a DNA sequence, including amplification and deletion of a particular DNA segment on chromosomes. Due to the huge amount of data in every DNA sequence, there is a great need for a computationally fast algorithm that accurately identifies CNVs. In this paper, we formulate the detection of CNVs as a constraint least squares problem and show that circular binary segmentation is a greedy approach to solving this problem. To solve this problem with high accuracy and efficiency, we first derived a necessary optimality condition for its solution based on the alternating minimization technique and then developed a computationally efficient algorithm named AMIAS. The performance of our method was tested on both simulated data and two realworld applications using genomic data from diagnosed primal glioblastoma and the HapMap project. Our proposed method has competitive performance in identifying CNVs with high-throughput genotypic data.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"2 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aligning sample size calculations with estimands in clinical trials with time-to-event outcomes 将样本量计算与临床试验的估计与事件发生时间结果相一致
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-27 DOI: 10.4310/23-sii804
Yixin Fang, Man Jin, Chengqing Wu
The ICH E9(R1) guidance recommended a framework to align planning, design, conduct, analysis, and interpretation of any clincial trial with its objective and estimand. How to handle intercurrent events (ICEs) is one of the five attributes of an estimand and sample size calculation is a key step in the trial planning and design. Therefore, sample size calculation should be aligned with the estimand and, in particular, with how the ICEs are handled. ICH E9(R1) summarized five strategies for handling ICEs, and five approaches have been proposed in the literature for sample size calculation when planning trials with quantitative and binary outcomes. In this paper, we discuss how to apply the five strategies to deal with ICEs in clinical trials with time-to-event outcomes and propose five approaches for sample size calculation that are aligned with the five strategies, respectively.
ICH E9(R1)指南建议建立一个框架,使任何临床试验的计划、设计、实施、分析和解释与其目标和评价保持一致。如何处理并发事件(ICEs)是估计的五大属性之一,而样本容量的计算是试验计划和设计的关键步骤。因此,样本量的计算应该与估计保持一致,特别是与如何处理ICEs保持一致。ICH E9(R1)总结了处理ICEs的五种策略,并在计划具有定量和二元结果的试验时提出了五种计算样本量的方法。在本文中,我们讨论了如何应用这五种策略来处理具有事件时间结局的临床试验中的ICEs,并分别提出了与这五种策略相一致的五种样本量计算方法。
{"title":"Aligning sample size calculations with estimands in clinical trials with time-to-event outcomes","authors":"Yixin Fang, Man Jin, Chengqing Wu","doi":"10.4310/23-sii804","DOIUrl":"https://doi.org/10.4310/23-sii804","url":null,"abstract":"The ICH E9(R1) guidance recommended a framework to align planning, design, conduct, analysis, and interpretation of any clincial trial with its objective and estimand. How to handle intercurrent events (ICEs) is one of the five attributes of an estimand and sample size calculation is a key step in the trial planning and design. Therefore, sample size calculation should be aligned with the estimand and, in particular, with how the ICEs are handled. ICH E9(R1) summarized five strategies for handling ICEs, and five approaches have been proposed in the literature for sample size calculation when planning trials with quantitative and binary outcomes. In this paper, we discuss how to apply the five strategies to deal with ICEs in clinical trials with time-to-event outcomes and propose five approaches for sample size calculation that are aligned with the five strategies, respectively.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"24 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Its Interface
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1