首页 > 最新文献

Statistica Sinica最新文献

英文 中文
PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA. 对数据缺失的多种类型特征进行惩罚回归。
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-04-01 DOI: 10.5705/ss.202020.0401
Kin Yau Wong, Donglin Zeng, D Y Lin

Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.

最近的技术进步使得在生物医学研究中测量多种类型的许多特征成为可能。然而,由于成本或其他限制,有些数据类型或特征可能无法对所有研究对象进行测量。我们使用潜变量模型来描述数据类型间和数据类型内的关系,并从观察到的数据中推断缺失值。我们开发了一种用于变量选择和参数估计的惩罚似然法,并设计了一种高效的期望最大化算法来实现我们的方法。当特征数量以样本量的多项式速率增加时,我们建立了所提出的估计器的渐近特性。最后,我们通过大量的模拟研究证明了所提方法的实用性,并将其应用于一项激励性的多平台基因组学研究。
{"title":"PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.5705/ss.202020.0401","DOIUrl":"10.5705/ss.202020.0401","url":null,"abstract":"<p><p>Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187615/pdf/nihms-1764514.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9482840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data. 利用区间删失竞争风险数据对一类部分线性变换模型进行筛式估计。
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0051
Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis

In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as n → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.

在本文中,我们考虑了一类具有区间删失竞争风险数据的部分线性变换模型。在特定病因累积发病率函数的半参数广义几率规范下,我们通过最大化 B-样条曲线和伯恩斯坦多项式联合跨筛空间的似然函数,获得了大量参数和非参数模型成分的最优估计值。我们的规范考虑了相对简单的有限维参数空间,近似于 n → ∞ 的无限维参数空间,从而使我们能够研究所有参数的几乎确定的一致性和收敛率,以及有限维成分的渐近分布和效率。我们通过各种情况下的模拟研究,研究了我们方法的有限样本性能。此外,我们还将我们的方法应用于撒哈拉以南非洲地区的 HIV 感染者数据集,以说明我们的方法。
{"title":"Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data.","authors":"Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis","doi":"10.5705/ss.202021.0051","DOIUrl":"10.5705/ss.202021.0051","url":null,"abstract":"<p><p>In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as <i>n</i> → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10208244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9526092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. 整合多来源高维数据与癌症研究应用的异质性分析。
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0002
Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma

This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.

本研究的动机是癌症研究,异质性分析在癌症研究中起着重要作用,大致可分为无监督和有监督两种。在监督异质性分析中,有限混合回归(FMR)技术被广泛使用,在该技术下,协变量对亚组反应的影响是不同的。高维分子和最近的组织病理学影像学特征被分别分析,并被证明是有效的异质性分析。为了更简单的分析,它们已被证明包含重叠但也独立的信息。在本文中,我们的目标是通过整合高维分子和组织病理成像特征,进行第一次和更有效的基于fmr的癌症异质性分析。一种惩罚方法被开发用于正则化估计,选择相关变量,同样重要的是,促进独立信息的识别。一致性属性是严格建立的。提出了一种有效的计算算法。对癌症基因组图谱(TCGA)肺癌数据的模拟和分析证明了该方法的实际有效性。总之,本研究为监督癌症异质性分析提供了一种实用的新方法。
{"title":"HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.","authors":"Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma","doi":"10.5705/ss.202021.0002","DOIUrl":"10.5705/ss.202021.0002","url":null,"abstract":"<p><p>This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs 多目标最优回归设计的充分必要条件
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-03-08 DOI: 10.5705/ss.202022.0328
Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou
We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.
我们通常基于单一目标函数构建最优设计。为了更好地捕捉实验目标的广度,我们可以基于多个目标函数构建一个多目标优化设计。虽然已经开发了算法来寻找多目标优化设计(例如效率约束和最大优化设计),但如何验证从算法中获得的解的最优性远不太清楚。本文给出了离散设计空间上效率约束和最大优化设计的最优性的理论结果。我们演示了如何将我们的结果与线性规划算法结合使用来验证最优性。
{"title":"Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs","authors":"Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou","doi":"10.5705/ss.202022.0328","DOIUrl":"https://doi.org/10.5705/ss.202022.0328","url":null,"abstract":"We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS. 异质亚群的高维因子回归。
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0145
Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

在现代科学研究中,由于复杂数据的丰富性,通常会观察到数据的异质性。我们为具有异质亚群的数据提出了一个因子回归模型。所提出的模型可以表示为异构项和齐次项的分解。异质性术语是由不同亚群中的潜在因素驱动的。齐次项捕捉了协变量的共同变化,并在子群体中共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间取得了良好的平衡。全局模型忽略了数据的异质性,而特定于组的模型分别适用于每个子组。我们证明了我们提出的估计量的估计和预测的一致性,并表明它比特定群体和全局模型具有更好的收敛速度。我们证明了估计潜在因素的额外成本是渐近可忽略的,并且极小极大率仍然是可实现的。我们通过研究在错误指定的特定群体模型下的预测误差,进一步证明了我们提出的方法的稳健性。最后,我们进行了模拟研究,并分析了阿尔茨海默病神经成像倡议的数据集和汇总的微阵列数据集,以进一步证明我们提出的因子回归模型的竞争力和可解释性。
{"title":"HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.","authors":"Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu","doi":"10.5705/ss.202020.0145","DOIUrl":"10.5705/ss.202020.0145","url":null,"abstract":"<p><p>In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583735/pdf/nihms-1892524.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical Likelihood Using External Summary Information 使用外部摘要信息的经验似然
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202023.0056
Lyu Ni, Junchao Shao, Jinyi Wang, Lei Wang
: Statistical analysis in modern scientific research nowadays has opportunities to utilize external summary information from similar studies to gain efficiency. However, the population generating data for current study, referred to as internal population, is typically different from the external population for summary information, although they share some common characteristics that make efficiency improvement possible. The existing population heterogeneity is a challenging issue especially when we have only summary statistics but not individual-level external data. In this paper, we apply an empirical likelihood approach to estimating internal population distribution, with external summary information utilized as constraints for efficiency gain under population heterogeneity. We show that our approach produces an asymptotically more efficient estimator of internal population distribution compared with the customary empirical likelihood without using any external information, under the condition that the external information is based on a dataset with size larger than that
现代科学研究中的统计分析有机会利用来自类似研究的外部总结信息来提高效率。然而,为当前研究提供数据的人口,称为内部人口,通常不同于获取摘要信息的外部人口,尽管它们有一些共同的特征,可以提高效率。现有的人口异质性是一个具有挑战性的问题,特别是当我们只有汇总统计而不是个人层面的外部数据时。本文采用经验似然方法估计内部种群分布,并利用外部汇总信息作为种群异质性下效率增益的约束条件。我们表明,在外部信息基于大于该数据集的条件下,我们的方法与传统的经验似然方法相比,在不使用任何外部信息的情况下,产生了一个渐进的更有效的内部人口分布估计。电子邮件:lwangstat@nankai.edu.cn。中国统计:新录用论文(接受作者版本,需英文编辑)
{"title":"Empirical Likelihood Using External Summary Information","authors":"Lyu Ni, Junchao Shao, Jinyi Wang, Lei Wang","doi":"10.5705/ss.202023.0056","DOIUrl":"https://doi.org/10.5705/ss.202023.0056","url":null,"abstract":": Statistical analysis in modern scientific research nowadays has opportunities to utilize external summary information from similar studies to gain efficiency. However, the population generating data for current study, referred to as internal population, is typically different from the external population for summary information, although they share some common characteristics that make efficiency improvement possible. The existing population heterogeneity is a challenging issue especially when we have only summary statistics but not individual-level external data. In this paper, we apply an empirical likelihood approach to estimating internal population distribution, with external summary information utilized as constraints for efficiency gain under population heterogeneity. We show that our approach produces an asymptotically more efficient estimator of internal population distribution compared with the customary empirical likelihood without using any external information, under the condition that the external information is based on a dataset with size larger than that","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70939888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data 主题级单细胞表达数据的非参数贝叶斯两级聚类
3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0337
Qiuyu Wu, Xiangyu Luo
The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.
单细胞测序的出现为个性化治疗开辟了新的途径。在本文中,我们解决了同时发现主题子组(主题级)和细胞类型检测(细胞级)的两级聚类问题,该问题适用于来自多个主题的单细胞表达数据。然而,目前的统计方法要么是不考虑受试者异质性的集群细胞,要么是不使用单细胞信息的分组受试者。为了弥合细胞聚类和主体分组之间的差距,我们开发了一种非参数贝叶斯模型,即单细胞表达数据(SCSC)模型的主体和细胞聚类,以同时实现主体和细胞分组。SCSC不需要预先指定主题子组号或细胞类型号。它自动诱导主题子组结构,并在主题之间匹配细胞类型。此外,它通过刻意考虑数据的丢失、库大小和过度分散,直接对单单元原始计数数据进行建模。提出了一种闭塞的Gibbs采样器用于后验推理。模拟研究和多主体iPSC scRNA-seq数据集的应用验证了SCSC同时聚类主体和细胞的能力。
{"title":"Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data","authors":"Qiuyu Wu, Xiangyu Luo","doi":"10.5705/ss.202020.0337","DOIUrl":"https://doi.org/10.5705/ss.202020.0337","url":null,"abstract":"The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse Sliced Inverse Regression via Cholesky Matrix Penalization 基于Cholesky矩阵惩罚的稀疏切片逆回归
3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0406
Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh
We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.
我们引入了一种新的稀疏切片逆回归估计器,称为Cholesky矩阵惩罚及其自适应版本,用于在估计中心子空间的维度时实现稀疏性。新的估计器使用协变量协方差矩阵的Cholesky分解,并在目标函数中包含正则化项,以计算效率高的方式实现稀疏性。建立了实现中心子空间估计和变量选择一致性的调谐参数的理论值。此外,我们提出了一种新的投影信息准则来选择我们所提出的估计器的调优参数,并证明了新准则有助于选择的一致性。Cholesky矩阵惩罚估计器继承了矩阵Lasso和Lasso切片逆回归估计器的强度;它在数值研究中具有优越的性能,并可适用于文献中其他的充分维数方法。
{"title":"Sparse Sliced Inverse Regression via Cholesky Matrix Penalization","authors":"Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh","doi":"10.5705/ss.202020.0406","DOIUrl":"https://doi.org/10.5705/ss.202020.0406","url":null,"abstract":"We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification Prasad-Rao的鲁棒性:潜在模型错配下观测最佳预测器预测误差的均方估计
3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0325
Xiaohui Liu, Haiqiang Ma, Jiming Jiang
{"title":"That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification","authors":"Xiaohui Liu, Haiqiang Ma, Jiming Jiang","doi":"10.5705/ss.202020.0325","DOIUrl":"https://doi.org/10.5705/ss.202020.0325","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical Inference for Functional Time Series 函数时间序列的统计推断
3区 数学 Q2 Mathematics Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0107
Jie Li, Lijian Yang
{"title":"Statistical Inference for Functional Time Series","authors":"Jie Li, Lijian Yang","doi":"10.5705/ss.202021.0107","DOIUrl":"https://doi.org/10.5705/ss.202021.0107","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135182931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Statistica Sinica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1