Statistica Sinica最新文献

英文中文

Statistical Inference for Functional Time Series 函数时间序列的统计推断

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0107

Jie Li, Lijian Yang

引用次数: 1

On Construction of Nonregular Two-Level Factorial Designs With Maximum Generalized Resolutions 具有最大广义分辨率的非正则二水平阶乘设计的构造

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0024

Chenlu Shi, Boxin Tang

引用次数: 1

Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data 主题级单细胞表达数据的非参数贝叶斯两级聚类

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0337

Qiuyu Wu, Xiangyu Luo

The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.

单细胞测序的出现为个性化治疗开辟了新的途径。在本文中，我们解决了同时发现主题子组(主题级)和细胞类型检测(细胞级)的两级聚类问题，该问题适用于来自多个主题的单细胞表达数据。然而，目前的统计方法要么是不考虑受试者异质性的集群细胞，要么是不使用单细胞信息的分组受试者。为了弥合细胞聚类和主体分组之间的差距，我们开发了一种非参数贝叶斯模型，即单细胞表达数据(SCSC)模型的主体和细胞聚类，以同时实现主体和细胞分组。SCSC不需要预先指定主题子组号或细胞类型号。它自动诱导主题子组结构，并在主题之间匹配细胞类型。此外，它通过刻意考虑数据的丢失、库大小和过度分散，直接对单单元原始计数数据进行建模。提出了一种闭塞的Gibbs采样器用于后验推理。模拟研究和多主体iPSC scRNA-seq数据集的应用验证了SCSC同时聚类主体和细胞的能力。

{"title":"Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data","authors":"Qiuyu Wu, Xiangyu Luo","doi":"10.5705/ss.202020.0337","DOIUrl":"https://doi.org/10.5705/ss.202020.0337","url":null,"abstract":"The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS. 异质亚群的高维因子回归。

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0145

Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

在现代科学研究中，由于复杂数据的丰富性，通常会观察到数据的异质性。我们为具有异质亚群的数据提出了一个因子回归模型。所提出的模型可以表示为异构项和齐次项的分解。异质性术语是由不同亚群中的潜在因素驱动的。齐次项捕捉了协变量的共同变化，并在子群体中共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间取得了良好的平衡。全局模型忽略了数据的异质性，而特定于组的模型分别适用于每个子组。我们证明了我们提出的估计量的估计和预测的一致性，并表明它比特定群体和全局模型具有更好的收敛速度。我们证明了估计潜在因素的额外成本是渐近可忽略的，并且极小极大率仍然是可实现的。我们通过研究在错误指定的特定群体模型下的预测误差，进一步证明了我们提出的方法的稳健性。最后，我们进行了模拟研究，并分析了阿尔茨海默病神经成像倡议的数据集和汇总的微阵列数据集，以进一步证明我们提出的因子回归模型的竞争力和可解释性。

{"title":"HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.","authors":"Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu","doi":"10.5705/ss.202020.0145","DOIUrl":"10.5705/ss.202020.0145","url":null,"abstract":"In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 1","pages":"27-53"},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583735/pdf/nihms-1892524.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empirical Likelihood Using External Summary Information 使用外部摘要信息的经验似然

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202023.0056

Lyu Ni, Junchao Shao, Jinyi Wang, Lei Wang

: Statistical analysis in modern scientiﬁc research nowadays has opportunities to utilize external summary information from similar studies to gain eﬃciency. However, the population generating data for current study, referred to as internal population, is typically diﬀerent from the external population for summary information, although they share some common characteristics that make eﬃciency improvement possible. The existing population heterogeneity is a challenging issue especially when we have only summary statistics but not individual-level external data. In this paper, we apply an empirical likelihood approach to estimating internal population distribution, with external summary information utilized as constraints for eﬃciency gain under population heterogeneity. We show that our approach produces an asymptotically more eﬃcient estimator of internal population distribution compared with the customary empirical likelihood without using any external information, under the condition that the external information is based on a dataset with size larger than that

现代科学研究中的统计分析有机会利用来自类似研究的外部总结信息来提高效率。然而，为当前研究提供数据的人口，称为内部人口，通常不同于获取摘要信息的外部人口，尽管它们有一些共同的特征，可以提高效率。现有的人口异质性是一个具有挑战性的问题，特别是当我们只有汇总统计而不是个人层面的外部数据时。本文采用经验似然方法估计内部种群分布，并利用外部汇总信息作为种群异质性下效率增益的约束条件。我们表明，在外部信息基于大于该数据集的条件下，我们的方法与传统的经验似然方法相比，在不使用任何外部信息的情况下，产生了一个渐进的更有效的内部人口分布估计。电子邮件:lwangstat@nankai.edu.cn。中国统计:新录用论文(接受作者版本，需英文编辑)

引用次数: 0

Interval estimation for operating characteristic of continuous biomarkers with controlled sensitivity or specificity. 对连续生物标记物的操作特征进行区间估计，并控制其灵敏度或特异性。

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0020

Yijian Huang, Isaac Parakati, Dattatraya H Patil, Martin G Sanda

The receiver operating characteristic (ROC) curve provides a comprehensive performance assessment of a continuous biomarker over the full threshold spectrum. Nevertheless, a medical test often dictates to operate at a certain high level of sensitivity or specificity. A diagnostic accuracy metric directly targeting the clinical utility is specificity at the controlled sensitivity level, or vice versa. While the empirical point estimation is readily adopted in practice, the nonparametric interval estimation is challenged by the fact that the variance involves density functions due to estimated threshold. In addition, even with a fixed threshold, many standard confidence intervals including the Wald interval for binomial proportion could have erratic behaviors. In this article, we are motivated by the superior performance of the score interval for binomial proportion and propose a novel extension for the biomarker problem. Meanwhile, we develop exact bootstrap and establish consistency of the bootstrap variance estimator. Both single-biomarker evaluation and two-biomarker comparison are investigated. Extensive simulation studies were conducted, demonstrating competitive performance of our proposals. An illustration with aggressive prostate cancer diagnosis is provided.

接收器工作特征曲线（ROC）可对连续生物标记物在整个阈值范围内的性能进行全面评估。然而，医学检验往往需要在一定的高灵敏度或特异性水平上进行操作。直接针对临床效用的诊断准确性指标是受控灵敏度水平下的特异性，反之亦然。虽然在实践中很容易采用经验点估算，但非参数区间估算却面临挑战，因为方差涉及到估算阈值的密度函数。此外，即使阈值固定，许多标准置信区间（包括二项式比例的 Wald 区间）也可能表现不稳定。在本文中，我们从二叉比例得分区间的优越性能出发，提出了一种针对生物标记问题的新扩展方法。同时，我们开发了精确自举法，并建立了自举方差估计器的一致性。我们还研究了单生物标记评价和双生物标记比较。我们进行了广泛的模拟研究，证明了我们的建议具有竞争力。我们还以侵袭性前列腺癌诊断为例进行了说明。

{"title":"Interval estimation for operating characteristic of continuous biomarkers with controlled sensitivity or specificity.","authors":"Yijian Huang, Isaac Parakati, Dattatraya H Patil, Martin G Sanda","doi":"10.5705/ss.202021.0020","DOIUrl":"10.5705/ss.202021.0020","url":null,"abstract":"The receiver operating characteristic (ROC) curve provides a comprehensive performance assessment of a continuous biomarker over the full threshold spectrum. Nevertheless, a medical test often dictates to operate at a certain high level of sensitivity or specificity. A diagnostic accuracy metric directly targeting the clinical utility is specificity at the controlled sensitivity level, or vice versa. While the empirical point estimation is readily adopted in practice, the nonparametric interval estimation is challenged by the fact that the variance involves density functions due to estimated threshold. In addition, even with a fixed threshold, many standard confidence intervals including the Wald interval for binomial proportion could have erratic behaviors. In this article, we are motivated by the superior performance of the score interval for binomial proportion and propose a novel extension for the biomarker problem. Meanwhile, we develop exact bootstrap and establish consistency of the bootstrap variance estimator. Both single-biomarker evaluation and two-biomarker comparison are investigated. Extensive simulation studies were conducted, demonstrating competitive performance of our proposals. An illustration with aggressive prostate cancer diagnosis is provided.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 1","pages":"193-214"},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9485519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Dimensional Behaviour of Some Two-Sample Tests Based on Ball Divergence 基于球散度的若干双样本试验的高维行为

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202023.0069

Bilol Banerjee, A. Ghosh

引用次数: 0

Joint Modeling of Change-Point Identification and Dependent Dynamic Community Detection 变化点识别与依赖动态群落检测联合建模

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0182

Diqing Li, Yubai Yuan, Xinsheng Zhang, Annie Qu

: The ﬁeld of dynamic network analysis has recently seen a surge of interest in community detection and evolution. However, existing methods for dynamic community detection do not consider dependencies between edges, which could lead to a loss of information when detecting community structures. In this study, we investigate the problem of identifying a change-point with abrupt changes in the community structure of a network. To do so, we propose an approximate likelihood approach for the change-point estimator and for identifying node membership that integrates marginal information and dependencies of network connectivities. We propose an expectation-maximization-type algorithm that maximizes the approximate likelihood jointly over change-point and community membership evolution. From a theoretical viewpoint, we establish estimation consistency under the regularity condition, and show that the proposed estimators achieve a higher convergence rate than those of their marginal likelihood counterparts, which do not incorporate dependencies between edges. We demonstrate the validity of the proposed method by applying it to the ADHD-200 data set to detect brain functional community changes over time.

动态网络分析领域最近出现了对社区检测和进化的兴趣激增。然而，现有的动态社区检测方法没有考虑边缘之间的依赖关系，这可能导致在检测社区结构时信息的丢失。在这项研究中，我们探讨了一个问题，即在一个网络的社区结构突变时，如何识别一个变化点。为此，我们提出了一种近似似然方法，用于变化点估计器和识别集成了边际信息和网络连接依赖性的节点隶属度。我们提出了一种期望最大化型算法，该算法在变化点和社区成员进化上共同最大化近似似然。从理论角度出发，我们建立了正则性条件下估计的一致性，并证明了所提估计比不考虑边间依赖性的边缘似然估计具有更高的收敛速度。我们通过将所提出的方法应用于ADHD-200数据集来检测大脑功能群落随时间的变化，从而证明了该方法的有效性。

{"title":"Joint Modeling of Change-Point Identification and Dependent Dynamic Community Detection","authors":"Diqing Li, Yubai Yuan, Xinsheng Zhang, Annie Qu","doi":"10.5705/ss.202021.0182","DOIUrl":"https://doi.org/10.5705/ss.202021.0182","url":null,"abstract":": The ﬁeld of dynamic network analysis has recently seen a surge of interest in community detection and evolution. However, existing methods for dynamic community detection do not consider dependencies between edges, which could lead to a loss of information when detecting community structures. In this study, we investigate the problem of identifying a change-point with abrupt changes in the community structure of a network. To do so, we propose an approximate likelihood approach for the change-point estimator and for identifying node membership that integrates marginal information and dependencies of network connectivities. We propose an expectation-maximization-type algorithm that maximizes the approximate likelihood jointly over change-point and community membership evolution. From a theoretical viewpoint, we establish estimation consistency under the regularity condition, and show that the proposed estimators achieve a higher convergence rate than those of their marginal likelihood counterparts, which do not incorporate dependencies between edges. We demonstrate the validity of the proposed method by applying it to the ADHD-200 data set to detect brain functional community changes over time.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated Estimation of Heavy-Tailed Vector Error Correction Models 重尾向量误差修正模型的自动估计

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0177

Feifei Guo, Shiqing Ling, Zichuan Mi

引用次数: 1

The Identifiability of Copula Models for Dependent Competing Risks Data With Exponentially Distributed Margins 具有指数分布边际的依赖竞争风险数据的Copula模型的可辨识性

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0520

Antai Wang

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistica Sinica

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀