首页 > 最新文献

Statistics and Its Interface最新文献

英文 中文
Default Bayesian testing for the zero-inflated Poisson distribution 零膨胀泊松分布的默认贝叶斯测试
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-19 DOI: 10.4310/22-sii750
Yewon Han, Haewon Hwang, Hon Keung Ng, Seong Kim
In a Bayesian model selection and hypothesis testing, users should be cautious when choosing suitable prior distributions, as it is an important problem. More often than not, objective Bayesian analyses utilize noninformative priors such as Jeffreys priors. However, since these noninformative priors are often improper, the Bayes factor associated with these improper priors is not well-defined. To circumvent this indeterminate issue, the Bayes factor can be corrected by intrinsic and fractional methods. These adjusted Bayes factors are asymptotically equivalent to the ordinary Bayes factors calculated with proper priors, called intrinsic priors. In this article, we derive intrinsic priors for testing the point null hypothesis under a zero-inflated Poisson distribution. Extensive simulation studies are performed to support the theoretical results on asymptotic equivalence, and two real datasets are analyzed to illustrate the methodology developed in this paper.
在贝叶斯模型选择和假设检验中,用户应谨慎选择合适的先验分布,因为这是一个重要问题。客观的贝叶斯分析通常使用非信息先验,如 Jeffreys 先验。然而,由于这些非信息先验往往是不恰当的,因此与这些不恰当先验相关的贝叶斯因子并不明确。为了规避这个不确定的问题,可以通过本征法和分数法对贝叶斯因子进行修正。这些调整后的贝叶斯因子在渐近上等同于用适当褒义词计算的普通贝叶斯因子,称为本征褒义词。在本文中,我们推导了在零膨胀泊松分布下测试点零假设的本征先验。为了支持渐近等价的理论结果,我们进行了广泛的模拟研究,并分析了两个真实数据集,以说明本文所开发的方法。
{"title":"Default Bayesian testing for the zero-inflated Poisson distribution","authors":"Yewon Han, Haewon Hwang, Hon Keung Ng, Seong Kim","doi":"10.4310/22-sii750","DOIUrl":"https://doi.org/10.4310/22-sii750","url":null,"abstract":"In a Bayesian model selection and hypothesis testing, users should be cautious when choosing suitable prior distributions, as it is an important problem. More often than not, objective Bayesian analyses utilize noninformative priors such as Jeffreys priors. However, since these noninformative priors are often improper, the Bayes factor associated with these improper priors is not well-defined. To circumvent this indeterminate issue, the Bayes factor can be corrected by intrinsic and fractional methods. These adjusted Bayes factors are asymptotically equivalent to the ordinary Bayes factors calculated with proper priors, called intrinsic priors. In this article, we derive intrinsic priors for testing the point null hypothesis under a zero-inflated Poisson distribution. Extensive simulation studies are performed to support the theoretical results on asymptotic equivalence, and two real datasets are analyzed to illustrate the methodology developed in this paper.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"5 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating extreme value index by subsampling for massive datasets with heavy-tailed distributions 通过子采样估计重尾分布海量数据集的极值指数
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-19 DOI: 10.4310/22-sii749
Yongxin Li, Liujun Chen, Deyuan Li, Hansheng Wang
Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To address the issue, we propose here a subsampling-based method. Specifically, multiple subsamples are drawn from the whole dataset by using the technique of simple random subsampling with replacement. Based on each subsample, an approximate maximum likelihood estimator can be computed. The resulting estimators are then averaged to form a more accurate one. Under appropriate regularity conditions, we show theoretically that the proposed estimator is consistent and asymptotically normal. With the help of the estimated extreme value index, we can estimate high-level quantiles and tail probabilities of a heavy-tailed random variable consistently. Extensive simulation experiments are provided to demonstrate the promising performance of our method. A real data analysis is also presented for illustration purpose.
现代统计分析经常会遇到大规模和重尾分布的数据集。对于规模庞大的数据集,传统的估计方法很难直接用来估计极值指数。为了解决这个问题,我们在此提出一种基于子抽样的方法。具体来说,我们使用简单随机子抽样技术,从整个数据集中抽取多个子样本进行替换。根据每个子样本,可以计算出近似最大似然估计值。然后对得到的估计值取平均值,形成一个更精确的估计值。在适当的正则条件下,我们从理论上证明了所提出的估计值是一致的,而且渐近正态。在估计极值指数的帮助下,我们可以一致地估计重尾随机变量的高阶量值和尾概率。大量的模拟实验证明了我们的方法具有良好的性能。此外,我们还提供了一个真实数据分析,以作说明。
{"title":"Estimating extreme value index by subsampling for massive datasets with heavy-tailed distributions","authors":"Yongxin Li, Liujun Chen, Deyuan Li, Hansheng Wang","doi":"10.4310/22-sii749","DOIUrl":"https://doi.org/10.4310/22-sii749","url":null,"abstract":"Modern statistical analyses often encounter datasets with massive sizes and heavy-tailed distributions. For datasets with massive sizes, traditional estimation methods can hardly be used to estimate the extreme value index directly. To address the issue, we propose here a subsampling-based method. Specifically, multiple subsamples are drawn from the whole dataset by using the technique of simple random subsampling with replacement. Based on each subsample, an approximate maximum likelihood estimator can be computed. The resulting estimators are then averaged to form a more accurate one. Under appropriate regularity conditions, we show theoretically that the proposed estimator is consistent and asymptotically normal. With the help of the estimated extreme value index, we can estimate high-level quantiles and tail probabilities of a heavy-tailed random variable consistently. Extensive simulation experiments are provided to demonstrate the promising performance of our method. A real data analysis is also presented for illustration purpose.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"27 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A random projection method for large-scale community detection 大规模群落探测的随机投影法
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/22-sii752
Haobo Qi, Hansheng Wang, Xuening Zhu
In this work, we consider a random projection method for a large-scale community detection task. We introduce a random Gaussian matrix that generates several projections on the column space of the network adjacency matrix. The $k$-means algorithm is then applied with the low-dimensional projected matrix. The computational complexity is much lower than that of the classic spectral clustering methods. Furthermore, the algorithm is easy to implement and accessible for privacy preservation. We can theoretically establish a strong consistency result of the algorithm under the stochastic block model. Extensive numerical studies are conducted to verify the theoretical findings and illustrate the usefulness of the proposed method.
在这项工作中,我们考虑采用随机投影法来完成大规模群落检测任务。我们引入了一个随机高斯矩阵,在网络邻接矩阵的列空间上生成多个投影。然后利用低维投影矩阵应用 $k$-means 算法。该算法的计算复杂度远远低于经典的谱聚类方法。此外,该算法易于实现,并能保护隐私。我们可以从理论上建立随机块模型下算法的强一致性结果。我们还进行了广泛的数值研究,以验证理论结论,并说明所提方法的实用性。
{"title":"A random projection method for large-scale community detection","authors":"Haobo Qi, Hansheng Wang, Xuening Zhu","doi":"10.4310/22-sii752","DOIUrl":"https://doi.org/10.4310/22-sii752","url":null,"abstract":"In this work, we consider a random projection method for a large-scale community detection task. We introduce a random Gaussian matrix that generates several projections on the column space of the network adjacency matrix. The $k$-means algorithm is then applied with the low-dimensional projected matrix. The computational complexity is much lower than that of the classic spectral clustering methods. Furthermore, the algorithm is easy to implement and accessible for privacy preservation. We can theoretically establish a strong consistency result of the algorithm under the stochastic block model. Extensive numerical studies are conducted to verify the theoretical findings and illustrate the usefulness of the proposed method.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"1 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correlated Wishart matrices classification via an expectation-maximization composite likelihood-based algorithm 通过基于期望最大化的复合似然算法进行相关 Wishart 矩阵分类
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/22-sii770
Zhou Lan
Positive-definite matrix-variate data is becoming popular in computer vision. The computer vision data descriptors in the form of Region Covariance Descriptors (RCD) are positive definite matrices, which extract the key features of the images. The RCDs are extensively used in image set classification. Some classification methods treating RCDs as Wishart distributed random matrices are being proposed. However, the majority of the current methods preclude the potential correlation among the RCDs caused by the so-called auxiliary information (e.g., subjects’ ages and nose widths, etc). Modeling correlated Wishart matrices is difficult since the joint density function of correlated Wishart matrices is difficult to be obtained. In this paper, we propose an Expectation-Maximization composite likelihoodbased algorithm of Wishart matrices to tackle this issue. Given the numerical studies based on the synthetic data and the real data (Chicago face data-set), our proposed algorithm performs better than the alternative methods which do not consider the correlation caused by the so-called auxiliary information.
正定矩阵变量数据在计算机视觉领域越来越受欢迎。区域协方差描述符(RCD)形式的计算机视觉数据描述符是正定矩阵,可提取图像的关键特征。区域协方差描述符广泛应用于图像集分类。有人提出了一些将 RCD 视为 Wishart 分布随机矩阵的分类方法。然而,目前的大多数方法都排除了由所谓的辅助信息(如受试者的年龄和鼻宽等)引起的 RCD 之间的潜在相关性。由于相关 Wishart 矩阵的联合密度函数难以获得,因此很难对相关 Wishart 矩阵进行建模。本文提出了一种基于期望最大化的 Wishart 矩阵复合似然算法来解决这一问题。通过对合成数据和真实数据(芝加哥人脸数据集)的数值研究,我们提出的算法比其他不考虑所谓辅助信息导致的相关性的方法表现更好。
{"title":"Correlated Wishart matrices classification via an expectation-maximization composite likelihood-based algorithm","authors":"Zhou Lan","doi":"10.4310/22-sii770","DOIUrl":"https://doi.org/10.4310/22-sii770","url":null,"abstract":"Positive-definite matrix-variate data is becoming popular in computer vision. The computer vision data descriptors in the form of Region Covariance Descriptors (RCD) are positive definite matrices, which extract the key features of the images. The RCDs are extensively used in image set classification. Some classification methods treating RCDs as Wishart distributed random matrices are being proposed. However, the majority of the current methods preclude the potential correlation among the RCDs caused by the so-called auxiliary information (e.g., subjects’ ages and nose widths, etc). Modeling correlated Wishart matrices is difficult since the joint density function of correlated Wishart matrices is difficult to be obtained. In this paper, we propose an Expectation-Maximization composite likelihoodbased algorithm of Wishart matrices to tackle this issue. Given the numerical studies based on the synthetic data and the real data (Chicago face data-set), our proposed algorithm performs better than the alternative methods which do not consider the correlation caused by the so-called auxiliary information.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"27 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust and covariance-assisted tensor response regression 稳健和协方差辅助张量响应回归
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/sii.2024.v17.n2.a10
Ning Wang, Xin Zhang
Tensor data analysis is gaining increasing popularity in modern multivariate statistics. When analyzing real-world tensor data, many existing tensor estimation approaches are sensitive to heavy-tailed data and outliers, in addition to the apparent high-dimensionality. In this article, we develop a robust and covariance-assisted tensor response regression model based on a recently proposed tensor t‑distribution to address these issues in tensor data. This model assumes that the tensor regression coefficient has a low-rank structure that can be learned more effectively using the additional covariance information. This enables a fast and robust decomposition-based estimation method. Theoretical analysis and numerical experiments demonstrate the superior performance of our approach. By addressing the heavy-tail, high-order, and high-dimensional issues, our work contributes to robust and effective estimation methods for tensor response regression, with broad applicability in various domains.
张量数据分析在现代多元统计中越来越受欢迎。在分析真实世界的张量数据时,除了明显的高维性之外,许多现有的张量估计方法对重尾数据和异常值都很敏感。在本文中,我们基于最近提出的张量 t 分布建立了一个稳健的协方差辅助张量响应回归模型,以解决张量数据中的这些问题。该模型假定张量回归系数具有低秩结构,利用额外的协方差信息可以更有效地学习该结构。这使得基于分解的估计方法既快速又稳健。理论分析和数值实验证明了我们的方法性能优越。通过解决重尾、高阶和高维问题,我们的工作有助于为张量响应回归提供稳健有效的估计方法,并在各个领域具有广泛的适用性。
{"title":"Robust and covariance-assisted tensor response regression","authors":"Ning Wang, Xin Zhang","doi":"10.4310/sii.2024.v17.n2.a10","DOIUrl":"https://doi.org/10.4310/sii.2024.v17.n2.a10","url":null,"abstract":"Tensor data analysis is gaining increasing popularity in modern multivariate statistics. When analyzing real-world tensor data, many existing tensor estimation approaches are sensitive to heavy-tailed data and outliers, in addition to the apparent high-dimensionality. In this article, we develop a robust and covariance-assisted tensor response regression model based on a recently proposed tensor t‑distribution to address these issues in tensor data. This model assumes that the tensor regression coefficient has a low-rank structure that can be learned more effectively using the additional covariance information. This enables a fast and robust decomposition-based estimation method. Theoretical analysis and numerical experiments demonstrate the superior performance of our approach. By addressing the heavy-tail, high-order, and high-dimensional issues, our work contributes to robust and effective estimation methods for tensor response regression, with broad applicability in various domains.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"24 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian tensor-on-tensor regression with efficient computation 高效计算的贝叶斯张量对张量回归
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii786
Kunbo Wang, Yanxun Xu
We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it via cross-validation or some model selection criteria. However, no existing method can simultaneously estimate the model dimension (the dimension of the core tensor) and other model parameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC) algorithm to estimate both the model dimension and parameters for posterior inference. Besides the MCMC sampler, we also develop an ultra-fast optimization-based computing algorithm wherein the maximum a posteriori estimators for parameters are computed, and the model dimension is optimized via a simulated annealing algorithm. The proposed Bayesian framework provides a natural way for uncertainty quantification. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-on-tensor regression model and show its superior performance compared to alternative methods. We also demonstrate its practical effectiveness by applying it to two real-world datasets, including facial imaging data and 3D motion data.
我们提出了一种贝叶斯张量对张量回归方法,以回归系数张量的塔克分解为基础,从另一个任意维度的张量中预测任意维度的多维阵列(张量)。利用塔克分解的传统张量回归方法要么假定核心张量的维度是已知的,要么通过交叉验证或某些模型选择标准来估计。然而,目前还没有一种方法能同时估计模型维度(核心张量的维度)和其他模型参数。为了填补这一空白,我们开发了一种高效的马尔可夫链蒙特卡罗(MCMC)算法,用于估计模型维度和参数,以进行后验推断。除了 MCMC 采样器之外,我们还开发了一种基于优化的超快速计算算法,通过该算法计算参数的最大后验估计值,并通过模拟退火算法优化模型维度。所提出的贝叶斯框架为不确定性量化提供了一种自然的方法。通过广泛的模拟研究,我们对所提出的贝叶斯张量对张量回归模型进行了评估,结果表明,与其他方法相比,该模型性能优越。我们还将其应用于两个真实世界的数据集,包括面部成像数据和三维运动数据,从而证明了它的实际效果。
{"title":"Bayesian tensor-on-tensor regression with efficient computation","authors":"Kunbo Wang, Yanxun Xu","doi":"10.4310/23-sii786","DOIUrl":"https://doi.org/10.4310/23-sii786","url":null,"abstract":"We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it via cross-validation or some model selection criteria. However, no existing method can simultaneously estimate the model dimension (the dimension of the core tensor) and other model parameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC) algorithm to estimate both the model dimension and parameters for posterior inference. Besides the MCMC sampler, we also develop an ultra-fast optimization-based computing algorithm wherein the maximum <i>a posteriori</i> estimators for parameters are computed, and the model dimension is optimized via a simulated annealing algorithm. The proposed Bayesian framework provides a natural way for uncertainty quantification. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-on-tensor regression model and show its superior performance compared to alternative methods. We also demonstrate its practical effectiveness by applying it to two real-world datasets, including facial imaging data and 3D motion data.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"23 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139658968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Density-convoluted tensor support vector machines 密度卷积张量支持向量机
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii796
Boxiang Wang, Le Zhou, Jian Yang, Qing Mai
With the emergence of tensor data (also known as multi-dimensional arrays) in many modern applications such as image processing and digital marketing, tensor classification is gaining increasing attention. Although there is a rich toolbox of classification methods for vector-based data, these traditional methods may not be adequate for tensor data classification. In this paper, we propose a new classifier called density-convoluted tensor support vector machine (DCT‑SVM). This method is motivated by applying a kernel density convolution method on the SVM loss to induce a new family of classification loss functions. To establish the theoretical foundation of DCT‑SVM, the probabilistic order of magnitude for its excess risk is systematically studied. For efficiently computing DCT‑SVM, we develop a fast monotone accelerated proximal gradient descent algorithm and show the convergence of the algorithm. With simulation studies, we demonstrate the superior performance of DCT‑SVM over many popular classification methods. We further demonstrate the real potential of DCT‑SVM using a modern data application for online advertising.
随着张量数据(又称多维数组)在图像处理和数字营销等许多现代应用中的出现,张量分类正日益受到关注。虽然针对基于向量的数据有丰富的分类方法工具箱,但这些传统方法可能无法满足张量数据分类的需要。在本文中,我们提出了一种名为密度-卷积张量支持向量机(DCT-SVM)的新分类器。这种方法的原理是在 SVM 损失上应用核密度卷积方法,从而产生新的分类损失函数族。为了建立 DCT-SVM 的理论基础,系统地研究了其超额风险的概率数量级。为了高效计算 DCT-SVM,我们开发了一种快速单调加速近似梯度下降算法,并展示了该算法的收敛性。通过模拟研究,我们证明了 DCT-SVM 优于许多流行分类方法的性能。我们还利用在线广告的现代数据应用进一步证明了 DCT-SVM 的真正潜力。
{"title":"Density-convoluted tensor support vector machines","authors":"Boxiang Wang, Le Zhou, Jian Yang, Qing Mai","doi":"10.4310/23-sii796","DOIUrl":"https://doi.org/10.4310/23-sii796","url":null,"abstract":"With the emergence of tensor data (also known as multi-dimensional arrays) in many modern applications such as image processing and digital marketing, tensor classification is gaining increasing attention. Although there is a rich toolbox of classification methods for vector-based data, these traditional methods may not be adequate for tensor data classification. In this paper, we propose a new classifier called density-convoluted tensor support vector machine (DCT‑SVM). This method is motivated by applying a kernel density convolution method on the SVM loss to induce a new family of classification loss functions. To establish the theoretical foundation of DCT‑SVM, the probabilistic order of magnitude for its excess risk is systematically studied. For efficiently computing DCT‑SVM, we develop a fast monotone accelerated proximal gradient descent algorithm and show the convergence of the algorithm. With simulation studies, we demonstrate the superior performance of DCT‑SVM over many popular classification methods. We further demonstrate the real potential of DCT‑SVM using a modern data application for online advertising.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"36 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-way overlapping clustering by Bayesian tensor decomposition 通过贝叶斯张量分解进行多向重叠聚类
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii790
Zhuofan Wang, Fangting Zhou, Kejun He, Yang Ni
The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.
现代测序技术的发展为测量不同个体多个组织的基因表达提供了巨大的机会。基因、组织和个体之间的三方差异使得统计推断成为一项具有挑战性的任务。本文提出了一种贝叶斯多向聚类方法,可同时对基因、组织和个体进行聚类。所提出的模型自适应地将观察到的数据分为三个潜在类别,并使用贝叶斯分层结构将潜在变量进一步分解为低维特征,这些特征可以解释为重叠聚类。利用贝叶斯非参数先验,即印度缓冲过程,我们的方法可以自动确定聚类数量。我们通过模拟研究和对基因型-组织表达(GTEx)RNA-seq 数据的应用证明了我们方法的实用性。聚类结果揭示了人脑中抑郁相关基因的一些有趣发现,这些发现也与生物领域的知识相吻合。详细算法和一些数值结果见在线补充材料,网址为:$href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}。
{"title":"Multi-way overlapping clustering by Bayesian tensor decomposition","authors":"Zhuofan Wang, Fangting Zhou, Kejun He, Yang Ni","doi":"10.4310/23-sii790","DOIUrl":"https://doi.org/10.4310/23-sii790","url":null,"abstract":"The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"12 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Community detection in temporal citation network via a tensor-based approach 基于张量的时态引文网络社群检测方法
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/22-sii751
Tianchen Gao, Rui Pan, Junfei Zhang, Hansheng Wang
In the era of big data, network analysis has attracted widespread attention. Detecting and tracking community evolution in temporal networks can uncover important and interesting behaviors. In this paper, we analyze a temporal citation network constructed by publications collected from 44 statistical journals between 2001 and 2018. We propose an approach named Tensor-based Directed Spectral Clustering On Ratios of Eigenvectors (TD-SCORE) which can correct for degree heterogeneity to detect the community structure of the temporal citation network. We first explore the characteristics of the temporal network via in-degree distribution and visualization of different snapshots, and we find that both the community structure and the key nodes change over time. Then, we apply the TD-SCORE method to the core network of our temporal citation network. Seven communities are identified, including variable selection, Bayesian analysis, functional data analysis, and many others. Finally, we track the evolution of the above communities and reach some conclusions.
在大数据时代,网络分析受到广泛关注。检测和跟踪时态网络中的社群演化可以发现重要而有趣的行为。在本文中,我们分析了由 2001 年至 2018 年间从 44 种统计期刊中收集的出版物构建的时态引文网络。我们提出了一种名为 "基于特征向量比的张量定向谱聚类(TD-SCORE)"的方法,该方法可以校正度异质性,从而检测时空引文网络的群落结构。我们首先通过内度分布和不同快照的可视化来探索时空网络的特征,发现群落结构和关键节点都会随时间发生变化。然后,我们将 TD-SCORE 方法应用于时空引文网络的核心网络。我们发现了七个社群,包括变量选择、贝叶斯分析、功能数据分析等。最后,我们跟踪了上述群体的演变,并得出了一些结论。
{"title":"Community detection in temporal citation network via a tensor-based approach","authors":"Tianchen Gao, Rui Pan, Junfei Zhang, Hansheng Wang","doi":"10.4310/22-sii751","DOIUrl":"https://doi.org/10.4310/22-sii751","url":null,"abstract":"In the era of big data, network analysis has attracted widespread attention. Detecting and tracking community evolution in temporal networks can uncover important and interesting behaviors. In this paper, we analyze a temporal citation network constructed by publications collected from 44 statistical journals between 2001 and 2018. We propose an approach named Tensor-based Directed Spectral Clustering On Ratios of Eigenvectors (TD-SCORE) which can correct for degree heterogeneity to detect the community structure of the temporal citation network. We first explore the characteristics of the temporal network via in-degree distribution and visualization of different snapshots, and we find that both the community structure and the key nodes change over time. Then, we apply the TD-SCORE method to the core network of our temporal citation network. Seven communities are identified, including variable selection, Bayesian analysis, functional data analysis, and many others. Finally, we track the evolution of the above communities and reach some conclusions.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"26 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian methods in tensor analysis 张量分析中的贝叶斯方法
IF 0.8 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-01 DOI: 10.4310/23-sii802
Shi Yiyao, Shen Weining
Tensors, also known as multidimensional arrays, are useful data structures in machine learning and statistics. In recent years, Bayesian methods have emerged as a popular direction for analyzing tensor-valued data since they provide a convenient way to introduce sparsity into the model and conduct uncertainty quantification. In this article, we provide an overview of frequentist and Bayesian methods for solving tensor completion and regression problems, with a focus on Bayesian methods. We review common Bayesian tensor approaches including model formulation, prior assignment, posterior computation, and theoretical properties.We also discuss potential future directions in this field.
张量,又称多维数组,是机器学习和统计学中非常有用的数据结构。近年来,贝叶斯方法成为分析张量值数据的一个流行方向,因为它们提供了一种方便的方法,可以将稀疏性引入模型并进行不确定性量化。在本文中,我们将概述解决张量补全和回归问题的频数主义和贝叶斯方法,并重点介绍贝叶斯方法。我们回顾了常见的贝叶斯张量方法,包括模型制定、先验分配、后验计算和理论属性。
{"title":"Bayesian methods in tensor analysis","authors":"Shi Yiyao, Shen Weining","doi":"10.4310/23-sii802","DOIUrl":"https://doi.org/10.4310/23-sii802","url":null,"abstract":"Tensors, also known as multidimensional arrays, are useful data structures in machine learning and statistics. In recent years, Bayesian methods have emerged as a popular direction for analyzing tensor-valued data since they provide a convenient way to introduce sparsity into the model and conduct uncertainty quantification. In this article, we provide an overview of frequentist and Bayesian methods for solving tensor completion and regression problems, with a focus on Bayesian methods. We review common Bayesian tensor approaches including model formulation, prior assignment, posterior computation, and theoretical properties.We also discuss potential future directions in this field.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"49 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Its Interface
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1