ZuhengDavid, Xu, Moksh Jain, Ali Denton, Shawn Whitfield, Aniket Didolkar, Berton Earnshaw, Jason Hartford
Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We validate on several synthetic and real biological experiments that our tests are able to identify interacting pairs effectively. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines.
{"title":"Automated Discovery of Pairwise Interactions from Unstructured Data","authors":"ZuhengDavid, Xu, Moksh Jain, Ali Denton, Shawn Whitfield, Aniket Didolkar, Berton Earnshaw, Jason Hartford","doi":"arxiv-2409.07594","DOIUrl":"https://doi.org/arxiv-2409.07594","url":null,"abstract":"Pairwise interactions between perturbations to a system can provide evidence\u0000for the causal dependencies of the underlying underlying mechanisms of a\u0000system. When observations are low dimensional, hand crafted measurements,\u0000detecting interactions amounts to simple statistical tests, but it is not\u0000obvious how to detect interactions between perturbations affecting latent\u0000variables. We derive two interaction tests that are based on pairwise\u0000interventions, and show how these tests can be integrated into an active\u0000learning pipeline to efficiently discover pairwise interactions between\u0000perturbations. We illustrate the value of these tests in the context of\u0000biology, where pairwise perturbation experiments are frequently used to reveal\u0000interactions that are not observable from any single perturbation. Our tests\u0000can be run on unstructured data, such as the pixels in an image, which enables\u0000a more general notion of interaction than typical cell viability experiments,\u0000and can be run on cheaper experimental assays. We validate on several synthetic\u0000and real biological experiments that our tests are able to identify interacting\u0000pairs effectively. We evaluate our approach on a real biological experiment\u0000where we knocked out 50 pairs of genes and measured the effect with microscopy\u0000images. We show that we are able to recover significantly more known biological\u0000interactions than random search and standard active learning baselines.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a continuous-time approximation of the stochastic gradient descent process for minimizing the expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized linear neural network training.
{"title":"Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks","authors":"Gabor Lugosi, Eulalia Nualart","doi":"arxiv-2409.07401","DOIUrl":"https://doi.org/arxiv-2409.07401","url":null,"abstract":"We study a continuous-time approximation of the stochastic gradient descent\u0000process for minimizing the expected loss in learning problems. The main results\u0000establish general sufficient conditions for the convergence, extending the\u0000results of Chatterjee (2022) established for (nonstochastic) gradient descent.\u0000We show how the main result can be applied to the case of overparametrized\u0000linear neural network training.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Bradley Malin, Kieran Parsons, Ye Wang
We explore user-level gradient inversion as a new attack surface in distributed learning. We first investigate existing attacks on their ability to make inferences about private information beyond training data reconstruction. Motivated by the low reconstruction quality of existing methods, we propose a novel gradient inversion attack that applies a denoising diffusion model as a strong image prior in order to enhance recovery in the large batch setting. Unlike traditional attacks, which aim to reconstruct individual samples and suffer at large batch and image sizes, our approach instead aims to recover a representative image that captures the sensitive shared semantic information corresponding to the underlying user. Our experiments with face images demonstrate the ability of our methods to recover realistic facial images along with private user attributes.
{"title":"Exploring User-level Gradient Inversion with a Diffusion Prior","authors":"Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Bradley Malin, Kieran Parsons, Ye Wang","doi":"arxiv-2409.07291","DOIUrl":"https://doi.org/arxiv-2409.07291","url":null,"abstract":"We explore user-level gradient inversion as a new attack surface in\u0000distributed learning. We first investigate existing attacks on their ability to\u0000make inferences about private information beyond training data reconstruction.\u0000Motivated by the low reconstruction quality of existing methods, we propose a\u0000novel gradient inversion attack that applies a denoising diffusion model as a\u0000strong image prior in order to enhance recovery in the large batch setting.\u0000Unlike traditional attacks, which aim to reconstruct individual samples and\u0000suffer at large batch and image sizes, our approach instead aims to recover a\u0000representative image that captures the sensitive shared semantic information\u0000corresponding to the underlying user. Our experiments with face images\u0000demonstrate the ability of our methods to recover realistic facial images along\u0000with private user attributes.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The performance of the standard Online Robust Principal Component Analysis (OR-PCA) technique depends on the optimum tuning of the explicit regularizers and this tuning is dataset sensitive. We aim to remove the dependency on these tuning parameters by using implicit regularization. We propose to use the implicit regularization effect of various modified gradient descents to make OR-PCA tuning free. Our method incorporates three different versions of modified gradient descent that separately but naturally encourage sparsity and low-rank structures in the data. The proposed method performs comparable or better than the tuned OR-PCA for both simulated and real-world datasets. Tuning-free ORPCA makes it more scalable for large datasets since we do not require dataset-dependent parameter tuning.
{"title":"Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization","authors":"Lakshmi Jayalal, Gokularam Muthukrishnan, Sheetal Kalyani","doi":"arxiv-2409.07275","DOIUrl":"https://doi.org/arxiv-2409.07275","url":null,"abstract":"The performance of the standard Online Robust Principal Component Analysis\u0000(OR-PCA) technique depends on the optimum tuning of the explicit regularizers\u0000and this tuning is dataset sensitive. We aim to remove the dependency on these\u0000tuning parameters by using implicit regularization. We propose to use the\u0000implicit regularization effect of various modified gradient descents to make\u0000OR-PCA tuning free. Our method incorporates three different versions of\u0000modified gradient descent that separately but naturally encourage sparsity and\u0000low-rank structures in the data. The proposed method performs comparable or\u0000better than the tuned OR-PCA for both simulated and real-world datasets.\u0000Tuning-free ORPCA makes it more scalable for large datasets since we do not\u0000require dataset-dependent parameter tuning.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"203 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
António Farinhas, Haau-Sing Li, André F. T. Martins
To ensure large language models (LLMs) are used safely, one must reduce their propensity to hallucinate or to generate unacceptable answers. A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy communication channels. We conceptualize the generator as a sender transmitting multiple descriptions of a message through parallel noisy channels. The receiver decodes the message by ranking the (potentially corrupted) descriptions and selecting the one found to be most reliable. We provide conditions under which this protocol is asymptotically error-free (i.e., yields an acceptable answer almost surely) even in scenarios where the reranker is imperfect (governed by Mallows or Zipf-Mandelbrot models) and the channel distributions are statistically dependent. We use our framework to obtain reranking laws which we validate empirically on two real-world tasks using LLMs: text-to-code generation with DeepSeek-Coder 7B and machine translation of medical data with TowerInstruct 13B.
{"title":"Reranking Laws for Language Generation: A Communication-Theoretic Perspective","authors":"António Farinhas, Haau-Sing Li, André F. T. Martins","doi":"arxiv-2409.07131","DOIUrl":"https://doi.org/arxiv-2409.07131","url":null,"abstract":"To ensure large language models (LLMs) are used safely, one must reduce their\u0000propensity to hallucinate or to generate unacceptable answers. A simple and\u0000often used strategy is to first let the LLM generate multiple hypotheses and\u0000then employ a reranker to choose the best one. In this paper, we draw a\u0000parallel between this strategy and the use of redundancy to decrease the error\u0000rate in noisy communication channels. We conceptualize the generator as a\u0000sender transmitting multiple descriptions of a message through parallel noisy\u0000channels. The receiver decodes the message by ranking the (potentially\u0000corrupted) descriptions and selecting the one found to be most reliable. We\u0000provide conditions under which this protocol is asymptotically error-free\u0000(i.e., yields an acceptable answer almost surely) even in scenarios where the\u0000reranker is imperfect (governed by Mallows or Zipf-Mandelbrot models) and the\u0000channel distributions are statistically dependent. We use our framework to\u0000obtain reranking laws which we validate empirically on two real-world tasks\u0000using LLMs: text-to-code generation with DeepSeek-Coder 7B and machine\u0000translation of medical data with TowerInstruct 13B.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zehao Dou, Subhodh Kotekal, Zhehao Xu, Harrison H. Zhou
The recent, impressive advances in algorithmic generation of high-fidelity image, audio, and video are largely due to great successes in score-based diffusion models. A key implementing step is score matching, that is, the estimation of the score function of the forward diffusion process from training data. As shown in earlier literature, the total variation distance between the law of a sample generated from the trained diffusion model and the ground truth distribution can be controlled by the score matching risk. Despite the widespread use of score-based diffusion models, basic theoretical questions concerning exact optimal statistical rates for score estimation and its application to density estimation remain open. We establish the sharp minimax rate of score estimation for smooth, compactly supported densities. Formally, given (n) i.i.d. samples from an unknown (alpha)-H"{o}lder density (f) supported on ([-1, 1]), we prove the minimax rate of estimating the score function of the diffused distribution (f * mathcal{N}(0, t)) with respect to the score matching loss is (frac{1}{nt^2} wedge frac{1}{nt^{3/2}} wedge (t^{alpha-1} + n^{-2(alpha-1)/(2alpha+1)})) for all (alpha > 0) and (t ge 0). As a consequence, it is shown the law (hat{f}) of a sample generated from the diffusion model achieves the sharp minimax rate (bE(dTV(hat{f}, f)^2) lesssim n^{-2alpha/(2alpha+1)}) for all (alpha > 0) without any extraneous logarithmic terms which are prevalent in the literature, and without the need for early stopping which has been required for all existing procedures to the best of our knowledge.
最近,高保真图像、音频和视频的算法生成技术取得了令人瞩目的进步,这在很大程度上归功于基于分数的扩散模型所取得的巨大成功。一个关键的实现步骤是分数匹配,即从训练数据中估计前向扩散过程的分数函数。如早期文献所示,由训练好的扩散模型生成的样本规律与地面真实分布之间的总变异距离可由分数匹配风险控制。尽管基于分数的扩散模型得到了广泛应用,但有关分数估计的精确最优统计率及其在密度估计中的应用等基本理论问题仍未解决。我们建立了平滑、紧凑支撑密度的分数估计的锐敏最大率。形式上,给定(n)个 i i d 样本,这些样本来自一个未知的支持在([-1, 1])上的 (α)-H{o}lderdensity (f),我们证明相对于分数匹配损失,估计扩散分布 (f*mathcal{N}(0, t))的分数函数的最小率是(frac{1}{nt^2})。wedgefrac{1}{nt^{3/2}}(t^{alpha-1} + n^{-2(alpha-1)/(2alpha+1)})) forall (alpha > 0) and(t ge 0).结果表明,由扩散模型生成的样本的律(hat{f})达到了sharpminimax率(bE(dTV(hat{f}、f)^2) lesssim n^{-2alpha/(2alpha+1)}) forall (alpha > 0) without any extraneous logarithmic terms which are prevalent in the literature, and without the need for early stopping which has beenrequired for all existing procedures to the best of our knowledge.
{"title":"From optimal score matching to optimal sampling","authors":"Zehao Dou, Subhodh Kotekal, Zhehao Xu, Harrison H. Zhou","doi":"arxiv-2409.07032","DOIUrl":"https://doi.org/arxiv-2409.07032","url":null,"abstract":"The recent, impressive advances in algorithmic generation of high-fidelity\u0000image, audio, and video are largely due to great successes in score-based\u0000diffusion models. A key implementing step is score matching, that is, the\u0000estimation of the score function of the forward diffusion process from training\u0000data. As shown in earlier literature, the total variation distance between the\u0000law of a sample generated from the trained diffusion model and the ground truth\u0000distribution can be controlled by the score matching risk. Despite the widespread use of score-based diffusion models, basic theoretical\u0000questions concerning exact optimal statistical rates for score estimation and\u0000its application to density estimation remain open. We establish the sharp\u0000minimax rate of score estimation for smooth, compactly supported densities.\u0000Formally, given (n) i.i.d. samples from an unknown (alpha)-H\"{o}lder\u0000density (f) supported on ([-1, 1]), we prove the minimax rate of estimating\u0000the score function of the diffused distribution (f * mathcal{N}(0, t)) with\u0000respect to the score matching loss is (frac{1}{nt^2} wedge\u0000frac{1}{nt^{3/2}} wedge (t^{alpha-1} + n^{-2(alpha-1)/(2alpha+1)})) for\u0000all (alpha > 0) and (t ge 0). As a consequence, it is shown the law\u0000(hat{f}) of a sample generated from the diffusion model achieves the sharp\u0000minimax rate (bE(dTV(hat{f}, f)^2) lesssim n^{-2alpha/(2alpha+1)}) for\u0000all (alpha > 0) without any extraneous logarithmic terms which are prevalent\u0000in the literature, and without the need for early stopping which has been\u0000required for all existing procedures to the best of our knowledge.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Training-free guidance methods for continuous data have seen an explosion of interest due to the fact that they enable foundation diffusion models to be paired with interchangable guidance models. Currently, equivalent guidance methods for discrete diffusion models are unknown. We present a framework for applying training-free guidance to discrete data and demonstrate its utility on molecular graph generation tasks using the discrete diffusion model architecture of DiGress. We pair this model with guidance functions that return the proportion of heavy atoms that are a specific atom type and the molecular weight of the heavy atoms and demonstrate our method's ability to guide the data generation.
{"title":"Training-Free Guidance for Discrete Diffusion Models for Molecular Generation","authors":"Thomas J. Kerby, Kevin R. Moon","doi":"arxiv-2409.07359","DOIUrl":"https://doi.org/arxiv-2409.07359","url":null,"abstract":"Training-free guidance methods for continuous data have seen an explosion of\u0000interest due to the fact that they enable foundation diffusion models to be\u0000paired with interchangable guidance models. Currently, equivalent guidance\u0000methods for discrete diffusion models are unknown. We present a framework for\u0000applying training-free guidance to discrete data and demonstrate its utility on\u0000molecular graph generation tasks using the discrete diffusion model\u0000architecture of DiGress. We pair this model with guidance functions that return\u0000the proportion of heavy atoms that are a specific atom type and the molecular\u0000weight of the heavy atoms and demonstrate our method's ability to guide the\u0000data generation.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengzhe Zhang, Jiajun He, Laurence I. Midgley, Javier Antorán, José Miguel Hernández-Lobato
Diffusion models have shown promising potential for advancing Boltzmann Generators. However, two critical challenges persist: (1) inherent errors in samples due to model imperfections, and (2) the requirement of hundreds of functional evaluations (NFEs) to achieve high-quality samples. While existing solutions like importance sampling and distillation address these issues separately, they are often incompatible, as most distillation models lack the necessary density information for importance sampling. This paper introduces a novel sampling method that effectively combines Consistency Models (CMs) with importance sampling. We evaluate our approach on both synthetic energy functions and equivariant n-body particle systems. Our method produces unbiased samples using only 6-25 NFEs while achieving a comparable Effective Sample Size (ESS) to Denoising Diffusion Probabilistic Models (DDPMs) that require approximately 100 NFEs.
{"title":"Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models","authors":"Fengzhe Zhang, Jiajun He, Laurence I. Midgley, Javier Antorán, José Miguel Hernández-Lobato","doi":"arxiv-2409.07323","DOIUrl":"https://doi.org/arxiv-2409.07323","url":null,"abstract":"Diffusion models have shown promising potential for advancing Boltzmann\u0000Generators. However, two critical challenges persist: (1) inherent errors in\u0000samples due to model imperfections, and (2) the requirement of hundreds of\u0000functional evaluations (NFEs) to achieve high-quality samples. While existing\u0000solutions like importance sampling and distillation address these issues\u0000separately, they are often incompatible, as most distillation models lack the\u0000necessary density information for importance sampling. This paper introduces a\u0000novel sampling method that effectively combines Consistency Models (CMs) with\u0000importance sampling. We evaluate our approach on both synthetic energy\u0000functions and equivariant n-body particle systems. Our method produces unbiased\u0000samples using only 6-25 NFEs while achieving a comparable Effective Sample Size\u0000(ESS) to Denoising Diffusion Probabilistic Models (DDPMs) that require\u0000approximately 100 NFEs.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding how real data is distributed in high dimensional spaces is the key to many tasks in machine learning. We want to provide a natural geometric structure on the space of data employing a deep ReLU neural network trained as a classifier. Through the data information matrix (DIM), a variation of the Fisher information matrix, the model will discern a singular foliation structure on the space of data. We show that the singular points of such foliation are contained in a measure zero set, and that a local regular foliation exists almost everywhere. Experiments show that the data is correlated with leaves of such foliation. Moreover we show the potential of our approach for knowledge transfer by analyzing the spectrum of the DIM to measure distances between datasets.
了解真实数据在高维空间中的分布是机器学习中许多任务的关键。我们希望利用经过训练的深度 ReLU 神经网络作为分类器,为数据空间提供自然的几何结构。通过数据信息矩阵(DIM)--菲舍尔信息矩阵的一种变体--模型将辨别数据空间上的奇异对折结构。我们证明,这种褶皱的奇异点包含在一个度量为零的集合中,而且几乎在所有地方都存在局部规则褶皱。实验表明,数据与这种褶皱的叶子相关。此外,我们还通过分析 DIM 的频谱来测量数据集之间的差异,从而展示了我们的方法在知识转移方面的潜力。
{"title":"Manifold Learning via Foliations and Knowledge Transfer","authors":"E. Tron, E. Fioresi","doi":"arxiv-2409.07412","DOIUrl":"https://doi.org/arxiv-2409.07412","url":null,"abstract":"Understanding how real data is distributed in high dimensional spaces is the\u0000key to many tasks in machine learning. We want to provide a natural geometric\u0000structure on the space of data employing a deep ReLU neural network trained as\u0000a classifier. Through the data information matrix (DIM), a variation of the\u0000Fisher information matrix, the model will discern a singular foliation\u0000structure on the space of data. We show that the singular points of such\u0000foliation are contained in a measure zero set, and that a local regular\u0000foliation exists almost everywhere. Experiments show that the data is\u0000correlated with leaves of such foliation. Moreover we show the potential of our\u0000approach for knowledge transfer by analyzing the spectrum of the DIM to measure\u0000distances between datasets.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop hard clustering based on likelihood rather than distance and prove convergence. We also provide simulations and real data examples.
我们开发了基于似然而非距离的硬聚类,并证明了收敛性。我们还提供了模拟和真实数据示例。
{"title":"k-MLE, k-Bregman, k-VARs: Theory, Convergence, Computation","authors":"Zuogong Yue, Victor Solo","doi":"arxiv-2409.06938","DOIUrl":"https://doi.org/arxiv-2409.06938","url":null,"abstract":"We develop hard clustering based on likelihood rather than distance and prove\u0000convergence. We also provide simulations and real data examples.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}