We propose PieClam (Prior Inclusive Exclusive Cluster Affiliation Model): a probabilistic graph model for representing any graph as overlapping generalized communities. Our method can be interpreted as a graph autoencoder: nodes are embedded into a code space by an algorithm that maximizes the log-likelihood of the decoded graph, given the input graph. PieClam is a community affiliation model that extends well-known methods like BigClam in two main manners. First, instead of the decoder being defined via pairwise interactions between the nodes in the code space, we also incorporate a learned prior on the distribution of nodes in the code space, turning our method into a graph generative model. Secondly, we generalize the notion of communities by allowing not only sets of nodes with strong connectivity, which we call inclusive communities, but also sets of nodes with strong disconnection, which we call exclusive communities. To model both types of communities, we propose a new type of decoder based the Lorentz inner product, which we prove to be much more expressive than standard decoders based on standard inner products or norm distances. By introducing a new graph similarity measure, that we call the log cut distance, we show that PieClam is a universal autoencoder, able to uniformly approximately reconstruct any graph. Our method is shown to obtain competitive performance in graph anomaly detection benchmarks.
{"title":"PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities","authors":"Daniel Zilberg, Ron Levie","doi":"arxiv-2409.11618","DOIUrl":"https://doi.org/arxiv-2409.11618","url":null,"abstract":"We propose PieClam (Prior Inclusive Exclusive Cluster Affiliation Model): a\u0000probabilistic graph model for representing any graph as overlapping generalized\u0000communities. Our method can be interpreted as a graph autoencoder: nodes are\u0000embedded into a code space by an algorithm that maximizes the log-likelihood of\u0000the decoded graph, given the input graph. PieClam is a community affiliation\u0000model that extends well-known methods like BigClam in two main manners. First,\u0000instead of the decoder being defined via pairwise interactions between the\u0000nodes in the code space, we also incorporate a learned prior on the\u0000distribution of nodes in the code space, turning our method into a graph\u0000generative model. Secondly, we generalize the notion of communities by allowing\u0000not only sets of nodes with strong connectivity, which we call inclusive\u0000communities, but also sets of nodes with strong disconnection, which we call\u0000exclusive communities. To model both types of communities, we propose a new\u0000type of decoder based the Lorentz inner product, which we prove to be much more\u0000expressive than standard decoders based on standard inner products or norm\u0000distances. By introducing a new graph similarity measure, that we call the log\u0000cut distance, we show that PieClam is a universal autoencoder, able to\u0000uniformly approximately reconstruct any graph. Our method is shown to obtain\u0000competitive performance in graph anomaly detection benchmarks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential models such as recurrent neural networks or transformer-based models became textit{de facto} tools for multivariate time series forecasting in a probabilistic fashion, with applications to a wide range of datasets, such as finance, biology, medicine, etc. Despite their adeptness in capturing dependencies, assessing prediction uncertainty, and efficiency in training, challenges emerge in modeling high-dimensional complex distributions and cross-feature dependencies. To tackle these issues, recent works delve into generative modeling by employing diffusion or flow-based models. Notably, the integration of stochastic differential equations or probability flow successfully extends these methods to probabilistic time series imputation and forecasting. However, scalability issues necessitate a computational-friendly framework for large-scale generative model-based predictions. This work proposes a novel approach by blending the computational efficiency of recurrent neural networks with the high-quality probabilistic modeling of the diffusion model, which addresses challenges and advances generative models' application in time series forecasting. Our method relies on the foundation of stochastic interpolants and the extension to a broader conditional generation framework with additional control features, offering insights for future developments in this dynamic field.
{"title":"Recurrent Interpolants for Probabilistic Time Series Prediction","authors":"Yu Chen, Marin Biloš, Sarthak Mittal, Wei Deng, Kashif Rasul, Anderson Schneider","doi":"arxiv-2409.11684","DOIUrl":"https://doi.org/arxiv-2409.11684","url":null,"abstract":"Sequential models such as recurrent neural networks or transformer-based\u0000models became textit{de facto} tools for multivariate time series forecasting\u0000in a probabilistic fashion, with applications to a wide range of datasets, such\u0000as finance, biology, medicine, etc. Despite their adeptness in capturing\u0000dependencies, assessing prediction uncertainty, and efficiency in training,\u0000challenges emerge in modeling high-dimensional complex distributions and\u0000cross-feature dependencies. To tackle these issues, recent works delve into\u0000generative modeling by employing diffusion or flow-based models. Notably, the\u0000integration of stochastic differential equations or probability flow\u0000successfully extends these methods to probabilistic time series imputation and\u0000forecasting. However, scalability issues necessitate a computational-friendly\u0000framework for large-scale generative model-based predictions. This work\u0000proposes a novel approach by blending the computational efficiency of recurrent\u0000neural networks with the high-quality probabilistic modeling of the diffusion\u0000model, which addresses challenges and advances generative models' application\u0000in time series forecasting. Our method relies on the foundation of stochastic\u0000interpolants and the extension to a broader conditional generation framework\u0000with additional control features, offering insights for future developments in\u0000this dynamic field.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization (EM) algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of an invertible PSD MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.
{"title":"Fitting Multilevel Factor Models","authors":"Tetiana Parshakova, Trevor Hastie, Stephen Boyd","doi":"arxiv-2409.12067","DOIUrl":"https://doi.org/arxiv-2409.12067","url":null,"abstract":"We examine a special case of the multilevel factor model, with covariance\u0000given by multilevel low rank (MLR) matrix~cite{parshakova2023factor}. We\u0000develop a novel, fast implementation of the expectation-maximization (EM)\u0000algorithm, tailored for multilevel factor models, to maximize the likelihood of\u0000the observed data. This method accommodates any hierarchical structure and\u0000maintains linear time and storage complexities per iteration. This is achieved\u0000through a new efficient technique for computing the inverse of the positive\u0000definite MLR matrix. We show that the inverse of an invertible PSD MLR matrix\u0000is also an MLR matrix with the same sparsity in factors, and we use the\u0000recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of\u0000the inverse. Additionally, we present an algorithm that computes the Cholesky\u0000factorization of an expanded matrix with linear time and space complexities,\u0000yielding the covariance matrix as its Schur complement. This paper is\u0000accompanied by an open-source package that implements the proposed methods.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eliot Tron, Rita Fioresi, Nicolas Couellan, Stéphane Puechmorel
The purpose of this paper is to employ the language of Cartan moving frames to study the geometry of the data manifolds and its Riemannian structure, via the data information metric and its curvature at data points. Using this framework and through experiments, explanations on the response of a neural network are given by pointing out the output classes that are easily reachable from a given input. This emphasizes how the proposed mathematical relationship between the output of the network and the geometry of its inputs can be exploited as an explainable artificial intelligence tool.
{"title":"Cartan moving frames and the data manifolds","authors":"Eliot Tron, Rita Fioresi, Nicolas Couellan, Stéphane Puechmorel","doi":"arxiv-2409.12057","DOIUrl":"https://doi.org/arxiv-2409.12057","url":null,"abstract":"The purpose of this paper is to employ the language of Cartan moving frames\u0000to study the geometry of the data manifolds and its Riemannian structure, via\u0000the data information metric and its curvature at data points. Using this\u0000framework and through experiments, explanations on the response of a neural\u0000network are given by pointing out the output classes that are easily reachable\u0000from a given input. This emphasizes how the proposed mathematical relationship\u0000between the output of the network and the geometry of its inputs can be\u0000exploited as an explainable artificial intelligence tool.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Samudre, Mircea Petrache, Brian D. Nord, Shubhendu Trivedi
There has been much recent interest in designing symmetry-aware neural networks (NNs) exhibiting relaxed equivariance. Such NNs aim to interpolate between being exactly equivariant and being fully flexible, affording consistent performance benefits. In a separate line of work, certain structured parameter matrices -- those with displacement structure, characterized by low displacement rank (LDR) -- have been used to design small-footprint NNs. Displacement structure enables fast function and gradient evaluation, but permits accurate approximations via compression primarily to classical convolutional neural networks (CNNs). In this work, we propose a general framework -- based on a novel construction of symmetry-based structured matrices -- to build approximately equivariant NNs with significantly reduced parameter counts. Our framework integrates the two aforementioned lines of work via the use of so-called Group Matrices (GMs), a forgotten precursor to the modern notion of regular representations of finite groups. GMs allow the design of structured matrices -- resembling LDR matrices -- which generalize the linear operations of a classical CNN from cyclic groups to general finite groups and their homogeneous spaces. We show that GMs can be employed to extend all the elementary operations of CNNs to general discrete groups. Further, the theory of structured matrices based on GMs provides a generalization of LDR theory focussed on matrices with cyclic structure, providing a tool for implementing approximate equivariance for discrete groups. We test GM-based architectures on a variety of tasks in the presence of relaxed symmetry. We report that our framework consistently performs competitively compared to approximately equivariant NNs, and other structured matrix-based compression frameworks, sometimes with a one or two orders of magnitude lower parameter count.
{"title":"Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks","authors":"Ashwin Samudre, Mircea Petrache, Brian D. Nord, Shubhendu Trivedi","doi":"arxiv-2409.11772","DOIUrl":"https://doi.org/arxiv-2409.11772","url":null,"abstract":"There has been much recent interest in designing symmetry-aware neural\u0000networks (NNs) exhibiting relaxed equivariance. Such NNs aim to interpolate\u0000between being exactly equivariant and being fully flexible, affording\u0000consistent performance benefits. In a separate line of work, certain structured\u0000parameter matrices -- those with displacement structure, characterized by low\u0000displacement rank (LDR) -- have been used to design small-footprint NNs.\u0000Displacement structure enables fast function and gradient evaluation, but\u0000permits accurate approximations via compression primarily to classical\u0000convolutional neural networks (CNNs). In this work, we propose a general\u0000framework -- based on a novel construction of symmetry-based structured\u0000matrices -- to build approximately equivariant NNs with significantly reduced\u0000parameter counts. Our framework integrates the two aforementioned lines of work\u0000via the use of so-called Group Matrices (GMs), a forgotten precursor to the\u0000modern notion of regular representations of finite groups. GMs allow the design\u0000of structured matrices -- resembling LDR matrices -- which generalize the\u0000linear operations of a classical CNN from cyclic groups to general finite\u0000groups and their homogeneous spaces. We show that GMs can be employed to extend\u0000all the elementary operations of CNNs to general discrete groups. Further, the\u0000theory of structured matrices based on GMs provides a generalization of LDR\u0000theory focussed on matrices with cyclic structure, providing a tool for\u0000implementing approximate equivariance for discrete groups. We test GM-based\u0000architectures on a variety of tasks in the presence of relaxed symmetry. We\u0000report that our framework consistently performs competitively compared to\u0000approximately equivariant NNs, and other structured matrix-based compression\u0000frameworks, sometimes with a one or two orders of magnitude lower parameter\u0000count.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of system identification for stochastic continuous-time dynamics, based on a single finite-length state trajectory. We present a method for estimating the possibly unstable open-loop matrix by employing properly randomized control inputs. Then, we establish theoretical performance guarantees showing that the estimation error decays with trajectory length, a measure of excitability, and the signal-to-noise ratio, while it grows with dimension. Numerical illustrations that showcase the rates of learning the dynamics, will be provided as well. To perform the theoretical analysis, we develop new technical tools that are of independent interest. That includes non-asymptotic stochastic bounds for highly non-stationary martingales and generalized laws of iterated logarithms, among others.
{"title":"Learning Unstable Continuous-Time Stochastic Linear Control Systems","authors":"Reza Sadeghi Hafshejani, Mohamad Kazem Shirani Fradonbeh","doi":"arxiv-2409.11327","DOIUrl":"https://doi.org/arxiv-2409.11327","url":null,"abstract":"We study the problem of system identification for stochastic continuous-time\u0000dynamics, based on a single finite-length state trajectory. We present a method\u0000for estimating the possibly unstable open-loop matrix by employing properly\u0000randomized control inputs. Then, we establish theoretical performance\u0000guarantees showing that the estimation error decays with trajectory length, a\u0000measure of excitability, and the signal-to-noise ratio, while it grows with\u0000dimension. Numerical illustrations that showcase the rates of learning the\u0000dynamics, will be provided as well. To perform the theoretical analysis, we\u0000develop new technical tools that are of independent interest. That includes\u0000non-asymptotic stochastic bounds for highly non-stationary martingales and\u0000generalized laws of iterated logarithms, among others.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Priscilla Ong, Manuel Haußmann, Otto Lönnroth, Harri Lähdesmäki
Modelling longitudinal data is an important yet challenging task. These datasets can be high-dimensional, contain non-linear effects and time-varying covariates. Gaussian process (GP) prior-based variational autoencoders (VAEs) have emerged as a promising approach due to their ability to model time-series data. However, they are costly to train and struggle to fully exploit the rich covariates characteristic of longitudinal data, making them difficult for practitioners to use effectively. In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model. We highlight theoretical connections between it and GP-based techniques, providing a unified framework for this class of methods. Our proposal performs competitively compared to existing approaches across simulated and real-world datasets.
建立纵向数据模型是一项重要而又具有挑战性的任务。这些数据集可能是高维数据,包含非线性效应和时变变量。基于高斯过程(GP)先验的变异自动编码器(VAE)因其能够对时间序列数据建模而成为一种很有前途的方法。然而,它们的训练成本很高,而且难以充分利用纵向数据所特有的丰富变量,因此实践者很难有效地使用它们。在这项工作中,我们利用线性混合模型(LMMs)和摊销变异推理(amortized variational inference)为VAEs提供条件先验,并提出了LMM-VAE--一种可扩展、可解释和可识别的模型。我们强调了它与基于 GP 的技术之间的理论联系,为这类方法提供了一个统一的框架。与现有方法相比,我们的建议在模拟和真实世界数据集上的表现极具竞争力。
{"title":"Latent mixed-effect models for high-dimensional longitudinal data","authors":"Priscilla Ong, Manuel Haußmann, Otto Lönnroth, Harri Lähdesmäki","doi":"arxiv-2409.11008","DOIUrl":"https://doi.org/arxiv-2409.11008","url":null,"abstract":"Modelling longitudinal data is an important yet challenging task. These\u0000datasets can be high-dimensional, contain non-linear effects and time-varying\u0000covariates. Gaussian process (GP) prior-based variational autoencoders (VAEs)\u0000have emerged as a promising approach due to their ability to model time-series\u0000data. However, they are costly to train and struggle to fully exploit the rich\u0000covariates characteristic of longitudinal data, making them difficult for\u0000practitioners to use effectively. In this work, we leverage linear mixed models\u0000(LMMs) and amortized variational inference to provide conditional priors for\u0000VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model. We\u0000highlight theoretical connections between it and GP-based techniques, providing\u0000a unified framework for this class of methods. Our proposal performs\u0000competitively compared to existing approaches across simulated and real-world\u0000datasets.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"212 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One essential goal of constructing coarse-grained molecular dynamics (CGMD) models is to accurately predict non-equilibrium processes beyond the atomistic scale. While a CG model can be constructed by projecting the full dynamics onto a set of resolved variables, the dynamics of the CG variables can recover the full dynamics only when the conditional distribution of the unresolved variables is close to the one associated with the particular projection operator. In particular, the model's applicability to various non-equilibrium processes is generally unwarranted due to the inconsistency in the conditional distribution. Here, we present a data-driven approach for constructing CGMD models that retain certain generalization ability for non-equilibrium processes. Unlike the conventional CG models based on pre-selected CG variables (e.g., the center of mass), the present CG model seeks a set of auxiliary CG variables based on the time-lagged independent component analysis to minimize the entropy contribution of the unresolved variables. This ensures the distribution of the unresolved variables under a broad range of non-equilibrium conditions approaches the one under equilibrium. Numerical results of a polymer melt system demonstrate the significance of this broadly-overlooked metric for the model's generalization ability, and the effectiveness of the present CG model for predicting the complex viscoelastic responses under various non-equilibrium flows.
{"title":"On the generalization ability of coarse-grained molecular dynamics models for non-equilibrium processes","authors":"Liyao Lyu, Huan Lei","doi":"arxiv-2409.11519","DOIUrl":"https://doi.org/arxiv-2409.11519","url":null,"abstract":"One essential goal of constructing coarse-grained molecular dynamics (CGMD)\u0000models is to accurately predict non-equilibrium processes beyond the atomistic\u0000scale. While a CG model can be constructed by projecting the full dynamics onto\u0000a set of resolved variables, the dynamics of the CG variables can recover the\u0000full dynamics only when the conditional distribution of the unresolved\u0000variables is close to the one associated with the particular projection\u0000operator. In particular, the model's applicability to various non-equilibrium\u0000processes is generally unwarranted due to the inconsistency in the conditional\u0000distribution. Here, we present a data-driven approach for constructing CGMD\u0000models that retain certain generalization ability for non-equilibrium\u0000processes. Unlike the conventional CG models based on pre-selected CG variables\u0000(e.g., the center of mass), the present CG model seeks a set of auxiliary CG\u0000variables based on the time-lagged independent component analysis to minimize\u0000the entropy contribution of the unresolved variables. This ensures the\u0000distribution of the unresolved variables under a broad range of non-equilibrium\u0000conditions approaches the one under equilibrium. Numerical results of a polymer\u0000melt system demonstrate the significance of this broadly-overlooked metric for\u0000the model's generalization ability, and the effectiveness of the present CG\u0000model for predicting the complex viscoelastic responses under various\u0000non-equilibrium flows.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"89 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a novel family of outlier detection algorithms based on Cluster Catch Digraphs (CCDs), specifically tailored to address the challenges of high dimensionality and varying cluster shapes, which deteriorate the performance of most traditional outlier detection methods. We propose the Uniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and Neighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive variants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data sets with arbitrary cluster shapes and high dimensions. We present the advantages and shortcomings of these algorithms and provide the motivation or need to define each particular algorithm. Through comprehensive Monte Carlo simulations, we assess their performance and demonstrate the robustness and effectiveness of our algorithms across various settings and contamination levels. We also illustrate the use of our algorithms on various real-life data sets. The U-MCCD algorithm efficiently identifies outliers while maintaining high true negative rates, and the SU-MCCD algorithm shows substantial improvement in handling non-uniform clusters. Additionally, the UN-MCCD and SUN-MCCD algorithms address the limitations of existing methods in high-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for clustering and outlier detection. Our results indicate that these novel algorithms offer substantial advancements in the accuracy and adaptability of outlier detection, providing a valuable tool for various real-world applications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs, $k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.
{"title":"Outlier Detection with Cluster Catch Digraphs","authors":"Rui Shi, Nedret Billor, Elvan Ceyhan","doi":"arxiv-2409.11596","DOIUrl":"https://doi.org/arxiv-2409.11596","url":null,"abstract":"This paper introduces a novel family of outlier detection algorithms based on\u0000Cluster Catch Digraphs (CCDs), specifically tailored to address the challenges\u0000of high dimensionality and varying cluster shapes, which deteriorate the\u0000performance of most traditional outlier detection methods. We propose the\u0000Uniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and\u0000Neighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive\u0000variants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data\u0000sets with arbitrary cluster shapes and high dimensions. We present the\u0000advantages and shortcomings of these algorithms and provide the motivation or\u0000need to define each particular algorithm. Through comprehensive Monte Carlo\u0000simulations, we assess their performance and demonstrate the robustness and\u0000effectiveness of our algorithms across various settings and contamination\u0000levels. We also illustrate the use of our algorithms on various real-life data\u0000sets. The U-MCCD algorithm efficiently identifies outliers while maintaining\u0000high true negative rates, and the SU-MCCD algorithm shows substantial\u0000improvement in handling non-uniform clusters. Additionally, the UN-MCCD and\u0000SUN-MCCD algorithms address the limitations of existing methods in\u0000high-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for\u0000clustering and outlier detection. Our results indicate that these novel\u0000algorithms offer substantial advancements in the accuracy and adaptability of\u0000outlier detection, providing a valuable tool for various real-world\u0000applications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs,\u0000$k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing demand for accurate, efficient, and scalable solutions in computational mechanics highlights the need for advanced operator learning algorithms that can efficiently handle large datasets while providing reliable uncertainty quantification. This paper introduces a novel Gaussian Process (GP) based neural operator for solving parametric differential equations. The approach proposed leverages the expressive capability of deterministic neural operators and the uncertainty awareness of conventional GP. In particular, we propose a ``neural operator-embedded kernel'' wherein the GP kernel is formulated in the latent space learned using a neural operator. Further, we exploit a stochastic dual descent (SDD) algorithm for simultaneously training the neural operator parameters and the GP hyperparameters. Our approach addresses the (a) resolution dependence and (b) cubic complexity of traditional GP models, allowing for input-resolution independence and scalability in high-dimensional and non-linear parametric systems, such as those encountered in computational mechanics. We apply our method to a range of non-linear parametric partial differential equations (PDEs) and demonstrate its superiority in both computational efficiency and accuracy compared to standard GP models and wavelet neural operators. Our experimental results highlight the efficacy of this framework in solving complex PDEs while maintaining robustness in uncertainty estimation, positioning it as a scalable and reliable operator-learning algorithm for computational mechanics.
对精确、高效、可扩展的计算力学解决方案的需求日益增长,这凸显了对先进算子学习算法的需求,这种算法既能高效处理大型数据集,又能提供可靠的不确定性量化。本文介绍了一种基于高斯过程(GP)的新型神经算子,用于求解参数微分方程。本文提出的方法充分利用了确定性神经算子的表达能力和传统 GP 的不确定性意识。特别是,我们提出了一种 "神经算子嵌入内核",其中 GP 内核是在使用神经算子学习的潜空间中形成的。此外,我们还利用随机双降(SDD)算法同时训练神经算子参数和 GP 超参数。我们的方法解决了传统 GP 模型的(a)分辨率依赖性和(b)立方复杂性问题,从而实现了输入分辨率的独立性和高维非线性参数系统的可扩展性,例如在计算力学中遇到的系统。我们将我们的方法应用于一系列非线性参数偏微分方程(PDEs),并证明与标准 GP 模型和小波神经算子相比,我们的方法在计算效率和准确性方面都更胜一筹。我们的实验结果凸显了这一框架在求解复杂偏微分方程时的有效性,同时保持了不确定性估计的鲁棒性,使其成为计算力学领域一种可扩展且可靠的算子学习算法。
{"title":"Towards Gaussian Process for operator learning: an uncertainty aware resolution independent operator learning algorithm for computational mechanics","authors":"Sawan Kumar, Rajdip Nayek, Souvik Chakraborty","doi":"arxiv-2409.10972","DOIUrl":"https://doi.org/arxiv-2409.10972","url":null,"abstract":"The growing demand for accurate, efficient, and scalable solutions in\u0000computational mechanics highlights the need for advanced operator learning\u0000algorithms that can efficiently handle large datasets while providing reliable\u0000uncertainty quantification. This paper introduces a novel Gaussian Process (GP)\u0000based neural operator for solving parametric differential equations. The\u0000approach proposed leverages the expressive capability of deterministic neural\u0000operators and the uncertainty awareness of conventional GP. In particular, we\u0000propose a ``neural operator-embedded kernel'' wherein the GP kernel is\u0000formulated in the latent space learned using a neural operator. Further, we\u0000exploit a stochastic dual descent (SDD) algorithm for simultaneously training\u0000the neural operator parameters and the GP hyperparameters. Our approach\u0000addresses the (a) resolution dependence and (b) cubic complexity of traditional\u0000GP models, allowing for input-resolution independence and scalability in\u0000high-dimensional and non-linear parametric systems, such as those encountered\u0000in computational mechanics. We apply our method to a range of non-linear\u0000parametric partial differential equations (PDEs) and demonstrate its\u0000superiority in both computational efficiency and accuracy compared to standard\u0000GP models and wavelet neural operators. Our experimental results highlight the\u0000efficacy of this framework in solving complex PDEs while maintaining robustness\u0000in uncertainty estimation, positioning it as a scalable and reliable\u0000operator-learning algorithm for computational mechanics.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}