Anna Emilie J. Wedenborg, Michael Alexander Harborg, Andreas Bigom, Oliver Elmgreen, Marcus Presutti, Andreas Råskov, Fumiko Kano Glückstad, Mikkel Schmidt, Morten Mørup
This paper introduces a novel framework for Archetypal Analysis (AA) tailored to ordinal data, particularly from questionnaires. Unlike existing methods, the proposed method, Ordinal Archetypal Analysis (OAA), bypasses the two-step process of transforming ordinal data into continuous scales and operates directly on the ordinal data. We extend traditional AA methods to handle the subjective nature of questionnaire-based data, acknowledging individual differences in scale perception. We introduce the Response Bias Ordinal Archetypal Analysis (RBOAA), which learns individualized scales for each subject during optimization. The effectiveness of these methods is demonstrated on synthetic data and the European Social Survey dataset, highlighting their potential to provide deeper insights into human behavior and perception. The study underscores the importance of considering response bias in cross-national research and offers a principled approach to analyzing ordinal data through Archetypal Analysis.
本文介绍了一种新颖的原型分析(AA)框架,专门针对序数数据,尤其是来自问卷调查的数据。与现有方法不同,本文提出的方法--序数原型分析(OAA)--绕过了将序数数据转换为连续量表的两步过程,直接对序数数据进行操作。我们扩展了传统的 AA 方法,以处理基于问卷的数据的主观性,承认量表感知的个体差异。我们引入了响应偏差序数弧形分析法(RBOAA),它能在优化过程中为每个受试者学习个性化的量表。我们在合成数据和欧洲社会调查数据集上证明了这些方法的有效性,从而凸显了它们在深入了解人类行为和感知方面的潜力。该研究强调了在跨国研究中考虑反应偏差的重要性,并提供了一种通过原型分析法分析序数数据的原则性方法。
{"title":"Modeling Human Responses by Ordinal Archetypal Analysis","authors":"Anna Emilie J. Wedenborg, Michael Alexander Harborg, Andreas Bigom, Oliver Elmgreen, Marcus Presutti, Andreas Råskov, Fumiko Kano Glückstad, Mikkel Schmidt, Morten Mørup","doi":"arxiv-2409.07934","DOIUrl":"https://doi.org/arxiv-2409.07934","url":null,"abstract":"This paper introduces a novel framework for Archetypal Analysis (AA) tailored\u0000to ordinal data, particularly from questionnaires. Unlike existing methods, the\u0000proposed method, Ordinal Archetypal Analysis (OAA), bypasses the two-step\u0000process of transforming ordinal data into continuous scales and operates\u0000directly on the ordinal data. We extend traditional AA methods to handle the\u0000subjective nature of questionnaire-based data, acknowledging individual\u0000differences in scale perception. We introduce the Response Bias Ordinal\u0000Archetypal Analysis (RBOAA), which learns individualized scales for each\u0000subject during optimization. The effectiveness of these methods is demonstrated\u0000on synthetic data and the European Social Survey dataset, highlighting their\u0000potential to provide deeper insights into human behavior and perception. The\u0000study underscores the importance of considering response bias in cross-national\u0000research and offers a principled approach to analyzing ordinal data through\u0000Archetypal Analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Hao, Wentao Fu, Xuanze Chen, Chengxiang Jin, Jiajun Zhou, Shanqing Yu, Qi Xuan
Traditional anomalous traffic detection methods are based on single-view analysis, which has obvious limitations in dealing with complex attacks and encrypted communications. In this regard, we propose a Multi-view Feature Fusion (MuFF) method for network anomaly traffic detection. MuFF models the temporal and interactive relationships of packets in network traffic based on the temporal and interactive viewpoints respectively. It learns temporal and interactive features. These features are then fused from different perspectives for anomaly traffic detection. Extensive experiments on six real traffic datasets show that MuFF has excellent performance in network anomalous traffic detection, which makes up for the shortcomings of detection under a single perspective.
{"title":"Network Anomaly Traffic Detection via Multi-view Feature Fusion","authors":"Song Hao, Wentao Fu, Xuanze Chen, Chengxiang Jin, Jiajun Zhou, Shanqing Yu, Qi Xuan","doi":"arxiv-2409.08020","DOIUrl":"https://doi.org/arxiv-2409.08020","url":null,"abstract":"Traditional anomalous traffic detection methods are based on single-view\u0000analysis, which has obvious limitations in dealing with complex attacks and\u0000encrypted communications. In this regard, we propose a Multi-view Feature\u0000Fusion (MuFF) method for network anomaly traffic detection. MuFF models the\u0000temporal and interactive relationships of packets in network traffic based on\u0000the temporal and interactive viewpoints respectively. It learns temporal and\u0000interactive features. These features are then fused from different perspectives\u0000for anomaly traffic detection. Extensive experiments on six real traffic\u0000datasets show that MuFF has excellent performance in network anomalous traffic\u0000detection, which makes up for the shortcomings of detection under a single\u0000perspective.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Loconte, Antonio Mari, Gennaro Gala, Robert Peharz, Cassio de Campos, Erik Quaeghebeur, Gennaro Vessio, Antonio Vergari
This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular "Lego block" approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.
{"title":"What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?","authors":"Lorenzo Loconte, Antonio Mari, Gennaro Gala, Robert Peharz, Cassio de Campos, Erik Quaeghebeur, Gennaro Vessio, Antonio Vergari","doi":"arxiv-2409.07953","DOIUrl":"https://doi.org/arxiv-2409.07953","url":null,"abstract":"This paper establishes a rigorous connection between circuit representations\u0000and tensor factorizations, two seemingly distinct yet fundamentally related\u0000areas. By connecting these fields, we highlight a series of opportunities that\u0000can benefit both communities. Our work generalizes popular tensor\u0000factorizations within the circuit language, and unifies various circuit\u0000learning algorithms under a single, generalized hierarchical factorization\u0000framework. Specifically, we introduce a modular \"Lego block\" approach to build\u0000tensorized circuit architectures. This, in turn, allows us to systematically\u0000construct and explore various circuit and tensor factorization models while\u0000maintaining tractability. This connection not only clarifies similarities and\u0000differences in existing models, but also enables the development of a\u0000comprehensive pipeline for building and optimizing new circuit/tensor\u0000factorization architectures. We show the effectiveness of our framework through\u0000extensive empirical evaluations, and highlight new research opportunities for\u0000tensor factorizations in probabilistic modeling.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai
This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID centers around a multi-stage purification process that leverages multiple rounds of diffusion-denoising loops at the early time-steps of the diffusion models, and the integration of Tucker decomposition, an extension of matrix factorization, to remove adversarial noise at high-noise regimes. Consequently, LoRID increases the effective diffusion time-steps and overcomes strong adversarial attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.
{"title":"LoRID: Low-Rank Iterative Diffusion for Adversarial Purification","authors":"Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai","doi":"arxiv-2409.08255","DOIUrl":"https://doi.org/arxiv-2409.08255","url":null,"abstract":"This work presents an information-theoretic examination of diffusion-based\u0000purification methods, the state-of-the-art adversarial defenses that utilize\u0000diffusion models to remove malicious perturbations in adversarial examples. By\u0000theoretically characterizing the inherent purification errors associated with\u0000the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank\u0000Iterative Diffusion purification method designed to remove adversarial\u0000perturbation with low intrinsic purification errors. LoRID centers around a\u0000multi-stage purification process that leverages multiple rounds of\u0000diffusion-denoising loops at the early time-steps of the diffusion models, and\u0000the integration of Tucker decomposition, an extension of matrix factorization,\u0000to remove adversarial noise at high-noise regimes. Consequently, LoRID\u0000increases the effective diffusion time-steps and overcomes strong adversarial\u0000attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ,\u0000and ImageNet datasets under both white-box and black-box settings.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenhao Zhao, Minhong Zhu, Chen Wang, Sijia Wang, Jiqiang Zhang, Li Chen, Weiran Cai
Graph Contrastive Learning (GCL) seeks to learn nodal or graph representations that contain maximal consistent information from graph-structured data. While node-level contrasting modes are dominating, some efforts commence to explore consistency across different scales. Yet, they tend to lose consistent information and be contaminated by disturbing features. Here, we introduce MUX-GCL, a novel cross-scale contrastive learning paradigm that utilizes multiplex representations as effective patches. While this learning mode minimizes contaminating noises, a commensurate contrasting strategy using positional affinities further avoids information loss by correcting false negative pairs across scales. Extensive downstream experiments demonstrate that MUX-GCL yields multiple state-of-the-art results on public datasets. Our theoretical analysis further guarantees the new objective function as a stricter lower bound of mutual information of raw input features and output embeddings, which rationalizes this paradigm. Code is available at https://github.com/MUX-GCL/Code.
{"title":"Multiplex Graph Contrastive Learning with Soft Negatives","authors":"Zhenhao Zhao, Minhong Zhu, Chen Wang, Sijia Wang, Jiqiang Zhang, Li Chen, Weiran Cai","doi":"arxiv-2409.08010","DOIUrl":"https://doi.org/arxiv-2409.08010","url":null,"abstract":"Graph Contrastive Learning (GCL) seeks to learn nodal or graph\u0000representations that contain maximal consistent information from\u0000graph-structured data. While node-level contrasting modes are dominating, some\u0000efforts commence to explore consistency across different scales. Yet, they tend\u0000to lose consistent information and be contaminated by disturbing features.\u0000Here, we introduce MUX-GCL, a novel cross-scale contrastive learning paradigm\u0000that utilizes multiplex representations as effective patches. While this\u0000learning mode minimizes contaminating noises, a commensurate contrasting\u0000strategy using positional affinities further avoids information loss by\u0000correcting false negative pairs across scales. Extensive downstream experiments\u0000demonstrate that MUX-GCL yields multiple state-of-the-art results on public\u0000datasets. Our theoretical analysis further guarantees the new objective\u0000function as a stricter lower bound of mutual information of raw input features\u0000and output embeddings, which rationalizes this paradigm. Code is available at\u0000https://github.com/MUX-GCL/Code.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph representation learning has emerged as a powerful tool for preserving graph topology when mapping nodes to vector representations, enabling various downstream tasks such as node classification and community detection. However, most current graph neural network models face the challenge of requiring extensive labeled data, which limits their practical applicability in real-world scenarios where labeled data is scarce. To address this challenge, researchers have explored Graph Contrastive Learning (GCL), which leverages enhanced graph data and contrastive learning techniques. While promising, existing GCL methods often struggle with effectively capturing both local and global graph structures, and balancing the trade-off between nodelevel and graph-level representations. In this work, we propose Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning (GRE2-MDCL). Our model introduces a novel triple network architecture with a multi-head attention GNN as the core. GRE2-MDCL first globally and locally augments the input graph using SVD and LAGNN techniques. It then constructs a multidimensional contrastive loss, incorporating cross-network, cross-view, and neighbor contrast, to optimize the model. Extensive experiments on benchmark datasets Cora, Citeseer, and PubMed demonstrate that GRE2-MDCL achieves state-of-the-art performance, with average accuracies of 82.5%, 72.5%, and 81.6% respectively. Visualizations further show tighter intra-cluster aggregation and clearer inter-cluster boundaries, highlighting the effectiveness of our framework in improving upon baseline GCL models.
{"title":"GRE^2-MDCL: Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning","authors":"Kaizhe Fan, Quanjun Li","doi":"arxiv-2409.07725","DOIUrl":"https://doi.org/arxiv-2409.07725","url":null,"abstract":"Graph representation learning has emerged as a powerful tool for preserving\u0000graph topology when mapping nodes to vector representations, enabling various\u0000downstream tasks such as node classification and community detection. However,\u0000most current graph neural network models face the challenge of requiring\u0000extensive labeled data, which limits their practical applicability in\u0000real-world scenarios where labeled data is scarce. To address this challenge,\u0000researchers have explored Graph Contrastive Learning (GCL), which leverages\u0000enhanced graph data and contrastive learning techniques. While promising,\u0000existing GCL methods often struggle with effectively capturing both local and\u0000global graph structures, and balancing the trade-off between nodelevel and\u0000graph-level representations. In this work, we propose Graph Representation\u0000Embedding Enhanced via Multidimensional Contrastive Learning (GRE2-MDCL). Our\u0000model introduces a novel triple network architecture with a multi-head\u0000attention GNN as the core. GRE2-MDCL first globally and locally augments the\u0000input graph using SVD and LAGNN techniques. It then constructs a\u0000multidimensional contrastive loss, incorporating cross-network, cross-view, and\u0000neighbor contrast, to optimize the model. Extensive experiments on benchmark\u0000datasets Cora, Citeseer, and PubMed demonstrate that GRE2-MDCL achieves\u0000state-of-the-art performance, with average accuracies of 82.5%, 72.5%, and\u000081.6% respectively. Visualizations further show tighter intra-cluster\u0000aggregation and clearer inter-cluster boundaries, highlighting the\u0000effectiveness of our framework in improving upon baseline GCL models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recently proposed Kolmogorov-Arnold Networks (KANs) offer enhanced interpretability and greater model expressiveness. However, KANs also present challenges related to privacy leakage during inference. Homomorphic encryption (HE) facilitates privacy-preserving inference for deep learning models, enabling resource-limited users to benefit from deep learning services while ensuring data security. Yet, the complex structure of KANs, incorporating nonlinear elements like the SiLU activation function and B-spline functions, renders existing privacy-preserving inference techniques inadequate. To address this issue, we propose an accurate and efficient privacy-preserving inference scheme tailored for KANs. Our approach introduces a task-specific polynomial approximation for the SiLU activation function, dynamically adjusting the approximation range to ensure high accuracy on real-world datasets. Additionally, we develop an efficient method for computing B-spline functions within the HE domain, leveraging techniques such as repeat packing, lazy combination, and comparison functions. We evaluate the effectiveness of our privacy-preserving KAN inference scheme on both symbolic formula evaluation and image classification. The experimental results show that our model achieves accuracy comparable to plaintext KANs across various datasets and outperforms plaintext MLPs. Additionally, on the CIFAR-10 dataset, our inference latency achieves over 7 times speedup compared to the naive method.
最近提出的 Kolmogorov-Arnold 网络(KANs)具有更强的可解释性和更高的模型表达能力。然而,KANs 也面临着推理过程中隐私泄露的挑战。同态加密(HE)有助于深度学习模型的隐私保护推理,使资源有限的用户能够受益于深度学习服务,同时确保数据安全。然而,KANs结构复杂,包含SiLU激活函数和B-样条函数等非线性元素,使得现有的隐私保护推理技术无法满足需要。为了解决这个问题,我们提出了一种专为 KAN 量身定制的准确高效的隐私保护推理方案。我们的方法为 SiLU 激活函数引入了针对特定任务的多项式逼近,动态调整逼近范围,以确保在真实世界数据集上的高精度。我们评估了保护隐私的 KAN 推理方案在符号公式评估和图像分类方面的有效性。实验结果表明,我们的模型在各种数据集上实现了与明文 KAN 相当的准确性,并且优于明文 MLP。此外,在 CIFAR-10 数据集上,我们的推理延迟比原始方法提高了 7 倍以上。
{"title":"Efficient Privacy-Preserving KAN Inference Using Homomorphic Encryption","authors":"Zhizheng Lai, Yufei Zhou, Peijia Zheng, Lin Chen","doi":"arxiv-2409.07751","DOIUrl":"https://doi.org/arxiv-2409.07751","url":null,"abstract":"The recently proposed Kolmogorov-Arnold Networks (KANs) offer enhanced\u0000interpretability and greater model expressiveness. However, KANs also present\u0000challenges related to privacy leakage during inference. Homomorphic encryption\u0000(HE) facilitates privacy-preserving inference for deep learning models,\u0000enabling resource-limited users to benefit from deep learning services while\u0000ensuring data security. Yet, the complex structure of KANs, incorporating\u0000nonlinear elements like the SiLU activation function and B-spline functions,\u0000renders existing privacy-preserving inference techniques inadequate. To address\u0000this issue, we propose an accurate and efficient privacy-preserving inference\u0000scheme tailored for KANs. Our approach introduces a task-specific polynomial\u0000approximation for the SiLU activation function, dynamically adjusting the\u0000approximation range to ensure high accuracy on real-world datasets.\u0000Additionally, we develop an efficient method for computing B-spline functions\u0000within the HE domain, leveraging techniques such as repeat packing, lazy\u0000combination, and comparison functions. We evaluate the effectiveness of our\u0000privacy-preserving KAN inference scheme on both symbolic formula evaluation and\u0000image classification. The experimental results show that our model achieves\u0000accuracy comparable to plaintext KANs across various datasets and outperforms\u0000plaintext MLPs. Additionally, on the CIFAR-10 dataset, our inference latency\u0000achieves over 7 times speedup compared to the naive method.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to the traditional linear probing method in transfer learning. Linear probing, often applied to the final layer of pre-trained models, is limited by its inability to model complex relationships in data. To address this, we propose substituting the linear probing layer with KAN, which leverages spline-based representations to approximate intricate functions. In this study, we integrate KAN with a ResNet-50 model pre-trained on ImageNet and evaluate its performance on the CIFAR-10 dataset. We perform a systematic hyperparameter search, focusing on grid size and spline degree (k), to optimize KAN's flexibility and accuracy. Our results demonstrate that KAN consistently outperforms traditional linear probing, achieving significant improvements in accuracy and generalization across a range of configurations. These findings indicate that KAN offers a more powerful and adaptable alternative to conventional linear probing techniques in transfer learning.
本文介绍了 Kolmogorov-Arnold 网络(KAN),作为迁移学习中传统线性探测方法的一种增强。线性探测通常应用于预训练模型的最后一层,但因其无法对数据中的复杂关系建模而受到限制。为了解决这个问题,我们建议用 KAN 代替线性探测层,KAN 利用基于样条的表示来逼近复杂的函数。在本研究中,我们将 KAN 与在 ImageNet 上预先训练好的 ResNet-50 模型进行了整合,并在 CIFAR-10 数据集上对其性能进行了评估。我们进行了系统的超参数搜索,重点关注网格大小和样条线度(k),以优化 KAN 的灵活性和准确性。我们的结果表明,KAN 的性能始终优于传统的线性探测,在各种配置下都能显著提高精度和泛化能力。这些发现表明,在迁移学习中,KAN 提供了一种比传统线性探测技术更强大、适应性更强的替代方法。
{"title":"Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning","authors":"Sheng Shen, Rabih Younes","doi":"arxiv-2409.07763","DOIUrl":"https://doi.org/arxiv-2409.07763","url":null,"abstract":"This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to\u0000the traditional linear probing method in transfer learning. Linear probing,\u0000often applied to the final layer of pre-trained models, is limited by its\u0000inability to model complex relationships in data. To address this, we propose\u0000substituting the linear probing layer with KAN, which leverages spline-based\u0000representations to approximate intricate functions. In this study, we integrate\u0000KAN with a ResNet-50 model pre-trained on ImageNet and evaluate its performance\u0000on the CIFAR-10 dataset. We perform a systematic hyperparameter search,\u0000focusing on grid size and spline degree (k), to optimize KAN's flexibility and\u0000accuracy. Our results demonstrate that KAN consistently outperforms traditional\u0000linear probing, achieving significant improvements in accuracy and\u0000generalization across a range of configurations. These findings indicate that\u0000KAN offers a more powerful and adaptable alternative to conventional linear\u0000probing techniques in transfer learning.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach
Background: Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We present and apply privacy-preserving federated machine learning techniques for prognostic model building, where local survey data never leaves the legally safe harbors of the medical centers. Methods: We used centralized, local, and federated learning techniques on two healthcare datasets (GLA:D data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion. Results: In GLA:D data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.
{"title":"Privacy-preserving federated prediction of pain intensity change based on multi-center survey data","authors":"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach","doi":"arxiv-2409.07997","DOIUrl":"https://doi.org/arxiv-2409.07997","url":null,"abstract":"Background: Patient-reported survey data are used to train prognostic models\u0000aimed at improving healthcare. However, such data are typically available\u0000multi-centric and, for privacy reasons, cannot easily be centralized in one\u0000data repository. Models trained locally are less accurate, robust, and\u0000generalizable. We present and apply privacy-preserving federated machine\u0000learning techniques for prognostic model building, where local survey data\u0000never leaves the legally safe harbors of the medical centers. Methods: We used\u0000centralized, local, and federated learning techniques on two healthcare\u0000datasets (GLA:D data from the five health regions of Denmark and international\u0000SHARE data of 27 countries) to predict two different health outcomes. We\u0000compared linear regression, random forest regression, and random forest\u0000classification models trained on local data with those trained on the entire\u0000data in a centralized and in a federated fashion. Results: In GLA:D data,\u0000federated linear regression (R2 0.34, RMSE 18.2) and federated random forest\u0000regression (R2 0.34, RMSE 18.3) models outperform their local counterparts\u0000(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\u0000We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\u0000respectively) did not perform significantly better than the federated models.\u0000In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\u00000.84, AUROC: 0.66) perform significantly better than the local models (AC:\u00000.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\u0000prognostic models from multi-center surveys without compromising privacy and\u0000with only minimal or no compromise regarding model performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhifeng Hu, Chong Han, Wolfgang Gerstacker, Ian F. Akyildiz
Terahertz (THz) space communications (Tera-SpaceCom) is envisioned as a promising technology to enable various space science and communication applications. Mainly, the realm of Tera-SpaceCom consists of THz sensing for space exploration, data centers in space providing cloud services for space exploration tasks, and a low earth orbit (LEO) mega-constellation relaying these tasks to ground stations (GSs) or data centers via THz links. Moreover, to reduce the computational burden on data centers as well as resource consumption and latency in the relaying process, the LEO mega-constellation provides satellite edge computing (SEC) services to directly compute space exploration tasks without relaying these tasks to data centers. The LEO satellites that receive space exploration tasks offload (i.e., distribute) partial tasks to their neighboring LEO satellites, to further reduce their computational burden. However, efficient joint communication resource allocation and computing task offloading for the Tera-SpaceCom SEC network is an NP-hard mixed-integer nonlinear programming problem (MINLP), due to the discrete nature of space exploration tasks and sub-arrays as well as the continuous nature of transmit power. To tackle this challenge, a graph neural network (GNN)-deep reinforcement learning (DRL)-based joint resource allocation and task offloading (GRANT) algorithm is proposed with the target of long-term resource efficiency (RE). Particularly, GNNs learn relationships among different satellites from their connectivity information. Furthermore, multi-agent and multi-task mechanisms cooperatively train task offloading and resource allocation. Compared with benchmark solutions, GRANT not only achieves the highest RE with relatively low latency, but realizes the fewest trainable parameters and the shortest running time.
{"title":"Tera-SpaceCom: GNN-based Deep Reinforcement Learning for Joint Resource Allocation and Task Offloading in TeraHertz Band Space Networks","authors":"Zhifeng Hu, Chong Han, Wolfgang Gerstacker, Ian F. Akyildiz","doi":"arxiv-2409.07911","DOIUrl":"https://doi.org/arxiv-2409.07911","url":null,"abstract":"Terahertz (THz) space communications (Tera-SpaceCom) is envisioned as a\u0000promising technology to enable various space science and communication\u0000applications. Mainly, the realm of Tera-SpaceCom consists of THz sensing for\u0000space exploration, data centers in space providing cloud services for space\u0000exploration tasks, and a low earth orbit (LEO) mega-constellation relaying\u0000these tasks to ground stations (GSs) or data centers via THz links. Moreover,\u0000to reduce the computational burden on data centers as well as resource\u0000consumption and latency in the relaying process, the LEO mega-constellation\u0000provides satellite edge computing (SEC) services to directly compute space\u0000exploration tasks without relaying these tasks to data centers. The LEO\u0000satellites that receive space exploration tasks offload (i.e., distribute)\u0000partial tasks to their neighboring LEO satellites, to further reduce their\u0000computational burden. However, efficient joint communication resource\u0000allocation and computing task offloading for the Tera-SpaceCom SEC network is\u0000an NP-hard mixed-integer nonlinear programming problem (MINLP), due to the\u0000discrete nature of space exploration tasks and sub-arrays as well as the\u0000continuous nature of transmit power. To tackle this challenge, a graph neural\u0000network (GNN)-deep reinforcement learning (DRL)-based joint resource allocation\u0000and task offloading (GRANT) algorithm is proposed with the target of long-term\u0000resource efficiency (RE). Particularly, GNNs learn relationships among\u0000different satellites from their connectivity information. Furthermore,\u0000multi-agent and multi-task mechanisms cooperatively train task offloading and\u0000resource allocation. Compared with benchmark solutions, GRANT not only achieves\u0000the highest RE with relatively low latency, but realizes the fewest trainable\u0000parameters and the shortest running time.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}