We propose a prototype-based federated learning method designed for embedding networks in classification or verification tasks. Our focus is on scenarios where each client has data from a single class. The main challenge is to develop an embedding network that can distinguish between different classes while adhering to privacy constraints. Sharing true class prototypes with the server or other clients could potentially compromise sensitive information. To tackle this issue, we propose a proxy class prototype that will be shared among clients instead of the true class prototype. Our approach generates proxy class prototypes by linearly combining them with their nearest neighbors. This technique conceals the true class prototype while enabling clients to learn discriminative embedding networks. We compare our method to alternative techniques, such as adding random Gaussian noise and using random selection with cosine similarity constraints. Furthermore, we evaluate the robustness of our approach against gradient inversion attacks and introduce a measure for prototype leakage. This measure quantifies the extent of private information revealed when sharing the proposed proxy class prototype. Moreover, we provide a theoretical analysis of the convergence properties of our approach. Our proposed method for federated learning from scratch demonstrates its effectiveness through empirical results on three benchmark datasets: CIFAR-100, VoxCeleb1, and VGGFace2.
{"title":"FedHide: Federated Learning by Hiding in the Neighbors","authors":"Hyunsin Park, Sungrack Yun","doi":"arxiv-2409.07808","DOIUrl":"https://doi.org/arxiv-2409.07808","url":null,"abstract":"We propose a prototype-based federated learning method designed for embedding\u0000networks in classification or verification tasks. Our focus is on scenarios\u0000where each client has data from a single class. The main challenge is to\u0000develop an embedding network that can distinguish between different classes\u0000while adhering to privacy constraints. Sharing true class prototypes with the\u0000server or other clients could potentially compromise sensitive information. To\u0000tackle this issue, we propose a proxy class prototype that will be shared among\u0000clients instead of the true class prototype. Our approach generates proxy class\u0000prototypes by linearly combining them with their nearest neighbors. This\u0000technique conceals the true class prototype while enabling clients to learn\u0000discriminative embedding networks. We compare our method to alternative\u0000techniques, such as adding random Gaussian noise and using random selection\u0000with cosine similarity constraints. Furthermore, we evaluate the robustness of\u0000our approach against gradient inversion attacks and introduce a measure for\u0000prototype leakage. This measure quantifies the extent of private information\u0000revealed when sharing the proposed proxy class prototype. Moreover, we provide\u0000a theoretical analysis of the convergence properties of our approach. Our\u0000proposed method for federated learning from scratch demonstrates its\u0000effectiveness through empirical results on three benchmark datasets: CIFAR-100,\u0000VoxCeleb1, and VGGFace2.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Wang, Tianyuan Zhang, Yikai Han, Muyang Fang, Ting Jin, Jiaqi Kang
With recent breakthroughs in deep neural networks, numerous tasks within autonomous driving have exhibited remarkable performance. However, deep learning models are susceptible to adversarial attacks, presenting significant security risks to autonomous driving systems. Presently, end-to-end architectures have emerged as the predominant solution for autonomous driving, owing to their collaborative nature across different tasks. Yet, the implications of adversarial attacks on such models remain relatively unexplored. In this paper, we conduct comprehensive adversarial security research on the modular end-to-end autonomous driving model for the first time. We thoroughly consider the potential vulnerabilities in the model inference process and design a universal attack scheme through module-wise noise injection. We conduct large-scale experiments on the full-stack autonomous driving model and demonstrate that our attack method outperforms previous attack methods. We trust that our research will offer fresh insights into ensuring the safety and reliability of autonomous driving systems.
{"title":"Attack End-to-End Autonomous Driving through Module-Wise Noise","authors":"Lu Wang, Tianyuan Zhang, Yikai Han, Muyang Fang, Ting Jin, Jiaqi Kang","doi":"arxiv-2409.07706","DOIUrl":"https://doi.org/arxiv-2409.07706","url":null,"abstract":"With recent breakthroughs in deep neural networks, numerous tasks within\u0000autonomous driving have exhibited remarkable performance. However, deep\u0000learning models are susceptible to adversarial attacks, presenting significant\u0000security risks to autonomous driving systems. Presently, end-to-end\u0000architectures have emerged as the predominant solution for autonomous driving,\u0000owing to their collaborative nature across different tasks. Yet, the\u0000implications of adversarial attacks on such models remain relatively\u0000unexplored. In this paper, we conduct comprehensive adversarial security\u0000research on the modular end-to-end autonomous driving model for the first time.\u0000We thoroughly consider the potential vulnerabilities in the model inference\u0000process and design a universal attack scheme through module-wise noise\u0000injection. We conduct large-scale experiments on the full-stack autonomous\u0000driving model and demonstrate that our attack method outperforms previous\u0000attack methods. We trust that our research will offer fresh insights into\u0000ensuring the safety and reliability of autonomous driving systems.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orazio Pinti, Jeremy M. Budd, Franca Hoffmann, Assad A. Oberai
We present a novel probabilistic approach for generating multi-fidelity data while accounting for errors inherent in both low- and high-fidelity data. In this approach a graph Laplacian constructed from the low-fidelity data is used to define a multivariate Gaussian prior density for the coordinates of the true data points. In addition, few high-fidelity data points are used to construct a conjugate likelihood term. Thereafter, Bayes rule is applied to derive an explicit expression for the posterior density which is also multivariate Gaussian. The maximum textit{a posteriori} (MAP) estimate of this density is selected to be the optimal multi-fidelity estimate. It is shown that the MAP estimate and the covariance of the posterior density can be determined through the solution of linear systems of equations. Thereafter, two methods, one based on spectral truncation and another based on a low-rank approximation, are developed to solve these equations efficiently. The multi-fidelity approach is tested on a variety of problems in solid and fluid mechanics with data that represents vectors of quantities of interest and discretized spatial fields in one and two dimensions. The results demonstrate that by utilizing a small fraction of high-fidelity data, the multi-fidelity approach can significantly improve the accuracy of a large collection of low-fidelity data points.
{"title":"Graph Laplacian-based Bayesian Multi-fidelity Modeling","authors":"Orazio Pinti, Jeremy M. Budd, Franca Hoffmann, Assad A. Oberai","doi":"arxiv-2409.08211","DOIUrl":"https://doi.org/arxiv-2409.08211","url":null,"abstract":"We present a novel probabilistic approach for generating multi-fidelity data\u0000while accounting for errors inherent in both low- and high-fidelity data. In\u0000this approach a graph Laplacian constructed from the low-fidelity data is used\u0000to define a multivariate Gaussian prior density for the coordinates of the true\u0000data points. In addition, few high-fidelity data points are used to construct a\u0000conjugate likelihood term. Thereafter, Bayes rule is applied to derive an\u0000explicit expression for the posterior density which is also multivariate\u0000Gaussian. The maximum textit{a posteriori} (MAP) estimate of this density is\u0000selected to be the optimal multi-fidelity estimate. It is shown that the MAP\u0000estimate and the covariance of the posterior density can be determined through\u0000the solution of linear systems of equations. Thereafter, two methods, one based\u0000on spectral truncation and another based on a low-rank approximation, are\u0000developed to solve these equations efficiently. The multi-fidelity approach is\u0000tested on a variety of problems in solid and fluid mechanics with data that\u0000represents vectors of quantities of interest and discretized spatial fields in\u0000one and two dimensions. The results demonstrate that by utilizing a small\u0000fraction of high-fidelity data, the multi-fidelity approach can significantly\u0000improve the accuracy of a large collection of low-fidelity data points.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Ovinnikov, Eugene Bykovets, Joachim M. Buhmann
Inverse reinforcement learning methods aim to retrieve the reward function of a Markov decision process based on a dataset of expert demonstrations. The commonplace scarcity and heterogeneous sources of such demonstrations can lead to the absorption of spurious correlations in the data by the learned reward function. Consequently, this adaptation often exhibits behavioural overfitting to the expert data set when a policy is trained on the obtained reward function under distribution shift of the environment dynamics. In this work, we explore a novel regularization approach for inverse reinforcement learning methods based on the causal invariance principle with the goal of improved reward function generalization. By applying this regularization to both exact and approximate formulations of the learning task, we demonstrate superior policy performance when trained using the recovered reward functions in a transfer setting
{"title":"Learning Causally Invariant Reward Functions from Diverse Demonstrations","authors":"Ivan Ovinnikov, Eugene Bykovets, Joachim M. Buhmann","doi":"arxiv-2409.08012","DOIUrl":"https://doi.org/arxiv-2409.08012","url":null,"abstract":"Inverse reinforcement learning methods aim to retrieve the reward function of\u0000a Markov decision process based on a dataset of expert demonstrations. The\u0000commonplace scarcity and heterogeneous sources of such demonstrations can lead\u0000to the absorption of spurious correlations in the data by the learned reward\u0000function. Consequently, this adaptation often exhibits behavioural overfitting\u0000to the expert data set when a policy is trained on the obtained reward function\u0000under distribution shift of the environment dynamics. In this work, we explore\u0000a novel regularization approach for inverse reinforcement learning methods\u0000based on the causal invariance principle with the goal of improved reward\u0000function generalization. By applying this regularization to both exact and\u0000approximate formulations of the learning task, we demonstrate superior policy\u0000performance when trained using the recovered reward functions in a transfer\u0000setting","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uncertainty estimation is crucial in scientific data for machine learning. Current uncertainty estimation methods mainly focus on the model's inherent uncertainty, while neglecting the explicit modeling of noise in the data. Furthermore, noise estimation methods typically rely on temporal or spatial dependencies, which can pose a significant challenge in structured scientific data where such dependencies among samples are often absent. To address these challenges in scientific research, we propose the Taylor-Sensus Network (TSNet). TSNet innovatively uses a Taylor series expansion to model complex, heteroscedastic noise and proposes a deep Taylor block for aware noise distribution. TSNet includes a noise-aware contrastive learning module and a data density perception module for aleatoric and epistemic uncertainty. Additionally, an uncertainty combination operator is used to integrate these uncertainties, and the network is trained using a novel heteroscedastic mean square error loss. TSNet demonstrates superior performance over mainstream and state-of-the-art methods in experiments, highlighting its potential in scientific research and noise resistance. It will be open-source to facilitate the community of "AI for Science".
{"title":"Taylor-Sensus Network: Embracing Noise to Enlighten Uncertainty for Scientific Data","authors":"Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Jintao Meng, Dawei Zhang","doi":"arxiv-2409.07942","DOIUrl":"https://doi.org/arxiv-2409.07942","url":null,"abstract":"Uncertainty estimation is crucial in scientific data for machine learning.\u0000Current uncertainty estimation methods mainly focus on the model's inherent\u0000uncertainty, while neglecting the explicit modeling of noise in the data.\u0000Furthermore, noise estimation methods typically rely on temporal or spatial\u0000dependencies, which can pose a significant challenge in structured scientific\u0000data where such dependencies among samples are often absent. To address these\u0000challenges in scientific research, we propose the Taylor-Sensus Network\u0000(TSNet). TSNet innovatively uses a Taylor series expansion to model complex,\u0000heteroscedastic noise and proposes a deep Taylor block for aware noise\u0000distribution. TSNet includes a noise-aware contrastive learning module and a\u0000data density perception module for aleatoric and epistemic uncertainty.\u0000Additionally, an uncertainty combination operator is used to integrate these\u0000uncertainties, and the network is trained using a novel heteroscedastic mean\u0000square error loss. TSNet demonstrates superior performance over mainstream and\u0000state-of-the-art methods in experiments, highlighting its potential in\u0000scientific research and noise resistance. It will be open-source to facilitate\u0000the community of \"AI for Science\".","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph neural networks have become the default choice by practitioners for graph learning tasks such as graph classification and node classification. Nevertheless, popular graph neural network models still struggle to capture higher-order information, i.e., information that goes emph{beyond} pairwise interactions. Recent work has shown that persistent homology, a tool from topological data analysis, can enrich graph neural networks with topological information that they otherwise could not capture. Calculating such features is efficient for dimension 0 (connected components) and dimension 1 (cycles). However, when it comes to higher-order structures, it does not scale well, with a complexity of $O(n^d)$, where $n$ is the number of nodes and $d$ is the order of the structures. In this work, we introduce a novel method that extracts information about higher-order structures in the graph while still using the efficient low-dimensional persistent homology algorithm. On standard benchmark datasets, we show that our method can lead to up to $31%$ improvements in test accuracy.
{"title":"CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs","authors":"Davide Buffelli, Farzin Soleymani, Bastian Rieck","doi":"arxiv-2409.08217","DOIUrl":"https://doi.org/arxiv-2409.08217","url":null,"abstract":"Graph neural networks have become the default choice by practitioners for\u0000graph learning tasks such as graph classification and node classification.\u0000Nevertheless, popular graph neural network models still struggle to capture\u0000higher-order information, i.e., information that goes emph{beyond} pairwise\u0000interactions. Recent work has shown that persistent homology, a tool from\u0000topological data analysis, can enrich graph neural networks with topological\u0000information that they otherwise could not capture. Calculating such features is\u0000efficient for dimension 0 (connected components) and dimension 1 (cycles).\u0000However, when it comes to higher-order structures, it does not scale well, with\u0000a complexity of $O(n^d)$, where $n$ is the number of nodes and $d$ is the order\u0000of the structures. In this work, we introduce a novel method that extracts\u0000information about higher-order structures in the graph while still using the\u0000efficient low-dimensional persistent homology algorithm. On standard benchmark\u0000datasets, we show that our method can lead to up to $31%$ improvements in test\u0000accuracy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jake Street, Isibor Ihianle, Funminiyi Olajide, Ahmad Lotfi
Online Grooming (OG) is a prevalent threat facing predominately children online, with groomers using deceptive methods to prey on the vulnerability of children on social media/messaging platforms. These attacks can have severe psychological and physical impacts, including a tendency towards revictimization. Current technical measures are inadequate, especially with the advent of end-to-end encryption which hampers message monitoring. Existing solutions focus on the signature analysis of child abuse media, which does not effectively address real-time OG detection. This paper proposes that OG attacks are complex, requiring the identification of specific communication patterns between adults and children. It introduces a novel approach leveraging advanced models such as BERT and RoBERTa for Message-Level Analysis and a Context Determination approach for classifying actor interactions, including the introduction of Actor Significance Thresholds and Message Significance Thresholds. The proposed method aims to enhance accuracy and robustness in detecting OG by considering the dynamic and multi-faceted nature of these attacks. Cross-dataset experiments evaluate the robustness and versatility of our approach. This paper's contributions include improved detection methodologies and the potential for application in various scenarios, addressing gaps in current literature and practices.
网络诱拐(OG)是一种普遍存在的威胁,主要是儿童在网络上面临的威胁,诱拐者在社交媒体/信息平台上使用欺骗方法利用儿童的弱点。这些攻击会造成严重的心理和生理影响,包括受害倾向。目前的技术措施还不够完善,尤其是端到端加密技术的发明阻碍了信息监控。现有的解决方案主要集中在虐童媒体的签名分析上,这并不能有效地解决实时 OG 检测问题。本文认为,OG 攻击非常复杂,需要识别成人和儿童之间的特定通信模式。本文介绍了一种利用 BERT 和 RoBERTa 等先进模型进行消息级分析的新方法,以及一种对行为者互动进行分类的上下文确定方法,包括引入行为者重要性阈值和消息重要性阈值。所提出的方法旨在通过考虑这些攻击的动态和多面性,提高检测 OG 的准确性和鲁棒性。跨数据集实验评估了我们方法的鲁棒性和通用性。本文的贡献包括改进了检测方法,并具有在各种场景中应用的潜力,填补了当前文献和实践中的空白。
{"title":"Enhanced Online Grooming Detection Employing Context Determination and Message-Level Analysis","authors":"Jake Street, Isibor Ihianle, Funminiyi Olajide, Ahmad Lotfi","doi":"arxiv-2409.07958","DOIUrl":"https://doi.org/arxiv-2409.07958","url":null,"abstract":"Online Grooming (OG) is a prevalent threat facing predominately children\u0000online, with groomers using deceptive methods to prey on the vulnerability of\u0000children on social media/messaging platforms. These attacks can have severe\u0000psychological and physical impacts, including a tendency towards\u0000revictimization. Current technical measures are inadequate, especially with the\u0000advent of end-to-end encryption which hampers message monitoring. Existing\u0000solutions focus on the signature analysis of child abuse media, which does not\u0000effectively address real-time OG detection. This paper proposes that OG attacks\u0000are complex, requiring the identification of specific communication patterns\u0000between adults and children. It introduces a novel approach leveraging advanced\u0000models such as BERT and RoBERTa for Message-Level Analysis and a Context\u0000Determination approach for classifying actor interactions, including the\u0000introduction of Actor Significance Thresholds and Message Significance\u0000Thresholds. The proposed method aims to enhance accuracy and robustness in\u0000detecting OG by considering the dynamic and multi-faceted nature of these\u0000attacks. Cross-dataset experiments evaluate the robustness and versatility of\u0000our approach. This paper's contributions include improved detection\u0000methodologies and the potential for application in various scenarios,\u0000addressing gaps in current literature and practices.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning regularization techniques, such as emph{dropout}, emph{layer normalization}, or emph{weight decay}, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of emph{Reinforcement Learning} (RL), the application of these techniques has been limited, usually applied to value function estimators citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains.
{"title":"The Role of Deep Learning Regularizations on Actors in Offline RL","authors":"Denis Tarasov, Anja Surina, Caglar Gulcehre","doi":"arxiv-2409.07606","DOIUrl":"https://doi.org/arxiv-2409.07606","url":null,"abstract":"Deep learning regularization techniques, such as emph{dropout}, emph{layer\u0000normalization}, or emph{weight decay}, are widely adopted in the construction\u0000of modern artificial neural networks, often resulting in more robust training\u0000processes and improved generalization capabilities. However, in the domain of\u0000emph{Reinforcement Learning} (RL), the application of these techniques has\u0000been limited, usually applied to value function estimators\u0000citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental\u0000effects. This issue is even more pronounced in offline RL settings, which bear\u0000greater similarity to supervised learning but have received less attention.\u0000Recent work in continuous offline RL has demonstrated that while we can build\u0000sufficiently powerful critic networks, the generalization of actor networks\u0000remains a bottleneck. In this study, we empirically show that applying standard\u0000regularization techniques to actor networks in offline RL actor-critic\u0000algorithms yields improvements of 6% on average across two algorithms and\u0000three different continuous D4RL domains.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benoit Dufumier, Javiera Castillo-Navarro, Devis Tuia, Jean-Philippe Thiran
Humans perceive the world through multisensory integration, blending the information of different modalities to adapt their behavior. Contrastive learning offers an appealing solution for multimodal self-supervised learning. Indeed, by considering each modality as a different view of the same entity, it learns to align features of different modalities in a shared representation space. However, this approach is intrinsically limited as it only learns shared or redundant information between modalities, while multimodal interactions can arise in other ways. In this work, we introduce CoMM, a Contrastive MultiModal learning strategy that enables the communication between modalities in a single multimodal space. Instead of imposing cross- or intra- modality constraints, we propose to align multimodal representations by maximizing the mutual information between augmented versions of these multimodal features. Our theoretical analysis shows that shared, synergistic and unique terms of information naturally emerge from this formulation, allowing us to estimate multimodal interactions beyond redundancy. We test CoMM both in a controlled and in a series of real-world settings: in the former, we demonstrate that CoMM effectively captures redundant, unique and synergistic information between modalities. In the latter, CoMM learns complex multimodal interactions and achieves state-of-the-art results on the six multimodal benchmarks.
人类通过多感官整合来感知世界,融合不同模式的信息来调整自己的行为。对比学习为多模态自监督学习提供了一种极具吸引力的解决方案。事实上,通过将每种模态视为同一实体的不同视角,对比学习可以将不同模态的特征整合到一个共享的表征空间中。然而,这种方法有其内在的局限性,因为它只能学习模态之间的共享或冗余信息,而多模态交互可以通过其他方式产生。在这项工作中,我们引入了一种对比多模态学习策略(CoMM),它能在单一多模态空间中实现模态之间的交流。我们并不强加跨模态或模内模态约束,而是建议通过最大化这些多模态特征的增强版本之间的相互信息来调整多模态表征。我们的理论分析表明,共享、协同和独特的信息项会从这一表述中自然产生,从而使我们能够估算冗余之外的多模态交互。我们在受控环境和一系列真实世界环境中测试了 CoMM:在前者中,我们证明 CoMM 能够有效捕捉多模态之间的冗余、独特和协同信息。在后者中,CoMM 学习复杂的多模态交互,并在六个多模态基准测试中取得了最先进的结果。
{"title":"What to align in multimodal contrastive learning?","authors":"Benoit Dufumier, Javiera Castillo-Navarro, Devis Tuia, Jean-Philippe Thiran","doi":"arxiv-2409.07402","DOIUrl":"https://doi.org/arxiv-2409.07402","url":null,"abstract":"Humans perceive the world through multisensory integration, blending the\u0000information of different modalities to adapt their behavior. Contrastive\u0000learning offers an appealing solution for multimodal self-supervised learning.\u0000Indeed, by considering each modality as a different view of the same entity, it\u0000learns to align features of different modalities in a shared representation\u0000space. However, this approach is intrinsically limited as it only learns shared\u0000or redundant information between modalities, while multimodal interactions can\u0000arise in other ways. In this work, we introduce CoMM, a Contrastive MultiModal\u0000learning strategy that enables the communication between modalities in a single\u0000multimodal space. Instead of imposing cross- or intra- modality constraints, we\u0000propose to align multimodal representations by maximizing the mutual\u0000information between augmented versions of these multimodal features. Our\u0000theoretical analysis shows that shared, synergistic and unique terms of\u0000information naturally emerge from this formulation, allowing us to estimate\u0000multimodal interactions beyond redundancy. We test CoMM both in a controlled\u0000and in a series of real-world settings: in the former, we demonstrate that CoMM\u0000effectively captures redundant, unique and synergistic information between\u0000modalities. In the latter, CoMM learns complex multimodal interactions and\u0000achieves state-of-the-art results on the six multimodal benchmarks.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Rüb, Philipp Tuchel, Axel Sikora, Daniel Mueller-Gritschneder
A new algorithm for incremental learning in the context of Tiny Machine learning (TinyML) is presented, which is optimized for low-performance and energy efficient embedded devices. TinyML is an emerging field that deploys machine learning models on resource-constrained devices such as microcontrollers, enabling intelligent applications like voice recognition, anomaly detection, predictive maintenance, and sensor data processing in environments where traditional machine learning models are not feasible. The algorithm solve the challenge of catastrophic forgetting through the use of knowledge distillation to create a small, distilled dataset. The novelty of the method is that the size of the model can be adjusted dynamically, so that the complexity of the model can be adapted to the requirements of the task. This offers a solution for incremental learning in resource-constrained environments, where both model size and computational efficiency are critical factors. Results show that the proposed algorithm offers a promising approach for TinyML incremental learning on embedded devices. The algorithm was tested on five datasets including: CIFAR10, MNIST, CORE50, HAR, Speech Commands. The findings indicated that, despite using only 43% of Floating Point Operations (FLOPs) compared to a larger fixed model, the algorithm experienced a negligible accuracy loss of just 1%. In addition, the presented method is memory efficient. While state-of-the-art incremental learning is usually very memory intensive, the method requires only 1% of the original data set.
{"title":"A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption","authors":"Marcus Rüb, Philipp Tuchel, Axel Sikora, Daniel Mueller-Gritschneder","doi":"arxiv-2409.07114","DOIUrl":"https://doi.org/arxiv-2409.07114","url":null,"abstract":"A new algorithm for incremental learning in the context of Tiny Machine\u0000learning (TinyML) is presented, which is optimized for low-performance and\u0000energy efficient embedded devices. TinyML is an emerging field that deploys\u0000machine learning models on resource-constrained devices such as\u0000microcontrollers, enabling intelligent applications like voice recognition,\u0000anomaly detection, predictive maintenance, and sensor data processing in\u0000environments where traditional machine learning models are not feasible. The\u0000algorithm solve the challenge of catastrophic forgetting through the use of\u0000knowledge distillation to create a small, distilled dataset. The novelty of the\u0000method is that the size of the model can be adjusted dynamically, so that the\u0000complexity of the model can be adapted to the requirements of the task. This\u0000offers a solution for incremental learning in resource-constrained\u0000environments, where both model size and computational efficiency are critical\u0000factors. Results show that the proposed algorithm offers a promising approach\u0000for TinyML incremental learning on embedded devices. The algorithm was tested\u0000on five datasets including: CIFAR10, MNIST, CORE50, HAR, Speech Commands. The\u0000findings indicated that, despite using only 43% of Floating Point Operations\u0000(FLOPs) compared to a larger fixed model, the algorithm experienced a\u0000negligible accuracy loss of just 1%. In addition, the presented method is\u0000memory efficient. While state-of-the-art incremental learning is usually very\u0000memory intensive, the method requires only 1% of the original data set.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}