首页 > 最新文献

arXiv - CS - Neural and Evolutionary Computing最新文献

英文 中文
Improved Differential Evolution based Feature Selection through Quantum, Chaos, and Lasso 通过量子、混沌和拉索改进基于差分进化的特征选择
Pub Date : 2024-08-20 DOI: arxiv-2408.10693
Yelleti Vivek, Sri Krishna Vadlamani, Vadlamani Ravi, P. Radha Krishna
Modern deep learning continues to achieve outstanding performance on anastounding variety of high-dimensional tasks. In practice, this is obtained byfitting deep neural models to all the input data with minimal featureengineering, thus sacrificing interpretability in many cases. However, inapplications such as medicine, where interpretability is crucial, featuresubset selection becomes an important problem. Metaheuristics such as BinaryDifferential Evolution are a popular approach to feature selection, and theresearch literature continues to introduce novel ideas, drawn from quantumcomputing and chaos theory, for instance, to improve them. In this paper, wedemonstrate that introducing chaos-generated variables, generated fromconsiderations of the Lyapunov time, in place of random variables inquantum-inspired metaheuristics significantly improves their performance onhigh-dimensional medical classification tasks and outperforms other approaches.We show that this chaos-induced improvement is a general phenomenon bydemonstrating it for multiple varieties of underlying quantum-inspiredmetaheuristics. Performance is further enhanced through Lasso-assisted featurepruning. At the implementation level, we vastly speed up our algorithms througha scalable island-based computing cluster parallelization technique.
现代深度学习不断在各种高维任务中取得出色的性能。在实践中,这是通过将深度神经模型与所有输入数据相匹配,并尽量减少特征工程来实现的,因此在很多情况下牺牲了可解释性。然而,在医学等应用中,可解释性至关重要,特征子集的选择就成了一个重要问题。二元差分进化论等元搜索算法是一种流行的特征选择方法,研究文献不断引入量子计算和混沌理论等新思想对其进行改进。在本文中,我们证明了在量子启发元heuristics中引入混沌生成的变量(由Lyapunov时间的考虑而生成)来代替随机变量,可以显著提高它们在高维医学分类任务中的性能,并且优于其他方法。通过 Lasso 辅助特征剪枝,性能得到了进一步提升。在实现层面,我们通过可扩展的岛式计算集群并行化技术大大加快了算法的速度。
{"title":"Improved Differential Evolution based Feature Selection through Quantum, Chaos, and Lasso","authors":"Yelleti Vivek, Sri Krishna Vadlamani, Vadlamani Ravi, P. Radha Krishna","doi":"arxiv-2408.10693","DOIUrl":"https://doi.org/arxiv-2408.10693","url":null,"abstract":"Modern deep learning continues to achieve outstanding performance on an\u0000astounding variety of high-dimensional tasks. In practice, this is obtained by\u0000fitting deep neural models to all the input data with minimal feature\u0000engineering, thus sacrificing interpretability in many cases. However, in\u0000applications such as medicine, where interpretability is crucial, feature\u0000subset selection becomes an important problem. Metaheuristics such as Binary\u0000Differential Evolution are a popular approach to feature selection, and the\u0000research literature continues to introduce novel ideas, drawn from quantum\u0000computing and chaos theory, for instance, to improve them. In this paper, we\u0000demonstrate that introducing chaos-generated variables, generated from\u0000considerations of the Lyapunov time, in place of random variables in\u0000quantum-inspired metaheuristics significantly improves their performance on\u0000high-dimensional medical classification tasks and outperforms other approaches.\u0000We show that this chaos-induced improvement is a general phenomenon by\u0000demonstrating it for multiple varieties of underlying quantum-inspired\u0000metaheuristics. Performance is further enhanced through Lasso-assisted feature\u0000pruning. At the implementation level, we vastly speed up our algorithms through\u0000a scalable island-based computing cluster parallelization technique.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations 递归神经网络利用非线性表征学习存储和生成序列
Pub Date : 2024-08-20 DOI: arxiv-2408.10920
Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
The Linear Representation Hypothesis (LRH) states that neural networks learnto encode concepts as directions in activation space, and a strong version ofthe LRH states that models learn only such encodings. In this paper, we presenta counterexample to this strong LRH: when trained to repeat an input tokensequence, gated recurrent neural networks (RNNs) learn to represent the tokenat each position with a particular order of magnitude, rather than a direction.These representations have layered features that are impossible to locate indistinct linear subspaces. To show this, we train interventions to predict andmanipulate tokens by learning the scaling factor corresponding to each sequenceposition. These interventions indicate that the smallest RNNs find only thismagnitude-based solution, while larger RNNs have linear representations. Thesefindings strongly indicate that interpretability research should not beconfined by the LRH.
线性表征假说(Larine Representation Hypothesis,LRH)指出,神经网络学习将概念编码为激活空间中的方向,而 LRH 的强版本指出,模型只学习这样的编码。在本文中,我们提出了这个强 LRH 的反例:当训练重复输入的标记序列时,门控递归神经网络(RNN)会学习用特定的数量级而不是方向来表示每个位置上的标记。为了说明这一点,我们通过学习与每个序列位置相对应的缩放因子来训练预测和操纵标记的干预。这些干预表明,最小的 RNN 只能找到这种基于幅度的解决方案,而较大的 RNN 则具有线性表征。这些发现有力地表明,可解释性研究不应受限于 LRH。
{"title":"Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations","authors":"Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger","doi":"arxiv-2408.10920","DOIUrl":"https://doi.org/arxiv-2408.10920","url":null,"abstract":"The Linear Representation Hypothesis (LRH) states that neural networks learn\u0000to encode concepts as directions in activation space, and a strong version of\u0000the LRH states that models learn only such encodings. In this paper, we present\u0000a counterexample to this strong LRH: when trained to repeat an input token\u0000sequence, gated recurrent neural networks (RNNs) learn to represent the token\u0000at each position with a particular order of magnitude, rather than a direction.\u0000These representations have layered features that are impossible to locate in\u0000distinct linear subspaces. To show this, we train interventions to predict and\u0000manipulate tokens by learning the scaling factor corresponding to each sequence\u0000position. These interventions indicate that the smallest RNNs find only this\u0000magnitude-based solution, while larger RNNs have linear representations. These\u0000findings strongly indicate that interpretability research should not be\u0000confined by the LRH.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm 基于事件流的手语翻译:高清基准数据集与新算法
Pub Date : 2024-08-20 DOI: arxiv-2408.10488
Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang
Sign Language Translation (SLT) is a core task in the field of AI-assisteddisability. Unlike traditional SLT based on visible light videos, which iseasily affected by factors such as lighting, rapid hand movements, and privacybreaches, this paper proposes the use of high-definition Event streams for SLT,effectively mitigating the aforementioned issues. This is primarily becauseEvent streams have a high dynamic range and dense temporal signals, which canwithstand low illumination and motion blur well. Additionally, due to theirsparsity in space, they effectively protect the privacy of the target person.More specifically, we propose a new high-resolution Event stream sign languagedataset, termed Event-CSL, which effectively fills the data gap in this area ofresearch. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words inthe text vocabulary. These samples are collected in a variety of indoor andoutdoor scenes, encompassing multiple angles, light intensities, and cameramovements. We have benchmarked existing mainstream SLT works to enable faircomparison for future efforts. Based on this dataset and several otherlarge-scale datasets, we propose a novel baseline method that fully leveragesthe Mamba model's ability to integrate temporal information of CNN features,resulting in improved sign language translation outcomes. Both the benchmarkdataset and source code will be released onhttps://github.com/Event-AHU/OpenESL
手语翻译(SLT)是人工智能辅助残疾领域的一项核心任务。传统的手语翻译基于可见光视频,容易受到光线、快速手部动作和隐私泄露等因素的影响,而本文提出使用高清事件流进行手语翻译,有效缓解了上述问题。这主要是因为事件流具有高动态范围和密集的时间信号,能够很好地抵御低照度和运动模糊。更具体地说,我们提出了一个新的高分辨率事件流手势语言数据集,称为 Event-CSL,它有效地填补了这一研究领域的数据空白。它包含 14,827 个视频、14,821 个词汇和 2,544 个中文文本词汇。这些样本是在各种室内和室外场景中收集的,包括多角度、光照强度和摄像机运动。我们对现有的主流 SLT 作品进行了基准测试,以便为今后的工作提供公平的比较。基于该数据集和其他几个大规模数据集,我们提出了一种新颖的基准方法,该方法充分利用了 Mamba 模型整合 CNN 特征的时间信息的能力,从而提高了手语翻译效果。基准数据集和源代码都将在 https://github.com/Event-AHU/OpenESL 上发布。
{"title":"Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm","authors":"Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang","doi":"arxiv-2408.10488","DOIUrl":"https://doi.org/arxiv-2408.10488","url":null,"abstract":"Sign Language Translation (SLT) is a core task in the field of AI-assisted\u0000disability. Unlike traditional SLT based on visible light videos, which is\u0000easily affected by factors such as lighting, rapid hand movements, and privacy\u0000breaches, this paper proposes the use of high-definition Event streams for SLT,\u0000effectively mitigating the aforementioned issues. This is primarily because\u0000Event streams have a high dynamic range and dense temporal signals, which can\u0000withstand low illumination and motion blur well. Additionally, due to their\u0000sparsity in space, they effectively protect the privacy of the target person.\u0000More specifically, we propose a new high-resolution Event stream sign language\u0000dataset, termed Event-CSL, which effectively fills the data gap in this area of\u0000research. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in\u0000the text vocabulary. These samples are collected in a variety of indoor and\u0000outdoor scenes, encompassing multiple angles, light intensities, and camera\u0000movements. We have benchmarked existing mainstream SLT works to enable fair\u0000comparison for future efforts. Based on this dataset and several other\u0000large-scale datasets, we propose a novel baseline method that fully leverages\u0000the Mamba model's ability to integrate temporal information of CNN features,\u0000resulting in improved sign language translation outcomes. Both the benchmark\u0000dataset and source code will be released on\u0000https://github.com/Event-AHU/OpenESL","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation Framework for AI-driven Molecular Design of Multi-target Drugs: Brain Diseases as a Case Study 人工智能驱动的多靶点药物分子设计评估框架:以脑部疾病为例
Pub Date : 2024-08-20 DOI: arxiv-2408.10482
Arthur Cerveira, Frederico Kremer, Darling de Andrade Lourenço, Ulisses B Corrêa
The widespread application of Artificial Intelligence (AI) techniques hassignificantly influenced the development of new therapeutic agents. Thesecomputational methods can be used to design and predict the properties ofgenerated molecules. Multi-target Drug Discovery (MTDD) is an emerging paradigmfor discovering drugs against complex disorders that do not respond well tomore traditional target-specific treatments, such as central nervous system,immune system, and cardiovascular diseases. Still, there is yet to be anestablished benchmark suite for assessing the effectiveness of AI tools fordesigning multi-target compounds. Standardized benchmarks allow for comparingexisting techniques and promote rapid research progress. Hence, this workproposes an evaluation framework for molecule generation techniques in MTDDscenarios, considering brain diseases as a case study. Our methodology involvesusing large language models to select the appropriate molecular targets,gathering and preprocessing the bioassay datasets, training quantitativestructure-activity relationship models to predict target modulation, andassessing other essential drug-likeness properties for implementing thebenchmarks. Additionally, this work will assess the performance of four deepgenerative models and evolutionary algorithms over our benchmark suite. In ourfindings, both evolutionary algorithms and generative models can achievecompetitive results across the proposed benchmarks.
人工智能(AI)技术的广泛应用对新型治疗药物的开发产生了重大影响。这些计算方法可用于设计和预测生成分子的特性。多靶点药物发现(MTDD)是一种新兴的范式,用于发现治疗复杂疾病的药物,这些疾病对传统的特异性靶点治疗效果不佳,如中枢神经系统、免疫系统和心血管疾病。不过,目前还没有一个成熟的基准套件来评估人工智能工具在设计多靶点化合物方面的有效性。标准化的基准可以对现有技术进行比较,促进研究的快速发展。因此,本研究以脑部疾病为案例,为 MTDD 场景中的分子生成技术提出了一个评估框架。我们的方法包括使用大型语言模型来选择合适的分子靶点,收集和预处理生物测定数据集,训练定量结构-活性关系模型来预测靶点调节,以及评估实施基准的其他基本药物相似性。此外,这项工作还将评估四种深度生成模型和进化算法在我们的基准套件中的性能。我们发现,进化算法和生成模型都能在所提出的基准中取得具有竞争力的结果。
{"title":"Evaluation Framework for AI-driven Molecular Design of Multi-target Drugs: Brain Diseases as a Case Study","authors":"Arthur Cerveira, Frederico Kremer, Darling de Andrade Lourenço, Ulisses B Corrêa","doi":"arxiv-2408.10482","DOIUrl":"https://doi.org/arxiv-2408.10482","url":null,"abstract":"The widespread application of Artificial Intelligence (AI) techniques has\u0000significantly influenced the development of new therapeutic agents. These\u0000computational methods can be used to design and predict the properties of\u0000generated molecules. Multi-target Drug Discovery (MTDD) is an emerging paradigm\u0000for discovering drugs against complex disorders that do not respond well to\u0000more traditional target-specific treatments, such as central nervous system,\u0000immune system, and cardiovascular diseases. Still, there is yet to be an\u0000established benchmark suite for assessing the effectiveness of AI tools for\u0000designing multi-target compounds. Standardized benchmarks allow for comparing\u0000existing techniques and promote rapid research progress. Hence, this work\u0000proposes an evaluation framework for molecule generation techniques in MTDD\u0000scenarios, considering brain diseases as a case study. Our methodology involves\u0000using large language models to select the appropriate molecular targets,\u0000gathering and preprocessing the bioassay datasets, training quantitative\u0000structure-activity relationship models to predict target modulation, and\u0000assessing other essential drug-likeness properties for implementing the\u0000benchmarks. Additionally, this work will assess the performance of four deep\u0000generative models and evolutionary algorithms over our benchmark suite. In our\u0000findings, both evolutionary algorithms and generative models can achieve\u0000competitive results across the proposed benchmarks.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function 球函数上大种群规模的$(μ/μ_I, λ)$-ES突变强度适应性研究
Pub Date : 2024-08-19 DOI: arxiv-2408.09761
Amir Omeradzic, Hans-Georg Beyer
The mutation strength adaptation properties of a multi-recombinative$(mu/mu_I, lambda)$-ES are studied for isotropic mutations. To this end,standard implementations of cumulative step-size adaptation (CSA) and mutativeself-adaptation ($sigma$SA) are investigated experimentally and theoreticallyby assuming large population sizes ($mu$) in relation to the search spacedimensionality ($N$). The adaptation is characterized in terms of thescale-invariant mutation strength on the sphere in relation to its maximumachievable value for positive progress. %The results show how the different$sigma$-adaptation variants behave as $mu$ and $N$ are varied. StandardCSA-variants show notably different adaptation properties and progress rates onthe sphere, becoming slower or faster as $mu$ or $N$ are varied. This is shownby investigating common choices for the cumulation and damping parameters.Standard $sigma$SA-variants (with default learning parameter settings) canachieve faster adaptation and larger progress rates compared to the CSA.However, it is shown how self-adaptation affects the progress rate levelsnegatively. Furthermore, differences regarding the adaptation and stability of$sigma$SA with log-normal and normal mutation sampling are elaborated.
针对各向同性突变,研究了多重组$(mu/mu_I, lambda)$-ES的突变强度适应特性。为此,实验和理论研究了累积步长适应(CSA)和突变自适应($sigma$SA)的标准实现,假设种群规模($mu$)与搜索间隔维度($N$)相关较大。适应性的特征是球体上的规模不变突变强度与正进展的最大可实现值的关系。结果显示了不同的$sigma$适应变体在$mu$和$N$变化时的表现。标准 CSA 变体在球面上显示出明显不同的适应特性和进展速度,随着 $mu$ 或 $N$ 的变化而变慢或变快。与 CSA 相比,标准的 $sigma$SA 变体(使用默认学习参数设置)可以获得更快的适应性和更大的进展率。此外,还阐述了采用对数正态和正态突变采样的 CSA 在适应性和稳定性方面的差异。
{"title":"Mutation Strength Adaptation of the $(μ/μ_I, λ)$-ES for Large Population Sizes on the Sphere Function","authors":"Amir Omeradzic, Hans-Georg Beyer","doi":"arxiv-2408.09761","DOIUrl":"https://doi.org/arxiv-2408.09761","url":null,"abstract":"The mutation strength adaptation properties of a multi-recombinative\u0000$(mu/mu_I, lambda)$-ES are studied for isotropic mutations. To this end,\u0000standard implementations of cumulative step-size adaptation (CSA) and mutative\u0000self-adaptation ($sigma$SA) are investigated experimentally and theoretically\u0000by assuming large population sizes ($mu$) in relation to the search space\u0000dimensionality ($N$). The adaptation is characterized in terms of the\u0000scale-invariant mutation strength on the sphere in relation to its maximum\u0000achievable value for positive progress. %The results show how the different\u0000$sigma$-adaptation variants behave as $mu$ and $N$ are varied. Standard\u0000CSA-variants show notably different adaptation properties and progress rates on\u0000the sphere, becoming slower or faster as $mu$ or $N$ are varied. This is shown\u0000by investigating common choices for the cumulation and damping parameters.\u0000Standard $sigma$SA-variants (with default learning parameter settings) can\u0000achieve faster adaptation and larger progress rates compared to the CSA.\u0000However, it is shown how self-adaptation affects the progress rate levels\u0000negatively. Furthermore, differences regarding the adaptation and stability of\u0000$sigma$SA with log-normal and normal mutation sampling are elaborated.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Liquid Fourier Latent Dynamics Networks for fast GPU-based numerical simulations in computational cardiology 液体傅立叶潜动力网络用于计算心脏病学中基于 GPU 的快速数值模拟
Pub Date : 2024-08-19 DOI: arxiv-2408.09818
Matteo Salvador, Alison L. Marsden
Scientific Machine Learning (ML) is gaining momentum as a cost-effectivealternative to physics-based numerical solvers in many engineeringapplications. In fact, scientific ML is currently being used to build accurateand efficient surrogate models starting from high-fidelity numericalsimulations, effectively encoding the parameterized temporal dynamicsunderlying Ordinary Differential Equations (ODEs), or even the spatio-temporalbehavior underlying Partial Differential Equations (PDEs), in appropriatelydesigned neural networks. We propose an extension of Latent Dynamics Networks(LDNets), namely Liquid Fourier LDNets (LFLDNets), to create parameterizedspace-time surrogate models for multiscale and multiphysics sets of highlynonlinear differential equations on complex geometries. LFLDNets employ aneurologically-inspired, sparse, liquid neural network for temporal dynamics,relaxing the requirement of a numerical solver for time advancement and leadingto superior performance in terms of tunable parameters, accuracy, efficiencyand learned trajectories with respect to neural ODEs based on feedforwardfully-connected neural networks. Furthermore, in our implementation ofLFLDNets, we use a Fourier embedding with a tunable kernel in thereconstruction network to learn high-frequency functions better and faster thanusing space coordinates directly as input. We challenge LFLDNets in theframework of computational cardiology and evaluate their capabilities on two3-dimensional test cases arising from multiscale cardiac electrophysiology andcardiovascular hemodynamics. This paper illustrates the capability to runArtificial Intelligence-based numerical simulations on single or multiple GPUsin a matter of minutes and represents a significant step forward in thedevelopment of physics-informed digital twins.
在许多工程应用中,科学机器学习(ML)作为基于物理的数值求解器的一种经济高效的替代方法,正获得越来越大的发展势头。事实上,科学机器学习目前正被用于从高保真数值模拟出发建立精确高效的代理模型,从而有效地将常微分方程(ODE)或偏微分方程(PDE)的时空动态参数化编码到适当设计的神经网络中。我们提出了潜在动力学网络(LDNets)的扩展,即液体傅立叶 LDNets(LFLDNets),用于创建复杂几何体上多尺度和多物理场高非线性微分方程组的参数化时空代理模型。LFLDNets 采用受神经学启发的稀疏液体神经网络来处理时间动力学,放宽了对时间推进数值求解器的要求,与基于全连接神经网络的神经 ODE 相比,在可调参数、准确性、效率和学习轨迹方面具有更优越的性能。此外,在我们的 LFLDNets 实现中,我们在其构建网络中使用了带有可调内核的傅立叶嵌入,从而比直接将空间坐标作为输入更好、更快地学习高频函数。我们在计算心脏病学的框架内对 LFLDNets 提出了挑战,并在多尺度心脏电生理学和心血管血流动力学的二三维测试案例中对其能力进行了评估。本文展示了在单个或多个 GPU 上运行基于人工智能的数值模拟只需几分钟的能力,标志着在开发物理信息数字双胞胎方面迈出了重要一步。
{"title":"Liquid Fourier Latent Dynamics Networks for fast GPU-based numerical simulations in computational cardiology","authors":"Matteo Salvador, Alison L. Marsden","doi":"arxiv-2408.09818","DOIUrl":"https://doi.org/arxiv-2408.09818","url":null,"abstract":"Scientific Machine Learning (ML) is gaining momentum as a cost-effective\u0000alternative to physics-based numerical solvers in many engineering\u0000applications. In fact, scientific ML is currently being used to build accurate\u0000and efficient surrogate models starting from high-fidelity numerical\u0000simulations, effectively encoding the parameterized temporal dynamics\u0000underlying Ordinary Differential Equations (ODEs), or even the spatio-temporal\u0000behavior underlying Partial Differential Equations (PDEs), in appropriately\u0000designed neural networks. We propose an extension of Latent Dynamics Networks\u0000(LDNets), namely Liquid Fourier LDNets (LFLDNets), to create parameterized\u0000space-time surrogate models for multiscale and multiphysics sets of highly\u0000nonlinear differential equations on complex geometries. LFLDNets employ a\u0000neurologically-inspired, sparse, liquid neural network for temporal dynamics,\u0000relaxing the requirement of a numerical solver for time advancement and leading\u0000to superior performance in terms of tunable parameters, accuracy, efficiency\u0000and learned trajectories with respect to neural ODEs based on feedforward\u0000fully-connected neural networks. Furthermore, in our implementation of\u0000LFLDNets, we use a Fourier embedding with a tunable kernel in the\u0000reconstruction network to learn high-frequency functions better and faster than\u0000using space coordinates directly as input. We challenge LFLDNets in the\u0000framework of computational cardiology and evaluate their capabilities on two\u00003-dimensional test cases arising from multiscale cardiac electrophysiology and\u0000cardiovascular hemodynamics. This paper illustrates the capability to run\u0000Artificial Intelligence-based numerical simulations on single or multiple GPUs\u0000in a matter of minutes and represents a significant step forward in the\u0000development of physics-informed digital twins.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A More Accurate Approximation of Activation Function with Few Spikes Neurons 用少量尖峰神经元更精确地逼近激活函数
Pub Date : 2024-08-19 DOI: arxiv-2409.00044
Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park
Recent deep neural networks (DNNs), such as diffusion models [1], have facedhigh computational demands. Thus, spiking neural networks (SNNs) have attractedlots of attention as energy-efficient neural networks. However, conventionalspiking neurons, such as leaky integrate-and-fire neurons, cannot accuratelyrepresent complex non-linear activation functions, such as Swish [2]. Toapproximate activation functions with spiking neurons, few spikes (FS) neuronswere proposed [3], but the approximation performance was limited due to thelack of training methods considering the neurons. Thus, we proposetendency-based parameter initialization (TBPI) to enhance the approximation ofactivation function with FS neurons, exploiting temporal dependenciesinitializing the training parameters.
最近的深度神经网络(DNN),如扩散模型[1],面临着很高的计算要求。因此,尖峰神经网络(SNN)作为高能效神经网络吸引了大量关注。然而,传统的尖峰神经元(如泄漏整合-发射神经元)无法准确地表示复杂的非线性激活函数,如 Swish[2]。为了用尖峰神经元逼近激活函数,有人提出了少尖峰(FS)神经元 [3],但由于缺乏考虑神经元的训练方法,逼近性能有限。因此,我们提出了基于时序的参数初始化(TBPI),利用训练参数初始化的时序依赖性来提高 FS 神经元激活函数的近似性。
{"title":"A More Accurate Approximation of Activation Function with Few Spikes Neurons","authors":"Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park","doi":"arxiv-2409.00044","DOIUrl":"https://doi.org/arxiv-2409.00044","url":null,"abstract":"Recent deep neural networks (DNNs), such as diffusion models [1], have faced\u0000high computational demands. Thus, spiking neural networks (SNNs) have attracted\u0000lots of attention as energy-efficient neural networks. However, conventional\u0000spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately\u0000represent complex non-linear activation functions, such as Swish [2]. To\u0000approximate activation functions with spiking neurons, few spikes (FS) neurons\u0000were proposed [3], but the approximation performance was limited due to the\u0000lack of training methods considering the neurons. Thus, we propose\u0000tendency-based parameter initialization (TBPI) to enhance the approximation of\u0000activation function with FS neurons, exploiting temporal dependencies\u0000initializing the training parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading TBA:使用基于固态盘的激活卸载加快大型语言模型训练
Pub Date : 2024-08-19 DOI: arxiv-2408.10013
Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu
The growth rate of the GPU memory capacity has not been able to keep up withthat of the size of large language models (LLMs), hindering the model trainingprocess. In particular, activations -- the intermediate tensors produced duringforward propagation and reused in backward propagation -- dominate the GPUmemory use. To address this challenge, we propose TBA to efficiently offloadactivations to high-capacity NVMe SSDs. This approach reduces GPU memory usagewithout impacting performance by adaptively overlapping data transfers withcomputation. TBA is compatible with popular deep learning frameworks likePyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensordeduplication, forwarding, and adaptive offloading to further enhanceefficiency. We conduct extensive experiments on GPT, BERT, and T5. Resultsdemonstrate that TBA effectively reduces 47% of the activation peak memoryusage. At the same time, TBA perfectly overlaps the I/O with the computationand incurs negligible performance overhead. We introduce therecompute-offload-keep (ROK) curve to compare the TBA offloading with other twotensor placement strategies, keeping activations in memory and layerwise fullrecomputation. We find that TBA achieves better memory savings than layerwisefull recomputation while retaining the performance of keeping the activationsin memory.
GPU 内存容量的增长速度一直跟不上大型语言模型(LLM)的大小,从而阻碍了模型的训练过程。特别是激活(activations)--在前向传播过程中产生并在后向传播中重复使用的中间张量--在GPU内存的使用中占主导地位。为了应对这一挑战,我们提出了 TBA 方法,将激活有效地卸载到大容量 NVMe SSD 上。这种方法通过自适应地将数据传输与计算重叠,在不影响性能的情况下减少了 GPU 内存的使用。TBA兼容PyTorch、Megatron和DeepSpeed等流行的深度学习框架,并采用了重复数据传输、转发和自适应卸载等技术来进一步提高效率。我们在 GPT、BERT 和 T5 上进行了大量实验。结果表明,TBA 有效降低了 47% 的激活峰值内存用量。同时,TBA 将 I/O 与计算完美地重叠在一起,产生的性能开销可以忽略不计。我们引入了计算-卸载-保持(ROK)曲线,将 TBA 卸载与其他双传感器放置策略(将激活保持在内存中和分层全计算)进行比较。我们发现,与分层全重新计算相比,TBA 能更好地节省内存,同时保留内存中激活的性能。
{"title":"TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading","authors":"Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu","doi":"arxiv-2408.10013","DOIUrl":"https://doi.org/arxiv-2408.10013","url":null,"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with\u0000that of the size of large language models (LLMs), hindering the model training\u0000process. In particular, activations -- the intermediate tensors produced during\u0000forward propagation and reused in backward propagation -- dominate the GPU\u0000memory use. To address this challenge, we propose TBA to efficiently offload\u0000activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage\u0000without impacting performance by adaptively overlapping data transfers with\u0000computation. TBA is compatible with popular deep learning frameworks like\u0000PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor\u0000deduplication, forwarding, and adaptive offloading to further enhance\u0000efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results\u0000demonstrate that TBA effectively reduces 47% of the activation peak memory\u0000usage. At the same time, TBA perfectly overlaps the I/O with the computation\u0000and incurs negligible performance overhead. We introduce the\u0000recompute-offload-keep (ROK) curve to compare the TBA offloading with other two\u0000tensor placement strategies, keeping activations in memory and layerwise full\u0000recomputation. We find that TBA achieves better memory savings than layerwise\u0000full recomputation while retaining the performance of keeping the activations\u0000in memory.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Population-based Search with Active Inference 用主动推理增强基于种群的搜索
Pub Date : 2024-08-18 DOI: arxiv-2408.09548
Nassim Dehouche, Daniel Friedman
The Active Inference framework models perception and action as a unifiedprocess, where agents use probabilistic models to predict and actively minimizesensory discrepancies. In complement and contrast, traditional population-basedmetaheuristics rely on reactive environmental interactions without anticipatoryadaptation. This paper proposes the integration of Active Inference into thesemetaheuristics to enhance performance through anticipatory environmentaladaptation. We demonstrate this approach specifically with Ant ColonyOptimization (ACO) on the Travelling Salesman Problem (TSP). Experimentalresults indicate that Active Inference can yield some improved solutions withonly a marginal increase in computational cost, with interesting patterns ofperformance that relate to number and topology of nodes in the graph. Furtherwork will characterize where and when different types of Active Inferenceaugmentation of population metaheuristics may be efficacious.
主动推理(Active Inference)框架将感知和行动作为一个统一的过程进行建模,其中代理使用概率模型进行预测,并主动将感知差异最小化。与之形成互补和对比的是,传统的基于种群的元启发式算法依赖于被动的环境互动,而不具备预期适应能力。本文提出将 "主动推理"(Active Inference)集成到元启发式算法中,通过预期环境适应来提高性能。我们在旅行推销员问题(TSP)的蚁群优化(ACO)中具体演示了这种方法。实验结果表明,主动推理可以产生一些改进的解决方案,而计算成本仅略有增加,其性能模式与图中节点的数量和拓扑结构有关。进一步的工作将描述不同类型的主动推理对群体元启发式算法的增强在何时何地可能有效。
{"title":"Enhancing Population-based Search with Active Inference","authors":"Nassim Dehouche, Daniel Friedman","doi":"arxiv-2408.09548","DOIUrl":"https://doi.org/arxiv-2408.09548","url":null,"abstract":"The Active Inference framework models perception and action as a unified\u0000process, where agents use probabilistic models to predict and actively minimize\u0000sensory discrepancies. In complement and contrast, traditional population-based\u0000metaheuristics rely on reactive environmental interactions without anticipatory\u0000adaptation. This paper proposes the integration of Active Inference into these\u0000metaheuristics to enhance performance through anticipatory environmental\u0000adaptation. We demonstrate this approach specifically with Ant Colony\u0000Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental\u0000results indicate that Active Inference can yield some improved solutions with\u0000only a marginal increase in computational cost, with interesting patterns of\u0000performance that relate to number and topology of nodes in the graph. Further\u0000work will characterize where and when different types of Active Inference\u0000augmentation of population metaheuristics may be efficacious.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization 论通过神经极化提高前向学习的泛化和稳定性
Pub Date : 2024-08-17 DOI: arxiv-2408.09210
Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas
Forward-only learning algorithms have recently gained attention asalternatives to gradient backpropagation, replacing the backward step of thislatter solver with an additional contrastive forward pass. Among theseapproaches, the so-called Forward-Forward Algorithm (FFA) has been shown toachieve competitive levels of performance in terms of generalization andcomplexity. Networks trained using FFA learn to contrastively maximize alayer-wise defined goodness score when presented with real data (denoted aspositive samples) and to minimize it when processing synthetic data (corr.negative samples). However, this algorithm still faces weaknesses thatnegatively affect the model accuracy and training stability, primarily due to agradient imbalance between positive and negative samples. To overcome thisissue, in this work we propose a novel implementation of the FFA algorithm,denoted as Polar-FFA, which extends the original formulation by introducing aneural division (emph{polarization}) between positive and negative instances.Neurons in each of these groups aim to maximize their goodness when presentedwith their respective data type, thereby creating a symmetric gradientbehavior. To empirically gauge the improved learning capabilities of ourproposed Polar-FFA, we perform several systematic experiments using differentactivation and goodness functions over image classification datasets. Ourresults demonstrate that Polar-FFA outperforms FFA in terms of accuracy andconvergence speed. Furthermore, its lower reliance on hyperparameters reducesthe need for hyperparameter tuning to guarantee optimal generalizationcapabilities, thereby allowing for a broader range of neural networkconfigurations.
作为梯度反向传播的替代方法,只向前学习算法最近备受关注,它以额外的对比性前向传递取代了梯度反向传播求解器的后向步骤。在这些算法中,所谓的前向算法(FFA)已被证明在泛化和复杂性方面达到了具有竞争力的性能水平。使用 FFA 训练的网络在处理真实数据(表示为阳性样本)时,会学习对比性地最大化按层定义的好度得分,而在处理合成数据(表示为阴性样本)时,会学习最小化好度得分。然而,这种算法仍然面临着一些弱点,对模型的准确性和训练稳定性造成了负面影响,这主要是由于正负样本之间的不平衡造成的。为了克服这一问题,我们在这项工作中提出了一种新的 FFA 算法实现方法,称为 Polar-FFA,该方法通过在正负实例之间引入神经划分(emph{polarization})对原始公式进行了扩展。为了从经验上衡量我们提出的 Polar-FFA 的改进学习能力,我们在图像分类数据集上使用不同的激活和良度函数进行了多次系统实验。结果表明,Polar-FFA 在准确性和收敛速度方面都优于 FFA。此外,Polar-FFA 对超参数的依赖性较低,减少了为保证最佳泛化能力而对超参数进行调整的需要,从而允许更广泛的神经网络配置。
{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":"https://doi.org/arxiv-2408.09210","url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\u0000alternatives to gradient backpropagation, replacing the backward step of this\u0000latter solver with an additional contrastive forward pass. Among these\u0000approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\u0000achieve competitive levels of performance in terms of generalization and\u0000complexity. Networks trained using FFA learn to contrastively maximize a\u0000layer-wise defined goodness score when presented with real data (denoted as\u0000positive samples) and to minimize it when processing synthetic data (corr.\u0000negative samples). However, this algorithm still faces weaknesses that\u0000negatively affect the model accuracy and training stability, primarily due to a\u0000gradient imbalance between positive and negative samples. To overcome this\u0000issue, in this work we propose a novel implementation of the FFA algorithm,\u0000denoted as Polar-FFA, which extends the original formulation by introducing a\u0000neural division (emph{polarization}) between positive and negative instances.\u0000Neurons in each of these groups aim to maximize their goodness when presented\u0000with their respective data type, thereby creating a symmetric gradient\u0000behavior. To empirically gauge the improved learning capabilities of our\u0000proposed Polar-FFA, we perform several systematic experiments using different\u0000activation and goodness functions over image classification datasets. Our\u0000results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\u0000convergence speed. Furthermore, its lower reliance on hyperparameters reduces\u0000the need for hyperparameter tuning to guarantee optimal generalization\u0000capabilities, thereby allowing for a broader range of neural network\u0000configurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Neural and Evolutionary Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1