首页 > 最新文献

arXiv - CS - Machine Learning最新文献

英文 中文
Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling 强化学习作为改进现实世界生产调度的启发式方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11933
Arthur Müller, Lukas Vollenkemper
The integration of Reinforcement Learning (RL) with heuristic methods is anemerging trend for solving optimization problems, which leverages RL's abilityto learn from the data generated during the search process. One promisingapproach is to train an RL agent as an improvement heuristic, starting with asuboptimal solution that is iteratively improved by applying small changes. Weapply this approach to a real-world multiobjective production schedulingproblem. Our approach utilizes a network architecture that includes Transformerencoding to learn the relationships between jobs. Afterwards, a probabilitymatrix is generated from which pairs of jobs are sampled and then swapped toimprove the solution. We benchmarked our approach against other heuristicsusing real data from our industry partner, demonstrating its superiorperformance.
强化学习(RL)与启发式方法的结合是解决优化问题的一个新兴趋势,它充分利用了 RL 从搜索过程中产生的数据中学习的能力。一种很有前景的方法是将 RL 代理作为改进启发式方法来训练,从次优解开始,通过应用微小的变化进行迭代改进。我们将这种方法应用于现实世界中的多目标生产调度问题。我们的方法采用了一种网络架构,其中包括转换编码(Transformerencoding)来学习工作之间的关系。之后,生成一个概率矩阵,从中抽取成对的工作,然后进行交换,以改进解决方案。我们利用行业合作伙伴提供的真实数据,将我们的方法与其他启发式方法进行了对比,证明了它的优越性能。
{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":"https://doi.org/arxiv-2409.11933","url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\u0000emerging trend for solving optimization problems, which leverages RL's ability\u0000to learn from the data generated during the search process. One promising\u0000approach is to train an RL agent as an improvement heuristic, starting with a\u0000suboptimal solution that is iteratively improved by applying small changes. We\u0000apply this approach to a real-world multiobjective production scheduling\u0000problem. Our approach utilizes a network architecture that includes Transformer\u0000encoding to learn the relationships between jobs. Afterwards, a probability\u0000matrix is generated from which pairs of jobs are sampled and then swapped to\u0000improve the solution. We benchmarked our approach against other heuristics\u0000using real data from our industry partner, demonstrating its superior\u0000performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction 交通事故死亡率预测的可解释机器学习方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11929
Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq
Road traffic accidents (RTA) pose a significant public health threatworldwide, leading to considerable loss of life and economic burdens. This isparticularly acute in developing countries like Bangladesh. Building reliablemodels to forecast crash outcomes is crucial for implementing effectivepreventive measures. To aid in developing targeted safety interventions, thisstudy presents a machine learning-based approach for classifying fatal andnon-fatal road accident outcomes using data from the Dhaka metropolitan trafficcrash database from 2017 to 2022. Our framework utilizes a range of machinelearning classification algorithms, comprising Logistic Regression, SupportVector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,LightGBM, and Artificial Neural Network. We prioritize model interpretabilityby employing the SHAP (SHapley Additive exPlanations) method, which elucidatesthe key factors influencing accident fatality. Our results demonstrate thatLightGBM outperforms other models, achieving a ROC-AUC score of 0.72. Theglobal, local, and feature dependency analyses are conducted to acquire deeperinsights into the behavior of the model. SHAP analysis reveals that casualtyclass, time of accident, location, vehicle type, and road type play pivotalroles in determining fatality risk. These findings offer valuable insights forpolicymakers and road safety practitioners in developing countries, enablingthe implementation of evidence-based strategies to reduce traffic crashfatalities.
道路交通事故(RTA)在全球范围内对公共健康构成了严重威胁,造成了巨大的生命损失和经济负担。这在孟加拉国等发展中国家尤为严重。建立可靠的模型来预测交通事故的结果,对于实施有效的预防措施至关重要。为了帮助制定有针对性的安全干预措施,本研究提出了一种基于机器学习的方法,利用达卡大都市交通事故数据库中 2017 年至 2022 年的数据,对致命和非致命道路交通事故结果进行分类。我们的框架采用了一系列机器学习分类算法,包括逻辑回归、支持向量机、奈夫贝叶斯、随机森林、决策树、梯度提升、LightGBM 和人工神经网络。我们采用 SHAP(SHapley Additive exPlanations)方法优先考虑模型的可解释性,该方法阐明了影响事故死亡率的关键因素。结果表明,LightGBM 优于其他模型,其 ROC-AUC 得分为 0.72。我们还进行了全局、局部和特征依赖性分析,以便更深入地了解模型的行为。SHAP 分析表明,伤亡类别、事故时间、地点、车辆类型和道路类型在决定死亡风险方面起着关键作用。这些发现为发展中国家的政策制定者和道路安全从业人员提供了宝贵的见解,有助于实施循证战略,减少交通事故死亡人数。
{"title":"An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction","authors":"Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq","doi":"arxiv-2409.11929","DOIUrl":"https://doi.org/arxiv-2409.11929","url":null,"abstract":"Road traffic accidents (RTA) pose a significant public health threat\u0000worldwide, leading to considerable loss of life and economic burdens. This is\u0000particularly acute in developing countries like Bangladesh. Building reliable\u0000models to forecast crash outcomes is crucial for implementing effective\u0000preventive measures. To aid in developing targeted safety interventions, this\u0000study presents a machine learning-based approach for classifying fatal and\u0000non-fatal road accident outcomes using data from the Dhaka metropolitan traffic\u0000crash database from 2017 to 2022. Our framework utilizes a range of machine\u0000learning classification algorithms, comprising Logistic Regression, Support\u0000Vector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,\u0000LightGBM, and Artificial Neural Network. We prioritize model interpretability\u0000by employing the SHAP (SHapley Additive exPlanations) method, which elucidates\u0000the key factors influencing accident fatality. Our results demonstrate that\u0000LightGBM outperforms other models, achieving a ROC-AUC score of 0.72. The\u0000global, local, and feature dependency analyses are conducted to acquire deeper\u0000insights into the behavior of the model. SHAP analysis reveals that casualty\u0000class, time of accident, location, vehicle type, and road type play pivotal\u0000roles in determining fatality risk. These findings offer valuable insights for\u0000policymakers and road safety practitioners in developing countries, enabling\u0000the implementation of evidence-based strategies to reduce traffic crash\u0000fatalities.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features 具有任意特征的线性时差学习的几乎确定收敛性
Pub Date : 2024-09-18 DOI: arxiv-2409.12135
Jiuqi Wang, Shangtong Zhang
Temporal difference (TD) learning with linear function approximation,abbreviated as linear TD, is a classic and powerful prediction algorithm inreinforcement learning. While it is well understood that linear TD convergesalmost surely to a unique point, this convergence traditionally requires theassumption that the features used by the approximator are linearly independent.However, this linear independence assumption does not hold in many practicalscenarios. This work is the first to establish the almost sure convergence oflinear TD without requiring linearly independent features. In fact, we do notmake any assumptions on the features. We prove that the approximated valuefunction converges to a unique point and the weight iterates converge to a set.We also establish a notion of local stability of the weight iterates.Importantly, we do not need to introduce any other additional assumptions anddo not need to make any modification to the linear TD algorithm. Key to ouranalysis is a novel characterization of bounded invariant sets of the mean ODEof linear TD.
带有线性函数近似的时差(TD)学习,简称线性 TD,是强化学习中一种经典而强大的预测算法。虽然人们都知道线性 TD 几乎肯定会收敛到一个唯一点,但这种收敛传统上需要假设近似器使用的特征是线性独立的。这项工作首次在不要求线性独立特征的情况下,建立了线性 TD 几乎确定的收敛性。事实上,我们对特征不做任何假设。我们证明了近似值函数会收敛到一个唯一点,权重迭代会收敛到一个集合。我们还建立了权重迭代的局部稳定性概念。我们分析的关键是线性 TD 平均 ODE 有界不变集的新特征。
{"title":"Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features","authors":"Jiuqi Wang, Shangtong Zhang","doi":"arxiv-2409.12135","DOIUrl":"https://doi.org/arxiv-2409.12135","url":null,"abstract":"Temporal difference (TD) learning with linear function approximation,\u0000abbreviated as linear TD, is a classic and powerful prediction algorithm in\u0000reinforcement learning. While it is well understood that linear TD converges\u0000almost surely to a unique point, this convergence traditionally requires the\u0000assumption that the features used by the approximator are linearly independent.\u0000However, this linear independence assumption does not hold in many practical\u0000scenarios. This work is the first to establish the almost sure convergence of\u0000linear TD without requiring linearly independent features. In fact, we do not\u0000make any assumptions on the features. We prove that the approximated value\u0000function converges to a unique point and the weight iterates converge to a set.\u0000We also establish a notion of local stability of the weight iterates.\u0000Importantly, we do not need to introduce any other additional assumptions and\u0000do not need to make any modification to the linear TD algorithm. Key to our\u0000analysis is a novel characterization of bounded invariant sets of the mean ODE\u0000of linear TD.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient wavelet-based physics-informed neural networks for singularly perturbed problems 用于奇异扰动问题的高效小波物理信息神经网络
Pub Date : 2024-09-18 DOI: arxiv-2409.11847
Himanshu Pandey, Anshima Singh, Ratikanta Behera
Physics-informed neural networks (PINNs) are a class of deep learning modelsthat utilize physics as differential equations to address complex problems,including ones that may involve limited data availability. However, tacklingsolutions of differential equations with oscillations or singular perturbationsand shock-like structures becomes challenging for PINNs. Considering thesechallenges, we designed an efficient wavelet-based PINNs (W-PINNs) model tosolve singularly perturbed differential equations. Here, we represent thesolution in wavelet space using a family of smooth-compactly supportedwavelets. This framework represents the solution of a differential equationwith significantly fewer degrees of freedom while still retaining in capturing,identifying, and analyzing the local structure of complex physical phenomena.The architecture allows the training process to search for a solution withinwavelet space, making the process faster and more accurate. The proposed modeldoes not rely on automatic differentiations for derivatives involved indifferential equations and does not require any prior information regarding thebehavior of the solution, such as the location of abrupt features. Thus,through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturinglocalized nonlinear information, making them well-suited for problems showingabrupt behavior in certain regions, such as singularly perturbed problems. Theefficiency and accuracy of the proposed neural network model are demonstratedin various test problems, i.e., highly singularly perturbed nonlineardifferential equations, the FitzHugh-Nagumo (FHN), and Predator-preyinteraction models. The proposed design model exhibits impressive comparisonswith traditional PINNs and the recently developed wavelet-based PINNs, whichuse wavelets as an activation function for solving nonlinear differentialequations.
物理信息神经网络(PINNs)是一类利用物理微分方程解决复杂问题的深度学习模型,包括那些可能涉及有限数据可用性的问题。然而,对于 PINNs 来说,处理具有振荡或奇异扰动和类似冲击结构的微分方程求解变得具有挑战性。考虑到这些挑战,我们设计了一种高效的基于小波的 PINNs(W-PINNs)模型来求解奇异扰动微分方程。在这里,我们使用平滑-紧凑支持的小波系列来表示小波空间中的解。这种框架表示微分方程的解,自由度大大降低,同时仍能捕捉、识别和分析复杂物理现象的局部结构。这种架构允许训练过程在小波空间内搜索解,从而使过程更快、更准确。所提出的模型不依赖于微分方程导数的自动微分,也不需要任何有关解的行为的先验信息,如突变特征的位置。因此,通过小波与 PINNs 的策略性融合,W-PINNs 擅长捕捉局部非线性信息,使其非常适合处理在某些区域显示中断行为的问题,如奇异扰动问题。在各种测试问题中,即高度奇异扰动非线性微分方程、FitzHugh-Nagumo(FHN)和捕食者-猎物相互作用模型中,证明了所提出的神经网络模型的高效性和准确性。所提出的设计模型与传统 PINN 和最近开发的基于小波的 PINN(使用小波作为激活函数求解非线性微分方程)进行了比较,结果令人印象深刻。
{"title":"An efficient wavelet-based physics-informed neural networks for singularly perturbed problems","authors":"Himanshu Pandey, Anshima Singh, Ratikanta Behera","doi":"arxiv-2409.11847","DOIUrl":"https://doi.org/arxiv-2409.11847","url":null,"abstract":"Physics-informed neural networks (PINNs) are a class of deep learning models\u0000that utilize physics as differential equations to address complex problems,\u0000including ones that may involve limited data availability. However, tackling\u0000solutions of differential equations with oscillations or singular perturbations\u0000and shock-like structures becomes challenging for PINNs. Considering these\u0000challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to\u0000solve singularly perturbed differential equations. Here, we represent the\u0000solution in wavelet space using a family of smooth-compactly supported\u0000wavelets. This framework represents the solution of a differential equation\u0000with significantly fewer degrees of freedom while still retaining in capturing,\u0000identifying, and analyzing the local structure of complex physical phenomena.\u0000The architecture allows the training process to search for a solution within\u0000wavelet space, making the process faster and more accurate. The proposed model\u0000does not rely on automatic differentiations for derivatives involved in\u0000differential equations and does not require any prior information regarding the\u0000behavior of the solution, such as the location of abrupt features. Thus,\u0000through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing\u0000localized nonlinear information, making them well-suited for problems showing\u0000abrupt behavior in certain regions, such as singularly perturbed problems. The\u0000efficiency and accuracy of the proposed neural network model are demonstrated\u0000in various test problems, i.e., highly singularly perturbed nonlinear\u0000differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey\u0000interaction models. The proposed design model exhibits impressive comparisons\u0000with traditional PINNs and the recently developed wavelet-based PINNs, which\u0000use wavelets as an activation function for solving nonlinear differential\u0000equations.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview 量化大规模模型的艺术与科学:全面概述
Pub Date : 2024-09-18 DOI: arxiv-2409.11650
Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao
This paper provides a comprehensive overview of the principles, challenges,and methodologies associated with quantizing large-scale neural network models.As neural networks have evolved towards larger and more complex architecturesto address increasingly sophisticated tasks, the computational and energy costshave escalated significantly. We explore the necessity and impact of model sizegrowth, highlighting the performance benefits as well as the computationalchallenges and environmental considerations. The core focus is on modelquantization as a fundamental approach to mitigate these challenges by reducingmodel size and improving efficiency without substantially compromisingaccuracy. We delve into various quantization techniques, including bothpost-training quantization (PTQ) and quantization-aware training (QAT), andanalyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examinehow these methods address issues like outliers, importance weighting, andactivation quantization, ultimately contributing to more sustainable andaccessible deployment of large-scale models.
本文全面概述了量化大规模神经网络模型的相关原理、挑战和方法。随着神经网络向更大型、更复杂的架构演进,以解决日益复杂的任务,计算和能源成本显著上升。我们探讨了模型规模增长的必要性和影响,强调了性能优势以及计算挑战和环境因素。核心重点是将模型量化作为一种基本方法,通过缩小模型规模和提高效率来缓解这些挑战,同时又不大幅降低精度。我们深入研究了各种量化技术,包括训练后量化(PTQ)和量化感知训练(QAT),并分析了几种最先进的算法,如 LLM-QAT、PEQA(L4Q)、ZeroQuant、SmoothQuant 等。通过比较分析,我们研究了这些方法如何解决异常值、重要性加权和激活量化等问题,最终为更可持续、更可访问的大规模模型部署做出了贡献。
{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":"https://doi.org/arxiv-2409.11650","url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\u0000and methodologies associated with quantizing large-scale neural network models.\u0000As neural networks have evolved towards larger and more complex architectures\u0000to address increasingly sophisticated tasks, the computational and energy costs\u0000have escalated significantly. We explore the necessity and impact of model size\u0000growth, highlighting the performance benefits as well as the computational\u0000challenges and environmental considerations. The core focus is on model\u0000quantization as a fundamental approach to mitigate these challenges by reducing\u0000model size and improving efficiency without substantially compromising\u0000accuracy. We delve into various quantization techniques, including both\u0000post-training quantization (PTQ) and quantization-aware training (QAT), and\u0000analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\u0000ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\u0000how these methods address issues like outliers, importance weighting, and\u0000activation quantization, ultimately contributing to more sustainable and\u0000accessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monomial Matrix Group Equivariant Neural Functional Networks 单项矩阵组等变量神经功能网络
Pub Date : 2024-09-18 DOI: arxiv-2409.11697
Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen
Neural functional networks (NFNs) have recently gained significant attentiondue to their diverse applications, ranging from predicting networkgeneralization and network editing to classifying implicit neuralrepresentation. Previous NFN designs often depend on permutation symmetries inneural networks' weights, which traditionally arise from the unorderedarrangement of neurons in hidden layers. However, these designs do not takeinto account the weight scaling symmetries of $operatorname{ReLU}$ networks,and the weight sign flipping symmetries of $operatorname{sin}$ or$operatorname{tanh}$ networks. In this paper, we extend the study of the groupaction on the network weights from the group of permutation matrices to thegroup of monomial matrices by incorporating scaling/sign-flipping symmetries.Particularly, we encode these scaling/sign-flipping symmetries by designing ourcorresponding equivariant and invariant layers. We name our new family of NFNsthe Monomial Matrix Group Equivariant Neural Functional Networks(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN hasmuch fewer independent trainable parameters compared to the baseline NFNs inthe literature, thus enhancing the model's efficiency. Moreover, for fullyconnected and convolutional neural networks, we theoretically prove that allgroups that leave these networks invariant while acting on their weight spacesare some subgroups of the monomial matrix group. We provide empirical evidencesto demonstrate the advantages of our model over existing baselines, achievingcompetitive performance and efficiency.
最近,神经功能网络(NFN)因其多样化的应用而备受关注,这些应用包括预测网络泛化和网络编辑,以及对隐式神经表征进行分类。以往的 NFN 设计通常依赖于神经网络权重的排列对称性,而这种对称性传统上源于隐藏层中神经元的无序排列。然而,这些设计并没有考虑到 $operatorname{ReLU}$ 网络的权重缩放对称性,以及 $operatorname{sin}$ 或 $operatorname{tanh}$ 网络的权重符号翻转对称性。特别是,我们通过设计相应的等变层和不变层来编码这些缩放/符号翻转对称性。我们将新的 NFN 系列命名为单项矩阵组等变神经功能网络(Monomial-NFN)。由于对称性的扩展,与文献中的基线 NFN 相比,Monomial-NFN 的独立可训练参数要少得多,从而提高了模型的效率。此外,对于全连接神经网络和卷积神经网络,我们从理论上证明了使这些网络在权重空间上保持不变的所有群都是单项式矩阵群的某些子群。我们提供了经验证据来证明我们的模型相对于现有基线的优势,实现了具有竞争力的性能和效率。
{"title":"Monomial Matrix Group Equivariant Neural Functional Networks","authors":"Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen","doi":"arxiv-2409.11697","DOIUrl":"https://doi.org/arxiv-2409.11697","url":null,"abstract":"Neural functional networks (NFNs) have recently gained significant attention\u0000due to their diverse applications, ranging from predicting network\u0000generalization and network editing to classifying implicit neural\u0000representation. Previous NFN designs often depend on permutation symmetries in\u0000neural networks' weights, which traditionally arise from the unordered\u0000arrangement of neurons in hidden layers. However, these designs do not take\u0000into account the weight scaling symmetries of $operatorname{ReLU}$ networks,\u0000and the weight sign flipping symmetries of $operatorname{sin}$ or\u0000$operatorname{tanh}$ networks. In this paper, we extend the study of the group\u0000action on the network weights from the group of permutation matrices to the\u0000group of monomial matrices by incorporating scaling/sign-flipping symmetries.\u0000Particularly, we encode these scaling/sign-flipping symmetries by designing our\u0000corresponding equivariant and invariant layers. We name our new family of NFNs\u0000the Monomial Matrix Group Equivariant Neural Functional Networks\u0000(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has\u0000much fewer independent trainable parameters compared to the baseline NFNs in\u0000the literature, thus enhancing the model's efficiency. Moreover, for fully\u0000connected and convolutional neural networks, we theoretically prove that all\u0000groups that leave these networks invariant while acting on their weight spaces\u0000are some subgroups of the monomial matrix group. We provide empirical evidences\u0000to demonstrate the advantages of our model over existing baselines, achieving\u0000competitive performance and efficiency.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers 卷积层谱规范的严密而高效的上界
Pub Date : 2024-09-18 DOI: arxiv-2409.11859
Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba
Controlling the spectral norm of the Jacobian matrix, which is related to theconvolution operation, has been shown to improve generalization, trainingstability and robustness in CNNs. Existing methods for computing the normeither tend to overestimate it or their performance may deteriorate quicklywith increasing the input and kernel sizes. In this paper, we demonstrate thatthe tensor version of the spectral norm of a four-dimensional convolutionkernel, up to a constant factor, serves as an upper bound for the spectral normof the Jacobian matrix associated with the convolution operation. This newupper bound is independent of the input image resolution, differentiable andcan be efficiently calculated during training. Through experiments, wedemonstrate how this new bound can be used to improve the performance ofconvolutional architectures.
控制与卷积操作相关的雅各布矩阵的谱规范已被证明可以提高 CNN 的泛化、训练稳定性和鲁棒性。现有的雅各布矩阵计算方法往往会高估该值,或者随着输入和核大小的增加,其性能会迅速下降。在本文中,我们证明了四维卷积核的张量版谱规范(直到一个常数因子)可作为与卷积操作相关的雅各布矩阵的谱规范的上界。这个新的上界与输入图像的分辨率无关,可微分,并能在训练过程中有效计算。通过实验,我们展示了如何利用这一新上界来提高卷积架构的性能。
{"title":"Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers","authors":"Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba","doi":"arxiv-2409.11859","DOIUrl":"https://doi.org/arxiv-2409.11859","url":null,"abstract":"Controlling the spectral norm of the Jacobian matrix, which is related to the\u0000convolution operation, has been shown to improve generalization, training\u0000stability and robustness in CNNs. Existing methods for computing the norm\u0000either tend to overestimate it or their performance may deteriorate quickly\u0000with increasing the input and kernel sizes. In this paper, we demonstrate that\u0000the tensor version of the spectral norm of a four-dimensional convolution\u0000kernel, up to a constant factor, serves as an upper bound for the spectral norm\u0000of the Jacobian matrix associated with the convolution operation. This new\u0000upper bound is independent of the input image resolution, differentiable and\u0000can be efficiently calculated during training. Through experiments, we\u0000demonstrate how this new bound can be used to improve the performance of\u0000convolutional architectures.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning FedLF: 联合长尾学习中的自适应对数调整和特征优化
Pub Date : 2024-09-18 DOI: arxiv-2409.12105
Xiuhua Lu, Peng Li, Xuefeng Jiang
Federated learning offers a paradigm to the challenge of preserving privacyin distributed machine learning. However, datasets distributed across eachclient in the real world are inevitably heterogeneous, and if the datasets canbe globally aggregated, they tend to be long-tailed distributed, which greatlyaffects the performance of the model. The traditional approach to federatedlearning primarily addresses the heterogeneity of data among clients, yet itfails to address the phenomenon of class-wise bias in global long-tailed data.This results in the trained model focusing on the head classes while neglectingthe equally important tail classes. Consequently, it is essential to develop amethodology that considers classes holistically. To address the above problems,we propose a new method FedLF, which introduces three modifications in thelocal training phase: adaptive logit adjustment, continuous class centredoptimization, and feature decorrelation. We compare seven state-of-the-artmethods with varying degrees of data heterogeneity and long-taileddistribution. Extensive experiments on benchmark datasets CIFAR-10-LT andCIFAR-100-LT demonstrate that our approach effectively mitigates the problem ofmodel performance degradation due to data heterogeneity and long-taileddistribution. our code is available at https://github.com/18sym/FedLF.
联盟学习为解决分布式机器学习中的隐私保护难题提供了一种范例。然而,现实世界中分布在每个客户端的数据集不可避免地具有异质性,如果数据集可以进行全局聚合,它们往往是长尾分布的,这会极大地影响模型的性能。传统的联合学习方法主要解决的是客户端之间数据的异构性问题,但却无法解决全局长尾数据中的类偏差现象。因此,开发一种全面考虑类别的方法至关重要。为了解决上述问题,我们提出了一种新方法 FedLF,它在局部训练阶段引入了三项修正:自适应 logit 调整、连续类中心优化和特征去相关性。我们比较了数据异质性和长尾分布程度不同的七种最新方法。在基准数据集 CIFAR-10-LT 和 CIFAR-100-LT 上进行的大量实验证明,我们的方法能有效缓解数据异质性和长尾分布导致的模型性能下降问题。我们的代码可在 https://github.com/18sym/FedLF 上获取。
{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Advances in OOD Detection: Problems and Approaches OOD 检测的最新进展:问题与方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11884
Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang
Out-of-distribution (OOD) detection aims to detect test samples outside thetraining category space, which is an essential component in building reliablemachine learning systems. Existing reviews on OOD detection primarily focus onmethod taxonomy, surveying the field by categorizing various approaches.However, many recent works concentrate on non-traditional OOD detectionscenarios, such as test-time adaptation, multi-modal data sources and othernovel contexts. In this survey, we uniquely review recent advances in OODdetection from the problem scenario perspective for the first time. Accordingto whether the training process is completely controlled, we divide OODdetection methods into training-driven and training-agnostic. Besides,considering the rapid development of pre-trained models, large pre-trainedmodel-based OOD detection is also regarded as an important category anddiscussed separately. Furthermore, we provide a discussion of the evaluationscenarios, a variety of applications, and several future research directions.We believe this survey with new taxonomy will benefit the proposal of newmethods and the expansion of more practical scenarios. A curated list ofrelated papers is provided in the Github repository:url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}
分布外(OOD)检测旨在检测训练类别空间之外的测试样本,是构建可靠的机器学习系统的重要组成部分。现有的 OOD 检测综述主要集中在方法分类学方面,通过对各种方法进行分类来对该领域进行调查。然而,最近的许多作品都集中在非传统的 OOD 检测场景上,如测试时间适应、多模式数据源和其他新的背景。在本研究中,我们首次从问题场景的角度独特地回顾了 OOD 检测的最新进展。根据训练过程是否完全可控,我们将 OOD 检测方法分为训练驱动型和训练无关型。此外,考虑到预训练模型的快速发展,基于大型预训练模型的 OOD 检测也被视为一个重要类别,并单独进行了讨论。此外,我们还讨论了评估场景、各种应用以及未来的几个研究方向。我们相信,这份带有新分类法的调查报告将有助于提出新方法和扩展更多实用场景。Github 存储库中提供了相关论文的精选列表:url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}。
{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations 更少的内存意味着更小的 GPU:使用压缩激活的反向传播
Pub Date : 2024-09-18 DOI: arxiv-2409.11902
Daniel Barley, Holger Fröning
The ever-growing scale of deep neural networks (DNNs) has lead to an equallyrapid growth in computational resource requirements. Many recent architectures,most prominently Large Language Models, have to be trained using supercomputerswith thousands of accelerators, such as GPUs or TPUs. Next to the vast numberof floating point operations the memory footprint of DNNs is also exploding. Incontrast, GPU architectures are notoriously short on memory. Even comparativelysmall architectures like some EfficientNet variants cannot be trained on asingle consumer-grade GPU at reasonable mini-batch sizes. During training,intermediate input activations have to be stored until backpropagation forgradient calculation. These make up the vast majority of the memory footprint.In this work we therefore consider compressing activation maps for the backwardpass using pooling, which can reduce both the memory footprint and amount ofdata movement. The forward computation remains uncompressed. We empiricallyshow convergence and study effects on feature detection at the example of thecommon vision architecture ResNet. With this approach we are able to reduce thepeak memory consumption by 29% at the cost of a longer training schedule, whilemaintaining prediction accuracy compared to an uncompressed baseline.
深度神经网络(DNN)的规模不断扩大,导致对计算资源的需求也同样快速增长。最近的许多架构,尤其是大型语言模型,都必须使用配备数千个加速器(如 GPU 或 TPU)的超级计算机进行训练。除了大量浮点运算外,DNN 的内存占用也呈爆炸式增长。与此形成鲜明对比的是,GPU 体系结构的内存不足是众所周知的。即使是像某些 EfficientNet 变体这样相对较小的架构,也无法在单个消费级 GPU 上以合理的小批量规模进行训练。在训练过程中,必须存储中间输入激活,直到反向传播梯度计算为止。因此,在这项工作中,我们考虑使用池化技术压缩后向通路的激活图,这样可以减少内存占用和数据移动量。前向计算仍未压缩。我们以常见的视觉架构 ResNet 为例,通过经验展示了收敛性,并研究了对特征检测的影响。通过这种方法,我们能够将峰值内存消耗减少 29%,但代价是需要更长的训练时间,同时与未压缩的基线相比,预测准确率得以保持。
{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1