首页 > 最新文献

arXiv - CS - Machine Learning最新文献

英文 中文
Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling 强化学习作为改进现实世界生产调度的启发式方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11933
Arthur Müller, Lukas Vollenkemper
The integration of Reinforcement Learning (RL) with heuristic methods is anemerging trend for solving optimization problems, which leverages RL's abilityto learn from the data generated during the search process. One promisingapproach is to train an RL agent as an improvement heuristic, starting with asuboptimal solution that is iteratively improved by applying small changes. Weapply this approach to a real-world multiobjective production schedulingproblem. Our approach utilizes a network architecture that includes Transformerencoding to learn the relationships between jobs. Afterwards, a probabilitymatrix is generated from which pairs of jobs are sampled and then swapped toimprove the solution. We benchmarked our approach against other heuristicsusing real data from our industry partner, demonstrating its superiorperformance.
强化学习(RL)与启发式方法的结合是解决优化问题的一个新兴趋势,它充分利用了 RL 从搜索过程中产生的数据中学习的能力。一种很有前景的方法是将 RL 代理作为改进启发式方法来训练,从次优解开始,通过应用微小的变化进行迭代改进。我们将这种方法应用于现实世界中的多目标生产调度问题。我们的方法采用了一种网络架构,其中包括转换编码(Transformerencoding)来学习工作之间的关系。之后,生成一个概率矩阵,从中抽取成对的工作,然后进行交换,以改进解决方案。我们利用行业合作伙伴提供的真实数据,将我们的方法与其他启发式方法进行了对比,证明了它的优越性能。
{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":"https://doi.org/arxiv-2409.11933","url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\u0000emerging trend for solving optimization problems, which leverages RL's ability\u0000to learn from the data generated during the search process. One promising\u0000approach is to train an RL agent as an improvement heuristic, starting with a\u0000suboptimal solution that is iteratively improved by applying small changes. We\u0000apply this approach to a real-world multiobjective production scheduling\u0000problem. Our approach utilizes a network architecture that includes Transformer\u0000encoding to learn the relationships between jobs. Afterwards, a probability\u0000matrix is generated from which pairs of jobs are sampled and then swapped to\u0000improve the solution. We benchmarked our approach against other heuristics\u0000using real data from our industry partner, demonstrating its superior\u0000performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction 交通事故死亡率预测的可解释机器学习方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11929
Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq
Road traffic accidents (RTA) pose a significant public health threatworldwide, leading to considerable loss of life and economic burdens. This isparticularly acute in developing countries like Bangladesh. Building reliablemodels to forecast crash outcomes is crucial for implementing effectivepreventive measures. To aid in developing targeted safety interventions, thisstudy presents a machine learning-based approach for classifying fatal andnon-fatal road accident outcomes using data from the Dhaka metropolitan trafficcrash database from 2017 to 2022. Our framework utilizes a range of machinelearning classification algorithms, comprising Logistic Regression, SupportVector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,LightGBM, and Artificial Neural Network. We prioritize model interpretabilityby employing the SHAP (SHapley Additive exPlanations) method, which elucidatesthe key factors influencing accident fatality. Our results demonstrate thatLightGBM outperforms other models, achieving a ROC-AUC score of 0.72. Theglobal, local, and feature dependency analyses are conducted to acquire deeperinsights into the behavior of the model. SHAP analysis reveals that casualtyclass, time of accident, location, vehicle type, and road type play pivotalroles in determining fatality risk. These findings offer valuable insights forpolicymakers and road safety practitioners in developing countries, enablingthe implementation of evidence-based strategies to reduce traffic crashfatalities.
道路交通事故(RTA)在全球范围内对公共健康构成了严重威胁,造成了巨大的生命损失和经济负担。这在孟加拉国等发展中国家尤为严重。建立可靠的模型来预测交通事故的结果,对于实施有效的预防措施至关重要。为了帮助制定有针对性的安全干预措施,本研究提出了一种基于机器学习的方法,利用达卡大都市交通事故数据库中 2017 年至 2022 年的数据,对致命和非致命道路交通事故结果进行分类。我们的框架采用了一系列机器学习分类算法,包括逻辑回归、支持向量机、奈夫贝叶斯、随机森林、决策树、梯度提升、LightGBM 和人工神经网络。我们采用 SHAP(SHapley Additive exPlanations)方法优先考虑模型的可解释性,该方法阐明了影响事故死亡率的关键因素。结果表明,LightGBM 优于其他模型,其 ROC-AUC 得分为 0.72。我们还进行了全局、局部和特征依赖性分析,以便更深入地了解模型的行为。SHAP 分析表明,伤亡类别、事故时间、地点、车辆类型和道路类型在决定死亡风险方面起着关键作用。这些发现为发展中国家的政策制定者和道路安全从业人员提供了宝贵的见解,有助于实施循证战略,减少交通事故死亡人数。
{"title":"An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction","authors":"Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq","doi":"arxiv-2409.11929","DOIUrl":"https://doi.org/arxiv-2409.11929","url":null,"abstract":"Road traffic accidents (RTA) pose a significant public health threat\u0000worldwide, leading to considerable loss of life and economic burdens. This is\u0000particularly acute in developing countries like Bangladesh. Building reliable\u0000models to forecast crash outcomes is crucial for implementing effective\u0000preventive measures. To aid in developing targeted safety interventions, this\u0000study presents a machine learning-based approach for classifying fatal and\u0000non-fatal road accident outcomes using data from the Dhaka metropolitan traffic\u0000crash database from 2017 to 2022. Our framework utilizes a range of machine\u0000learning classification algorithms, comprising Logistic Regression, Support\u0000Vector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,\u0000LightGBM, and Artificial Neural Network. We prioritize model interpretability\u0000by employing the SHAP (SHapley Additive exPlanations) method, which elucidates\u0000the key factors influencing accident fatality. Our results demonstrate that\u0000LightGBM outperforms other models, achieving a ROC-AUC score of 0.72. The\u0000global, local, and feature dependency analyses are conducted to acquire deeper\u0000insights into the behavior of the model. SHAP analysis reveals that casualty\u0000class, time of accident, location, vehicle type, and road type play pivotal\u0000roles in determining fatality risk. These findings offer valuable insights for\u0000policymakers and road safety practitioners in developing countries, enabling\u0000the implementation of evidence-based strategies to reduce traffic crash\u0000fatalities.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient wavelet-based physics-informed neural networks for singularly perturbed problems 用于奇异扰动问题的高效小波物理信息神经网络
Pub Date : 2024-09-18 DOI: arxiv-2409.11847
Himanshu Pandey, Anshima Singh, Ratikanta Behera
Physics-informed neural networks (PINNs) are a class of deep learning modelsthat utilize physics as differential equations to address complex problems,including ones that may involve limited data availability. However, tacklingsolutions of differential equations with oscillations or singular perturbationsand shock-like structures becomes challenging for PINNs. Considering thesechallenges, we designed an efficient wavelet-based PINNs (W-PINNs) model tosolve singularly perturbed differential equations. Here, we represent thesolution in wavelet space using a family of smooth-compactly supportedwavelets. This framework represents the solution of a differential equationwith significantly fewer degrees of freedom while still retaining in capturing,identifying, and analyzing the local structure of complex physical phenomena.The architecture allows the training process to search for a solution withinwavelet space, making the process faster and more accurate. The proposed modeldoes not rely on automatic differentiations for derivatives involved indifferential equations and does not require any prior information regarding thebehavior of the solution, such as the location of abrupt features. Thus,through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturinglocalized nonlinear information, making them well-suited for problems showingabrupt behavior in certain regions, such as singularly perturbed problems. Theefficiency and accuracy of the proposed neural network model are demonstratedin various test problems, i.e., highly singularly perturbed nonlineardifferential equations, the FitzHugh-Nagumo (FHN), and Predator-preyinteraction models. The proposed design model exhibits impressive comparisonswith traditional PINNs and the recently developed wavelet-based PINNs, whichuse wavelets as an activation function for solving nonlinear differentialequations.
物理信息神经网络(PINNs)是一类利用物理微分方程解决复杂问题的深度学习模型,包括那些可能涉及有限数据可用性的问题。然而,对于 PINNs 来说,处理具有振荡或奇异扰动和类似冲击结构的微分方程求解变得具有挑战性。考虑到这些挑战,我们设计了一种高效的基于小波的 PINNs(W-PINNs)模型来求解奇异扰动微分方程。在这里,我们使用平滑-紧凑支持的小波系列来表示小波空间中的解。这种框架表示微分方程的解,自由度大大降低,同时仍能捕捉、识别和分析复杂物理现象的局部结构。这种架构允许训练过程在小波空间内搜索解,从而使过程更快、更准确。所提出的模型不依赖于微分方程导数的自动微分,也不需要任何有关解的行为的先验信息,如突变特征的位置。因此,通过小波与 PINNs 的策略性融合,W-PINNs 擅长捕捉局部非线性信息,使其非常适合处理在某些区域显示中断行为的问题,如奇异扰动问题。在各种测试问题中,即高度奇异扰动非线性微分方程、FitzHugh-Nagumo(FHN)和捕食者-猎物相互作用模型中,证明了所提出的神经网络模型的高效性和准确性。所提出的设计模型与传统 PINN 和最近开发的基于小波的 PINN(使用小波作为激活函数求解非线性微分方程)进行了比较,结果令人印象深刻。
{"title":"An efficient wavelet-based physics-informed neural networks for singularly perturbed problems","authors":"Himanshu Pandey, Anshima Singh, Ratikanta Behera","doi":"arxiv-2409.11847","DOIUrl":"https://doi.org/arxiv-2409.11847","url":null,"abstract":"Physics-informed neural networks (PINNs) are a class of deep learning models\u0000that utilize physics as differential equations to address complex problems,\u0000including ones that may involve limited data availability. However, tackling\u0000solutions of differential equations with oscillations or singular perturbations\u0000and shock-like structures becomes challenging for PINNs. Considering these\u0000challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to\u0000solve singularly perturbed differential equations. Here, we represent the\u0000solution in wavelet space using a family of smooth-compactly supported\u0000wavelets. This framework represents the solution of a differential equation\u0000with significantly fewer degrees of freedom while still retaining in capturing,\u0000identifying, and analyzing the local structure of complex physical phenomena.\u0000The architecture allows the training process to search for a solution within\u0000wavelet space, making the process faster and more accurate. The proposed model\u0000does not rely on automatic differentiations for derivatives involved in\u0000differential equations and does not require any prior information regarding the\u0000behavior of the solution, such as the location of abrupt features. Thus,\u0000through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing\u0000localized nonlinear information, making them well-suited for problems showing\u0000abrupt behavior in certain regions, such as singularly perturbed problems. The\u0000efficiency and accuracy of the proposed neural network model are demonstrated\u0000in various test problems, i.e., highly singularly perturbed nonlinear\u0000differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey\u0000interaction models. The proposed design model exhibits impressive comparisons\u0000with traditional PINNs and the recently developed wavelet-based PINNs, which\u0000use wavelets as an activation function for solving nonlinear differential\u0000equations.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features 具有任意特征的线性时差学习的几乎确定收敛性
Pub Date : 2024-09-18 DOI: arxiv-2409.12135
Jiuqi Wang, Shangtong Zhang
Temporal difference (TD) learning with linear function approximation,abbreviated as linear TD, is a classic and powerful prediction algorithm inreinforcement learning. While it is well understood that linear TD convergesalmost surely to a unique point, this convergence traditionally requires theassumption that the features used by the approximator are linearly independent.However, this linear independence assumption does not hold in many practicalscenarios. This work is the first to establish the almost sure convergence oflinear TD without requiring linearly independent features. In fact, we do notmake any assumptions on the features. We prove that the approximated valuefunction converges to a unique point and the weight iterates converge to a set.We also establish a notion of local stability of the weight iterates.Importantly, we do not need to introduce any other additional assumptions anddo not need to make any modification to the linear TD algorithm. Key to ouranalysis is a novel characterization of bounded invariant sets of the mean ODEof linear TD.
带有线性函数近似的时差(TD)学习,简称线性 TD,是强化学习中一种经典而强大的预测算法。虽然人们都知道线性 TD 几乎肯定会收敛到一个唯一点,但这种收敛传统上需要假设近似器使用的特征是线性独立的。这项工作首次在不要求线性独立特征的情况下,建立了线性 TD 几乎确定的收敛性。事实上,我们对特征不做任何假设。我们证明了近似值函数会收敛到一个唯一点,权重迭代会收敛到一个集合。我们还建立了权重迭代的局部稳定性概念。我们分析的关键是线性 TD 平均 ODE 有界不变集的新特征。
{"title":"Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features","authors":"Jiuqi Wang, Shangtong Zhang","doi":"arxiv-2409.12135","DOIUrl":"https://doi.org/arxiv-2409.12135","url":null,"abstract":"Temporal difference (TD) learning with linear function approximation,\u0000abbreviated as linear TD, is a classic and powerful prediction algorithm in\u0000reinforcement learning. While it is well understood that linear TD converges\u0000almost surely to a unique point, this convergence traditionally requires the\u0000assumption that the features used by the approximator are linearly independent.\u0000However, this linear independence assumption does not hold in many practical\u0000scenarios. This work is the first to establish the almost sure convergence of\u0000linear TD without requiring linearly independent features. In fact, we do not\u0000make any assumptions on the features. We prove that the approximated value\u0000function converges to a unique point and the weight iterates converge to a set.\u0000We also establish a notion of local stability of the weight iterates.\u0000Importantly, we do not need to introduce any other additional assumptions and\u0000do not need to make any modification to the linear TD algorithm. Key to our\u0000analysis is a novel characterization of bounded invariant sets of the mean ODE\u0000of linear TD.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview 量化大规模模型的艺术与科学:全面概述
Pub Date : 2024-09-18 DOI: arxiv-2409.11650
Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao
This paper provides a comprehensive overview of the principles, challenges,and methodologies associated with quantizing large-scale neural network models.As neural networks have evolved towards larger and more complex architecturesto address increasingly sophisticated tasks, the computational and energy costshave escalated significantly. We explore the necessity and impact of model sizegrowth, highlighting the performance benefits as well as the computationalchallenges and environmental considerations. The core focus is on modelquantization as a fundamental approach to mitigate these challenges by reducingmodel size and improving efficiency without substantially compromisingaccuracy. We delve into various quantization techniques, including bothpost-training quantization (PTQ) and quantization-aware training (QAT), andanalyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examinehow these methods address issues like outliers, importance weighting, andactivation quantization, ultimately contributing to more sustainable andaccessible deployment of large-scale models.
本文全面概述了量化大规模神经网络模型的相关原理、挑战和方法。随着神经网络向更大型、更复杂的架构演进,以解决日益复杂的任务,计算和能源成本显著上升。我们探讨了模型规模增长的必要性和影响,强调了性能优势以及计算挑战和环境因素。核心重点是将模型量化作为一种基本方法,通过缩小模型规模和提高效率来缓解这些挑战,同时又不大幅降低精度。我们深入研究了各种量化技术,包括训练后量化(PTQ)和量化感知训练(QAT),并分析了几种最先进的算法,如 LLM-QAT、PEQA(L4Q)、ZeroQuant、SmoothQuant 等。通过比较分析,我们研究了这些方法如何解决异常值、重要性加权和激活量化等问题,最终为更可持续、更可访问的大规模模型部署做出了贡献。
{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":"https://doi.org/arxiv-2409.11650","url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\u0000and methodologies associated with quantizing large-scale neural network models.\u0000As neural networks have evolved towards larger and more complex architectures\u0000to address increasingly sophisticated tasks, the computational and energy costs\u0000have escalated significantly. We explore the necessity and impact of model size\u0000growth, highlighting the performance benefits as well as the computational\u0000challenges and environmental considerations. The core focus is on model\u0000quantization as a fundamental approach to mitigate these challenges by reducing\u0000model size and improving efficiency without substantially compromising\u0000accuracy. We delve into various quantization techniques, including both\u0000post-training quantization (PTQ) and quantization-aware training (QAT), and\u0000analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\u0000ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\u0000how these methods address issues like outliers, importance weighting, and\u0000activation quantization, ultimately contributing to more sustainable and\u0000accessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monomial Matrix Group Equivariant Neural Functional Networks 单项矩阵组等变量神经功能网络
Pub Date : 2024-09-18 DOI: arxiv-2409.11697
Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen
Neural functional networks (NFNs) have recently gained significant attentiondue to their diverse applications, ranging from predicting networkgeneralization and network editing to classifying implicit neuralrepresentation. Previous NFN designs often depend on permutation symmetries inneural networks' weights, which traditionally arise from the unorderedarrangement of neurons in hidden layers. However, these designs do not takeinto account the weight scaling symmetries of $operatorname{ReLU}$ networks,and the weight sign flipping symmetries of $operatorname{sin}$ or$operatorname{tanh}$ networks. In this paper, we extend the study of the groupaction on the network weights from the group of permutation matrices to thegroup of monomial matrices by incorporating scaling/sign-flipping symmetries.Particularly, we encode these scaling/sign-flipping symmetries by designing ourcorresponding equivariant and invariant layers. We name our new family of NFNsthe Monomial Matrix Group Equivariant Neural Functional Networks(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN hasmuch fewer independent trainable parameters compared to the baseline NFNs inthe literature, thus enhancing the model's efficiency. Moreover, for fullyconnected and convolutional neural networks, we theoretically prove that allgroups that leave these networks invariant while acting on their weight spacesare some subgroups of the monomial matrix group. We provide empirical evidencesto demonstrate the advantages of our model over existing baselines, achievingcompetitive performance and efficiency.
最近,神经功能网络(NFN)因其多样化的应用而备受关注,这些应用包括预测网络泛化和网络编辑,以及对隐式神经表征进行分类。以往的 NFN 设计通常依赖于神经网络权重的排列对称性,而这种对称性传统上源于隐藏层中神经元的无序排列。然而,这些设计并没有考虑到 $operatorname{ReLU}$ 网络的权重缩放对称性,以及 $operatorname{sin}$ 或 $operatorname{tanh}$ 网络的权重符号翻转对称性。特别是,我们通过设计相应的等变层和不变层来编码这些缩放/符号翻转对称性。我们将新的 NFN 系列命名为单项矩阵组等变神经功能网络(Monomial-NFN)。由于对称性的扩展,与文献中的基线 NFN 相比,Monomial-NFN 的独立可训练参数要少得多,从而提高了模型的效率。此外,对于全连接神经网络和卷积神经网络,我们从理论上证明了使这些网络在权重空间上保持不变的所有群都是单项式矩阵群的某些子群。我们提供了经验证据来证明我们的模型相对于现有基线的优势,实现了具有竞争力的性能和效率。
{"title":"Monomial Matrix Group Equivariant Neural Functional Networks","authors":"Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen","doi":"arxiv-2409.11697","DOIUrl":"https://doi.org/arxiv-2409.11697","url":null,"abstract":"Neural functional networks (NFNs) have recently gained significant attention\u0000due to their diverse applications, ranging from predicting network\u0000generalization and network editing to classifying implicit neural\u0000representation. Previous NFN designs often depend on permutation symmetries in\u0000neural networks' weights, which traditionally arise from the unordered\u0000arrangement of neurons in hidden layers. However, these designs do not take\u0000into account the weight scaling symmetries of $operatorname{ReLU}$ networks,\u0000and the weight sign flipping symmetries of $operatorname{sin}$ or\u0000$operatorname{tanh}$ networks. In this paper, we extend the study of the group\u0000action on the network weights from the group of permutation matrices to the\u0000group of monomial matrices by incorporating scaling/sign-flipping symmetries.\u0000Particularly, we encode these scaling/sign-flipping symmetries by designing our\u0000corresponding equivariant and invariant layers. We name our new family of NFNs\u0000the Monomial Matrix Group Equivariant Neural Functional Networks\u0000(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has\u0000much fewer independent trainable parameters compared to the baseline NFNs in\u0000the literature, thus enhancing the model's efficiency. Moreover, for fully\u0000connected and convolutional neural networks, we theoretically prove that all\u0000groups that leave these networks invariant while acting on their weight spaces\u0000are some subgroups of the monomial matrix group. We provide empirical evidences\u0000to demonstrate the advantages of our model over existing baselines, achieving\u0000competitive performance and efficiency.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers 卷积层谱规范的严密而高效的上界
Pub Date : 2024-09-18 DOI: arxiv-2409.11859
Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba
Controlling the spectral norm of the Jacobian matrix, which is related to theconvolution operation, has been shown to improve generalization, trainingstability and robustness in CNNs. Existing methods for computing the normeither tend to overestimate it or their performance may deteriorate quicklywith increasing the input and kernel sizes. In this paper, we demonstrate thatthe tensor version of the spectral norm of a four-dimensional convolutionkernel, up to a constant factor, serves as an upper bound for the spectral normof the Jacobian matrix associated with the convolution operation. This newupper bound is independent of the input image resolution, differentiable andcan be efficiently calculated during training. Through experiments, wedemonstrate how this new bound can be used to improve the performance ofconvolutional architectures.
控制与卷积操作相关的雅各布矩阵的谱规范已被证明可以提高 CNN 的泛化、训练稳定性和鲁棒性。现有的雅各布矩阵计算方法往往会高估该值,或者随着输入和核大小的增加,其性能会迅速下降。在本文中,我们证明了四维卷积核的张量版谱规范(直到一个常数因子)可作为与卷积操作相关的雅各布矩阵的谱规范的上界。这个新的上界与输入图像的分辨率无关,可微分,并能在训练过程中有效计算。通过实验,我们展示了如何利用这一新上界来提高卷积架构的性能。
{"title":"Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers","authors":"Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba","doi":"arxiv-2409.11859","DOIUrl":"https://doi.org/arxiv-2409.11859","url":null,"abstract":"Controlling the spectral norm of the Jacobian matrix, which is related to the\u0000convolution operation, has been shown to improve generalization, training\u0000stability and robustness in CNNs. Existing methods for computing the norm\u0000either tend to overestimate it or their performance may deteriorate quickly\u0000with increasing the input and kernel sizes. In this paper, we demonstrate that\u0000the tensor version of the spectral norm of a four-dimensional convolution\u0000kernel, up to a constant factor, serves as an upper bound for the spectral norm\u0000of the Jacobian matrix associated with the convolution operation. This new\u0000upper bound is independent of the input image resolution, differentiable and\u0000can be efficiently calculated during training. Through experiments, we\u0000demonstrate how this new bound can be used to improve the performance of\u0000convolutional architectures.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes 拓扑深度学习与状态空间模型:简约复合物的曼巴方法
Pub Date : 2024-09-18 DOI: arxiv-2409.12033
Marco Montagna, Simone Scardapane, Lev Telyatnikov
Graph Neural Networks based on the message-passing (MP) mechanism are adominant approach for handling graph-structured data. However, they areinherently limited to modeling only pairwise interactions, making it difficultto explicitly capture the complexity of systems with $n$-body relations. Toaddress this, topological deep learning has emerged as a promising field forstudying and modeling higher-order interactions using various topologicaldomains, such as simplicial and cellular complexes. While these new domainsprovide powerful representations, they introduce new challenges, such aseffectively modeling the interactions among higher-order structures throughhigher-order MP. Meanwhile, structured state-space sequence models have provento be effective for sequence modeling and have recently been adapted for graphdata by encoding the neighborhood of a node as a sequence, thereby avoiding theMP mechanism. In this work, we propose a novel architecture designed to operatewith simplicial complexes, utilizing the Mamba state-space model as itsbackbone. Our approach generates sequences for the nodes based on theneighboring cells, enabling direct communication between all higher-orderstructures, regardless of their rank. We extensively validate our model,demonstrating that it achieves competitive performance compared tostate-of-the-art models developed for simplicial complexes.
基于消息传递(MP)机制的图神经网络是处理图结构数据的主要方法。然而,它们本身仅限于建模成对的相互作用,因此难以明确捕捉具有 $n$ 体关系的系统的复杂性。为了解决这个问题,拓扑深度学习应运而生,成为利用各种拓扑域(如单纯形和细胞复合物)研究和建模高阶相互作用的一个前景广阔的领域。虽然这些新领域提供了强大的表征,但也带来了新的挑战,例如通过高阶 MP 有效地建模高阶结构之间的相互作用。与此同时,结构化状态空间序列模型已被证明能有效地进行序列建模,最近又通过将节点的邻域编码为序列,从而避免了 MP 机制,使其适用于图数据。在这项工作中,我们提出了一种新颖的架构,旨在利用 Mamba 状态空间模型作为骨干,对简单复合物进行操作。我们的方法根据相邻单元为节点生成序列,从而实现了所有高阶结构之间的直接通信,而不管它们的阶数如何。我们对我们的模型进行了广泛验证,证明与针对简单复合物开发的最先进模型相比,我们的模型具有极强的性能竞争力。
{"title":"Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes","authors":"Marco Montagna, Simone Scardapane, Lev Telyatnikov","doi":"arxiv-2409.12033","DOIUrl":"https://doi.org/arxiv-2409.12033","url":null,"abstract":"Graph Neural Networks based on the message-passing (MP) mechanism are a\u0000dominant approach for handling graph-structured data. However, they are\u0000inherently limited to modeling only pairwise interactions, making it difficult\u0000to explicitly capture the complexity of systems with $n$-body relations. To\u0000address this, topological deep learning has emerged as a promising field for\u0000studying and modeling higher-order interactions using various topological\u0000domains, such as simplicial and cellular complexes. While these new domains\u0000provide powerful representations, they introduce new challenges, such as\u0000effectively modeling the interactions among higher-order structures through\u0000higher-order MP. Meanwhile, structured state-space sequence models have proven\u0000to be effective for sequence modeling and have recently been adapted for graph\u0000data by encoding the neighborhood of a node as a sequence, thereby avoiding the\u0000MP mechanism. In this work, we propose a novel architecture designed to operate\u0000with simplicial complexes, utilizing the Mamba state-space model as its\u0000backbone. Our approach generates sequences for the nodes based on the\u0000neighboring cells, enabling direct communication between all higher-order\u0000structures, regardless of their rank. We extensively validate our model,\u0000demonstrating that it achieves competitive performance compared to\u0000state-of-the-art models developed for simplicial complexes.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications 在数据受限的计步应用中进行不确定性估计的高效模型诊断方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11985
Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller
This paper introduces a model-agnostic approach designed to enhanceuncertainty estimation in the predictive modeling of soil properties, a crucialfactor for advancing pedometrics and the practice of digital soil mapping. Foraddressing the typical challenge of data scarcity in soil studies, we presentan improved technique for uncertainty estimation. This method is based on thetransformation of regression tasks into classification problems, which not onlyallows for the production of reliable uncertainty estimates but also enablesthe application of established machine learning algorithms with competitiveperformance that have not yet been utilized in pedometrics. Empirical resultsfrom datasets collected from two German agricultural fields showcase thepractical application of the proposed methodology. Our results and findingssuggest that the proposed approach has the potential to provide betteruncertainty estimation than the models commonly used in pedometrics.
本文介绍了一种与模型无关的方法,旨在加强土壤特性预测建模中的不确定性估计,这是推进土壤测量学和数字土壤制图实践的关键因素。为了解决土壤研究中数据稀缺的典型难题,我们提出了一种改进的不确定性估计技术。该方法基于将回归任务转化为分类问题,这不仅可以产生可靠的不确定性估计,还可以应用尚未在测绘学中使用过的具有竞争力性能的成熟机器学习算法。从德国两个农田收集的数据集得出的经验结果展示了所提方法的实际应用。我们的结果和发现表明,与计步学中常用的模型相比,所提出的方法有可能提供更好的不确定性估计。
{"title":"An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications","authors":"Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller","doi":"arxiv-2409.11985","DOIUrl":"https://doi.org/arxiv-2409.11985","url":null,"abstract":"This paper introduces a model-agnostic approach designed to enhance\u0000uncertainty estimation in the predictive modeling of soil properties, a crucial\u0000factor for advancing pedometrics and the practice of digital soil mapping. For\u0000addressing the typical challenge of data scarcity in soil studies, we present\u0000an improved technique for uncertainty estimation. This method is based on the\u0000transformation of regression tasks into classification problems, which not only\u0000allows for the production of reliable uncertainty estimates but also enables\u0000the application of established machine learning algorithms with competitive\u0000performance that have not yet been utilized in pedometrics. Empirical results\u0000from datasets collected from two German agricultural fields showcase the\u0000practical application of the proposed methodology. Our results and findings\u0000suggest that the proposed approach has the potential to provide better\u0000uncertainty estimation than the models commonly used in pedometrics.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended Deep Submodular Functions 扩展深次模态函数
Pub Date : 2024-09-18 DOI: arxiv-2409.12053
Seyed Mohammad Hosseini, Arash Jamshid, Seyed Mahdi Noormousavi, Mahdi Jafari Siavoshani, Naeimeh Omidvar
We introduce a novel category of set functions called Extended DeepSubmodular functions (EDSFs), which are neural network-representable. EDSFsserve as an extension of Deep Submodular Functions (DSFs), inheriting crucialproperties from DSFs while addressing innate limitations. It is known that DSFscan represent a limiting subset of submodular functions. In contrast, throughan analysis of polymatroid properties, we establish that EDSFs possess thecapability to represent all monotone submodular functions, a notableenhancement compared to DSFs. Furthermore, our findings demonstrate that EDSFscan represent any monotone set function, indicating the family of EDSFs isequivalent to the family of all monotone set functions. Additionally, we provethat EDSFs maintain the concavity inherent in DSFs when the components of theinput vector are non-negative real numbers-an essential feature in certaincombinatorial optimization problems. Through extensive experiments, weillustrate that EDSFs exhibit significantly lower empirical generalizationerror than DSFs in the learning of coverage functions. This suggests that EDSFspresent a promising advancement in the representation and learning of setfunctions with improved generalization capabilities.
我们引入了一类新的集合函数,称为扩展深度子模态函数(EDSF),它是神经网络可表示的。EDSF 是深度子模态函数(DSF)的扩展,继承了 DSF 的关键特性,同时解决了其固有的局限性。众所周知,DSF 可以代表亚模态函数的极限子集。与此相反,通过分析多模态性质,我们发现 EDSF 具有表示所有单调子模态函数的能力,这与 DSF 相比是一个显著的进步。此外,我们的研究结果表明,EDSF可以表示任何单调集合函数,这表明EDSF族等价于所有单调集合函数族。此外,我们还证明了当输入向量的分量为非负实数时,EDSFs 保持了 DSFs 固有的凹性--这是某些组合优化问题的基本特征。通过大量实验,我们证明在学习覆盖函数时,EDSF 的经验泛化误差明显低于 DSF。这表明,EDSF 在表示和学习具有更好泛化能力的集合函数方面是一个很有前途的进步。
{"title":"Extended Deep Submodular Functions","authors":"Seyed Mohammad Hosseini, Arash Jamshid, Seyed Mahdi Noormousavi, Mahdi Jafari Siavoshani, Naeimeh Omidvar","doi":"arxiv-2409.12053","DOIUrl":"https://doi.org/arxiv-2409.12053","url":null,"abstract":"We introduce a novel category of set functions called Extended Deep\u0000Submodular functions (EDSFs), which are neural network-representable. EDSFs\u0000serve as an extension of Deep Submodular Functions (DSFs), inheriting crucial\u0000properties from DSFs while addressing innate limitations. It is known that DSFs\u0000can represent a limiting subset of submodular functions. In contrast, through\u0000an analysis of polymatroid properties, we establish that EDSFs possess the\u0000capability to represent all monotone submodular functions, a notable\u0000enhancement compared to DSFs. Furthermore, our findings demonstrate that EDSFs\u0000can represent any monotone set function, indicating the family of EDSFs is\u0000equivalent to the family of all monotone set functions. Additionally, we prove\u0000that EDSFs maintain the concavity inherent in DSFs when the components of the\u0000input vector are non-negative real numbers-an essential feature in certain\u0000combinatorial optimization problems. Through extensive experiments, we\u0000illustrate that EDSFs exhibit significantly lower empirical generalization\u0000error than DSFs in the learning of coverage functions. This suggests that EDSFs\u0000present a promising advancement in the representation and learning of set\u0000functions with improved generalization capabilities.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1