The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance.
{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":"https://doi.org/arxiv-2409.11933","url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\u0000emerging trend for solving optimization problems, which leverages RL's ability\u0000to learn from the data generated during the search process. One promising\u0000approach is to train an RL agent as an improvement heuristic, starting with a\u0000suboptimal solution that is iteratively improved by applying small changes. We\u0000apply this approach to a real-world multiobjective production scheduling\u0000problem. Our approach utilizes a network architecture that includes Transformer\u0000encoding to learn the relationships between jobs. Afterwards, a probability\u0000matrix is generated from which pairs of jobs are sampled and then swapped to\u0000improve the solution. We benchmarked our approach against other heuristics\u0000using real data from our industry partner, demonstrating its superior\u0000performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Road traffic accidents (RTA) pose a significant public health threat worldwide, leading to considerable loss of life and economic burdens. This is particularly acute in developing countries like Bangladesh. Building reliable models to forecast crash outcomes is crucial for implementing effective preventive measures. To aid in developing targeted safety interventions, this study presents a machine learning-based approach for classifying fatal and non-fatal road accident outcomes using data from the Dhaka metropolitan traffic crash database from 2017 to 2022. Our framework utilizes a range of machine learning classification algorithms, comprising Logistic Regression, Support Vector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting, LightGBM, and Artificial Neural Network. We prioritize model interpretability by employing the SHAP (SHapley Additive exPlanations) method, which elucidates the key factors influencing accident fatality. Our results demonstrate that LightGBM outperforms other models, achieving a ROC-AUC score of 0.72. The global, local, and feature dependency analyses are conducted to acquire deeper insights into the behavior of the model. SHAP analysis reveals that casualty class, time of accident, location, vehicle type, and road type play pivotal roles in determining fatality risk. These findings offer valuable insights for policymakers and road safety practitioners in developing countries, enabling the implementation of evidence-based strategies to reduce traffic crash fatalities.
{"title":"An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction","authors":"Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq","doi":"arxiv-2409.11929","DOIUrl":"https://doi.org/arxiv-2409.11929","url":null,"abstract":"Road traffic accidents (RTA) pose a significant public health threat\u0000worldwide, leading to considerable loss of life and economic burdens. This is\u0000particularly acute in developing countries like Bangladesh. Building reliable\u0000models to forecast crash outcomes is crucial for implementing effective\u0000preventive measures. To aid in developing targeted safety interventions, this\u0000study presents a machine learning-based approach for classifying fatal and\u0000non-fatal road accident outcomes using data from the Dhaka metropolitan traffic\u0000crash database from 2017 to 2022. Our framework utilizes a range of machine\u0000learning classification algorithms, comprising Logistic Regression, Support\u0000Vector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,\u0000LightGBM, and Artificial Neural Network. We prioritize model interpretability\u0000by employing the SHAP (SHapley Additive exPlanations) method, which elucidates\u0000the key factors influencing accident fatality. Our results demonstrate that\u0000LightGBM outperforms other models, achieving a ROC-AUC score of 0.72. The\u0000global, local, and feature dependency analyses are conducted to acquire deeper\u0000insights into the behavior of the model. SHAP analysis reveals that casualty\u0000class, time of accident, location, vehicle type, and road type play pivotal\u0000roles in determining fatality risk. These findings offer valuable insights for\u0000policymakers and road safety practitioners in developing countries, enabling\u0000the implementation of evidence-based strategies to reduce traffic crash\u0000fatalities.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal difference (TD) learning with linear function approximation, abbreviated as linear TD, is a classic and powerful prediction algorithm in reinforcement learning. While it is well understood that linear TD converges almost surely to a unique point, this convergence traditionally requires the assumption that the features used by the approximator are linearly independent. However, this linear independence assumption does not hold in many practical scenarios. This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features. In fact, we do not make any assumptions on the features. We prove that the approximated value function converges to a unique point and the weight iterates converge to a set. We also establish a notion of local stability of the weight iterates. Importantly, we do not need to introduce any other additional assumptions and do not need to make any modification to the linear TD algorithm. Key to our analysis is a novel characterization of bounded invariant sets of the mean ODE of linear TD.
{"title":"Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features","authors":"Jiuqi Wang, Shangtong Zhang","doi":"arxiv-2409.12135","DOIUrl":"https://doi.org/arxiv-2409.12135","url":null,"abstract":"Temporal difference (TD) learning with linear function approximation,\u0000abbreviated as linear TD, is a classic and powerful prediction algorithm in\u0000reinforcement learning. While it is well understood that linear TD converges\u0000almost surely to a unique point, this convergence traditionally requires the\u0000assumption that the features used by the approximator are linearly independent.\u0000However, this linear independence assumption does not hold in many practical\u0000scenarios. This work is the first to establish the almost sure convergence of\u0000linear TD without requiring linearly independent features. In fact, we do not\u0000make any assumptions on the features. We prove that the approximated value\u0000function converges to a unique point and the weight iterates converge to a set.\u0000We also establish a notion of local stability of the weight iterates.\u0000Importantly, we do not need to introduce any other additional assumptions and\u0000do not need to make any modification to the linear TD algorithm. Key to our\u0000analysis is a novel characterization of bounded invariant sets of the mean ODE\u0000of linear TD.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Physics-informed neural networks (PINNs) are a class of deep learning models that utilize physics as differential equations to address complex problems, including ones that may involve limited data availability. However, tackling solutions of differential equations with oscillations or singular perturbations and shock-like structures becomes challenging for PINNs. Considering these challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to solve singularly perturbed differential equations. Here, we represent the solution in wavelet space using a family of smooth-compactly supported wavelets. This framework represents the solution of a differential equation with significantly fewer degrees of freedom while still retaining in capturing, identifying, and analyzing the local structure of complex physical phenomena. The architecture allows the training process to search for a solution within wavelet space, making the process faster and more accurate. The proposed model does not rely on automatic differentiations for derivatives involved in differential equations and does not require any prior information regarding the behavior of the solution, such as the location of abrupt features. Thus, through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing localized nonlinear information, making them well-suited for problems showing abrupt behavior in certain regions, such as singularly perturbed problems. The efficiency and accuracy of the proposed neural network model are demonstrated in various test problems, i.e., highly singularly perturbed nonlinear differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey interaction models. The proposed design model exhibits impressive comparisons with traditional PINNs and the recently developed wavelet-based PINNs, which use wavelets as an activation function for solving nonlinear differential equations.
{"title":"An efficient wavelet-based physics-informed neural networks for singularly perturbed problems","authors":"Himanshu Pandey, Anshima Singh, Ratikanta Behera","doi":"arxiv-2409.11847","DOIUrl":"https://doi.org/arxiv-2409.11847","url":null,"abstract":"Physics-informed neural networks (PINNs) are a class of deep learning models\u0000that utilize physics as differential equations to address complex problems,\u0000including ones that may involve limited data availability. However, tackling\u0000solutions of differential equations with oscillations or singular perturbations\u0000and shock-like structures becomes challenging for PINNs. Considering these\u0000challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to\u0000solve singularly perturbed differential equations. Here, we represent the\u0000solution in wavelet space using a family of smooth-compactly supported\u0000wavelets. This framework represents the solution of a differential equation\u0000with significantly fewer degrees of freedom while still retaining in capturing,\u0000identifying, and analyzing the local structure of complex physical phenomena.\u0000The architecture allows the training process to search for a solution within\u0000wavelet space, making the process faster and more accurate. The proposed model\u0000does not rely on automatic differentiations for derivatives involved in\u0000differential equations and does not require any prior information regarding the\u0000behavior of the solution, such as the location of abrupt features. Thus,\u0000through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing\u0000localized nonlinear information, making them well-suited for problems showing\u0000abrupt behavior in certain regions, such as singularly perturbed problems. The\u0000efficiency and accuracy of the proposed neural network model are demonstrated\u0000in various test problems, i.e., highly singularly perturbed nonlinear\u0000differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey\u0000interaction models. The proposed design model exhibits impressive comparisons\u0000with traditional PINNs and the recently developed wavelet-based PINNs, which\u0000use wavelets as an activation function for solving nonlinear differential\u0000equations.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides a comprehensive overview of the principles, challenges, and methodologies associated with quantizing large-scale neural network models. As neural networks have evolved towards larger and more complex architectures to address increasingly sophisticated tasks, the computational and energy costs have escalated significantly. We explore the necessity and impact of model size growth, highlighting the performance benefits as well as the computational challenges and environmental considerations. The core focus is on model quantization as a fundamental approach to mitigate these challenges by reducing model size and improving efficiency without substantially compromising accuracy. We delve into various quantization techniques, including both post-training quantization (PTQ) and quantization-aware training (QAT), and analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q), ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine how these methods address issues like outliers, importance weighting, and activation quantization, ultimately contributing to more sustainable and accessible deployment of large-scale models.
{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":"https://doi.org/arxiv-2409.11650","url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\u0000and methodologies associated with quantizing large-scale neural network models.\u0000As neural networks have evolved towards larger and more complex architectures\u0000to address increasingly sophisticated tasks, the computational and energy costs\u0000have escalated significantly. We explore the necessity and impact of model size\u0000growth, highlighting the performance benefits as well as the computational\u0000challenges and environmental considerations. The core focus is on model\u0000quantization as a fundamental approach to mitigate these challenges by reducing\u0000model size and improving efficiency without substantially compromising\u0000accuracy. We delve into various quantization techniques, including both\u0000post-training quantization (PTQ) and quantization-aware training (QAT), and\u0000analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\u0000ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\u0000how these methods address issues like outliers, importance weighting, and\u0000activation quantization, ultimately contributing to more sustainable and\u0000accessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen
Neural functional networks (NFNs) have recently gained significant attention due to their diverse applications, ranging from predicting network generalization and network editing to classifying implicit neural representation. Previous NFN designs often depend on permutation symmetries in neural networks' weights, which traditionally arise from the unordered arrangement of neurons in hidden layers. However, these designs do not take into account the weight scaling symmetries of $operatorname{ReLU}$ networks, and the weight sign flipping symmetries of $operatorname{sin}$ or $operatorname{tanh}$ networks. In this paper, we extend the study of the group action on the network weights from the group of permutation matrices to the group of monomial matrices by incorporating scaling/sign-flipping symmetries. Particularly, we encode these scaling/sign-flipping symmetries by designing our corresponding equivariant and invariant layers. We name our new family of NFNs the Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has much fewer independent trainable parameters compared to the baseline NFNs in the literature, thus enhancing the model's efficiency. Moreover, for fully connected and convolutional neural networks, we theoretically prove that all groups that leave these networks invariant while acting on their weight spaces are some subgroups of the monomial matrix group. We provide empirical evidences to demonstrate the advantages of our model over existing baselines, achieving competitive performance and efficiency.
{"title":"Monomial Matrix Group Equivariant Neural Functional Networks","authors":"Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen","doi":"arxiv-2409.11697","DOIUrl":"https://doi.org/arxiv-2409.11697","url":null,"abstract":"Neural functional networks (NFNs) have recently gained significant attention\u0000due to their diverse applications, ranging from predicting network\u0000generalization and network editing to classifying implicit neural\u0000representation. Previous NFN designs often depend on permutation symmetries in\u0000neural networks' weights, which traditionally arise from the unordered\u0000arrangement of neurons in hidden layers. However, these designs do not take\u0000into account the weight scaling symmetries of $operatorname{ReLU}$ networks,\u0000and the weight sign flipping symmetries of $operatorname{sin}$ or\u0000$operatorname{tanh}$ networks. In this paper, we extend the study of the group\u0000action on the network weights from the group of permutation matrices to the\u0000group of monomial matrices by incorporating scaling/sign-flipping symmetries.\u0000Particularly, we encode these scaling/sign-flipping symmetries by designing our\u0000corresponding equivariant and invariant layers. We name our new family of NFNs\u0000the Monomial Matrix Group Equivariant Neural Functional Networks\u0000(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has\u0000much fewer independent trainable parameters compared to the baseline NFNs in\u0000the literature, thus enhancing the model's efficiency. Moreover, for fully\u0000connected and convolutional neural networks, we theoretically prove that all\u0000groups that leave these networks invariant while acting on their weight spaces\u0000are some subgroups of the monomial matrix group. We provide empirical evidences\u0000to demonstrate the advantages of our model over existing baselines, achieving\u0000competitive performance and efficiency.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.
{"title":"Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers","authors":"Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba","doi":"arxiv-2409.11859","DOIUrl":"https://doi.org/arxiv-2409.11859","url":null,"abstract":"Controlling the spectral norm of the Jacobian matrix, which is related to the\u0000convolution operation, has been shown to improve generalization, training\u0000stability and robustness in CNNs. Existing methods for computing the norm\u0000either tend to overestimate it or their performance may deteriorate quickly\u0000with increasing the input and kernel sizes. In this paper, we demonstrate that\u0000the tensor version of the spectral norm of a four-dimensional convolution\u0000kernel, up to a constant factor, serves as an upper bound for the spectral norm\u0000of the Jacobian matrix associated with the convolution operation. This new\u0000upper bound is independent of the input image resolution, differentiable and\u0000can be efficiently calculated during training. Through experiments, we\u0000demonstrate how this new bound can be used to improve the performance of\u0000convolutional architectures.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning. However, datasets distributed across each client in the real world are inevitably heterogeneous, and if the datasets can be globally aggregated, they tend to be long-tailed distributed, which greatly affects the performance of the model. The traditional approach to federated learning primarily addresses the heterogeneity of data among clients, yet it fails to address the phenomenon of class-wise bias in global long-tailed data. This results in the trained model focusing on the head classes while neglecting the equally important tail classes. Consequently, it is essential to develop a methodology that considers classes holistically. To address the above problems, we propose a new method FedLF, which introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation. We compare seven state-of-the-art methods with varying degrees of data heterogeneity and long-tailed distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of model performance degradation due to data heterogeneity and long-tailed distribution. our code is available at https://github.com/18sym/FedLF.
{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Out-of-distribution (OOD) detection aims to detect test samples outside the training category space, which is an essential component in building reliable machine learning systems. Existing reviews on OOD detection primarily focus on method taxonomy, surveying the field by categorizing various approaches. However, many recent works concentrate on non-traditional OOD detection scenarios, such as test-time adaptation, multi-modal data sources and other novel contexts. In this survey, we uniquely review recent advances in OOD detection from the problem scenario perspective for the first time. According to whether the training process is completely controlled, we divide OOD detection methods into training-driven and training-agnostic. Besides, considering the rapid development of pre-trained models, large pre-trained model-based OOD detection is also regarded as an important category and discussed separately. Furthermore, we provide a discussion of the evaluation scenarios, a variety of applications, and several future research directions. We believe this survey with new taxonomy will benefit the proposal of new methods and the expansion of more practical scenarios. A curated list of related papers is provided in the Github repository: url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}
{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators, such as GPUs or TPUs. Next to the vast number of floating point operations the memory footprint of DNNs is also exploding. In contrast, GPU architectures are notoriously short on memory. Even comparatively small architectures like some EfficientNet variants cannot be trained on a single consumer-grade GPU at reasonable mini-batch sizes. During training, intermediate input activations have to be stored until backpropagation for gradient calculation. These make up the vast majority of the memory footprint. In this work we therefore consider compressing activation maps for the backward pass using pooling, which can reduce both the memory footprint and amount of data movement. The forward computation remains uncompressed. We empirically show convergence and study effects on feature detection at the example of the common vision architecture ResNet. With this approach we are able to reduce the peak memory consumption by 29% at the cost of a longer training schedule, while maintaining prediction accuracy compared to an uncompressed baseline.
{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}