Mehroush Banday, Sherin Zafar, Parul Agarwal, M Afshar Alam, Abubeker K M
Coronary heart disease (CHD) is a severe cardiac disease, and hence, its early diagnosis is essential as it improves treatment results and saves money on medical care. The prevailing development of quantum computing and machine learning (ML) technologies may bring practical improvement to the performance of CHD diagnosis. Quantum machine learning (QML) is receiving tremendous interest in various disciplines due to its higher performance and capabilities. A quantum leap in the healthcare industry will increase processing power and optimise multiple models. Techniques for QML have the potential to forecast cardiac disease and help in early detection. To predict the risk of coronary heart disease, a hybrid approach utilizing an ensemble machine learning model based on QML classifiers is presented in this paper. Our approach, with its unique ability to address multidimensional healthcare data, reassures the method's robustness by fusing quantum and classical ML algorithms in a multi-step inferential framework. The marked rise in heart disease and death rates impacts worldwide human health and the global economy. Reducing cardiac morbidity and mortality requires early detection of heart disease. In this research, a hybrid approach utilizes techniques with quantum computing capabilities to tackle complex problems that are not amenable to conventional machine learning algorithms and to minimize computational expenses. The proposed method has been developed in the Raspberry Pi 5 Graphics Processing Unit (GPU) platform and tested on a broad dataset that integrates clinical and imaging data from patients suffering from CHD and healthy controls. Compared to classical machine learning models, the accuracy, sensitivity, F1 score, and specificity of the proposed hybrid QML model used with CHD are manifold higher.
冠心病(CHD)是一种严重的心脏疾病,因此早期诊断至关重要,因为它能提高治疗效果并节省医疗费用。量子计算和机器学习(ML)技术的蓬勃发展可能会切实改善冠心病的诊断性能。量子机器学习(QML)因其更高的性能和能力而受到各学科的极大关注。量子学习技术具有预测心脏病和帮助早期检测的潜力。为了预测冠心病的风险,本文提出了一种混合方法,利用基于 QML 分类器的集合机器学习模型。我们的方法具有处理多维医疗数据的独特能力,通过在多步推理框架中融合量子和经典 ML 算法,保证了该方法的稳健性。心脏病和死亡率的显著上升影响着全球人类健康和全球经济。降低心脏病发病率和死亡率需要对心脏病进行早期检测。在这项研究中,一种混合方法利用具有量子计算能力的技术来解决传统机器学习算法无法解决的复杂问题,并最大限度地减少计算费用。所提出的方法是在 Raspberry Pi 5 图形处理单元(GPU)平台上开发的,并在一个广泛的数据集上进行了测试,该数据集整合了冠心病患者和健康对照组的临床和成像数据。与经典机器学习模型相比,用于冠心病的混合 QML 模型的准确性、灵敏度、F1 分数和特异性都高出数倍。
{"title":"Early Detection of Coronary Heart Disease Using Hybrid Quantum Machine Learning Approach","authors":"Mehroush Banday, Sherin Zafar, Parul Agarwal, M Afshar Alam, Abubeker K M","doi":"arxiv-2409.10932","DOIUrl":"https://doi.org/arxiv-2409.10932","url":null,"abstract":"Coronary heart disease (CHD) is a severe cardiac disease, and hence, its\u0000early diagnosis is essential as it improves treatment results and saves money\u0000on medical care. The prevailing development of quantum computing and machine\u0000learning (ML) technologies may bring practical improvement to the performance\u0000of CHD diagnosis. Quantum machine learning (QML) is receiving tremendous\u0000interest in various disciplines due to its higher performance and capabilities.\u0000A quantum leap in the healthcare industry will increase processing power and\u0000optimise multiple models. Techniques for QML have the potential to forecast\u0000cardiac disease and help in early detection. To predict the risk of coronary\u0000heart disease, a hybrid approach utilizing an ensemble machine learning model\u0000based on QML classifiers is presented in this paper. Our approach, with its\u0000unique ability to address multidimensional healthcare data, reassures the\u0000method's robustness by fusing quantum and classical ML algorithms in a\u0000multi-step inferential framework. The marked rise in heart disease and death\u0000rates impacts worldwide human health and the global economy. Reducing cardiac\u0000morbidity and mortality requires early detection of heart disease. In this\u0000research, a hybrid approach utilizes techniques with quantum computing\u0000capabilities to tackle complex problems that are not amenable to conventional\u0000machine learning algorithms and to minimize computational expenses. The\u0000proposed method has been developed in the Raspberry Pi 5 Graphics Processing\u0000Unit (GPU) platform and tested on a broad dataset that integrates clinical and\u0000imaging data from patients suffering from CHD and healthy controls. Compared to\u0000classical machine learning models, the accuracy, sensitivity, F1 score, and\u0000specificity of the proposed hybrid QML model used with CHD are manifold higher.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intraoperative hypotension (IOH) prediction using Mean Arterial Pressure (MAP) is a critical research area with significant implications for patient outcomes during surgery. However, existing approaches predominantly employ static modeling paradigms that overlook the dynamic nature of physiological signals. In this paper, we introduce a novel Hybrid Multi-Factor (HMF) framework that reformulates IOH prediction as a blood pressure forecasting task. Our framework leverages a Transformer encoder, specifically designed to effectively capture the temporal evolution of MAP series through a patch-based input representation, which segments the input physiological series into informative patches for accurate analysis. To address the challenges of distribution shift in physiological series, our approach incorporates two key innovations: (1) Symmetric normalization and de-normalization processes help mitigate distributional drift in statistical properties, thereby ensuring the model's robustness across varying conditions, and (2) Sequence decomposition, which disaggregates the input series into trend and seasonal components, allowing for a more precise modeling of inherent sequence dependencies. Extensive experiments conducted on two real-world datasets demonstrate the superior performance of our approach compared to competitive baselines, particularly in capturing the nuanced variations in input series that are crucial for accurate IOH prediction.
{"title":"HMF: A Hybrid Multi-Factor Framework for Dynamic Intraoperative Hypotension Prediction","authors":"Mingyue Cheng, Jintao Zhang, Zhiding Liu, Chunli Liu, Yanhu Xie","doi":"arxiv-2409.11064","DOIUrl":"https://doi.org/arxiv-2409.11064","url":null,"abstract":"Intraoperative hypotension (IOH) prediction using Mean Arterial Pressure\u0000(MAP) is a critical research area with significant implications for patient\u0000outcomes during surgery. However, existing approaches predominantly employ\u0000static modeling paradigms that overlook the dynamic nature of physiological\u0000signals. In this paper, we introduce a novel Hybrid Multi-Factor (HMF)\u0000framework that reformulates IOH prediction as a blood pressure forecasting\u0000task. Our framework leverages a Transformer encoder, specifically designed to\u0000effectively capture the temporal evolution of MAP series through a patch-based\u0000input representation, which segments the input physiological series into\u0000informative patches for accurate analysis. To address the challenges of\u0000distribution shift in physiological series, our approach incorporates two key\u0000innovations: (1) Symmetric normalization and de-normalization processes help\u0000mitigate distributional drift in statistical properties, thereby ensuring the\u0000model's robustness across varying conditions, and (2) Sequence decomposition,\u0000which disaggregates the input series into trend and seasonal components,\u0000allowing for a more precise modeling of inherent sequence dependencies.\u0000Extensive experiments conducted on two real-world datasets demonstrate the\u0000superior performance of our approach compared to competitive baselines,\u0000particularly in capturing the nuanced variations in input series that are\u0000crucial for accurate IOH prediction.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Ding, Tingting Dan, Ziquan Wei, Hyuna Cho, Paul J. Laurienti, Won Hwa Kim, Guorong Wu
An unprecedented amount of existing functional Magnetic Resonance Imaging (fMRI) data provides a new opportunity to understand the relationship between functional fluctuation and human cognition/behavior using a data-driven approach. To that end, tremendous efforts have been made in machine learning to predict cognitive states from evolving volumetric images of blood-oxygen-level-dependent (BOLD) signals. Due to the complex nature of brain function, however, the evaluation on learning performance and discoveries are not often consistent across current state-of-the-arts (SOTA). By capitalizing on large-scale existing neuroimaging data (34,887 data samples from six public databases), we seek to establish a well-founded empirical guideline for designing deep models for functional neuroimages by linking the methodology underpinning with knowledge from the neuroscience domain. Specifically, we put the spotlight on (1) What is the current SOTA performance in cognitive task recognition and disease diagnosis using fMRI? (2) What are the limitations of current deep models? and (3) What is the general guideline for selecting the suitable machine learning backbone for new neuroimaging applications? We have conducted a comprehensive evaluation and statistical analysis, in various settings, to answer the above outstanding questions.
现有的功能磁共振成像(fMRI)数据量空前巨大,这为利用数据驱动方法了解功能波动与人类认知/行为之间的关系提供了新的机会。为此,人们在机器学习方面做出了巨大努力,以便从不断变化的血氧水平依赖性(BOLD)信号容积图像中预测认知状态。然而,由于大脑功能的复杂性,对学习性能和发现的评估往往与当前的技术水平(SOTA)不一致。通过利用现有的大规模神经影像数据(来自六个公共数据库的 34,887 个数据样本),我们试图通过将方法论基础与神经科学领域的知识联系起来,为功能神经影像深度模型的设计建立一个有理有据的经验指南。具体来说,我们将重点放在:(1)目前 SOTA 在使用 fMRI 进行认知任务识别和疾病诊断方面的表现如何?(2) 目前的深度模型有哪些局限性? (3) 为新的神经成像应用选择合适的机器学习骨干的一般准则是什么?为了回答上述悬而未决的问题,我们在不同的设置下进行了全面的评估和统计分析。
{"title":"Machine Learning on Dynamic Functional Connectivity: Promise, Pitfalls, and Interpretations","authors":"Jiaqi Ding, Tingting Dan, Ziquan Wei, Hyuna Cho, Paul J. Laurienti, Won Hwa Kim, Guorong Wu","doi":"arxiv-2409.11377","DOIUrl":"https://doi.org/arxiv-2409.11377","url":null,"abstract":"An unprecedented amount of existing functional Magnetic Resonance Imaging\u0000(fMRI) data provides a new opportunity to understand the relationship between\u0000functional fluctuation and human cognition/behavior using a data-driven\u0000approach. To that end, tremendous efforts have been made in machine learning to\u0000predict cognitive states from evolving volumetric images of\u0000blood-oxygen-level-dependent (BOLD) signals. Due to the complex nature of brain\u0000function, however, the evaluation on learning performance and discoveries are\u0000not often consistent across current state-of-the-arts (SOTA). By capitalizing\u0000on large-scale existing neuroimaging data (34,887 data samples from six public\u0000databases), we seek to establish a well-founded empirical guideline for\u0000designing deep models for functional neuroimages by linking the methodology\u0000underpinning with knowledge from the neuroscience domain. Specifically, we put\u0000the spotlight on (1) What is the current SOTA performance in cognitive task\u0000recognition and disease diagnosis using fMRI? (2) What are the limitations of\u0000current deep models? and (3) What is the general guideline for selecting the\u0000suitable machine learning backbone for new neuroimaging applications? We have\u0000conducted a comprehensive evaluation and statistical analysis, in various\u0000settings, to answer the above outstanding questions.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edvin Listo Zec, Tom Hagander, Eric Ihre-Thomason, Sarunas Girdzijauskas
Decentralized Learning (DL) enables privacy-preserving collaboration among organizations or users to enhance the performance of local deep learning models. However, model aggregation becomes challenging when client data is heterogeneous, and identifying compatible collaborators without direct data exchange remains a pressing issue. In this paper, we investigate the effectiveness of various similarity metrics in DL for identifying peers for model merging, conducting an empirical analysis across multiple datasets with distribution shifts. Our research provides insights into the performance of these metrics, examining their role in facilitating effective collaboration. By exploring the strengths and limitations of these metrics, we contribute to the development of robust DL methods.
{"title":"On the effects of similarity metrics in decentralized deep learning under distributional shift","authors":"Edvin Listo Zec, Tom Hagander, Eric Ihre-Thomason, Sarunas Girdzijauskas","doi":"arxiv-2409.10720","DOIUrl":"https://doi.org/arxiv-2409.10720","url":null,"abstract":"Decentralized Learning (DL) enables privacy-preserving collaboration among\u0000organizations or users to enhance the performance of local deep learning\u0000models. However, model aggregation becomes challenging when client data is\u0000heterogeneous, and identifying compatible collaborators without direct data\u0000exchange remains a pressing issue. In this paper, we investigate the\u0000effectiveness of various similarity metrics in DL for identifying peers for\u0000model merging, conducting an empirical analysis across multiple datasets with\u0000distribution shifts. Our research provides insights into the performance of\u0000these metrics, examining their role in facilitating effective collaboration. By\u0000exploring the strengths and limitations of these metrics, we contribute to the\u0000development of robust DL methods.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large Language Models (LLMs) have been widely adopted to process long-context tasks. However, the large memory overhead of the key-value (KV) cache poses significant challenges in long-context scenarios. Existing training-free KV cache compression methods typically focus on quantization and token pruning, which have compression limits, and excessive sparsity can lead to severe performance degradation. Other methods design new architectures with less KV overhead but require significant training overhead. To address the above two drawbacks, we further explore the redundancy in the channel dimension and apply an architecture-level design with minor training costs. Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression potential along the channel dimension. Based on this observation, we propose using low-rank decomposition for key and value layers and storing the low-dimension features. (2) To preserve model performance, we introduce a bi-branch KV cache, including a window-based full-precision KV cache and a low-precision compressed KV cache. (3) To reduce the training costs, we minimize the layer-wise reconstruction loss for the compressed KV cache instead of retraining the entire LLMs. Extensive experiments show that CSKV can reduce the memory overhead of the KV cache by 80% while maintaining the model's long-context capability. Moreover, we show that our method can be seamlessly combined with quantization to further reduce the memory overhead, achieving a compression ratio of up to 95%.
{"title":"CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios","authors":"Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao Dai, Yu Wang","doi":"arxiv-2409.10593","DOIUrl":"https://doi.org/arxiv-2409.10593","url":null,"abstract":"Large Language Models (LLMs) have been widely adopted to process long-context\u0000tasks. However, the large memory overhead of the key-value (KV) cache poses\u0000significant challenges in long-context scenarios. Existing training-free KV\u0000cache compression methods typically focus on quantization and token pruning,\u0000which have compression limits, and excessive sparsity can lead to severe\u0000performance degradation. Other methods design new architectures with less KV\u0000overhead but require significant training overhead. To address the above two\u0000drawbacks, we further explore the redundancy in the channel dimension and apply\u0000an architecture-level design with minor training costs. Therefore, we introduce\u0000CSKV, a training-efficient Channel Shrinking technique for KV cache\u0000compression: (1) We first analyze the singular value distribution of the KV\u0000cache, revealing significant redundancy and compression potential along the\u0000channel dimension. Based on this observation, we propose using low-rank\u0000decomposition for key and value layers and storing the low-dimension features.\u0000(2) To preserve model performance, we introduce a bi-branch KV cache, including\u0000a window-based full-precision KV cache and a low-precision compressed KV cache.\u0000(3) To reduce the training costs, we minimize the layer-wise reconstruction\u0000loss for the compressed KV cache instead of retraining the entire LLMs.\u0000Extensive experiments show that CSKV can reduce the memory overhead of the KV\u0000cache by 80% while maintaining the model's long-context capability. Moreover,\u0000we show that our method can be seamlessly combined with quantization to further\u0000reduce the memory overhead, achieving a compression ratio of up to 95%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justified by the assumption that the data themselves are inherently discrete in the latent space (like pixel values). In this paper, we propose an alternative representation of the latent space by relaxing the structural assumption than the VQ formulation. Specifically, we assume that the latent space can be approximated by a union of subspaces model corresponding to a dictionary-based representation under a sparsity constraint. The dictionary is learned/updated during the training process. We apply this approach to look at two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs with Generative Adversarial Networks (DL-GANs). We show empirically that our more latent space is more expressive and has leads to better representations than the VQ approach in terms of reconstruction quality at the expense of a small computational overhead for the latent space computation. Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space. We confirm this hypothesis by showing that our sparse representations also address the codebook collapse issue as found common in VQ-family models.
{"title":"LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling","authors":"Xin Li, Anand Sarwate","doi":"arxiv-2409.11184","DOIUrl":"https://doi.org/arxiv-2409.11184","url":null,"abstract":"Learning compact and meaningful latent space representations has been shown\u0000to be very useful in generative modeling tasks for visual data. One particular\u0000example is applying Vector Quantization (VQ) in variational autoencoders\u0000(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance\u0000in many modern generative modeling applications. Quantizing the latent space\u0000has been justified by the assumption that the data themselves are inherently\u0000discrete in the latent space (like pixel values). In this paper, we propose an\u0000alternative representation of the latent space by relaxing the structural\u0000assumption than the VQ formulation. Specifically, we assume that the latent\u0000space can be approximated by a union of subspaces model corresponding to a\u0000dictionary-based representation under a sparsity constraint. The dictionary is\u0000learned/updated during the training process. We apply this approach to look at\u0000two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs\u0000with Generative Adversarial Networks (DL-GANs). We show empirically that our\u0000more latent space is more expressive and has leads to better representations\u0000than the VQ approach in terms of reconstruction quality at the expense of a\u0000small computational overhead for the latent space computation. Our results thus\u0000suggest that the true benefit of the VQ approach might not be from\u0000discretization of the latent space, but rather the lossy compression of the\u0000latent space. We confirm this hypothesis by showing that our sparse\u0000representations also address the codebook collapse issue as found common in\u0000VQ-family models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel Pfeifer, Sudip Vhaduri, Mark Wilson, Julius Keller
While researchers have been trying to understand the stress and fatigue among pilots, especially pilot trainees, and to develop stress/fatigue models to automate the process of detecting stress/fatigue, they often do not consider biases such as sex in those models. However, in a critical profession like aviation, where the demographic distribution is disproportionately skewed to one sex, it is urgent to mitigate biases for fair and safe model predictions. In this work, we investigate the perceived stress/fatigue of 69 college students, including 40 pilot trainees with around 63% male. We construct models with decision trees first without bias mitigation and then with bias mitigation using a threshold optimizer with demographic parity and equalized odds constraints 30 times with random instances. Using bias mitigation, we achieve improvements of 88.31% (demographic parity difference) and 54.26% (equalized odds difference), which are also found to be statistically significant.
{"title":"Toward Mitigating Sex Bias in Pilot Trainees' Stress and Fatigue Modeling","authors":"Rachel Pfeifer, Sudip Vhaduri, Mark Wilson, Julius Keller","doi":"arxiv-2409.10676","DOIUrl":"https://doi.org/arxiv-2409.10676","url":null,"abstract":"While researchers have been trying to understand the stress and fatigue among\u0000pilots, especially pilot trainees, and to develop stress/fatigue models to\u0000automate the process of detecting stress/fatigue, they often do not consider\u0000biases such as sex in those models. However, in a critical profession like\u0000aviation, where the demographic distribution is disproportionately skewed to\u0000one sex, it is urgent to mitigate biases for fair and safe model predictions.\u0000In this work, we investigate the perceived stress/fatigue of 69 college\u0000students, including 40 pilot trainees with around 63% male. We construct models\u0000with decision trees first without bias mitigation and then with bias mitigation\u0000using a threshold optimizer with demographic parity and equalized odds\u0000constraints 30 times with random instances. Using bias mitigation, we achieve\u0000improvements of 88.31% (demographic parity difference) and 54.26% (equalized\u0000odds difference), which are also found to be statistically significant.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aron Distelzweig, Eitan Kosman, Andreas Look, Faris Janjoš, Denesh K. Manivannan, Abhinav Valada
Forecasting the future trajectories of surrounding agents is crucial for autonomous vehicles to ensure safe, efficient, and comfortable route planning. While model ensembling has improved prediction accuracy in various fields, its application in trajectory prediction is limited due to the multi-modal nature of predictions. In this paper, we propose a novel sampling method applicable to trajectory prediction based on the predictions of multiple models. We first show that conventional sampling based on predicted probabilities can degrade performance due to missing alignment between models. To address this problem, we introduce a new method that generates optimal trajectories from a set of neural networks, framing it as a risk minimization problem with a variable loss function. By using state-of-the-art models as base learners, our approach constructs diverse and effective ensembles for optimal trajectory sampling. Extensive experiments on the nuScenes prediction dataset demonstrate that our method surpasses current state-of-the-art techniques, achieving top ranks on the leaderboard. We also provide a comprehensive empirical study on ensembling strategies, offering insights into their effectiveness. Our findings highlight the potential of advanced ensembling techniques in trajectory prediction, significantly improving predictive performance and paving the way for more reliable predicted trajectories.
{"title":"Motion Forecasting via Model-Based Risk Minimization","authors":"Aron Distelzweig, Eitan Kosman, Andreas Look, Faris Janjoš, Denesh K. Manivannan, Abhinav Valada","doi":"arxiv-2409.10585","DOIUrl":"https://doi.org/arxiv-2409.10585","url":null,"abstract":"Forecasting the future trajectories of surrounding agents is crucial for\u0000autonomous vehicles to ensure safe, efficient, and comfortable route planning.\u0000While model ensembling has improved prediction accuracy in various fields, its\u0000application in trajectory prediction is limited due to the multi-modal nature\u0000of predictions. In this paper, we propose a novel sampling method applicable to\u0000trajectory prediction based on the predictions of multiple models. We first\u0000show that conventional sampling based on predicted probabilities can degrade\u0000performance due to missing alignment between models. To address this problem,\u0000we introduce a new method that generates optimal trajectories from a set of\u0000neural networks, framing it as a risk minimization problem with a variable loss\u0000function. By using state-of-the-art models as base learners, our approach\u0000constructs diverse and effective ensembles for optimal trajectory sampling.\u0000Extensive experiments on the nuScenes prediction dataset demonstrate that our\u0000method surpasses current state-of-the-art techniques, achieving top ranks on\u0000the leaderboard. We also provide a comprehensive empirical study on ensembling\u0000strategies, offering insights into their effectiveness. Our findings highlight\u0000the potential of advanced ensembling techniques in trajectory prediction,\u0000significantly improving predictive performance and paving the way for more\u0000reliable predicted trajectories.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang
The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. There has been growing interest in using online Reinforcement Learning (RL) for JSSP. While online RL can quickly find acceptable solutions, especially for larger problems, it produces lower-quality results than traditional methods like Constraint Programming (CP). A significant downside of online RL is that it cannot learn from existing data, such as solutions generated from CP, requiring them to train from scratch, leading to sample inefficiency and making them unable to learn from more optimal examples. We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), a novel approach for JSSP that addresses these limitations. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable action spaces, introduces a new entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing. Our experiments show that Offline-LD outperforms online RL on both generated and benchmark instances. By introducing noise into the dataset, we achieve similar or better results than those obtained from the expert dataset, indicating that a more diverse training set is preferable because it contains counterfactual information.
{"title":"Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling","authors":"Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang","doi":"arxiv-2409.10589","DOIUrl":"https://doi.org/arxiv-2409.10589","url":null,"abstract":"The Job Shop Scheduling Problem (JSSP) is a complex combinatorial\u0000optimization problem. There has been growing interest in using online\u0000Reinforcement Learning (RL) for JSSP. While online RL can quickly find\u0000acceptable solutions, especially for larger problems, it produces lower-quality\u0000results than traditional methods like Constraint Programming (CP). A\u0000significant downside of online RL is that it cannot learn from existing data,\u0000such as solutions generated from CP, requiring them to train from scratch,\u0000leading to sample inefficiency and making them unable to learn from more\u0000optimal examples. We introduce Offline Reinforcement Learning for Learning to\u0000Dispatch (Offline-LD), a novel approach for JSSP that addresses these\u0000limitations. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and\u0000discrete mSAC) for maskable action spaces, introduces a new entropy bonus\u0000modification for discrete SAC, and exploits reward normalization through\u0000preprocessing. Our experiments show that Offline-LD outperforms online RL on\u0000both generated and benchmark instances. By introducing noise into the dataset,\u0000we achieve similar or better results than those obtained from the expert\u0000dataset, indicating that a more diverse training set is preferable because it\u0000contains counterfactual information.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Smart Grid (SG) is a critical energy infrastructure that collects real-time electricity usage data to forecast future energy demands using information and communication technologies (ICT). Due to growing concerns about data security and privacy in SGs, federated learning (FL) has emerged as a promising training framework. FL offers a balance between privacy, efficiency, and accuracy in SGs by enabling collaborative model training without sharing private data from IoT devices. In this survey, we thoroughly review recent advancements in designing FL-based SG systems across three stages: generation, transmission and distribution, and consumption. Additionally, we explore potential vulnerabilities that may arise when implementing FL in these stages. Finally, we discuss the gap between state-of-the-art FL research and its practical applications in SGs and propose future research directions. These focus on potential attack and defense strategies for FL-based SG systems and the need to build a robust FL-based SG infrastructure. Unlike traditional surveys that address security issues in centralized machine learning methods for SG systems, this survey specifically examines the applications and security concerns in FL-based SG systems for the first time. Our aim is to inspire further research into applications and improvements in the robustness of FL-based SG systems.
{"title":"Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities","authors":"Zikai Zhang, Suman Rath, Jiaohao Xu, Tingsong Xiao","doi":"arxiv-2409.10764","DOIUrl":"https://doi.org/arxiv-2409.10764","url":null,"abstract":"The Smart Grid (SG) is a critical energy infrastructure that collects\u0000real-time electricity usage data to forecast future energy demands using\u0000information and communication technologies (ICT). Due to growing concerns about\u0000data security and privacy in SGs, federated learning (FL) has emerged as a\u0000promising training framework. FL offers a balance between privacy, efficiency,\u0000and accuracy in SGs by enabling collaborative model training without sharing\u0000private data from IoT devices. In this survey, we thoroughly review recent\u0000advancements in designing FL-based SG systems across three stages: generation,\u0000transmission and distribution, and consumption. Additionally, we explore\u0000potential vulnerabilities that may arise when implementing FL in these stages.\u0000Finally, we discuss the gap between state-of-the-art FL research and its\u0000practical applications in SGs and propose future research directions. These\u0000focus on potential attack and defense strategies for FL-based SG systems and\u0000the need to build a robust FL-based SG infrastructure. Unlike traditional\u0000surveys that address security issues in centralized machine learning methods\u0000for SG systems, this survey specifically examines the applications and security\u0000concerns in FL-based SG systems for the first time. Our aim is to inspire\u0000further research into applications and improvements in the robustness of\u0000FL-based SG systems.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}