首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Machine learning approaches to traffic accident severity prediction: Addressing class imbalance 交通事故严重程度预测的机器学习方法:解决类别不平衡
IF 4.9 Pub Date : 2025-11-07 DOI: 10.1016/j.mlwa.2025.100792
Mohammad Amin Amiri , Saeid Afshari , Ali Soltani
Road traffic injuries continue to pose a significant public health challenge in Australia, with pedestrians representing one of the most vulnerable road user groups. Accurate prediction of injury severity, particularly fatal outcomes, is essential for improving road safety interventions and resource allocation. This study applies advanced machine learning techniques to predict pedestrian crash severity using national hospitalization and mortality data collected from 2011 to 2021. The analysis focuses on addressing class imbalance, a common issue in injury data by evaluating the impact of several data balancing methods, including SMOTE, ADASYN, Random Oversampling (ROS), and Threshold Moving. We implement and compare four supervised learning algorithms: Logistic Regression, Support Vector Machine (SVM), Decision Tree, and XGBoost. Model performance is assessed using F1-score and macro-accuracy, with a focus on the minority (fatality) class. Results show that XGBoost combined with Threshold Moving achieves the highest performance, yielding an F1-score of 72% for fatality classification and a macro-accuracy of 84%. Additionally, feature importance analysis using SHAP values reveals age, gender, road user type, and crash location as key predictors of injury severity. The study highlights the critical role of data balancing strategies in enhancing predictive accuracy for rare but high-impact outcomes. These findings provide actionable insights for transport authorities and policymakers seeking to develop data-driven, targeted safety measures to protect pedestrians and reduce the severity of crash outcomes.
道路交通伤害继续对澳大利亚的公共卫生构成重大挑战,行人是最脆弱的道路使用者群体之一。准确预测伤害严重程度,特别是致命后果,对于改进道路安全干预措施和资源分配至关重要。本研究采用先进的机器学习技术,利用2011年至2021年收集的全国住院和死亡率数据预测行人碰撞严重程度。分析的重点是通过评估几种数据平衡方法的影响,包括SMOTE、ADASYN、随机过采样(ROS)和阈值移动,来解决损伤数据中的一个常见问题——类别失衡。我们实现并比较了四种监督学习算法:逻辑回归、支持向量机(SVM)、决策树和XGBoost。模型性能使用f1分数和宏观精度进行评估,重点关注少数(死亡)类别。结果表明,XGBoost结合Threshold Moving实现了最高的性能,在死亡率分类方面的f1得分为72%,宏观精度为84%。此外,使用SHAP值的特征重要性分析显示,年龄、性别、道路使用者类型和碰撞位置是损伤严重程度的关键预测因素。该研究强调了数据平衡策略在提高罕见但高影响结果的预测准确性方面的关键作用。这些发现为交通管理部门和政策制定者提供了可行的见解,帮助他们制定数据驱动的、有针对性的安全措施,以保护行人,降低碰撞后果的严重程度。
{"title":"Machine learning approaches to traffic accident severity prediction: Addressing class imbalance","authors":"Mohammad Amin Amiri ,&nbsp;Saeid Afshari ,&nbsp;Ali Soltani","doi":"10.1016/j.mlwa.2025.100792","DOIUrl":"10.1016/j.mlwa.2025.100792","url":null,"abstract":"<div><div>Road traffic injuries continue to pose a significant public health challenge in Australia, with pedestrians representing one of the most vulnerable road user groups. Accurate prediction of injury severity, particularly fatal outcomes, is essential for improving road safety interventions and resource allocation. This study applies advanced machine learning techniques to predict pedestrian crash severity using national hospitalization and mortality data collected from 2011 to 2021. The analysis focuses on addressing class imbalance, a common issue in injury data by evaluating the impact of several data balancing methods, including SMOTE, ADASYN, Random Oversampling (ROS), and Threshold Moving. We implement and compare four supervised learning algorithms: Logistic Regression, Support Vector Machine (SVM), Decision Tree, and XGBoost. Model performance is assessed using F1-score and macro-accuracy, with a focus on the minority (fatality) class. Results show that XGBoost combined with Threshold Moving achieves the highest performance, yielding an F1-score of 72% for fatality classification and a macro-accuracy of 84%. Additionally, feature importance analysis using SHAP values reveals age, gender, road user type, and crash location as key predictors of injury severity. The study highlights the critical role of data balancing strategies in enhancing predictive accuracy for rare but high-impact outcomes. These findings provide actionable insights for transport authorities and policymakers seeking to develop data-driven, targeted safety measures to protect pedestrians and reduce the severity of crash outcomes.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100792"},"PeriodicalIF":4.9,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based 3D reconstruction of dentate nuclei in Friedreich’s ataxia from T2*weighted MR images 基于深度学习的基于T2*加权MR图像的friedrich共济失调齿状核三维重建
IF 4.9 Pub Date : 2025-11-07 DOI: 10.1016/j.mlwa.2025.100790
Trushal Sardhara , Ravi Dadsena , Roland C. Aydin , Ralf-Dieter Hilgers , Leon Horn , Jörg B. Schulz , Kathrin Reetz , Sandro Romanzetti , Imis Dogan
Dentate nucleus (DN) degeneration is a key neuropathological feature in Friedreich’s ataxia (FRDA), and its accurate quantification is critical for understanding disease progression. However, its visualization and volumetry require iron-sensitive imaging techniques and time-consuming segmentation procedures, posing challenges for conventional ML approaches due to small datasets typical of rare diseases. We present a transfer learning–based machine learning pipeline for automated DN segmentation that directly uses standard T2*-weighted Magnetic Resonance Imaging (MRI), which highlights the DN without additional processing, and is designed to perform robustly with limited annotated data. Using 38 manually labeled subjects (18 FRDA, 20 controls), the model was validated via five-fold cross-validation and an independent hold-out test set, achieving Dice scores of 0.81–0.87 and outperforming classical atlas-based methods. Pretraining improved performance by ∼10% in patients and >5% in controls. Applied to 181 longitudinal scans from 33 FRDA patients and 33 controls, the model revealed significantly reduced DN volumes in FRDA, with reductions correlating with disease duration and clinical severity over time. Our approach provides a scalable and reproducible segmentation framework, requiring minimal annotated data and no preprocessing, while demonstrating robust performance across cross-validation and independent testing. Additionally, it enables the first longitudinal volumetric analysis of DN in FRDA using standard T2*-weighted MRI, demonstrating its practical utility for monitoring neurodegenerative changes. Overall, this work illustrates how transfer learning can overcome data scarcity in rare diseases and provides a robust methodology for automated MRI segmentation in both research and clinical applications.
齿状核(DN)变性是弗里德里希共济失调(FRDA)的一个重要神经病理特征,其准确定量对了解疾病进展至关重要。然而,其可视化和体积测量需要铁敏感成像技术和耗时的分割程序,由于罕见疾病典型的小数据集,对传统的ML方法提出了挑战。我们提出了一种基于迁移学习的机器学习管道,用于自动分割DN,该管道直接使用标准的T2*加权磁共振成像(MRI),它突出显示DN而无需额外处理,并且设计用于在有限的注释数据下稳健地执行。使用38名人工标记的受试者(18名FRDA, 20名对照),通过五重交叉验证和独立的保留测试集对模型进行验证,获得了0.81-0.87的Dice分数,优于经典的基于地图集的方法。预训练使患者和对照组的表现分别提高了10%和5%。应用于33名FRDA患者和33名对照组的181次纵向扫描,该模型显示FRDA中DN体积显著减少,随着时间的推移,这种减少与疾病持续时间和临床严重程度相关。我们的方法提供了一个可扩展和可重复的分割框架,需要最少的注释数据,不需要预处理,同时在交叉验证和独立测试中展示了强大的性能。此外,它可以使用标准T2*加权MRI首次对FRDA中的DN进行纵向体积分析,证明其在监测神经退行性变化方面的实用价值。总的来说,这项工作说明了迁移学习如何克服罕见疾病的数据稀缺性,并为研究和临床应用中的自动MRI分割提供了一种强大的方法。
{"title":"Deep learning-based 3D reconstruction of dentate nuclei in Friedreich’s ataxia from T2*weighted MR images","authors":"Trushal Sardhara ,&nbsp;Ravi Dadsena ,&nbsp;Roland C. Aydin ,&nbsp;Ralf-Dieter Hilgers ,&nbsp;Leon Horn ,&nbsp;Jörg B. Schulz ,&nbsp;Kathrin Reetz ,&nbsp;Sandro Romanzetti ,&nbsp;Imis Dogan","doi":"10.1016/j.mlwa.2025.100790","DOIUrl":"10.1016/j.mlwa.2025.100790","url":null,"abstract":"<div><div>Dentate nucleus (DN) degeneration is a key neuropathological feature in Friedreich’s ataxia (FRDA), and its accurate quantification is critical for understanding disease progression. However, its visualization and volumetry require iron-sensitive imaging techniques and time-consuming segmentation procedures, posing challenges for conventional ML approaches due to small datasets typical of rare diseases. We present a transfer learning–based machine learning pipeline for automated DN segmentation that directly uses standard T2*-weighted Magnetic Resonance Imaging (MRI), which highlights the DN without additional processing, and is designed to perform robustly with limited annotated data. Using 38 manually labeled subjects (18 FRDA, 20 controls), the model was validated via five-fold cross-validation and an independent hold-out test set, achieving Dice scores of 0.81–0.87 and outperforming classical atlas-based methods. Pretraining improved performance by ∼10% in patients and &gt;5% in controls. Applied to 181 longitudinal scans from 33 FRDA patients and 33 controls, the model revealed significantly reduced DN volumes in FRDA, with reductions correlating with disease duration and clinical severity over time. Our approach provides a scalable and reproducible segmentation framework, requiring minimal annotated data and no preprocessing, while demonstrating robust performance across cross-validation and independent testing. Additionally, it enables the first longitudinal volumetric analysis of DN in FRDA using standard T2*-weighted MRI, demonstrating its practical utility for monitoring neurodegenerative changes. Overall, this work illustrates how transfer learning can overcome data scarcity in rare diseases and provides a robust methodology for automated MRI segmentation in both research and clinical applications.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100790"},"PeriodicalIF":4.9,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-driven predictive modeling of temperature-dependent mechanical properties in austenitic stainless steels 奥氏体不锈钢温度相关力学性能的机器学习驱动预测建模
IF 4.9 Pub Date : 2025-11-06 DOI: 10.1016/j.mlwa.2025.100786
Movaffaq Kateb , Sahar Safarian
This work demonstrates that modern tree‑based models can effectively model complex, temperature-dependent mechanical responses, including highly nonlinear and even non-monotonic trends, in austenitic stainless steel and highlights limitations of composition‑only empirical models. To ensure robust model evaluation, we employed multiple validation strategies including repeated random train and test partitions and leave-one-out cross-validation. While one might assume that steel grade is fully captured by its composition, local assessments within narrower compositional ranges reveal different feature importance rankings than those observed in the full dataset. Grade-specific (AISI 304, 316, 321 and 347) feature importance analysis offered deeper insights into local alloy behavior and demonstrated the advantage of disaggregated modeling in avoiding misleading conclusions. Clustering and SHAP analyses further revealed a temperature-sensitive role of nitrogen, which strengthens the alloy through interstitial and fine precipitate mechanisms at lower temperatures but loses effectiveness at elevated temperatures due to precipitate coarsening. This highlights how data-driven methods can uncover metallurgically consistent, temperature-dependent strengthening behaviors not captured by simpler models. Our results confirm that temperature governs the mechanical performance of austenitic stainless steels, with other features contributing marginally, particularly for UTS. Additionally, the model achieved a notably high score for elongation, highlighting the critical role of testing temperature in addressing the long-standing challenge of poor elongation predictions in composition-only or composition-processing models. This suggests that low accuracy in previous studies is more likely due to dataset limitations rather than shortcomings of tree-based models.
这项工作表明,现代基于树的模型可以有效地模拟复杂的、温度相关的机械响应,包括奥氏体不锈钢的高度非线性甚至非单调趋势,并突出了仅成分经验模型的局限性。为了确保模型评估的鲁棒性,我们采用了多种验证策略,包括重复随机训练和测试分区以及留一交叉验证。虽然人们可能会认为钢铁等级完全由其成分捕获,但在较窄的成分范围内的局部评估显示出与完整数据集中观察到的特征重要性排名不同。特定等级(AISI 304、316、321和347)特征重要性分析提供了对局部合金行为的更深入了解,并展示了分类建模在避免误导性结论方面的优势。聚类分析和SHAP分析进一步揭示了氮的温度敏感作用,在较低温度下,氮通过间隙和细晶析出机制强化合金,但在高温下,由于析出物变粗而失去效果。这突出了数据驱动的方法如何能够揭示冶金一致的、依赖于温度的强化行为,这些行为没有被更简单的模型捕获。我们的研究结果证实,温度控制着奥氏体不锈钢的机械性能,其他特性的影响很小,特别是对于UTS。此外,该模型在伸长率方面取得了显著的高分,突出了测试温度在解决仅成分或成分加工模型中伸长率预测不佳的长期挑战中的关键作用。这表明先前研究的低准确性更可能是由于数据集的限制,而不是基于树的模型的缺点。
{"title":"Machine learning-driven predictive modeling of temperature-dependent mechanical properties in austenitic stainless steels","authors":"Movaffaq Kateb ,&nbsp;Sahar Safarian","doi":"10.1016/j.mlwa.2025.100786","DOIUrl":"10.1016/j.mlwa.2025.100786","url":null,"abstract":"<div><div>This work demonstrates that modern tree‑based models can effectively model complex, temperature-dependent mechanical responses, including highly nonlinear and even non-monotonic trends, in austenitic stainless steel and highlights limitations of composition‑only empirical models. To ensure robust model evaluation, we employed multiple validation strategies including repeated random train and test partitions and leave-one-out cross-validation. While one might assume that steel grade is fully captured by its composition, local assessments within narrower compositional ranges reveal different feature importance rankings than those observed in the full dataset. Grade-specific (AISI 304, 316, 321 and 347) feature importance analysis offered deeper insights into local alloy behavior and demonstrated the advantage of disaggregated modeling in avoiding misleading conclusions. Clustering and SHAP analyses further revealed a temperature-sensitive role of nitrogen, which strengthens the alloy through interstitial and fine precipitate mechanisms at lower temperatures but loses effectiveness at elevated temperatures due to precipitate coarsening. This highlights how data-driven methods can uncover metallurgically consistent, temperature-dependent strengthening behaviors not captured by simpler models. Our results confirm that temperature governs the mechanical performance of austenitic stainless steels, with other features contributing marginally, particularly for UTS. Additionally, the model achieved a notably high score for elongation, highlighting the critical role of testing temperature in addressing the long-standing challenge of poor elongation predictions in composition-only or composition-processing models. This suggests that low accuracy in previous studies is more likely due to dataset limitations rather than shortcomings of tree-based models.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100786"},"PeriodicalIF":4.9,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Production scheduling optimisation using mixed integer programming with machine learning dilution prediction capabilities for underground open stoping operations 利用混合整数规划和机器学习贫化预测能力进行地下露天采场生产调度优化
IF 4.9 Pub Date : 2025-11-05 DOI: 10.1016/j.mlwa.2025.100776
Prosper Chimunhu , Erkan Topal , Mohammad Waqar Ali Asad , Roohollah Shirani Faradonbeh , Ajak Duany Ajak
For decades, Mixed Integer Programming (MIP) has been successfully utilised to optimise production schedules in underground mining, with increasingly notable results reported. However, recurrent inconsistencies between schedule forecasts and actual production due to imprecise input assumptions, such as mining dilution factors, subtly impair the robustness of optimal solutions, with detrimental hierarchical effects on the business’s cashflow projections and profitability. To address this, this study leverages emerging applications of Machine Learning (ML) and adjacent technologies that are revolutionising intelligent prediction of dilution in underground mining operations. The study proposes a synergistic nexus between MIP and ML models using ML-predicted dilution on a per-stope granularity instead of the traditional single dilution factor to improve the schedule’s forecasting accuracy. A sample of 61 stopes from an underground open-stoping operation was used to create and optimise schedules based on empirically determined and ML-predicted dilution factors. Study findings revealed a 3.1% higher net present value (NPV) for MIP-optimised schedules over manual schedules for the same dilution factor (empirical). Further, it was also noted that the ML-predicted dilution at 74% accuracy on a per-stope granularity enhances the MIP-optimised schedules’ tonnage forecast precision by at least 4 % and the NPV by at least 2 % compared to MIP-optimised schedules using the single dilution factor over a 16-month period. Additionally, results revealed that MIP schedules augmented with ML-predicted dilution demonstrated greater flexibility in navigating schedule constraints, leading to better schedule responsiveness and granularity on forecasts. Thus, the study improves optimal solutions’ robustness, reliability and production scheduling efficacy.
几十年来,混合整数规划(MIP)已经成功地用于优化地下开采的生产计划,并取得了越来越显著的成果。然而,由于不精确的输入假设(如采矿稀释因素),进度预测与实际产量之间的反复不一致会微妙地损害最优解决方案的鲁棒性,对企业的现金流预测和盈利能力产生有害的分层影响。为了解决这个问题,本研究利用了机器学习(ML)和相关技术的新兴应用,这些技术正在彻底改变地下采矿作业中稀释的智能预测。该研究提出了MIP和ML模型之间的协同关系,使用ML预测的每个采场粒度的稀释系数,而不是传统的单一稀释系数,以提高进度预测的准确性。以地下空场开采的61个采场为样本,根据经验确定的稀释系数和ml预测的稀释系数来创建和优化计划。研究结果显示,对于相同的稀释系数(经验),mip优化方案的净现值(NPV)比手动方案高3.1%。此外,在16个月的时间内,与使用单一稀释系数的mip优化计划相比,ml预测的每个采场粒度稀释精度为74%,mip优化计划的吨位预测精度提高了至少4%,NPV提高了至少2%。此外,结果显示,MIP计划与ml预测的稀释度增强,在导航计划约束方面表现出更大的灵活性,从而导致更好的计划响应性和预测粒度。从而提高了最优解的鲁棒性、可靠性和生产调度效率。
{"title":"Production scheduling optimisation using mixed integer programming with machine learning dilution prediction capabilities for underground open stoping operations","authors":"Prosper Chimunhu ,&nbsp;Erkan Topal ,&nbsp;Mohammad Waqar Ali Asad ,&nbsp;Roohollah Shirani Faradonbeh ,&nbsp;Ajak Duany Ajak","doi":"10.1016/j.mlwa.2025.100776","DOIUrl":"10.1016/j.mlwa.2025.100776","url":null,"abstract":"<div><div>For decades, Mixed Integer Programming (MIP) has been successfully utilised to optimise production schedules in underground mining, with increasingly notable results reported. However, recurrent inconsistencies between schedule forecasts and actual production due to imprecise input assumptions, such as mining dilution factors, subtly impair the robustness of optimal solutions, with detrimental hierarchical effects on the business’s cashflow projections and profitability. To address this, this study leverages emerging applications of Machine Learning (ML) and adjacent technologies that are revolutionising intelligent prediction of dilution in underground mining operations. The study proposes a synergistic nexus between MIP and ML models using ML-predicted dilution on a per-stope granularity instead of the traditional single dilution factor to improve the schedule’s forecasting accuracy. A sample of 61 stopes from an underground open-stoping operation was used to create and optimise schedules based on empirically determined and ML-predicted dilution factors. Study findings revealed a 3.1% higher net present value (NPV) for MIP-optimised schedules over manual schedules for the same dilution factor (empirical). Further, it was also noted that the ML-predicted dilution at 74% accuracy on a per-stope granularity enhances the MIP-optimised schedules’ tonnage forecast precision by at least 4 % and the NPV by at least 2 % compared to MIP-optimised schedules using the single dilution factor over a 16-month period. Additionally, results revealed that MIP schedules augmented with ML-predicted dilution demonstrated greater flexibility in navigating schedule constraints, leading to better schedule responsiveness and granularity on forecasts. Thus, the study improves optimal solutions’ robustness, reliability and production scheduling efficacy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100776"},"PeriodicalIF":4.9,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond single-run metrics with CP-fuse: A rigorous multi-cohort evaluation of clinico-pathological fusion for improved survival prediction in TCGA 超越单组指标的CP-fuse:一项严格的多队列评估临床病理融合改善TCGA的生存预测
IF 4.9 Pub Date : 2025-11-04 DOI: 10.1016/j.mlwa.2025.100789
Juan Duran , Yujing Zou , Martin Vallières , Shirin A. Enger
Accurate prediction of progression-free survival (PFS) is critical for precision oncology. However, most existing multimodal survival studies rely on single fusion strategies, one-off cross-validation runs, and focus solely on discrimination metrics, leaving gaps in systematic evaluation and calibration. We evaluated multimodal fusion approaches combining histopathology whole-slide images (via Hierarchical Image Pyramid Transformer) and clinical variables (via Feature Tokenizer-Transformer) across five TCGA cohorts: bladder cancer (BLCA), uterine corpus endometrial carcinoma (UCEC), lung adenocarcinoma (LUAD), breast cancer (BRCA), and head and neck squamous cell carcinoma (HNSC) (N=2,984). Three intermediate (marginal, cross-attention, Variational Autoencoder or VAE) and two late fusion strategies (trainable-weight, meta-learning) were trained end-to-end with DeepSurv. Our 100-repetition 10-fold cross-validation (CV) framework mitigates the variance overlooked in single-run CV evaluations. VAE fusion achieved superior PFS prediction (Concordance-index) in BLCA (0.739±0.019), UCEC (0.770±0.021), LUAD (0.683±0.018), and BRCA (0.760±0.021), while meta-learning was best for HNSC (0.686±0.022). However, Integrated Brier Score values (0.066–0.142) revealed calibration variability. Our findings highlight the importance of multimodal fusion, combined discrimination and calibration metrics, and rigorous validation for clinically meaningful survival modeling.
准确预测无进展生存期(PFS)对精准肿瘤学至关重要。然而,大多数现有的多模态生存研究依赖于单一的融合策略,一次性交叉验证运行,并且只关注歧视指标,在系统评估和校准方面留下了空白。我们评估了结合组织病理学全切片图像(通过分层图像金字塔转换器)和临床变量(通过特征标记器转换器)的多模式融合方法,涵盖了五个TCGA队列:膀胱癌(BLCA)、子宫内膜癌(UCEC)、肺腺癌(LUAD)、乳腺癌(BRCA)和头颈部鳞状细胞癌(HNSC) (N=2,984)。使用DeepSurv端到端训练了三个中间(边缘、交叉注意、变分自编码器或VAE)和两个后期融合策略(可训练权重、元学习)。我们的100次重复10倍交叉验证(CV)框架减轻了单次CV评估中被忽视的方差。VAE融合对BLCA(0.739±0.019)、UCEC(0.770±0.021)、LUAD(0.683±0.018)和BRCA(0.760±0.021)的PFS(一致性指数)预测效果较好,而元学习对HNSC(0.686±0.022)的PFS预测效果最好。然而,综合Brier评分值(0.066-0.142)显示了校准的可变性。我们的研究结果强调了多模态融合、联合判别和校准指标以及对临床有意义的生存模型进行严格验证的重要性。
{"title":"Beyond single-run metrics with CP-fuse: A rigorous multi-cohort evaluation of clinico-pathological fusion for improved survival prediction in TCGA","authors":"Juan Duran ,&nbsp;Yujing Zou ,&nbsp;Martin Vallières ,&nbsp;Shirin A. Enger","doi":"10.1016/j.mlwa.2025.100789","DOIUrl":"10.1016/j.mlwa.2025.100789","url":null,"abstract":"<div><div>Accurate prediction of progression-free survival (PFS) is critical for precision oncology. However, most existing multimodal survival studies rely on single fusion strategies, one-off cross-validation runs, and focus solely on discrimination metrics, leaving gaps in systematic evaluation and calibration. We evaluated multimodal fusion approaches combining histopathology whole-slide images (via Hierarchical Image Pyramid Transformer) and clinical variables (via Feature Tokenizer-Transformer) across five TCGA cohorts: bladder cancer (BLCA), uterine corpus endometrial carcinoma (UCEC), lung adenocarcinoma (LUAD), breast cancer (BRCA), and head and neck squamous cell carcinoma (HNSC) (N=2,984). Three intermediate (marginal, cross-attention, Variational Autoencoder or VAE) and two late fusion strategies (trainable-weight, meta-learning) were trained end-to-end with DeepSurv. Our 100-repetition 10-fold cross-validation (CV) framework mitigates the variance overlooked in single-run CV evaluations. VAE fusion achieved superior PFS prediction (Concordance-index) in BLCA (0.739±0.019), UCEC (0.770±0.021), LUAD (0.683±0.018), and BRCA (0.760±0.021), while meta-learning was best for HNSC (0.686±0.022). However, Integrated Brier Score values (0.066–0.142) revealed calibration variability. Our findings highlight the importance of multimodal fusion, combined discrimination and calibration metrics, and rigorous validation for clinically meaningful survival modeling.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100789"},"PeriodicalIF":4.9,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecasting stock market anomalies in emerging markets: An OPTUNA-optimized isolation forest and K-means approach 新兴市场股票市场异常预测:optuna优化隔离森林和k -均值方法
IF 4.9 Pub Date : 2025-11-04 DOI: 10.1016/j.mlwa.2025.100770
Seyed Pendar Toufighi , Amir Mohammad Khani , Arman Rezasoltani , Iman Ghasemian Sahebi , Jan Vang
Forecasting financial anomalies in emerging markets is critical for informed investment and risk management. This study proposes a novel machine learning framework that integrates an OPTUNA-optimized Isolation Forest algorithm with K-Means clustering to detect and classify stock market anomalies in Iran Khodro, one of Iran’s largest automotive firms. Leveraging daily stock data from 2001 to 2022, the model enhances anomaly detection accuracy by tuning hyperparameters through Bayesian optimization, significantly reducing false positives compared to standard implementations. The K-Means clustering algorithm further segments the detected anomalies into meaningful behavioral categories based on price and trading volume dynamics. Results reveal distinct periods of market disruption aligned with major political and economic events, including sanctions, currency volatility, and the COVID-19 pandemic. This hybrid approach demonstrates a robust, efficient, and interpretable method for forecasting abnormal market behavior in high-volatility, low-transparency environments. The framework holds promise for broader application in forecasting stock anomalies across other emerging financial markets.
预测新兴市场的金融异常对于明智的投资和风险管理至关重要。本研究提出了一种新的机器学习框架,该框架集成了optuna优化的隔离森林算法和K-Means聚类,以检测和分类伊朗最大的汽车公司之一伊朗Khodro的股票市场异常。利用2001年至2022年的每日股票数据,该模型通过贝叶斯优化调整超参数,提高了异常检测的准确性,与标准实现相比,显著减少了误报。K-Means聚类算法将检测到的异常进一步细分为基于价格和交易量动态的有意义的行为类别。结果显示,与重大政治和经济事件(包括制裁、货币波动和COVID-19大流行)相关的不同时期的市场中断。这种混合方法展示了一种在高波动性、低透明度环境下预测异常市场行为的稳健、高效和可解释的方法。该框架有望在预测其他新兴金融市场的股票异常方面得到更广泛的应用。
{"title":"Forecasting stock market anomalies in emerging markets: An OPTUNA-optimized isolation forest and K-means approach","authors":"Seyed Pendar Toufighi ,&nbsp;Amir Mohammad Khani ,&nbsp;Arman Rezasoltani ,&nbsp;Iman Ghasemian Sahebi ,&nbsp;Jan Vang","doi":"10.1016/j.mlwa.2025.100770","DOIUrl":"10.1016/j.mlwa.2025.100770","url":null,"abstract":"<div><div>Forecasting financial anomalies in emerging markets is critical for informed investment and risk management. This study proposes a novel machine learning framework that integrates an OPTUNA-optimized Isolation Forest algorithm with K-Means clustering to detect and classify stock market anomalies in Iran Khodro, one of Iran’s largest automotive firms. Leveraging daily stock data from 2001 to 2022, the model enhances anomaly detection accuracy by tuning hyperparameters through Bayesian optimization, significantly reducing false positives compared to standard implementations. The K-Means clustering algorithm further segments the detected anomalies into meaningful behavioral categories based on price and trading volume dynamics. Results reveal distinct periods of market disruption aligned with major political and economic events, including sanctions, currency volatility, and the COVID-19 pandemic. This hybrid approach demonstrates a robust, efficient, and interpretable method for forecasting abnormal market behavior in high-volatility, low-transparency environments. The framework holds promise for broader application in forecasting stock anomalies across other emerging financial markets.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100770"},"PeriodicalIF":4.9,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Condition monitoring for pattern recognition in manufacturing 制造过程中模式识别的状态监测
IF 4.9 Pub Date : 2025-11-04 DOI: 10.1016/j.mlwa.2025.100787
Marco Piangerelli , Vincenzo Nucci , Flavio Corradini , Luca Giulioni , Barbara Re
Condition monitoring techniques stand as essential instruments for evaluating the health and performance of machinery and systems, serving as a foundational element of modern engineering. However, many existing techniques, including advanced approaches, are often tailored to specific domains, limiting their flexibility and adaptability. This paper introduces the COndition moNitoring Detection via cORrelation-based norms (CONDOR), a fully unsupervised, system-agnostic, and multiscale method that leverages matrix norms and correlation matrices derived from time series data recorded by sensors during machine operation. Designed for real-time application, the approach is particularly effective in manufacturing environments characterized by cyclic processes, where consistent inputs yield predictable behaviors. The methodology was validated on both synthetic and real-world datasets, successfully identifying operational patterns that align with common manufacturing system behaviors. Importantly, patterns identified in synthetic data were consistently detected in real-world scenarios, underscoring CONDOR’s robustness and reliability. Comparisons with state-of-the-art algorithms further highlight its superior ability to detect patterns and establish stable clusters, making it a promising tool for condition monitoring in diverse industrial contexts.
状态监测技术是评估机械和系统健康和性能的重要工具,是现代工程的基本要素。然而,许多现有的技术,包括先进的方法,往往是针对特定领域量身定制的,限制了它们的灵活性和适应性。本文介绍了基于相关规范的状态监测检测(CONDOR),这是一种完全无监督的、系统不可知的、多尺度的方法,它利用了从机器运行过程中传感器记录的时间序列数据中得到的矩阵规范和相关矩阵。该方法专为实时应用而设计,在以循环过程为特征的制造环境中特别有效,在这种环境中,一致的输入产生可预测的行为。该方法在合成数据集和真实数据集上进行了验证,成功地确定了与常见制造系统行为相一致的操作模式。重要的是,在合成数据中识别的模式在实际场景中始终被检测到,这强调了CONDOR的鲁棒性和可靠性。与最先进的算法进行比较,进一步突出了其检测模式和建立稳定集群的卓越能力,使其成为各种工业环境中状态监测的有前途的工具。
{"title":"Condition monitoring for pattern recognition in manufacturing","authors":"Marco Piangerelli ,&nbsp;Vincenzo Nucci ,&nbsp;Flavio Corradini ,&nbsp;Luca Giulioni ,&nbsp;Barbara Re","doi":"10.1016/j.mlwa.2025.100787","DOIUrl":"10.1016/j.mlwa.2025.100787","url":null,"abstract":"<div><div>Condition monitoring techniques stand as essential instruments for evaluating the health and performance of machinery and systems, serving as a foundational element of modern engineering. However, many existing techniques, including advanced approaches, are often tailored to specific domains, limiting their flexibility and adaptability. This paper introduces the <em>CO</em>ndition mo<em>N</em>itoring <em>D</em>etection via c<em>OR</em>relation-based norms (CONDOR), a fully unsupervised, system-agnostic, and multiscale method that leverages matrix norms and correlation matrices derived from time series data recorded by sensors during machine operation. Designed for real-time application, the approach is particularly effective in manufacturing environments characterized by cyclic processes, where consistent inputs yield predictable behaviors. The methodology was validated on both synthetic and real-world datasets, successfully identifying operational patterns that align with common manufacturing system behaviors. Importantly, patterns identified in synthetic data were consistently detected in real-world scenarios, underscoring CONDOR’s robustness and reliability. Comparisons with state-of-the-art algorithms further highlight its superior ability to detect patterns and establish stable clusters, making it a promising tool for condition monitoring in diverse industrial contexts.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100787"},"PeriodicalIF":4.9,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LapSDNMF: Label propagation assisted soft-constrained deep non-negative matrix factorisation for semi-supervised multi-view clustering labsdnmf:标签传播辅助软约束深度非负矩阵分解半监督多视图聚类
IF 4.9 Pub Date : 2025-11-03 DOI: 10.1016/j.mlwa.2025.100783
Sohan Dinusha Liyana Gunawardena, Khanh Luong, Thirunavukarasu Balasubramaniam, Richi Nayak
Semi-supervised methods based on non-negative matrix factorisation have emerged as a popular approach for clustering. However, the pressing challenge of capturing complex non-linear relationships within multi-view data is seldom considered in the semi-supervised context.
This study introduces a fundamentally novel framework: Label Propagation Assisted Soft-constrained Deep Non-negative Matrix Factorisation for Semi-supervised Multi-view Clustering (LapSDNMF).
LapSDNMF innovatively integrates deep hierarchical modelling with label propagation and soft constraint to jointly exploit the non-linear representation learning and extract accurate latent features from limited labelled data. By embedding a predictive membership matrix as a soft constraint, it enables similarly labelled samples to be projected into shared regions, better reflecting real-world data structures. The incorporation of graph-based regularisation within the deep architecture facilitates effective label propagation while preserving the manifold structure at each layer. LapSDNMF unifies deep learning and graph-theoretic techniques within a coherent optimisation framework. We also develop a novel, efficient algorithm based on multiplicative update rules to solve the resulting optimisation problem.
LapSDNMF significantly outperforms state-of-the-art multi-view clustering methods across five diverse real-world datasets. Specifically, it achieves improvements in F-score of 10.2%, 7.2%, 8.8%, 1.4%, and 6.1% on the Yale, Reuters-MinMax, Caltech7, 3-Sources, and Caltech20 datasets, respectively, compared with the best-performing baseline method.
基于非负矩阵分解的半监督方法已经成为一种流行的聚类方法。然而,在半监督环境中,捕获多视图数据中复杂非线性关系的紧迫挑战很少被考虑。本研究引入了一个全新的框架:标签传播辅助软约束深度非负矩阵分解半监督多视图聚类(LapSDNMF)。LapSDNMF创新地将深度层次建模与标签传播和软约束相结合,共同利用非线性表示学习,从有限的标记数据中提取准确的潜在特征。通过嵌入预测隶属矩阵作为软约束,它可以将类似标记的样本投影到共享区域,从而更好地反映现实世界的数据结构。在深度架构中结合基于图的正则化有助于有效的标签传播,同时保留每层的流形结构。LapSDNMF在一个连贯的优化框架内统一了深度学习和图论技术。我们还开发了一种基于乘法更新规则的新型高效算法来解决由此产生的优化问题。LapSDNMF在五个不同的现实世界数据集上显著优于最先进的多视图聚类方法。具体来说,与表现最好的基线方法相比,它在耶鲁、路透社- minmax、Caltech7、3-Sources和Caltech20数据集上的f分数分别提高了10.2%、7.2%、8.8%、1.4%和6.1%。
{"title":"LapSDNMF: Label propagation assisted soft-constrained deep non-negative matrix factorisation for semi-supervised multi-view clustering","authors":"Sohan Dinusha Liyana Gunawardena,&nbsp;Khanh Luong,&nbsp;Thirunavukarasu Balasubramaniam,&nbsp;Richi Nayak","doi":"10.1016/j.mlwa.2025.100783","DOIUrl":"10.1016/j.mlwa.2025.100783","url":null,"abstract":"<div><div>Semi-supervised methods based on non-negative matrix factorisation have emerged as a popular approach for clustering. However, the pressing challenge of capturing complex non-linear relationships within multi-view data is seldom considered in the semi-supervised context.</div><div>This study introduces a fundamentally novel framework: Label Propagation Assisted Soft-constrained Deep Non-negative Matrix Factorisation for Semi-supervised Multi-view Clustering (LapSDNMF).</div><div>LapSDNMF innovatively integrates deep hierarchical modelling with label propagation and soft constraint to jointly exploit the non-linear representation learning and extract accurate latent features from limited labelled data. By embedding a predictive membership matrix as a soft constraint, it enables similarly labelled samples to be projected into shared regions, better reflecting real-world data structures. The incorporation of graph-based regularisation within the deep architecture facilitates effective label propagation while preserving the manifold structure at each layer. LapSDNMF unifies deep learning and graph-theoretic techniques within a coherent optimisation framework. We also develop a novel, efficient algorithm based on multiplicative update rules to solve the resulting optimisation problem.</div><div>LapSDNMF significantly outperforms state-of-the-art multi-view clustering methods across five diverse real-world datasets. Specifically, it achieves improvements in F-score of 10.2%, 7.2%, 8.8%, 1.4%, and 6.1% on the Yale, Reuters-MinMax, Caltech7, 3-Sources, and Caltech20 datasets, respectively, compared with the best-performing baseline method.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100783"},"PeriodicalIF":4.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of knowledge distillation for onboard defect detection on an Unmanned Aircraft System for light aircraft general visual inspections 基于知识蒸馏的机载缺陷检测在轻型飞机目视检测系统中的实现
IF 4.9 Pub Date : 2025-11-03 DOI: 10.1016/j.mlwa.2025.100782
Luke Connolly , James Garland , Diarmuid O’Gorman , Edmond F. Tobin
Visual inspections of aircraft are a vital part of routine procedures for maintenance personnel in the aviation industry. However, these inspections take up a considerable amount of time to perform and are susceptible to human error. To mitigate this, utilising image classification for detecting defects is proposed, leveraging transfer learning and knowledge distillation within MATLAB to develop an efficient and deployable model. Transfer learning is applied to a ResNet-50 model, adapting it to classify aircraft defects using a curated dataset. This fine-tuned model is then utilised as a teacher in the knowledge distillation process, where a compact SqueezeNet model (the student) learns from both hard and soft labels to replicate its performance while significantly reducing computational demands. This allows for optimising deep-learning models for deployment on smaller hardware, making the student model suitable for use on an Unmanned Aircraft System (UAS) to filter out images that do not contain a defect, reducing workload for ground personnel. The proposed method offers a solution for improving the efficiency and accuracy of defect detection during a general visual inspection in the aviation industry. Targeted defects here are damaged_skin, missing_or_damaged_rivets, and panel_missing alongside a class denoting no_defect. The knowledge-distilled SqueezeNet model achieves 95.37% validation accuracy and 90.72% inference accuracy, with a 96.9% reduction in model size compared to ResNet-50. The teacher model has a size of 85.77 MB, while the student model is significantly smaller at 2.66 MB, making it ideal for deployment on embedded systems with limited resources.
飞机目视检查是航空工业维修人员例行程序的重要组成部分。然而,这些检查需要花费相当多的时间来执行,并且容易受到人为错误的影响。为了缓解这一点,提出利用图像分类来检测缺陷,利用MATLAB中的迁移学习和知识蒸馏来开发一个高效且可部署的模型。将迁移学习应用于ResNet-50模型,使其适应使用精心设计的数据集对飞机缺陷进行分类。然后,这个微调模型被用作知识蒸馏过程中的老师,在这个过程中,紧凑的SqueezeNet模型(学生)从硬标签和软标签中学习,以复制其性能,同时显着降低计算需求。这允许优化深度学习模型,以便在较小的硬件上部署,使学生模型适合在无人驾驶飞机系统(UAS)上使用,以过滤掉不包含缺陷的图像,减少地面人员的工作量。该方法为提高航空工业中目测缺陷检测的效率和准确性提供了一种解决方案。这里的目标缺陷是damaged_skin, missing_or_damaged_rivets,和panel_missing以及一个表示no_defect的类。经过知识提炼的SqueezeNet模型验证准确率为95.37%,推理准确率为90.72%,与ResNet-50相比,模型大小减少了96.9%。教师模型的大小为85.77 MB,而学生模型的大小要小得多,只有2.66 MB,这使得它非常适合在资源有限的嵌入式系统上部署。
{"title":"Implementation of knowledge distillation for onboard defect detection on an Unmanned Aircraft System for light aircraft general visual inspections","authors":"Luke Connolly ,&nbsp;James Garland ,&nbsp;Diarmuid O’Gorman ,&nbsp;Edmond F. Tobin","doi":"10.1016/j.mlwa.2025.100782","DOIUrl":"10.1016/j.mlwa.2025.100782","url":null,"abstract":"<div><div>Visual inspections of aircraft are a vital part of routine procedures for maintenance personnel in the aviation industry. However, these inspections take up a considerable amount of time to perform and are susceptible to human error. To mitigate this, utilising image classification for detecting defects is proposed, leveraging transfer learning and knowledge distillation within MATLAB to develop an efficient and deployable model. Transfer learning is applied to a ResNet-50 model, adapting it to classify aircraft defects using a curated dataset. This fine-tuned model is then utilised as a teacher in the knowledge distillation process, where a compact SqueezeNet model (the student) learns from both hard and soft labels to replicate its performance while significantly reducing computational demands. This allows for optimising deep-learning models for deployment on smaller hardware, making the student model suitable for use on an Unmanned Aircraft System (UAS) to filter out images that do not contain a defect, reducing workload for ground personnel. The proposed method offers a solution for improving the efficiency and accuracy of defect detection during a general visual inspection in the aviation industry. Targeted defects here are <em>damaged_skin</em>, <em>missing_or_damaged_rivets</em>, and <em>panel_missing</em> alongside a class denoting <em>no_defect</em>. The knowledge-distilled SqueezeNet model achieves 95.37% validation accuracy and 90.72% inference accuracy, with a 96.9% reduction in model size compared to ResNet-50. The teacher model has a size of 85.77 MB, while the student model is significantly smaller at 2.66 MB, making it ideal for deployment on embedded systems with limited resources.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100782"},"PeriodicalIF":4.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task learning for audio scene source counting and analysis 多任务学习音频场景源计数和分析
IF 4.9 Pub Date : 2025-11-03 DOI: 10.1016/j.mlwa.2025.100785
Michael Nigro, Sridhar Krishnan
Audio source counting is a fundamental task of audio scene analysis related to other audio tasks such as speaker diarization and sound event detection. It is also a relatively unexplored audio task that presents a complex challenge. In particular, source counting performance is poor when the source count range is large, limiting its potential applications. This paper presents a novel approach to improve upon audio source counting through multi-task learning. We present a first of its kind empirical study on the hierarchical nature of audio source counting, introducing the coarse source counting task and a hierarchical multi-task learning framework, in order to better understand and investigate the audio source counting task through several case study scenarios. We perform multi-task learning with a ResNet architecture and demonstrate improvements to audio source counting accuracy by up to a 6% increase from the previous best result on the SARdBScene dataset. We also perform multi-task learning of audio source counting and acoustic scene classification as a step forward for robust audio scene analysis. These experimental results show improvements of up to 6% in source counting accuracy over state-of-the-art baselines, particularly in high source count scenarios. Our findings highlight that multi-task learning not only enhances accuracy, but also improves efficiency by replacing multiple task-specific models with a single robust network.
音频源计数是音频场景分析的一项基本任务,它与其他音频任务(如扬声器拨号和声音事件检测)相关。这也是一个相对未被探索的音频任务,呈现出复杂的挑战。特别是当源计数范围较大时,源计数性能较差,限制了其潜在的应用前景。本文提出了一种通过多任务学习来改进音频源计数的新方法。我们首次对音频源计数的层次性进行了实证研究,引入了粗源计数任务和分层多任务学习框架,以便通过几个案例研究场景更好地理解和研究音频源计数任务。我们使用ResNet架构执行多任务学习,并演示了音频源计数精度的改进,比之前在SARdBScene数据集上的最佳结果提高了6%。我们还执行了音频源计数和声学场景分类的多任务学习,作为鲁棒音频场景分析的一步。这些实验结果表明,与最先进的基线相比,源计数精度提高了6%,特别是在高源计数情况下。我们的研究结果强调,多任务学习不仅可以提高准确性,还可以通过用单个鲁棒网络取代多个特定任务的模型来提高效率。
{"title":"Multi-task learning for audio scene source counting and analysis","authors":"Michael Nigro,&nbsp;Sridhar Krishnan","doi":"10.1016/j.mlwa.2025.100785","DOIUrl":"10.1016/j.mlwa.2025.100785","url":null,"abstract":"<div><div>Audio source counting is a fundamental task of audio scene analysis related to other audio tasks such as speaker diarization and sound event detection. It is also a relatively unexplored audio task that presents a complex challenge. In particular, source counting performance is poor when the source count range is large, limiting its potential applications. This paper presents a novel approach to improve upon audio source counting through multi-task learning. We present a first of its kind empirical study on the hierarchical nature of audio source counting, introducing the coarse source counting task and a hierarchical multi-task learning framework, in order to better understand and investigate the audio source counting task through several case study scenarios. We perform multi-task learning with a ResNet architecture and demonstrate improvements to audio source counting accuracy by up to a 6% increase from the previous best result on the SARdBScene dataset. We also perform multi-task learning of audio source counting and acoustic scene classification as a step forward for robust audio scene analysis. These experimental results show improvements of up to 6% in source counting accuracy over state-of-the-art baselines, particularly in high source count scenarios. Our findings highlight that multi-task learning not only enhances accuracy, but also improves efficiency by replacing multiple task-specific models with a single robust network.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100785"},"PeriodicalIF":4.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1