首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Automatic discovery of robust risk groups from limited survival data across biomedical modalities 从生物医学模式的有限生存数据中自动发现稳健的风险群体
IF 4.9 Pub Date : 2025-12-08 DOI: 10.1016/j.mlwa.2025.100814
Ethar Alzaid , George Wright , Mark Eastwood , Piotr Keller , Fayyaz Minhas
Survival prediction from medical data is often constrained by scarce labels, limiting the effectiveness of fully supervised models. In addition, most existing approaches produce deterministic risk scores without conveying reliability, which hinders interpretability and clinical trustworthiness. To address these challenges, we introduce T-SURE, a transductive survival ranking and risk-stratification framework that learns jointly from labeled and unlabeled patients to reduce dependence on large annotated cohorts. It also estimates a rejection score that identifies high-uncertainty cases, enabling selective abstention when confidence is low. T-SURE generates a single risk score that enables (1) patient ranking based on survival risk, (2) automatic assignment to risk groups, and (3) optional rejection of uncertain predictions. We extensively evaluated the model on pan-cancer datasets from The Cancer Genome Atlas (TCGA), using gene expression profiles, whole slide images, pathology reports, and clinical information. The model outperformed existing approaches in both ranking and risk stratification, especially in the limited labeled data regimen. It also showed consistent improvements in performance as uncertain samples were rejected, while maintaining statistically significant stratification across datasets. T-SURE integrates as a reliable component within computational pathology pipelines by guiding risk-specific therapeutic and monitoring decisions and flagging ambiguous or rare cases via a high rejection score for further investigation. To support reproducibility, the full implementation of T-SURE is publicly available at: (Anonymized).
基于医疗数据的生存预测通常受到稀缺标签的限制,从而限制了完全监督模型的有效性。此外,大多数现有方法产生的确定性风险评分没有传达可靠性,这阻碍了可解释性和临床可信度。为了解决这些挑战,我们引入了T-SURE,这是一个转导生存排名和风险分层框架,可以从标记和未标记的患者中共同学习,以减少对大型注释队列的依赖。它还估计一个拒绝分数,识别高不确定性的情况下,使选择性弃权时,信心是低的。T-SURE生成一个单一的风险评分,实现(1)基于生存风险的患者排名,(2)对风险组的自动分配,以及(3)对不确定预测的选择性拒绝。我们在来自癌症基因组图谱(TCGA)的泛癌症数据集上广泛评估了该模型,使用了基因表达谱、全幻灯片图像、病理报告和临床信息。该模型在排名和风险分层方面优于现有方法,特别是在有限的标记数据方案中。当不确定样本被拒绝时,它也显示出性能的持续改进,同时在数据集上保持统计学上显著的分层。T-SURE作为一个可靠的组件集成在计算病理学管道中,通过指导风险特异性治疗和监测决策,并通过高排斥评分标记不明确或罕见的病例,以供进一步研究。为了支持可重复性,T-SURE的完整实现公开可在:(匿名)。
{"title":"Automatic discovery of robust risk groups from limited survival data across biomedical modalities","authors":"Ethar Alzaid ,&nbsp;George Wright ,&nbsp;Mark Eastwood ,&nbsp;Piotr Keller ,&nbsp;Fayyaz Minhas","doi":"10.1016/j.mlwa.2025.100814","DOIUrl":"10.1016/j.mlwa.2025.100814","url":null,"abstract":"<div><div>Survival prediction from medical data is often constrained by scarce labels, limiting the effectiveness of fully supervised models. In addition, most existing approaches produce deterministic risk scores without conveying reliability, which hinders interpretability and clinical trustworthiness. To address these challenges, we introduce T-SURE, a transductive survival ranking and risk-stratification framework that learns jointly from labeled and unlabeled patients to reduce dependence on large annotated cohorts. It also estimates a rejection score that identifies high-uncertainty cases, enabling selective abstention when confidence is low. T-SURE generates a single risk score that enables (1) patient ranking based on survival risk, (2) automatic assignment to risk groups, and (3) optional rejection of uncertain predictions. We extensively evaluated the model on pan-cancer datasets from The Cancer Genome Atlas (TCGA), using gene expression profiles, whole slide images, pathology reports, and clinical information. The model outperformed existing approaches in both ranking and risk stratification, especially in the limited labeled data regimen. It also showed consistent improvements in performance as uncertain samples were rejected, while maintaining statistically significant stratification across datasets. T-SURE integrates as a reliable component within computational pathology pipelines by guiding risk-specific therapeutic and monitoring decisions and flagging ambiguous or rare cases via a high rejection score for further investigation. To support reproducibility, the full implementation of T-SURE is publicly available at: (Anonymized).</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100814"},"PeriodicalIF":4.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging multimodal large language models to extract mechanistic insights from biomedical visuals: A case study on COVID-19 and neurodegenerative diseases 利用多模态大语言模型从生物医学视觉中提取机制见解:COVID-19和神经退行性疾病的案例研究
IF 4.9 Pub Date : 2025-12-06 DOI: 10.1016/j.mlwa.2025.100816
Elizaveta Popova , Marc Jacobs , Martin Hofmann-Apitius , Negin Sadat Babaiha

Background

The COVID-19 pandemic has intensified concerns about its long-term neurological impact, with evidence linking SARS-CoV-2 infection to neurodegenerative diseases (NDDs) such as Alzheimer’s (AD) and Parkinson’s (PD). Patients with these conditions face higher risk of severe COVID-19 outcomes and may experience accelerated cognitive or motor decline following infection. Proposed mechanisms—including neuroinflammation, blood–brain barrier (BBB) disruption, and abnormal protein aggregation—closely mirror core features of neurodegenerative pathology. However, current knowledge remains fragmented across text, figures, and pathway diagrams, limiting integration into computational models that could reveal systemic patterns.

Results

To address this gap, we applied GPT-4 Omni (GPT-4o), a multimodal large language model (LLM), to extract mechanistic insights from biomedical figures. Over 10,000 images were retrieved through targeted searches on COVID-19 and neurodegeneration; after automated and manual filtering, a curated subset was analyzed. GPT-4o extracted biological relationships as semantic triples, grouped into six mechanistic categories—including microglial activation and barrier disruption—using ontology-guided similarity and assembled into a Neo4j knowledge graph (KG). Accuracy was evaluated against a gold-standard dataset of expert-annotated images using Biomedical Bidirectional Encoder Representations from Transformers (BioBERT)–based semantic matching. This evaluation enabled prompt tuning, threshold optimization, and hyperparameter assessment. Results show that GPT-4o successfully recovers both established and novel mechanisms, yielding interpretable outputs that illuminate complex biological links between SARS-CoV-2 and neurodegeneration.

Conclusions

This study demonstrates the potential of multimodal LLMs to mine biomedical visual data at scale. By complementing text mining and integrating figure-derived knowledge, our framework advances understanding of COVID-19–related neurodegeneration and supports future translational research.
COVID-19大流行加剧了人们对其长期神经系统影响的担忧,有证据表明SARS-CoV-2感染与阿尔茨海默病(AD)和帕金森病(PD)等神经退行性疾病(ndd)有关。患有这些疾病的患者面临COVID-19严重后果的更高风险,并可能在感染后加速认知或运动能力下降。提出的机制-包括神经炎症,血脑屏障(BBB)破坏和异常蛋白质聚集-密切反映了神经退行性病理的核心特征。然而,目前的知识仍然分散在文本、图形和路径图中,限制了集成到可以揭示系统模式的计算模型中。为了解决这一差距,我们应用了多模态大语言模型(LLM) GPT-4 Omni (gpt - 40)来从生物医学数据中提取机制见解。通过针对COVID-19和神经退行性疾病的定向搜索检索了1万多张图像;在自动和手动过滤之后,分析了一个精心挑选的子集。gpt - 40将生物关系提取为语义三元组,使用本体引导的相似性将其分为六个机制类别,包括小胶质细胞激活和屏障破坏,并组装成Neo4j知识图(KG)。使用基于变形金刚的生物医学双向编码器表示(BioBERT)的语义匹配,对专家注释图像的金标准数据集进行准确性评估。该评估支持即时调优、阈值优化和超参数评估。结果表明,gpt - 40成功恢复了已建立的和新的机制,产生了可解释的输出,阐明了SARS-CoV-2与神经变性之间的复杂生物学联系。本研究证明了多模态llm在大规模挖掘生物医学视觉数据方面的潜力。通过补充文本挖掘和整合图形衍生知识,我们的框架促进了对covid -19相关神经变性的理解,并支持未来的转化研究。
{"title":"Leveraging multimodal large language models to extract mechanistic insights from biomedical visuals: A case study on COVID-19 and neurodegenerative diseases","authors":"Elizaveta Popova ,&nbsp;Marc Jacobs ,&nbsp;Martin Hofmann-Apitius ,&nbsp;Negin Sadat Babaiha","doi":"10.1016/j.mlwa.2025.100816","DOIUrl":"10.1016/j.mlwa.2025.100816","url":null,"abstract":"<div><h3>Background</h3><div>The COVID-19 pandemic has intensified concerns about its long-term neurological impact, with evidence linking SARS-CoV-2 infection to neurodegenerative diseases (NDDs) such as Alzheimer’s (AD) and Parkinson’s (PD). Patients with these conditions face higher risk of severe COVID-19 outcomes and may experience accelerated cognitive or motor decline following infection. Proposed mechanisms—including neuroinflammation, blood–brain barrier (BBB) disruption, and abnormal protein aggregation—closely mirror core features of neurodegenerative pathology. However, current knowledge remains fragmented across text, figures, and pathway diagrams, limiting integration into computational models that could reveal systemic patterns.</div></div><div><h3>Results</h3><div>To address this gap, we applied GPT-4 Omni (GPT-4o), a multimodal large language model (LLM), to extract mechanistic insights from biomedical figures. Over 10,000 images were retrieved through targeted searches on COVID-19 and neurodegeneration; after automated and manual filtering, a curated subset was analyzed. GPT-4o extracted biological relationships as semantic triples, grouped into six mechanistic categories—including microglial activation and barrier disruption—using ontology-guided similarity and assembled into a Neo4j knowledge graph (KG). Accuracy was evaluated against a gold-standard dataset of expert-annotated images using Biomedical Bidirectional Encoder Representations from Transformers (BioBERT)–based semantic matching. This evaluation enabled prompt tuning, threshold optimization, and hyperparameter assessment. Results show that GPT-4o successfully recovers both established and novel mechanisms, yielding interpretable outputs that illuminate complex biological links between SARS-CoV-2 and neurodegeneration.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the potential of multimodal LLMs to mine biomedical visual data at scale. By complementing text mining and integrating figure-derived knowledge, our framework advances understanding of COVID-19–related neurodegeneration and supports future translational research.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100816"},"PeriodicalIF":4.9,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the remaining charge retention time of an electric vehicle battery 电动汽车电池剩余电荷保持时间的估计
IF 4.9 Pub Date : 2025-12-05 DOI: 10.1016/j.mlwa.2025.100813
Chourik Fousseni , Martin Otis , Khaled Ziane
Accurately estimating the remaining driving time (RDT) of an electric vehicle (EV) battery is essential for optimizing energy management and enhancing user experience. However, traditional estimation methods do not adequately account for the influence of temperature, driving characteristics and vehicle driving time, leading to less accurate predictions and suboptimal range management. To address these limitations, this study presents a method for estimating the remaining charge retention time by integrating temperature and driving characteristics, which refines predictions and improves model reliability. Furthermore, data from the National Big Data Alliance for New Energy Vehicles (NDANEV) were employed to develop a predictive model based on machine learning (ML) models. The different ML models compared in this study are Linear Regression, LSTM, RF, Prophet, LightGBM, and XGBoost. The model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), the coefficient of determination (R2) and the prediction runtime to assess the prediction accuracy. The results show that the R2 values for Prophet, Random Forest, LSTM, XGBoost, and LightGBM are 0.91, 0.94, 0.95, 0.94, and 0.94 respectively. This suggests that XGBoost outperforms the other models, providing the most accurate estimate of the remaining driving time. In addition, the result confirms that considering driving characteristics and ambient temperature improves the reliability and robustness of estimations. These advancements contribute to more efficient energy management and optimized charging strategies.
准确估算电动汽车电池的剩余行驶时间(RDT)对于优化能源管理和提升用户体验至关重要。然而,传统的估算方法没有充分考虑温度、驾驶特性和车辆行驶时间的影响,导致预测精度较低,里程管理不理想。为了解决这些限制,本研究提出了一种通过集成温度和驱动特性来估计剩余电荷保留时间的方法,该方法可以改进预测并提高模型的可靠性。此外,利用国家新能源汽车大数据联盟(NDANEV)的数据开发了基于机器学习(ML)模型的预测模型。本研究中比较的不同机器学习模型有线性回归、LSTM、RF、Prophet、LightGBM和XGBoost。采用平均绝对误差(MAE)、均方根误差(RMSE)、决定系数(R2)和预测运行时间对模型性能进行评价,以评估预测精度。结果表明,Prophet、Random Forest、LSTM、XGBoost和LightGBM的R2值分别为0.91、0.94、0.95、0.94和0.94。这表明XGBoost优于其他模型,提供了最准确的剩余驾驶时间估计。此外,结果证实,考虑驾驶特性和环境温度提高了估计的可靠性和鲁棒性。这些进步有助于更有效的能源管理和优化充电策略。
{"title":"Estimation of the remaining charge retention time of an electric vehicle battery","authors":"Chourik Fousseni ,&nbsp;Martin Otis ,&nbsp;Khaled Ziane","doi":"10.1016/j.mlwa.2025.100813","DOIUrl":"10.1016/j.mlwa.2025.100813","url":null,"abstract":"<div><div>Accurately estimating the remaining driving time (RDT) of an electric vehicle (EV) battery is essential for optimizing energy management and enhancing user experience. However, traditional estimation methods do not adequately account for the influence of temperature, driving characteristics and vehicle driving time, leading to less accurate predictions and suboptimal range management. To address these limitations, this study presents a method for estimating the remaining charge retention time by integrating temperature and driving characteristics, which refines predictions and improves model reliability. Furthermore, data from the National Big Data Alliance for New Energy Vehicles (NDANEV) were employed to develop a predictive model based on machine learning (ML) models. The different ML models compared in this study are Linear Regression, LSTM, RF, Prophet, LightGBM, and XGBoost. The model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), the coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>) and the prediction runtime to assess the prediction accuracy. The results show that the <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> values for Prophet, Random Forest, LSTM, XGBoost, and LightGBM are 0.91, 0.94, 0.95, 0.94, and 0.94 respectively. This suggests that XGBoost outperforms the other models, providing the most accurate estimate of the remaining driving time. In addition, the result confirms that considering driving characteristics and ambient temperature improves the reliability and robustness of estimations. These advancements contribute to more efficient energy management and optimized charging strategies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100813"},"PeriodicalIF":4.9,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing marine mammal monitoring: Large-scale UAV delphinidae datasets and robust motion tracking for group size estimation 推进海洋哺乳动物监测:大规模无人机飞燕科数据集和鲁棒运动跟踪群体大小估计
IF 4.9 Pub Date : 2025-12-04 DOI: 10.1016/j.mlwa.2025.100808
Leonardo Viegas Filipe , João Canelas , Mário Vieira , Francisco Correia da Fonseca , André Cid , Joana Castro , Inês Machado
Reliable estimates of dolphin abundance are essential for conservation and impact assessment, yet manual analysis of aerial surveys is time-consuming and difficult to scale. This paper presents an end-to-end pipeline for automatic dolphin counting from unmanned aerial vehicle (UAV) video that combines modern object detection and multi-object tracking. We construct a large detection dataset of 64,705 images with 225,305 dolphin bounding boxes and a tracking dataset of 54,274 frames with 207,850 boxes and 603 unique tracks, derived from UAV line-transect surveys. Using these data, we train a YOLO11-based detector that achieves a precision of approximately 0.93 across a range of sea states. For tracking, we adopt BoT-SORT and tune its parameters with a genetic algorithm using a multi-metric objective, reducing ID fragmentation by about 29% relative to default settings. Recent YOLO-based cetacean detectors trained on UAV imagery of beluga whales report precision/recall around 0.92/0.92 for adults and 0.94/0.89 for calves, but rely on DeepSORT tracking whose MOTA remains below 0.5 and must be boosted to roughly 0.7 with post-hoc trajectory post-processing. In this context, our pipeline offers competitive detection performance, substantially larger and fully documented detection and tracking benchmarks, and GA-optimized tracking without manual post-processing. Applied to dolphin group counting, the full pipeline attains a mean absolute error of 1.24 on a held-out validation set, demonstrating that UAV-based automated counting can support robust, scalable monitoring of coastal dolphin populations.
海豚数量的可靠估计对保育和影响评估至关重要,但人工分析航空调查既耗时又难以衡量。提出了一种结合现代目标检测和多目标跟踪的端到端无人机视频海豚自动计数方法。我们构建了一个由64,705张图像组成的大型检测数据集,其中包含225,305个海豚边界框;以及一个由54,274帧图像组成的跟踪数据集,其中包含207,850个框和603个独特的轨迹。利用这些数据,我们训练了一个基于yolo11的探测器,该探测器在各种海况下的精度约为0.93。对于跟踪,我们采用BoT-SORT并使用多度量目标的遗传算法调整其参数,相对于默认设置减少了约29%的ID碎片。最近基于yolo的鲸类探测器在无人机图像上训练的白鲸报告精度/召回率约为0.92/0.92,幼鲸为0.94/0.89,但依赖于深度排序跟踪,其MOTA仍然低于0.5,必须通过事后轨迹后处理提高到大约0.7。在这种情况下,我们的管道提供具有竞争力的检测性能,更大的和完整文档的检测和跟踪基准,以及无需手动后处理的ga优化跟踪。应用于海豚种群计数,整个管道的平均绝对误差为1.24,这表明基于无人机的自动计数可以支持强大的、可扩展的沿海海豚种群监测。
{"title":"Advancing marine mammal monitoring: Large-scale UAV delphinidae datasets and robust motion tracking for group size estimation","authors":"Leonardo Viegas Filipe ,&nbsp;João Canelas ,&nbsp;Mário Vieira ,&nbsp;Francisco Correia da Fonseca ,&nbsp;André Cid ,&nbsp;Joana Castro ,&nbsp;Inês Machado","doi":"10.1016/j.mlwa.2025.100808","DOIUrl":"10.1016/j.mlwa.2025.100808","url":null,"abstract":"<div><div>Reliable estimates of dolphin abundance are essential for conservation and impact assessment, yet manual analysis of aerial surveys is time-consuming and difficult to scale. This paper presents an end-to-end pipeline for automatic dolphin counting from unmanned aerial vehicle (UAV) video that combines modern object detection and multi-object tracking. We construct a large detection dataset of 64,705 images with 225,305 dolphin bounding boxes and a tracking dataset of 54,274 frames with 207,850 boxes and 603 unique tracks, derived from UAV line-transect surveys. Using these data, we train a YOLO11-based detector that achieves a precision of approximately 0.93 across a range of sea states. For tracking, we adopt BoT-SORT and tune its parameters with a genetic algorithm using a multi-metric objective, reducing ID fragmentation by about 29% relative to default settings. Recent YOLO-based cetacean detectors trained on UAV imagery of beluga whales report precision/recall around 0.92/0.92 for adults and 0.94/0.89 for calves, but rely on DeepSORT tracking whose MOTA remains below 0.5 and must be boosted to roughly 0.7 with post-hoc trajectory post-processing. In this context, our pipeline offers competitive detection performance, substantially larger and fully documented detection and tracking benchmarks, and GA-optimized tracking without manual post-processing. Applied to dolphin group counting, the full pipeline attains a mean absolute error of 1.24 on a held-out validation set, demonstrating that UAV-based automated counting can support robust, scalable monitoring of coastal dolphin populations.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100808"},"PeriodicalIF":4.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive modeling of error categories in English-Slovak machine translation using automatic evaluation metrics 基于自动评价指标的英语-斯洛伐克语机器翻译错误分类预测建模
IF 4.9 Pub Date : 2025-12-04 DOI: 10.1016/j.mlwa.2025.100810
Dasa Munkova , Lucia Benkova , Michal Munk , Lubomir Benko , Petr Hajek
This paper presents a language-specific adaptation for automatic identification of machine translation (MT) errors using a comprehensive set of open-source evaluation metrics. The approach focuses on the English–Slovak translation direction, addressing challenges posed by Slovak’s highly inflectional and low-resource nature. Predictive models were developed for five key error categories (Predication, Modal and communication sentence framework, Syntactic-semantic correlativeness, Compound/complex sentences, and Lexical semantics) by employing forward stepwise regression and validated through bootstrapping techniques. The models estimate the probability of error occurrence in MT segments, demonstrating stable and comparable performance across training and test datasets, as measured by Somers’ D. While human expert evaluation remains essential for verifying flagged segments, the proposed approach significantly reduces evaluator workload by prioritizing likely error-containing segments. This methodology offers a scalable and adaptable framework for MT quality assessment across languages and text styles, with potential to improve automated translation evaluation and post-editing processes.
本文提出了一种针对机器翻译(MT)错误自动识别的特定语言适配,使用了一套全面的开源评估指标。该方法侧重于英语-斯洛伐克语翻译方向,解决斯洛伐克语高度屈折和低资源性质所带来的挑战。采用前向逐步回归方法,对预测、情态与交际句框架、句法语义相关性、复合句/复合句、词汇语义等5个关键错误类别建立预测模型,并通过自举技术进行验证。该模型估计了MT片段中错误发生的概率,在训练和测试数据集上展示了稳定和可比较的性能,正如Somers ' d所测量的那样。尽管人类专家评估对于验证标记的片段仍然至关重要,但所提出的方法通过优先考虑可能包含错误的片段,大大减少了评估人员的工作量。该方法为跨语言和文本风格的翻译质量评估提供了一个可扩展和适应性强的框架,具有改进自动翻译评估和后期编辑过程的潜力。
{"title":"Predictive modeling of error categories in English-Slovak machine translation using automatic evaluation metrics","authors":"Dasa Munkova ,&nbsp;Lucia Benkova ,&nbsp;Michal Munk ,&nbsp;Lubomir Benko ,&nbsp;Petr Hajek","doi":"10.1016/j.mlwa.2025.100810","DOIUrl":"10.1016/j.mlwa.2025.100810","url":null,"abstract":"<div><div>This paper presents a language-specific adaptation for automatic identification of machine translation (MT) errors using a comprehensive set of open-source evaluation metrics. The approach focuses on the English–Slovak translation direction, addressing challenges posed by Slovak’s highly inflectional and low-resource nature. Predictive models were developed for five key error categories (Predication, Modal and communication sentence framework, Syntactic-semantic correlativeness, Compound/complex sentences, and Lexical semantics) by employing forward stepwise regression and validated through bootstrapping techniques. The models estimate the probability of error occurrence in MT segments, demonstrating stable and comparable performance across training and test datasets, as measured by Somers’ D. While human expert evaluation remains essential for verifying flagged segments, the proposed approach significantly reduces evaluator workload by prioritizing likely error-containing segments. This methodology offers a scalable and adaptable framework for MT quality assessment across languages and text styles, with potential to improve automated translation evaluation and post-editing processes.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100810"},"PeriodicalIF":4.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing skin cancer diagnosis using late discrete wavelet transform and new swarm-based optimizers 用晚期离散小波变换和新的群体优化器增强皮肤癌诊断
IF 4.9 Pub Date : 2025-12-03 DOI: 10.1016/j.mlwa.2025.100811
Ramin Mousa , Saeed Chamani , Mohammad Morsali , Mohammad Kazzazi , Parsa Hatami , Soroush Sarabi
Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.
皮肤癌(SC)是一种危及生命的疾病,早期诊断对有效治疗和生存至关重要。虽然深度学习(DL)具有先进的皮肤癌诊断(SCD),但由于从皮肤镜图像中提取多尺度特征以及通过有效探索超参数空间来优化复杂模型参数的挑战,目前的方法通常产生次优的准确性和效率。为了解决这个问题,我们提出了一种将晚期离散小波变换(DWT)与预训练卷积神经网络(cnn)和基于群的优化相结合的方法。后期DWT将cnn提取的特征映射分解为低频和高频分量,以提高对细微病变模式的检测,而自关注机制通过权衡特征重要性进一步细化,专注于相关的诊断信息。为了优化超参数,采用了三种新的基于群体的优化器-改进的大猩猩部队优化器(MGTO),改进的灰狼优化器(IGWO)和狐狸优化器(Fox) -搜索超参数的空间来微调模型以获得更好的性能。与现有方法相比,在ISIC-2016和ISIC-2017数据集上的实验表明,该方法的分类性能得到了提高,准确率至少提高了1%。因此,该框架提供了一种可靠有效的皮肤癌自动诊断方法。
{"title":"Enhancing skin cancer diagnosis using late discrete wavelet transform and new swarm-based optimizers","authors":"Ramin Mousa ,&nbsp;Saeed Chamani ,&nbsp;Mohammad Morsali ,&nbsp;Mohammad Kazzazi ,&nbsp;Parsa Hatami ,&nbsp;Soroush Sarabi","doi":"10.1016/j.mlwa.2025.100811","DOIUrl":"10.1016/j.mlwa.2025.100811","url":null,"abstract":"<div><div>Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100811"},"PeriodicalIF":4.9,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISO-DeTr: A novel detection transformer for industrial small object detection ISO-DeTr:一种用于工业小物体检测的新型检测变压器
IF 4.9 Pub Date : 2025-12-02 DOI: 10.1016/j.mlwa.2025.100809
Faisal Saeed , Anand Paul
Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.
在当代制造环境中,有效地检测和评估实时结构和生态参数提出了重大挑战,特别是在识别产品图像中的微小物体方面。工业部门的快速发展强调了智能制造环境维护严格的产品质量标准的必要性。然而,高速加速生产过程会增加产品缺陷的风险。本研究解决了工业环境中小物体检测固有的挑战,提出了一种适合现代制造环境的创新检测变压器模型。该模型集成了一个特征增强的多头自注意块(FEMSA),融合了跨信道通信网络和多个多头自注意(MSA)组件来细化图像特征。在检测变压器框架中还引入了一个查询提议网络,该网络使用交联(IoU)和非最大抑制(NMS)算法来识别高级提议。通过在定制工业小型对象上的大量实验,与基于非最大抑制和变压器的现有模型相比,我们提出的模型表现出优越的性能。通过解决与小物体检测相关的挑战,我们的模型有助于虚拟和物理制造领域之间的动态同步,增强工业生产中的质量控制。
{"title":"ISO-DeTr: A novel detection transformer for industrial small object detection","authors":"Faisal Saeed ,&nbsp;Anand Paul","doi":"10.1016/j.mlwa.2025.100809","DOIUrl":"10.1016/j.mlwa.2025.100809","url":null,"abstract":"<div><div>Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100809"},"PeriodicalIF":4.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling AdAPT:基于null伪标记的报纸域偏移广告检测器自适应
IF 4.9 Pub Date : 2025-12-01 DOI: 10.1016/j.mlwa.2025.100806
Faeze Zakaryapour Sayyad , Tobias Pettersson , Seyed Jalaleddin Mousavirad , Irida Shallari , Mattias O’Nils
Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.
在数字化报纸中检测广告是大规模媒体分析和数字化存档的关键步骤。然而,布局、排版和广告设计在出版商和时间段上的变化会导致显著的领域转移,从而降低监督检测器的泛化能力。该文提出了一种基于置信度引导的伪标记管道,用于广告检测中的无监督域自适应。该方法利用来自未标记目标域的无广告(Null)和包含广告的页面来生成可靠的伪标签。通过将标记的源数据与过滤后的伪标记目标样本相结合,对基于yolo的检测器进行再训练,AdAPT无需手动标注即可实现鲁棒自适应。在两份看不见的报纸(Adresseavisen和iTromsø)上进行的实验表明,基于空值的伪标记提供了最稳定和准确的自适应,与基线相比,误差减少了38%。结果表明,AdAPT是一种简单、可扩展且注释高效的解决方案,可在不同的报纸集合中维护高性能的广告检测。
{"title":"AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling","authors":"Faeze Zakaryapour Sayyad ,&nbsp;Tobias Pettersson ,&nbsp;Seyed Jalaleddin Mousavirad ,&nbsp;Irida Shallari ,&nbsp;Mattias O’Nils","doi":"10.1016/j.mlwa.2025.100806","DOIUrl":"10.1016/j.mlwa.2025.100806","url":null,"abstract":"<div><div>Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100806"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable DEA–ensemble approach with golden jackal optimization: efficiency evaluation and prediction for United States information technology firms 金豺狼优化下的可解释dea -集合方法:美国信息技术企业效率评价与预测
IF 4.9 Pub Date : 2025-11-29 DOI: 10.1016/j.mlwa.2025.100798
Temitope Olubanjo Kehinde , Azeez A. Oyedele , Morenikeji Kabirat Kareem , Joseph Akpan , Oludolapo A. Olanrewaju
This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.
本研究提出一个整合数据包络分析(DEA)与金豺优化(GJO)算法的集成学习框架来评估和预测美国信息技术公司的效率。恒定规模回报和可变规模回报模型都被应用于衡量企业效率和计算规模效率,在管理效应和规模相关效应之间提供了更清晰的区分。利用2013年至2023年期间3940家企业的数据,对10%随机样本引入±20%随机噪声的稳健性检验证实,CCR模型具有更强的稳定性,其相关系数为0.795,而BCC模型的相关系数为0.773。因此,采用CCR结果作为预测建模的基础。使用XGBoost、Gradient Boosting Regressor、AdaBoost、Extra Trees Regressor、Random Forest和LightGBM等6个集成学习器预测DEA效率得分,并使用GJO进行超参数调优。使用GJO优化的梯度增强回归器获得了最佳的预测性能,准确地再现了观察到的效率得分。SHAP和特征重要性分析显示,总股本、营业收入和总资产是效率的最具影响力的决定因素。本研究为效率预测提供了一种可扩展和可解释的方法,为动荡的金融市场中的管理者、投资者和政策制定者提供了可操作的见解。
{"title":"Explainable DEA–ensemble approach with golden jackal optimization: efficiency evaluation and prediction for United States information technology firms","authors":"Temitope Olubanjo Kehinde ,&nbsp;Azeez A. Oyedele ,&nbsp;Morenikeji Kabirat Kareem ,&nbsp;Joseph Akpan ,&nbsp;Oludolapo A. Olanrewaju","doi":"10.1016/j.mlwa.2025.100798","DOIUrl":"10.1016/j.mlwa.2025.100798","url":null,"abstract":"<div><div>This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100798"},"PeriodicalIF":4.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation 利用图像到图像的谱图转换,从皮质电图信号中增强被动听到语音的合成
IF 4.9 Pub Date : 2025-11-28 DOI: 10.1016/j.mlwa.2025.100805
Hongsang Lee , Jihun Hwang , Kyungjun Kim , Gyuwon Lee , Chun Kee Chung , Chang-Hwan Im
Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, p < 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.
神经信号的语音合成为恢复言语障碍患者的沟通提供了一条有希望的途径。最近的深度学习进展已经改进了将神经活动解码为可理解的语音,但需要进一步增强以提高合成语音的质量。在这里,我们研究了一种图像到图像的翻译方法是否可以进一步改进脑皮层电图(ECoG)信号合成的Mel谱图,这些信号是在参与者被动地听口语句子时记录的。ECoG数据收集自执行听觉语音感知任务的志愿者。首先训练了一个三层双向长短期记忆(Bi-LSTM)网络来预测神经信号的mel谱图特征。与Conformer模型的比较表明,在有限的数据条件下,Bi-LSTM作为初始综合模型更为有效。为了进一步提高bi - lstm合成Mel谱图的质量,我们采用高分辨率条件GAN Pix2pixHD作为后处理模块。Pix2pixHD的影响通过对数光谱距离(LSD)、尺度不变信失真比(SI-SDR)和短期客观可理解性(STOI)来评估。此外,进行主观听力测试(2AFC相似性判断)来评估感知改善。在客观指标上,Pix2pixHD后处理在频谱保真度、波形相似性和估计可理解性(更低的LSD、更高的SI-SDR和STOI)方面取得了一致的改善,主观测试证实了与原始语音的感知相似性显著增强。这些成果得到了非参数显著性检验的支持(Wilcoxon符号秩检验,p < 0.005)。结果表明,高分辨率图像到图像翻译是改进基于神经信号的语音合成、补充序列模型和提高合成语音整体感知质量的有效工具。
{"title":"Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation","authors":"Hongsang Lee ,&nbsp;Jihun Hwang ,&nbsp;Kyungjun Kim ,&nbsp;Gyuwon Lee ,&nbsp;Chun Kee Chung ,&nbsp;Chang-Hwan Im","doi":"10.1016/j.mlwa.2025.100805","DOIUrl":"10.1016/j.mlwa.2025.100805","url":null,"abstract":"<div><div>Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, <em>p</em> &lt; 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100805"},"PeriodicalIF":4.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1