首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning. ProtoECGNet:基于案例的可解释深度学习的多标签心电分类对比学习。
Sahil Sethi, David Chen, Thomas Statchen, Michael C Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments-enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.

基于深度学习的心电图(ECG)分类显示出令人印象深刻的表现,但由于缺乏透明和忠实的解释,临床应用一直放缓。诸如显著性图之类的事后方法可能无法反映模型的真实决策过程。基于原型的推理提供了一种更透明的选择,通过将决策与真实ECG片段的学习表征相似,从而实现忠实的、基于案例的解释。我们介绍了ProtoECGNet,一个基于原型的深度学习模型,用于可解释的多标签心电分类。ProtoECGNet采用结构化的多分支架构,反映了临床解释工作流程:它集成了具有节奏分类全局原型的1D CNN,具有基于形态学推理的时间本地化原型的2D CNN,以及具有弥漫性异常全局原型的2D CNN。每个分支都使用为多标签学习设计的原型损失进行训练,结合聚类、分离、多样性和一种新的对比损失,这种损失鼓励在不相关类的原型之间进行适当的分离,同时允许聚类进行频繁的共同发生的诊断。我们在PTB-XL数据集中的所有71个标签上评估了ProtoECGNet,展示了相对于最先进的黑箱模型的竞争力,同时提供了结构化的、基于案例的解释。为了评估原型质量,我们对最终模型的计划原型进行了结构化的临床医生审查,发现它们被评为具有代表性和清晰性。ProtoECGNet表明,原型学习可以有效地扩展到复杂的多标签时间序列分类,为临床决策支持提供了透明和可信的深度学习模型。
{"title":"ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning.","authors":"Sahil Sethi, David Chen, Thomas Statchen, Michael C Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments-enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model. 利用嵌入式神经霍克斯过程模型平衡诊断轨迹建模的可解释性和灵活性。
Yuankang Zhao, Matthew M Engelhard

The Hawkes process (HP) is commonly used to model event sequences with self-reinforcing dynamics, including electronic health records (EHRs). Traditional HPs capture self-reinforcement via parametric impact functions that can be inspected to understand how each event modulates the intensity of others. Neural network-based HPs offer greater flexibility, resulting in improved fit and prediction performance, but at the cost of interpretability, which is often critical in healthcare. In this work, we aim to understand and improve upon this tradeoff. We propose a novel HP formulation in which impact functions are modeled by defining a flexible impact kernel, instantiated as a neural network, in event embedding space, which allows us to model large-scale event sequences with many event types. This approach is more flexible than traditional HPs yet more interpretable than other neural network approaches, and allows us to explicitly trade flexibility for interpretability by adding transformer encoder layers to further contextualize the event embeddings. Results show that our method accurately recovers impact functions in simulations, achieves competitive performance on MIMIC-IV procedure dataset, and gains clinically meaningful interpretation on Duke-EHR with children diagnosis dataset even without transformer layers. This suggests that our flexible impact kernel is often sufficient to capture self-reinforcing dynamics in EHRs and other data effectively, implying that interpretability can be maintained without loss of performance.

Hawkes流程(HP)通常用于为具有自我强化动态的事件序列建模,包括电子健康记录(EHRs)。传统的hp通过参数影响函数捕获自我强化,可以检查这些函数以了解每个事件如何调节其他事件的强度。基于神经网络的hp提供了更大的灵活性,从而提高了拟合和预测性能,但代价是可解释性,这在医疗保健中通常是至关重要的。在这项工作中,我们的目标是理解和改进这种权衡。我们提出了一种新的HP公式,其中通过在事件嵌入空间中定义一个灵活的影响内核(实例化为神经网络)来建模影响函数,这使我们能够对具有许多事件类型的大规模事件序列进行建模。这种方法比传统的hp更灵活,但比其他神经网络方法更具可解释性,并且允许我们通过添加转换器编码器层来进一步将事件嵌入上下文化,从而显式地交换灵活性以获得可解释性。结果表明,该方法在模拟中准确地恢复了影响函数,在MIMIC-IV程序数据集上取得了具有竞争力的性能,并且在没有变压器层的Duke-EHR儿童诊断数据集上获得了具有临床意义的解释。这表明我们灵活的影响内核通常足以有效地捕获电子病历和其他数据中的自我强化动态,这意味着可以在不损失性能的情况下保持可解释性。
{"title":"Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model.","authors":"Yuankang Zhao, Matthew M Engelhard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Hawkes process (HP) is commonly used to model event sequences with self-reinforcing dynamics, including electronic health records (EHRs). Traditional HPs capture self-reinforcement via parametric impact functions that can be inspected to understand how each event modulates the intensity of others. Neural network-based HPs offer greater flexibility, resulting in improved fit and prediction performance, but at the cost of interpretability, which is often critical in healthcare. In this work, we aim to understand and improve upon this tradeoff. We propose a novel HP formulation in which impact functions are modeled by defining a flexible impact kernel, instantiated as a neural network, in event embedding space, which allows us to model large-scale event sequences with many event types. This approach is more flexible than traditional HPs yet more interpretable than other neural network approaches, and allows us to explicitly trade flexibility for interpretability by adding transformer encoder layers to further contextualize the event embeddings. Results show that our method accurately recovers impact functions in simulations, achieves competitive performance on MIMIC-IV procedure dataset, and gains clinically meaningful interpretation on Duke-EHR with children diagnosis dataset even without transformer layers. This suggests that our flexible impact kernel is often sufficient to capture self-reinforcing dynamics in EHRs and other data effectively, implying that interpretability can be maintained without loss of performance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646569/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models. 使用大语言模型对治疗顽固性高血压的可计算表型进行迭代学习。
Guilherme Seidyo Imai Aldeia, Daniel S Herman, William G La Cava

Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a synthesize, execute, debug, instruct strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.

大型语言模型(llm)在医学问题回答和编程方面表现出了卓越的能力,但它们在生成可解释可计算表型(CPs)方面的潜力尚未得到充分探索。在这项工作中,我们研究了llm是否可以为六种不同复杂性的临床表型生成准确而简洁的CPs,这可以用来支持可扩展的临床决策支持,以改善对高血压患者的护理。除了评估零短线性能外,我们还提出并测试了一种综合、执行、调试、指导策略,该策略使用llm生成并使用数据驱动反馈迭代地改进CPs。我们的研究结果表明,llm与迭代学习相结合,可以生成可解释且相当准确的程序,接近最先进的ML方法的性能,同时需要的训练示例显着减少。
{"title":"Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models.","authors":"Guilherme Seidyo Imai Aldeia, Daniel S Herman, William G La Cava","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a <i>synthesize, execute, debug, instruct</i> strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12755843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stage-Aware Event-Based Modeling (SA-EBM) for Disease Progression. 基于阶段感知事件的疾病进展模型(SA-EBM)
Hongtao Hao, Vivek Prabhakaran, Veena A Nair, Nagesh Adluru, Joseph L Austerweil

As diseases progress, they increasingly impact more cognitive and biological factors. By formulating probabilistic models with this basic assumption, Event-Based Models (EBMs) enable researchers to discover the progression of a disease that makes earlier diagnosis and effective clinical interventions possible. We build on prior EBMs with two major improvements: (1) dynamic estimation of healthy and pathological biomarker distributions, and (2) explicit modeling of disease stage distribution. We tested existing approaches and our novel approach on 9,000 synthetic datasets and also the real-world ADNI data. We found that our stage-aware EBM (SA-EBM) significantly outperforms prior methods, such as Gaussian Mixture Model (GMM) EBM, Kernel Density Estimation EBM and Discriminative EBM, in accurately recovering the order of disease events and assigning individual disease stages. Our package can be installed by pip install pysaebm. Source codes for the package, experiments, and visualizations are available in Appendix N, or at https://saebm.hongtaoh.com.

随着疾病的发展,它们越来越多地影响认知和生物因素。基于事件的模型(Event-Based models, EBMs)根据这一基本假设制定概率模型,使研究人员能够发现疾病的进展,从而使早期诊断和有效的临床干预成为可能。我们在之前的EBMs基础上进行了两项主要改进:(1)健康和病理生物标志物分布的动态估计,以及(2)疾病分期分布的显式建模。我们在9000个合成数据集和真实世界的ADNI数据上测试了现有的方法和我们的新方法。我们发现,我们的阶段感知EBM (SA-EBM)在准确恢复疾病事件顺序和分配个体疾病阶段方面明显优于先前的方法,如高斯混合模型(GMM) EBM,核密度估计EBM和判别EBM。我们的包可以通过pip install pysaebm安装。软件包、实验和可视化的源代码可在附录N或https://saebm.hongtaoh.com中获得。
{"title":"Stage-Aware Event-Based Modeling (SA-EBM) for Disease Progression.","authors":"Hongtao Hao, Vivek Prabhakaran, Veena A Nair, Nagesh Adluru, Joseph L Austerweil","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>As diseases progress, they increasingly impact more cognitive and biological factors. By formulating probabilistic models with this basic assumption, Event-Based Models (EBMs) enable researchers to discover the progression of a disease that makes earlier diagnosis and effective clinical interventions possible. We build on prior EBMs with two major improvements: (1) dynamic estimation of healthy and pathological biomarker distributions, and (2) explicit modeling of disease stage distribution. We tested existing approaches and our novel approach on 9,000 synthetic datasets and also the real-world ADNI data. We found that our stage-aware EBM (SA-EBM) significantly outperforms prior methods, such as Gaussian Mixture Model (GMM) EBM, Kernel Density Estimation EBM and Discriminative EBM, in accurately recovering the order of disease events and assigning individual disease stages. Our package can be installed by pip install pysaebm. Source codes for the package, experiments, and visualizations are available in Appendix N, or at https://saebm.hongtaoh.com.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888895/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146168322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning. 借鉴未来:通过对比学习加强早期风险评估。
Minghui Sun, Matthew M Engelhard, Benjamin A Goldstein

Risk assessments for a pediatric population are often conducted across multiple stages. For example, clinicians may evaluate risks prenatally, at birth, and during WellChild visits. While predictions at later stages typically achieve higher accuracy, it is clinically desirable to make reliable risk assessments as early as possible. Therefore, this study focuses on enhancing prediction performance in early-stage risk assessments. Our solution, Borrowing From the Future (BFF), is a contrastive multi-modal framework that treats each time window as a distinct modality. In BFF, a model is trained on all available data throughout the time while conduct risk assessment using the up-to-time information. This contrastive framework allows the model to "borrow" informative signals from later stages (e.g., WellChild visits) to implicitly supervise the learning at earlier stages (e.g., prenatal/birth stages). We validate BFF on two real-world pediatric outcome prediction tasks, demonstrating consistent improvements in early risk assessment. The code is at https://github.com/scotsun/bff.

儿科人群的风险评估通常是在多个阶段进行的。例如,临床医生可以在产前、分娩时和访问WellChild期间评估风险。虽然后期的预测通常会达到更高的准确性,但临床希望尽早进行可靠的风险评估。因此,本研究的重点是提高早期风险评估的预测性能。我们的解决方案,借用未来(BFF),是一个对比多模态框架,将每个时间窗口视为一个独特的模态。在BFF中,一个模型在所有可用的数据上进行训练,同时使用最新的信息进行风险评估。这种对比框架允许模型“借用”后期阶段(例如,WellChild访问)的信息信号,以隐性地监督早期阶段(例如,产前/分娩阶段)的学习。我们在两个现实世界的儿科预后预测任务中验证了BFF,证明了早期风险评估的一致性改进。代码见https://github.com/scotsun/bff。
{"title":"Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning.","authors":"Minghui Sun, Matthew M Engelhard, Benjamin A Goldstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Risk assessments for a pediatric population are often conducted across multiple stages. For example, clinicians may evaluate risks prenatally, at birth, and during WellChild visits. While predictions at later stages typically achieve higher accuracy, it is clinically desirable to make reliable risk assessments as early as possible. Therefore, this study focuses on enhancing prediction performance in early-stage risk assessments. Our solution, <b>Borrowing From the Future (BFF)</b>, is a contrastive multi-modal framework that treats each time window as a distinct modality. In BFF, a model is trained on all available data throughout the time while conduct risk assessment using the up-to-time information. This contrastive framework allows the model to \"borrow\" informative signals from later stages (e.g., WellChild visits) to implicitly supervise the learning at earlier stages (e.g., prenatal/birth stages). We validate BFF on two real-world pediatric outcome prediction tasks, demonstrating consistent improvements in early risk assessment. The code is at https://github.com/scotsun/bff.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach. 恢复校准对齐的大型语言模型:校准感知微调方法。
Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J Su, Li Shen

One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated, LLMs tend to become poorly calibrated after alignment with human preferences. In this paper, we investigate why preference alignment affects calibration and how to address this issue. For the first question, we observe that the preference collapse issue in alignment undesirably generalizes to the calibration scenario, causing LLMs to exhibit overconfidence and poor calibration. To address this, we demonstrate the importance of fine-tuning with domain-specific knowledge to alleviate the overconfidence issue. To further analyze whether this affects the model's performance, we categorize models into two regimes: calibratable and non-calibratable, defined by bounds of Expected Calibration Error (ECE). In the calibratable regime, we propose a calibration-aware fine-tuning approach to achieve proper calibration without compromising LLMs' performance. However, as models are further fine-tuned for better performance, they enter the non-calibratable regime. For this case, we develop an EM-algorithm-based ECE regularization for the fine-tuning loss to maintain low calibration error. Extensive experiments validate the effectiveness of the proposed methods.

大型语言模型(llm)成功的关键技术之一是偏好对齐。然而,偏好对齐的一个显著副作用是校准不良:虽然预训练的模型通常是校准良好的,但llm在与人类偏好对齐后往往会变得校准不良。在本文中,我们研究了偏好对齐为什么会影响校准以及如何解决这个问题。对于第一个问题,我们观察到对齐中的偏好崩溃问题不可取地推广到校准场景,导致llm表现出过度自信和校准不良。为了解决这个问题,我们展示了使用领域特定知识进行微调以减轻过度自信问题的重要性。为了进一步分析这是否会影响模型的性能,我们将模型分为两类:可校准和不可校准,由预期校准误差(ECE)的界限定义。在可校准状态下,我们提出了一种校准感知的微调方法,以在不影响llm性能的情况下实现适当的校准。然而,随着模型进一步微调以获得更好的性能,它们进入了不可校准的状态。针对这种情况,我们开发了一种基于em算法的ECE正则化微调损失,以保持低校准误差。大量的实验验证了所提方法的有效性。
{"title":"Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach.","authors":"Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J Su, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated, LLMs tend to become poorly calibrated after alignment with human preferences. In this paper, we investigate why preference alignment affects calibration and how to address this issue. For the first question, we observe that the preference collapse issue in alignment undesirably generalizes to the calibration scenario, causing LLMs to exhibit overconfidence and poor calibration. To address this, we demonstrate the importance of fine-tuning with domain-specific knowledge to alleviate the overconfidence issue. To further analyze whether this affects the model's performance, we categorize models into two regimes: calibratable and non-calibratable, defined by bounds of Expected Calibration Error (ECE). In the calibratable regime, we propose a calibration-aware fine-tuning approach to achieve proper calibration without compromising LLMs' performance. However, as models are further fine-tuned for better performance, they enter the non-calibratable regime. For this case, we develop an EM-algorithm-based ECE regularization for the fine-tuning loss to maintain low calibration error. Extensive experiments validate the effectiveness of the proposed methods.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"68364-68390"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift. “谁会经历大的模型衰减,为什么?”一种诊断异构性能漂移的分层框架。
Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C Hong, Jean Feng

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to ask "Can we explain this using more detailed variable(subset)-specific shifts?" (How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.

机器学习(ML)模型在新环境中部署时经常会出现性能下降。这种性能下降很少是一致的:有些子组的性能下降可能很大,而另一些子组则可能没有。了解性能差异产生的位置和方式对于设计有针对性的纠正措施至关重要,这些纠正措施可以减轻受影响最严重的子组的衰退,同时最大限度地减少任何意外影响。目前的方法不能提供如此详细的见解,因为它们要么(i)解释平均绩效变化是如何产生的,要么(ii)在没有深入了解其发生方式的情况下确定受不利影响的子群体。为此,我们引入了一种用于性能漂移(SHIFT)的子组扫描分层推理框架。SHIFT首先问“是否有任何子组由于协变量/结果变化而出现不可接受的大性能下降?”(在哪里?),如果是这样,就深入地问:“我们能否使用更详细的变量(子集)特定变化来解释这一点?”(如何?)。在现实世界的实验中,我们发现SHIFT识别出受性能衰减影响的可解释的子组,并建议有针对性的行动来有效地缓解衰减。
{"title":"\"Who experiences large model decay and why?\" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift.","authors":"Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C Hong, Jean Feng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing <i>targeted</i> corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how <i>average</i> performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a <b>S</b>ubgroup-scanning <b>H</b>ierarchical <b>I</b>nference <b>F</b>ramework for performance drif<b>T</b> (SHIFT). SHIFT first asks \"Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?\" (<i>Where?</i>) and, if so, dives deeper to ask \"Can we explain this using more detailed variable(subset)-specific shifts?\" (<i>How?</i>). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"55757-55787"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12747154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models. 猛禽:利用预训练的2D基础模型的3D医疗卷的可扩展无训练嵌入。
Ulzee An, Moonseong Jeong, Simon A Lee, Aditya Gorla, Yuzhe Yang, Sriram Sankararaman

Current challenges in developing foundational models for volumetric imaging data, such as magnetic resonance imaging (MRI), stem from the computational complexity of training state-of-the-art architectures in high dimensions and curating sufficiently large datasets of volumes. To address these challenges, we introduce Raptor (Random Planar Tensor Reduction), a train-free method for generating semantically rich embeddings for volumetric data. Raptor leverages a frozen 2D foundation model, pretrained on natural images, to extract visual tokens from individual cross-sections of medical volumes. These tokens are then spatially compressed using random projections, significantly reducing computational complexity while retaining semantic information. Extensive experiments on ten diverse medical volume tasks verify the superior performance of Raptor over state-of-the-art methods, including those pretrained exclusively on medical volumes (+3% SuPreM, +6% MISFM, +10% Merlin, +13% VoCo, and +14% SLIViT), while entirely bypassing the need for costly training. Our results highlight the effectiveness and versatility of Raptor as a foundation for advancing deep learning-based methods for medical volumes (code: github.com/sriramlab/raptor).

目前在为体积成像数据(如磁共振成像(MRI))开发基础模型方面面临的挑战,源于在高维上训练最先进的架构和管理足够大的体积数据集的计算复杂性。为了解决这些挑战,我们引入了Raptor (Random Planar Tensor Reduction,随机平面张量缩减),这是一种无训练的方法,用于为体积数据生成语义丰富的嵌入。Raptor利用冷冻的2D基础模型,在自然图像上进行预训练,从医疗卷的各个横截面中提取视觉标记。然后使用随机投影对这些令牌进行空间压缩,在保留语义信息的同时显著降低了计算复杂性。在十种不同的医疗体积任务上进行了广泛的实验,验证了Raptor优于最先进的方法,包括那些专门针对医疗体积进行预训练的方法(+3% SuPreM, +6% MISFM, +10% Merlin, +13% VoCo和+14% SLIViT),同时完全不需要昂贵的培训。我们的结果突出了Raptor作为推进基于深度学习的医疗卷方法的基础的有效性和多功能性(代码:github.com/sriramlab/raptor)。
{"title":"Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models.","authors":"Ulzee An, Moonseong Jeong, Simon A Lee, Aditya Gorla, Yuzhe Yang, Sriram Sankararaman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Current challenges in developing foundational models for volumetric imaging data, such as magnetic resonance imaging (MRI), stem from the computational complexity of training state-of-the-art architectures in high dimensions and curating sufficiently large datasets of volumes. To address these challenges, we introduce <b>Raptor</b> (Random Planar Tensor Reduction), a train-free method for generating semantically rich embeddings for volumetric data. Raptor leverages a frozen 2D foundation model, pretrained on natural images, to extract visual tokens from individual cross-sections of medical volumes. These tokens are then spatially compressed using random projections, significantly reducing computational complexity while retaining semantic information. Extensive experiments on ten diverse medical volume tasks verify the superior performance of Raptor over state-of-the-art methods, including those pretrained exclusively on medical volumes (+3% SuPreM, +6% MISFM, +10% Merlin, +13% VoCo, and +14% SLIViT), while entirely bypassing the need for costly training. Our results highlight the effectiveness and versatility of Raptor as a foundation for advancing deep learning-based methods for medical volumes (code: github.com/sriramlab/raptor).</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"1462-1482"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12893380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test-Time Training Provably Improves Transformers as In-context Learners. 测试时间训练可证明提高变形金刚作为语境学习者。
Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

测试时间训练(TTT)方法显式地更新模型的权重以适应特定的测试实例,并且它们在各种设置中都取得了成功,包括最近的语言建模和推理。为了揭开这一成功的神秘,我们研究了一种基于梯度的TTT算法,用于上下文学习,我们在测试提示中提供的上下文演示中训练变压器模型。具体来说,我们提供了一个全面的理论表征,当更新规则是一个单一的梯度步长线性变压器。我们的理论(i)描述了预训练分布和目标任务之间的一致性的作用,(ii)揭开了TTT如何缓解分布转移的神秘面纱,以及(iii)量化了TTT的样本复杂性,包括它如何显着减少上下文学习所需的最终样本量。作为我们的实证贡献,我们研究了TTT对TabPFN(一个表格基础模型)的好处。根据我们的理论,我们证明TTT显着减少了表格分类所需的样本量(减少了3到5倍),以可以忽略不计的训练成本解锁了大量的推理效率。
{"title":"Test-Time Training Provably Improves Transformers as In-context Learners.","authors":"Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory <i>(i)</i> delineates the role of alignment between pretraining distribution and target task, <i>(ii)</i> demystifies how TTT can alleviate distribution shift, and <i>(iii)</i> quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20266-20295"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data. 神经成像数据中行为相关时空模式的动态建模。
Mohammad Hosseini, Maryam M Shanechi

High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.

神经活动的高维成像,如宽视场钙成像和功能超声成像,为理解大脑活动和行为之间的关系提供了丰富的信息来源。在这些模式中准确地建模神经动力学对于理解这种关系至关重要,但这些模式中的高维性、复杂的时空依赖性和普遍的行为无关动力学阻碍了神经动力学的发展。现有的动态模型通常采用预处理步骤从神经图像模态中获得低维表示。然而,这一过程可能会丢弃与行为相关的信息,并错过时空结构。我们提出了一种新的数据驱动深度学习框架SBIND,用于模拟神经图像中的时空依赖关系,并将其行为相关动态与其他神经动力学分离开来。我们在宽视场成像数据集上验证了SBIND,并将其扩展到功能性超声成像,这是一种最新的模式,其动态建模在很大程度上仍未被探索。我们发现我们的模型有效地识别了局部和远距离的空间依赖关系,同时也分离了行为相关的神经动力学。这样,SBIND在神经行为预测方面优于现有的模型。总的来说,SBIND提供了一个多功能的工具,用于研究使用成像模式的行为背后的神经机制。
{"title":"Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data.","authors":"Mohammad Hosseini, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"23846-23872"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1