首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning. ProtoECGNet:基于案例的可解释深度学习的多标签心电分类对比学习。
Sahil Sethi, David Chen, Thomas Statchen, Michael C Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments-enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.

基于深度学习的心电图(ECG)分类显示出令人印象深刻的表现,但由于缺乏透明和忠实的解释,临床应用一直放缓。诸如显著性图之类的事后方法可能无法反映模型的真实决策过程。基于原型的推理提供了一种更透明的选择,通过将决策与真实ECG片段的学习表征相似,从而实现忠实的、基于案例的解释。我们介绍了ProtoECGNet,一个基于原型的深度学习模型,用于可解释的多标签心电分类。ProtoECGNet采用结构化的多分支架构,反映了临床解释工作流程:它集成了具有节奏分类全局原型的1D CNN,具有基于形态学推理的时间本地化原型的2D CNN,以及具有弥漫性异常全局原型的2D CNN。每个分支都使用为多标签学习设计的原型损失进行训练,结合聚类、分离、多样性和一种新的对比损失,这种损失鼓励在不相关类的原型之间进行适当的分离,同时允许聚类进行频繁的共同发生的诊断。我们在PTB-XL数据集中的所有71个标签上评估了ProtoECGNet,展示了相对于最先进的黑箱模型的竞争力,同时提供了结构化的、基于案例的解释。为了评估原型质量,我们对最终模型的计划原型进行了结构化的临床医生审查,发现它们被评为具有代表性和清晰性。ProtoECGNet表明,原型学习可以有效地扩展到复杂的多标签时间序列分类,为临床决策支持提供了透明和可信的深度学习模型。
{"title":"ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning.","authors":"Sahil Sethi, David Chen, Thomas Statchen, Michael C Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments-enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model. 利用嵌入式神经霍克斯过程模型平衡诊断轨迹建模的可解释性和灵活性。
Yuankang Zhao, Matthew M Engelhard

The Hawkes process (HP) is commonly used to model event sequences with self-reinforcing dynamics, including electronic health records (EHRs). Traditional HPs capture self-reinforcement via parametric impact functions that can be inspected to understand how each event modulates the intensity of others. Neural network-based HPs offer greater flexibility, resulting in improved fit and prediction performance, but at the cost of interpretability, which is often critical in healthcare. In this work, we aim to understand and improve upon this tradeoff. We propose a novel HP formulation in which impact functions are modeled by defining a flexible impact kernel, instantiated as a neural network, in event embedding space, which allows us to model large-scale event sequences with many event types. This approach is more flexible than traditional HPs yet more interpretable than other neural network approaches, and allows us to explicitly trade flexibility for interpretability by adding transformer encoder layers to further contextualize the event embeddings. Results show that our method accurately recovers impact functions in simulations, achieves competitive performance on MIMIC-IV procedure dataset, and gains clinically meaningful interpretation on Duke-EHR with children diagnosis dataset even without transformer layers. This suggests that our flexible impact kernel is often sufficient to capture self-reinforcing dynamics in EHRs and other data effectively, implying that interpretability can be maintained without loss of performance.

Hawkes流程(HP)通常用于为具有自我强化动态的事件序列建模,包括电子健康记录(EHRs)。传统的hp通过参数影响函数捕获自我强化,可以检查这些函数以了解每个事件如何调节其他事件的强度。基于神经网络的hp提供了更大的灵活性,从而提高了拟合和预测性能,但代价是可解释性,这在医疗保健中通常是至关重要的。在这项工作中,我们的目标是理解和改进这种权衡。我们提出了一种新的HP公式,其中通过在事件嵌入空间中定义一个灵活的影响内核(实例化为神经网络)来建模影响函数,这使我们能够对具有许多事件类型的大规模事件序列进行建模。这种方法比传统的hp更灵活,但比其他神经网络方法更具可解释性,并且允许我们通过添加转换器编码器层来进一步将事件嵌入上下文化,从而显式地交换灵活性以获得可解释性。结果表明,该方法在模拟中准确地恢复了影响函数,在MIMIC-IV程序数据集上取得了具有竞争力的性能,并且在没有变压器层的Duke-EHR儿童诊断数据集上获得了具有临床意义的解释。这表明我们灵活的影响内核通常足以有效地捕获电子病历和其他数据中的自我强化动态,这意味着可以在不损失性能的情况下保持可解释性。
{"title":"Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model.","authors":"Yuankang Zhao, Matthew M Engelhard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Hawkes process (HP) is commonly used to model event sequences with self-reinforcing dynamics, including electronic health records (EHRs). Traditional HPs capture self-reinforcement via parametric impact functions that can be inspected to understand how each event modulates the intensity of others. Neural network-based HPs offer greater flexibility, resulting in improved fit and prediction performance, but at the cost of interpretability, which is often critical in healthcare. In this work, we aim to understand and improve upon this tradeoff. We propose a novel HP formulation in which impact functions are modeled by defining a flexible impact kernel, instantiated as a neural network, in event embedding space, which allows us to model large-scale event sequences with many event types. This approach is more flexible than traditional HPs yet more interpretable than other neural network approaches, and allows us to explicitly trade flexibility for interpretability by adding transformer encoder layers to further contextualize the event embeddings. Results show that our method accurately recovers impact functions in simulations, achieves competitive performance on MIMIC-IV procedure dataset, and gains clinically meaningful interpretation on Duke-EHR with children diagnosis dataset even without transformer layers. This suggests that our flexible impact kernel is often sufficient to capture self-reinforcing dynamics in EHRs and other data effectively, implying that interpretability can be maintained without loss of performance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646569/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models. 使用大语言模型对治疗顽固性高血压的可计算表型进行迭代学习。
Guilherme Seidyo Imai Aldeia, Daniel S Herman, William G La Cava

Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a synthesize, execute, debug, instruct strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.

大型语言模型(llm)在医学问题回答和编程方面表现出了卓越的能力,但它们在生成可解释可计算表型(CPs)方面的潜力尚未得到充分探索。在这项工作中,我们研究了llm是否可以为六种不同复杂性的临床表型生成准确而简洁的CPs,这可以用来支持可扩展的临床决策支持,以改善对高血压患者的护理。除了评估零短线性能外,我们还提出并测试了一种综合、执行、调试、指导策略,该策略使用llm生成并使用数据驱动反馈迭代地改进CPs。我们的研究结果表明,llm与迭代学习相结合,可以生成可解释且相当准确的程序,接近最先进的ML方法的性能,同时需要的训练示例显着减少。
{"title":"Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models.","authors":"Guilherme Seidyo Imai Aldeia, Daniel S Herman, William G La Cava","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a <i>synthesize, execute, debug, instruct</i> strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12755843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning. 借鉴未来:通过对比学习加强早期风险评估。
Minghui Sun, Matthew M Engelhard, Benjamin A Goldstein

Risk assessments for a pediatric population are often conducted across multiple stages. For example, clinicians may evaluate risks prenatally, at birth, and during WellChild visits. While predictions at later stages typically achieve higher accuracy, it is clinically desirable to make reliable risk assessments as early as possible. Therefore, this study focuses on enhancing prediction performance in early-stage risk assessments. Our solution, Borrowing From the Future (BFF), is a contrastive multi-modal framework that treats each time window as a distinct modality. In BFF, a model is trained on all available data throughout the time while conduct risk assessment using the up-to-time information. This contrastive framework allows the model to "borrow" informative signals from later stages (e.g., WellChild visits) to implicitly supervise the learning at earlier stages (e.g., prenatal/birth stages). We validate BFF on two real-world pediatric outcome prediction tasks, demonstrating consistent improvements in early risk assessment. The code is at https://github.com/scotsun/bff.

儿科人群的风险评估通常是在多个阶段进行的。例如,临床医生可以在产前、分娩时和访问WellChild期间评估风险。虽然后期的预测通常会达到更高的准确性,但临床希望尽早进行可靠的风险评估。因此,本研究的重点是提高早期风险评估的预测性能。我们的解决方案,借用未来(BFF),是一个对比多模态框架,将每个时间窗口视为一个独特的模态。在BFF中,一个模型在所有可用的数据上进行训练,同时使用最新的信息进行风险评估。这种对比框架允许模型“借用”后期阶段(例如,WellChild访问)的信息信号,以隐性地监督早期阶段(例如,产前/分娩阶段)的学习。我们在两个现实世界的儿科预后预测任务中验证了BFF,证明了早期风险评估的一致性改进。代码见https://github.com/scotsun/bff。
{"title":"Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning.","authors":"Minghui Sun, Matthew M Engelhard, Benjamin A Goldstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Risk assessments for a pediatric population are often conducted across multiple stages. For example, clinicians may evaluate risks prenatally, at birth, and during WellChild visits. While predictions at later stages typically achieve higher accuracy, it is clinically desirable to make reliable risk assessments as early as possible. Therefore, this study focuses on enhancing prediction performance in early-stage risk assessments. Our solution, <b>Borrowing From the Future (BFF)</b>, is a contrastive multi-modal framework that treats each time window as a distinct modality. In BFF, a model is trained on all available data throughout the time while conduct risk assessment using the up-to-time information. This contrastive framework allows the model to \"borrow\" informative signals from later stages (e.g., WellChild visits) to implicitly supervise the learning at earlier stages (e.g., prenatal/birth stages). We validate BFF on two real-world pediatric outcome prediction tasks, demonstrating consistent improvements in early risk assessment. The code is at https://github.com/scotsun/bff.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"298 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift. “谁会经历大的模型衰减,为什么?”一种诊断异构性能漂移的分层框架。
Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C Hong, Jean Feng

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to ask "Can we explain this using more detailed variable(subset)-specific shifts?" (How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.

机器学习(ML)模型在新环境中部署时经常会出现性能下降。这种性能下降很少是一致的:有些子组的性能下降可能很大,而另一些子组则可能没有。了解性能差异产生的位置和方式对于设计有针对性的纠正措施至关重要,这些纠正措施可以减轻受影响最严重的子组的衰退,同时最大限度地减少任何意外影响。目前的方法不能提供如此详细的见解,因为它们要么(i)解释平均绩效变化是如何产生的,要么(ii)在没有深入了解其发生方式的情况下确定受不利影响的子群体。为此,我们引入了一种用于性能漂移(SHIFT)的子组扫描分层推理框架。SHIFT首先问“是否有任何子组由于协变量/结果变化而出现不可接受的大性能下降?”(在哪里?),如果是这样,就深入地问:“我们能否使用更详细的变量(子集)特定变化来解释这一点?”(如何?)。在现实世界的实验中,我们发现SHIFT识别出受性能衰减影响的可解释的子组,并建议有针对性的行动来有效地缓解衰减。
{"title":"\"Who experiences large model decay and why?\" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift.","authors":"Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C Hong, Jean Feng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing <i>targeted</i> corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how <i>average</i> performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a <b>S</b>ubgroup-scanning <b>H</b>ierarchical <b>I</b>nference <b>F</b>ramework for performance drif<b>T</b> (SHIFT). SHIFT first asks \"Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?\" (<i>Where?</i>) and, if so, dives deeper to ask \"Can we explain this using more detailed variable(subset)-specific shifts?\" (<i>How?</i>). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"55757-55787"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12747154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data. 神经成像数据中行为相关时空模式的动态建模。
Mohammad Hosseini, Maryam M Shanechi

High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.

神经活动的高维成像,如宽视场钙成像和功能超声成像,为理解大脑活动和行为之间的关系提供了丰富的信息来源。在这些模式中准确地建模神经动力学对于理解这种关系至关重要,但这些模式中的高维性、复杂的时空依赖性和普遍的行为无关动力学阻碍了神经动力学的发展。现有的动态模型通常采用预处理步骤从神经图像模态中获得低维表示。然而,这一过程可能会丢弃与行为相关的信息,并错过时空结构。我们提出了一种新的数据驱动深度学习框架SBIND,用于模拟神经图像中的时空依赖关系,并将其行为相关动态与其他神经动力学分离开来。我们在宽视场成像数据集上验证了SBIND,并将其扩展到功能性超声成像,这是一种最新的模式,其动态建模在很大程度上仍未被探索。我们发现我们的模型有效地识别了局部和远距离的空间依赖关系,同时也分离了行为相关的神经动力学。这样,SBIND在神经行为预测方面优于现有的模型。总的来说,SBIND提供了一个多功能的工具,用于研究使用成像模式的行为背后的神经机制。
{"title":"Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data.","authors":"Mohammad Hosseini, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"23846-23872"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test-Time Training Provably Improves Transformers as In-context Learners. 测试时间训练可证明提高变形金刚作为语境学习者。
Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

测试时间训练(TTT)方法显式地更新模型的权重以适应特定的测试实例,并且它们在各种设置中都取得了成功,包括最近的语言建模和推理。为了揭开这一成功的神秘,我们研究了一种基于梯度的TTT算法,用于上下文学习,我们在测试提示中提供的上下文演示中训练变压器模型。具体来说,我们提供了一个全面的理论表征,当更新规则是一个单一的梯度步长线性变压器。我们的理论(i)描述了预训练分布和目标任务之间的一致性的作用,(ii)揭开了TTT如何缓解分布转移的神秘面纱,以及(iii)量化了TTT的样本复杂性,包括它如何显着减少上下文学习所需的最终样本量。作为我们的实证贡献,我们研究了TTT对TabPFN(一个表格基础模型)的好处。根据我们的理论,我们证明TTT显着减少了表格分类所需的样本量(减少了3到5倍),以可以忽略不计的训练成本解锁了大量的推理效率。
{"title":"Test-Time Training Provably Improves Transformers as In-context Learners.","authors":"Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory <i>(i)</i> delineates the role of alignment between pretraining distribution and target task, <i>(ii)</i> demystifies how TTT can alleviate distribution shift, and <i>(iii)</i> quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20266-20295"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Survival Distributions with the Asymmetric Laplace Distribution. 用非对称拉普拉斯分布学习生存分布。
Deming Sheng, Ricardo Henao

Probabilistic survival analysis models seek to estimate the distribution of the future occurrence (time) of an event given a set of covariates. In recent years, these models have preferred nonparametric specifications that avoid directly estimating survival distributions via discretization. Specifically, they estimate the probability of an individual event at fixed times or the time of an event at fixed probabilities (quantiles), using supervised learning. Borrowing ideas from the quantile regression literature, we propose a parametric survival analysis method based on the Asymmetric Laplace Distribution (ALD). This distribution allows for closed-form calculation of popular event summaries such as mean, median, mode, variation, and quantiles. The model is optimized by maximum likelihood to learn, at the individual level, the parameters (location, scale, and asymmetry) of the ALD distribution. Extensive results on synthetic and real-world data demonstrate that the proposed method outperforms parametric and nonparametric approaches in terms of accuracy, discrimination and calibration.

概率生存分析模型试图估计给定一组协变量的事件未来发生(时间)的分布。近年来,这些模型更倾向于非参数规范,避免通过离散化直接估计生存分布。具体来说,他们使用监督学习来估计固定时间内单个事件的概率或固定概率(分位数)下事件发生的时间。借鉴分位数回归文献的思想,提出了一种基于非对称拉普拉斯分布(ALD)的参数生存分析方法。这种分布允许对流行事件摘要(如平均值、中位数、众数、变异和分位数)进行封闭式计算。该模型通过最大似然学习来优化,在个体层面,ALD分布的参数(位置,规模和不对称性)。合成数据和实际数据的广泛结果表明,该方法在精度、判别和校准方面优于参数和非参数方法。
{"title":"Learning Survival Distributions with the Asymmetric Laplace Distribution.","authors":"Deming Sheng, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Probabilistic survival analysis models seek to estimate the distribution of the future occurrence (time) of an event given a set of covariates. In recent years, these models have preferred nonparametric specifications that avoid directly estimating survival distributions via discretization. Specifically, they estimate the probability of an individual event at fixed times or the time of an event at fixed probabilities (quantiles), using supervised learning. Borrowing ideas from the quantile regression literature, we propose a parametric survival analysis method based on the Asymmetric Laplace Distribution (ALD). This distribution allows for closed-form calculation of popular event summaries such as mean, median, mode, variation, and quantiles. The model is optimized by maximum likelihood to learn, at the individual level, the parameters (location, scale, and asymmetry) of the ALD distribution. Extensive results on synthetic and real-world data demonstrate that the proposed method outperforms parametric and nonparametric approaches in terms of accuracy, discrimination and calibration.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"54772-54809"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN. 基于FAMPNN的全原子蛋白质序列侧链调节与建模。
Talal Widatalla, Richard W Shuai, Brian L Hie, Po-Ssu Huang

Leading deep learning-based methods for fixed-backbone protein sequence design do not model protein sidechain conformation during sequence generation despite the large role the three-dimensional arrangement of sidechain atoms play in protein conformation, stability, and overall protein function. Instead, these models implicitly reason about crucial sidechain interactions based on backbone geometry and known amino acid sequence labels. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue's discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. We demonstrate that learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements.

尽管侧链原子的三维排列在蛋白质的构象、稳定性和整体蛋白质功能中起着重要作用,但基于深度学习的固定骨架蛋白质序列设计的主流方法在序列生成过程中没有对蛋白质侧链构象进行建模。相反,这些模型基于主链几何形状和已知的氨基酸序列标签隐含地推断出关键的侧链相互作用。为了解决这个问题,我们提出了FAMPNN(全原子MPNN),这是一种序列设计方法,它明确地对每个残基的序列身份和侧链构象建模,其中残基的离散氨基酸身份和连续侧链构象的每个令牌分布是通过分类交叉熵和扩散损失目标相结合来学习的。我们证明,联合学习这些分布是一个高度协同的任务,既提高了序列恢复,又实现了最先进的侧链填充。此外,全原子建模的好处从序列恢复推广到实际的蛋白质设计应用,如实验结合的零射击预测和稳定性测量。
{"title":"Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN.","authors":"Talal Widatalla, Richard W Shuai, Brian L Hie, Po-Ssu Huang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Leading deep learning-based methods for fixed-backbone protein sequence design do not model protein sidechain conformation during sequence generation despite the large role the three-dimensional arrangement of sidechain atoms play in protein conformation, stability, and overall protein function. Instead, these models implicitly reason about crucial sidechain interactions based on backbone geometry and known amino acid sequence labels. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue's discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. We demonstrate that learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"66746-66771"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG. 通过热狗调频免费核心马尔科夫链蒙特卡洛。
Pub Date : 2025-07-01 Epub Date: 2025-07-21
Naitong Chen, Jonathan H Huggins, Trevor Campbell

A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, Coreset Markov chain Monte Carlo (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, Hot-start Distance over Gradient (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. We provide a theoretical analysis of the convergence of the coreset weights produced by Hot DoG. We also provide empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.

贝叶斯核心集是数据集的一个小的加权子集,在推理过程中取代完整的数据以减少计算成本。最先进的核心集构建算法,coreset Markov chain Monte Carlo (coreset MCMC),使用自适应的Markov链,通过随机梯度优化来训练核心集的权重。然而,构建的核心集的质量及其后验逼近的质量对随机优化学习率很敏感。在这项工作中,我们提出了一个无学习率的随机梯度优化过程,即Hot-start Distance over gradient (Hot DoG),用于在coreset MCMC中训练核心集权重,而无需用户进行调整。我们对Hot DoG产生的核心权值的收敛性进行了理论分析。我们还提供了实证结果,表明Hot DoG提供了比其他无学习率随机梯度方法更高质量的后验逼近,并且与最优调谐的ADAM相比具有竞争力。
{"title":"Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG.","authors":"Naitong Chen, Jonathan H Huggins, Trevor Campbell","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, <i>Coreset Markov chain Monte Carlo</i> (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, <i>Hot-start Distance over Gradient</i> (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. We provide a theoretical analysis of the convergence of the coreset weights produced by Hot DoG. We also provide empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"286 ","pages":"647-672"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1