首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
Learn Singularly Perturbed Solutions via Homotopy Dynamics. 通过同伦动力学学习奇摄动解。
Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao

Solving partial differential equations (PDEs) using neural networks has become a central focus in scientific machine learning. Training neural networks for singularly perturbed problems is particularly challenging due to certain parameters in the PDEs that introduce near-singularities in the loss function. In this study, we overcome this challenge by introducing a novel method based on homotopy dynamics to effectively manipulate these parameters. From a theoretical perspective, we analyze the effects of these parameters on training difficulty in these singularly perturbed problems and establish the convergence of the proposed homotopy dynamics method. Experimentally, we demonstrate that our approach significantly accelerates convergence and improves the accuracy of these singularly perturbed problems. These findings present an efficient optimization strategy leveraging homotopy dynamics, offering a robust framework to extend the applicability of neural networks for solving singularly perturbed differential equations.

利用神经网络求解偏微分方程(PDEs)已经成为科学机器学习的一个中心焦点。由于偏微分方程中的某些参数会在损失函数中引入近奇异性,因此训练用于奇摄动问题的神经网络特别具有挑战性。在本研究中,我们通过引入一种基于同伦动力学的新方法来有效地操纵这些参数,从而克服了这一挑战。从理论角度分析了这些参数对奇摄动问题训练难度的影响,并证明了所提同伦动力学方法的收敛性。实验表明,我们的方法显著地加快了奇异摄动问题的收敛速度,提高了这些问题的精度。这些发现提出了一种利用同伦动力学的有效优化策略,为扩展神经网络求解奇摄动微分方程的适用性提供了一个鲁棒框架。
{"title":"Learn Singularly Perturbed Solutions via Homotopy Dynamics.","authors":"Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Solving partial differential equations (PDEs) using neural networks has become a central focus in scientific machine learning. Training neural networks for singularly perturbed problems is particularly challenging due to certain parameters in the PDEs that introduce near-singularities in the loss function. In this study, we overcome this challenge by introducing a novel method based on homotopy dynamics to effectively manipulate these parameters. From a theoretical perspective, we analyze the effects of these parameters on training difficulty in these singularly perturbed problems and establish the convergence of the proposed homotopy dynamics method. Experimentally, we demonstrate that our approach significantly accelerates convergence and improves the accuracy of these singularly perturbed problems. These findings present an efficient optimization strategy leveraging homotopy dynamics, offering a robust framework to extend the applicability of neural networks for solving singularly perturbed differential equations.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"9590-9613"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Identification of Communication Between Multiple Interacting Neural Populations. 多个相互作用的神经群体之间通信的准确识别。
Belle Liu, Jacob Sacks, Matthew D Golub

Neural recording technologies now enable simultaneous recording of population activity across many brain regions, motivating the development of data-driven models of communication between brain regions. However, existing models can struggle to disentangle the sources that influence recorded neural populations, leading to inaccurate portraits of inter-regional communication. Here, we introduce Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), a sequential variational autoencoder designed to disentangle inter-regional communication, inputs from unobserved regions, and local neural population dynamics. We show that MR-LFADS outperforms existing approaches at identifying communication across dozens of simulations of task-trained multi-region networks. When applied to large-scale electrophysiology, MR-LFADS predicts brain-wide effects of circuit perturbations that were held out during model fitting. These validations on synthetic and real neural data position MR-LFADS as a promising tool for discovering principles of brain-wide information processing.

神经记录技术现在可以同时记录许多大脑区域的人口活动,从而推动了大脑区域之间通信的数据驱动模型的发展。然而,现有的模型很难理清影响记录的神经种群的来源,从而导致对区域间交流的不准确描述。在这里,我们通过动态系统引入多区域潜在因素分析(MR-LFADS),这是一种顺序变分自编码器,旨在分离区域间通信,未观察区域的输入和局部神经种群动态。我们表明,MR-LFADS在识别跨数十个任务训练的多区域网络模拟的通信方面优于现有的方法。当应用于大规模电生理学时,MR-LFADS可以预测模型拟合过程中产生的电路扰动对全脑的影响。这些对合成和真实神经数据的验证使MR-LFADS成为发现全脑信息处理原理的有前途的工具。
{"title":"Accurate Identification of Communication Between Multiple Interacting Neural Populations.","authors":"Belle Liu, Jacob Sacks, Matthew D Golub","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neural recording technologies now enable simultaneous recording of population activity across many brain regions, motivating the development of data-driven models of communication between brain regions. However, existing models can struggle to disentangle the sources that influence recorded neural populations, leading to inaccurate portraits of inter-regional communication. Here, we introduce Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), a sequential variational autoencoder designed to disentangle inter-regional communication, inputs from unobserved regions, and local neural population dynamics. We show that MR-LFADS outperforms existing approaches at identifying communication across dozens of simulations of task-trained multi-region networks. When applied to large-scale electrophysiology, MR-LFADS predicts brain-wide effects of circuit perturbations that were held out during model fitting. These validations on synthetic and real neural data position MR-LFADS as a promising tool for discovering principles of brain-wide information processing.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"39381-39404"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Feature Acquisition Via Explainability-driven Ranking. 通过可解释性驱动排名获取主动功能。
Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama

In many practical applications, including medicine, acquiring all relevant data for machine learning models is often infeasible due to constraints on time, cost, and resources. This makes it important to selectively acquire only the most informative features, yet traditional static feature selection methods fall short in scenarios where feature importance varies across instances. Here, we propose an active feature acquisition (AFA) framework, which dynamically selects features based on their importance to each individual case. Our method leverages local explanation techniques to generate instance-specific feature importance rankings. We then reframe the AFA problem as a feature prediction task, introducing a policy network grounded in a decision transformer architecture. This policy network is trained to select the next most informative feature by learning from the feature importance rankings. As a result, features are acquired sequentially, ordered by their predictive significance, leading to more efficient feature selection and acquisition. Extensive experiments on multiple datasets demonstrate that our approach outperforms current state-of-the-art AFA methods in predictive accuracy and feature acquisition efficiency. These findings highlight the promise of an explainability-driven AFA strategy in scenarios where feature acquisition is a concern.

在包括医学在内的许多实际应用中,由于时间、成本和资源的限制,获取机器学习模型的所有相关数据通常是不可行的。这使得有选择地获取信息量最大的特征变得非常重要,然而传统的静态特征选择方法在特征重要性因实例而异的情况下就会有所不足。在此,我们提出了一种主动特征获取(AFA)框架,该框架根据特征对每个案例的重要性动态选择特征。我们的方法利用局部解释技术来生成特定于实例的特征重要性排名。然后,我们将AFA问题重新定义为特征预测任务,引入基于决策转换器架构的策略网络。该策略网络经过训练,通过学习特征重要性排名来选择下一个信息量最大的特征。因此,特征的获取是顺序的,根据它们的预测意义排序,从而导致更有效的特征选择和获取。在多个数据集上进行的大量实验表明,我们的方法在预测精度和特征获取效率方面优于当前最先进的AFA方法。这些发现强调了在关注特征获取的情况下,可解释性驱动的AFA策略的前景。
{"title":"Active Feature Acquisition Via Explainability-driven Ranking.","authors":"Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In many practical applications, including medicine, acquiring all relevant data for machine learning models is often infeasible due to constraints on time, cost, and resources. This makes it important to selectively acquire only the most informative features, yet traditional static feature selection methods fall short in scenarios where feature importance varies across instances. Here, we propose an active feature acquisition (AFA) framework, which dynamically selects features based on their importance to each individual case. Our method leverages local explanation techniques to generate instance-specific feature importance rankings. We then reframe the AFA problem as a feature prediction task, introducing a policy network grounded in a decision transformer architecture. This policy network is trained to select the next most informative feature by learning from the feature importance rankings. As a result, features are acquired sequentially, ordered by their predictive significance, leading to more efficient feature selection and acquisition. Extensive experiments on multiple datasets demonstrate that our approach outperforms current state-of-the-art AFA methods in predictive accuracy and feature acquisition efficiency. These findings highlight the promise of an explainability-driven AFA strategy in scenarios where feature acquisition is a concern.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20748-20765"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases. 使用真实世界测试用例对时间序列的缺失数据插入方法进行基准测试。
Adedolapo Aishat Toye, Asuman Celik, Samantha Kleinberg

Missing data is pervasive in healthcare. Many imputation methods exist to fill in missing values, yet most were evaluated using randomly deleted values rather than the actual mechanisms they were designed to address. We aimed to determine real-world accuracy for missing data imputation with three missing data mechanisms (missing completely at random, MCAR; missing at random, MAR; and not missing at random, NMAR) for state of the art and commonly used imputation methods. Using two time series data targets (continuous glucose monitoring, Loop dataset; heart rate, All of Us dataset) we simulated missingness by masking values for each mechanism, at a range of missingness percentages (5-30%) and tested 12 imputation methods. We evaluated accuracy with multiple metrics including root mean square error (RMSE) and bias. We found that overall, accuracy was significantly better on MCAR than on MAR and NMAR, despite many methods being developed for those mechanisms. Linear interpolation had the lowest RMSE with all mechanisms and for all demographic groups, with low bias. This study shows that current evaluation practices do not provide an accurate picture of real world performance with realistic patterns of missingness. Future research is needed to develop evaluation practices that better capture real-world accuracy, and methods that better address real-world mechanisms.

数据缺失在医疗保健行业非常普遍。存在许多填入方法来填补缺失值,但是大多数都是使用随机删除的值进行评估,而不是设计它们来处理的实际机制。我们的目标是通过三种缺失数据机制(完全随机缺失,MCAR;随机缺失,MAR;非随机缺失,NMAR)来确定最先进和常用的缺失数据插入方法的真实世界准确性。使用两个时间序列数据目标(连续血糖监测,Loop数据集;心率,All of Us数据集),我们在缺失百分比范围内(5-30%)通过每种机制的掩蔽值模拟缺失,并测试了12种imputation方法。我们用包括均方根误差(RMSE)和偏倚在内的多个指标来评估准确性。我们发现,总体而言,尽管针对这些机制开发了许多方法,但MCAR的准确性明显优于MAR和NMAR。线性插值在所有机制和所有人口群体中均具有最低的RMSE,偏差低。这项研究表明,目前的评估实践不能提供真实世界的表现与现实模式的缺失的准确图片。未来的研究需要开发评估实践,以更好地捕捉现实世界的准确性,以及更好地解决现实世界机制的方法。
{"title":"Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.","authors":"Adedolapo Aishat Toye, Asuman Celik, Samantha Kleinberg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Missing data is pervasive in healthcare. Many imputation methods exist to fill in missing values, yet most were evaluated using randomly deleted values rather than the actual mechanisms they were designed to address. We aimed to determine real-world accuracy for missing data imputation with three missing data mechanisms (missing completely at random, MCAR; missing at random, MAR; and not missing at random, NMAR) for state of the art and commonly used imputation methods. Using two time series data targets (continuous glucose monitoring, Loop dataset; heart rate, All of Us dataset) we simulated missingness by masking values for each mechanism, at a range of missingness percentages (5-30%) and tested 12 imputation methods. We evaluated accuracy with multiple metrics including root mean square error (RMSE) and bias. We found that overall, accuracy was significantly better on MCAR than on MAR and NMAR, despite many methods being developed for those mechanisms. Linear interpolation had the lowest RMSE with all mechanisms and for all demographic groups, with low bias. This study shows that current evaluation practices do not provide an accurate picture of real world performance with realistic patterns of missingness. Future research is needed to develop evaluation practices that better capture real-world accuracy, and methods that better address real-world mechanisms.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"480-501"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis. 药物不依从性对不良结局的影响:来自精神分裂症患者生存分析的证据。
Shahriar Noroozizadeh, Pim Welle, Jeremy C Weiss, George H Chen

This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.

本研究量化了精神分裂症患者抗精神病药物不依从性与不良后果之间的关系。我们使用生存分析来构建问题,重点关注几个不良事件(过早死亡、非自愿住院、入狱)中最早发生的时间。我们扩展了标准的因果推理方法(t -学习者,s -学习者,最近邻匹配),以利用各种生存模型来估计个体和平均治疗效果,其中治疗对应于药物依从性。使用不同数量的纵向信息(3、6、9和12个月)重复分析。使用来自宾夕法尼亚州西部阿勒格尼县的数据,我们发现强有力的证据表明,不坚持治疗会使不良后果提前约1至4个月。消融研究证实,国家提供的风险评分调整了关键混杂因素,因为它们的去除放大了估计的影响。按药物配方(注射与口服)和药物类型进行的亚组分析一致表明,不依从性与早期不良事件有关。这些发现强调了依从性在延迟精神危机中的临床重要性,并表明将生存分析与因果推理工具相结合可以产生与政策相关的见解。我们警告说,虽然我们应用因果推理,我们只提出联想主张和讨论因果解释所需的假设。
{"title":"The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis.","authors":"Shahriar Noroozizadeh, Pim Welle, Jeremy C Weiss, George H Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"573-609"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports. CaseReportBench:用于临床病例报告密集信息提取的LLM基准数据集。
Xiao Yu Cindy Zhang, Carlos R Ferreira, Francis Rossignol, Raymond T Ng, Wyeth Wasserman, Jian Zhu

Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports (focusing on IEMs). Using this dataset, we assess various models and promptings, introducing novel strategies of category-specific prompting and subheading-filtered data integration. Zero-shot chain-of-thought offers little advantage over zero-shot prompting. Category-specific prompting improves alignment to benchmark. Open-source Qwen2.5:7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recognizing negative findings for differential diagnosis. This work advances LLM-driven clinical NLP, paving the way for scalable medical AI applications.

罕见疾病,包括先天性代谢错误(IEM),构成了重大的诊断挑战。病例报告是提供诊断信息的关键资源,但在计算上未得到充分利用。临床密集信息抽取是指将医疗信息组织成结构化的预定义类别。大型语言模型(llm)可以从案例报告中提取可扩展的信息,但很少为此任务进行评估。我们介绍了CaseReportBench,这是一个专家注释的数据集,用于案例报告的密集信息提取(专注于IEMs)。使用该数据集,我们评估了各种模型和提示,引入了特定类别提示和副标题过滤数据集成的新策略。零射击的思维链比零射击的提示几乎没有优势。特定于类别的提示改进了与基准的一致性。开源Qwen2.5:7B在此任务上优于gpt - 40。我们的临床医生评估表明,llm可以从病例报告中提取临床相关细节,支持罕见病的诊断和管理。我们还强调了需要改进的领域,例如llm在鉴别诊断中识别阴性结果方面的局限性。这项工作推进了法学硕士驱动的临床NLP,为可扩展的医疗人工智能应用铺平了道路。
{"title":"CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports.","authors":"Xiao Yu Cindy Zhang, Carlos R Ferreira, Francis Rossignol, Raymond T Ng, Wyeth Wasserman, Jian Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce <b>CaseReportBench</b>, an expert-annotated dataset for dense information extraction of case reports (focusing on IEMs). Using this dataset, we assess various models and promptings, introducing novel strategies of <b>category-specific prompting</b> and <b>subheading-filtered data integration</b>. Zero-shot chain-of-thought offers little advantage over zero-shot prompting. <b>Category-specific prompting</b> improves alignment to benchmark. Open-source <b>Qwen2.5:7B</b> outperforms <b>GPT-4o</b> for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recognizing negative findings for differential diagnosis. This work advances LLM-driven clinical NLP, paving the way for scalable medical AI applications.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"527-542"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12477612/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? 陷入文字的网络:法学硕士是否会落入医学文献的圈套?
Hye Sun Yun, Karen Y C Zhang, Ramez Kouzy, Iain J Marshall, Junyi Jessy Li, Byron C Wallace

Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.

医学研究在将新的治疗方法转化为临床实践方面面临着充分证明的挑战。发表奖励鼓励研究人员提出“积极”的发现,即使实证结果是模棱两可的。因此,有充分的证据表明,作者经常歪曲研究结果,特别是在文章摘要中。这种说法会影响临床医生对证据的解释,并可能影响病人的护理决定。在本研究中,我们询问大型语言模型(llm)提供的试验结果的解释是否同样受到自旋的影响。这一点很重要,因为法学硕士越来越多地被用于检索和综合已发表的医学证据。我们评估了22个法学硕士,发现他们比人类更容易旋转。他们也可能将自旋传播到他们的输出中:我们发现证据,例如,法学硕士隐含地将自旋纳入他们生成的简单语言摘要中。然而,我们也发现LLM通常能够识别自旋,并且可以以某种方式提示减轻自旋对LLM输出的影响。
{"title":"Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?","authors":"Hye Sun Yun, Karen Y C Zhang, Ramez Kouzy, Iain J Marshall, Junyi Jessy Li, Byron C Wallace","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present \"positive\" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often <i>spin</i> study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board <i>more</i> susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"458-479"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12622377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145552263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation. 对抗性正-未标记域适应预测部分观察到的长期结果。
Mengying Yan, Meng Xia, Wei A Huang, Chuan Hong, Benjamin A Goldstein, Matthew M Engelhard

Predicting long-term clinical outcomes often requires large-scale training data with sufficiently long follow-up. However, in electronic health records (EHR) data, long-term labels may not be available for contemporary patient cohorts. Given the dynamic nature of clinical practice, models that rely on historical training data may not perform optimally. In this work, we frame the problem as a positive-unlabeled domain adaptation task, where we seek to adapt from a fully labeled source domain (e.g., historical data) to a partially labeled target domain (e.g., contemporary data). We propose an adversarial framework that includes three core components: (1) Overall Alignment, to match feature distributions between source and target domains; (2) Partial Alignment, to map source negatives to unlabeled target samples; and (3) Conditional Alignment, to address conditional shift using available positive labels in the target domain. We evaluate our method on a benchmark digit classification task (SVHN-MNIST), and two real-world EHR applications: prediction of one-year mortality post COVID-19, and long-term prediction of neurodevelopmental conditions (NDC) in children. In all settings, our approach consistently outperforms baseline models and, in most cases, achieves performance close to an oracle model trained with fully observed labels.

预测长期临床结果通常需要大规模的训练数据和足够长的随访。然而,在电子健康记录(EHR)数据中,长期标签可能无法用于当代患者队列。鉴于临床实践的动态性,依赖于历史训练数据的模型可能无法达到最佳效果。在这项工作中,我们将这个问题定义为一个积极的未标记领域适应任务,在这个任务中,我们寻求从一个完全标记的源领域(例如,历史数据)适应到一个部分标记的目标领域(例如,当代数据)。我们提出了一个包含三个核心组件的对抗框架:(1)总体对齐,匹配源域和目标域之间的特征分布;(2)部分比对,将源阴性物映射到未标记的目标样品;(3)条件对齐,使用目标域中可用的正标签来解决条件移位。我们在基准数字分类任务(SVHN-MNIST)和两个现实世界的电子病历应用中评估了我们的方法:预测COVID-19后一年的死亡率,以及儿童神经发育状况(NDC)的长期预测。在所有设置中,我们的方法始终优于基线模型,并且在大多数情况下,达到接近使用完全观察标签训练的oracle模型的性能。
{"title":"Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation.","authors":"Mengying Yan, Meng Xia, Wei A Huang, Chuan Hong, Benjamin A Goldstein, Matthew M Engelhard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Predicting long-term clinical outcomes often requires large-scale training data with sufficiently long follow-up. However, in electronic health records (EHR) data, long-term labels may not be available for contemporary patient cohorts. Given the dynamic nature of clinical practice, models that rely on historical training data may not perform optimally. In this work, we frame the problem as a positive-unlabeled domain adaptation task, where we seek to adapt from a fully labeled source domain (e.g., historical data) to a partially labeled target domain (e.g., contemporary data). We propose an adversarial framework that includes three core components: (1) Overall Alignment, to match feature distributions between source and target domains; (2) Partial Alignment, to map source negatives to unlabeled target samples; and (3) Conditional Alignment, to address conditional shift using available positive labels in the target domain. We evaluate our method on a benchmark digit classification task (SVHN-MNIST), and two real-world EHR applications: prediction of one-year mortality post COVID-19, and long-term prediction of neurodevelopmental conditions (NDC) in children. In all settings, our approach consistently outperforms baseline models and, in most cases, achieves performance close to an oracle model trained with fully observed labels.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"672-690"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12779109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models. 贝叶斯多项式logistic -正态动态线性模型的可扩展推理。
Manan Saxena, Tinghua Chen, Justin D Silverman

Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative frequency of the counts. Multiple authors have proposed Bayesian Multinomial Logistic-Normal Dynamic Linear Models (MLN-DLMs) as a flexible approach to modeling these data. However, adoption of these methods has been limited by computational challenges. This article develops an efficient and accurate approach to posterior state estimation, called Fenrir. Our approach relies on a novel algorithm for MAP estimation and an accurate approximation to a key posterior marginal of the model. As there are no equivalent methods against which we can compare, we also develop an optimized Stan implementation of MLN-DLMs. Our experiments suggest that Fenrir can be three orders of magnitude more efficient than Stan and can even be incorporated into larger sampling schemes for joint inference of model hyperparameters. Our methods are made available to the community as a user-friendly software library written in C++ with an R interface.

许多科学领域收集纵向计数成分数据。每次观测都是一个多变量计数向量,其中的总计数是任意的,信息在于计数的相对频率。多位作者提出了贝叶斯多项logistic -正态动态线性模型(MLN-DLMs)作为一种灵活的方法来建模这些数据。然而,这些方法的采用受到计算挑战的限制。本文提出了一种有效而准确的后验状态估计方法,称为Fenrir。我们的方法依赖于一种新的MAP估计算法和对模型关键后验边缘的精确逼近。由于没有可以比较的等效方法,我们还开发了mln - dlm的优化Stan实现。我们的实验表明,Fenrir的效率可以比Stan高三个数量级,甚至可以合并到更大的采样方案中,用于模型超参数的联合推理。我们的方法作为一个用c++编写的带有R接口的用户友好的软件库提供给社区。
{"title":"Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models.","authors":"Manan Saxena, Tinghua Chen, Justin D Silverman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative frequency of the counts. Multiple authors have proposed Bayesian Multinomial Logistic-Normal Dynamic Linear Models (MLN-DLMs) as a flexible approach to modeling these data. However, adoption of these methods has been limited by computational challenges. This article develops an efficient and accurate approach to posterior state estimation, called Fenrir. Our approach relies on a novel algorithm for MAP estimation and an accurate approximation to a key posterior marginal of the model. As there are no equivalent methods against which we can compare, we also develop an optimized Stan implementation of MLN-DLMs. Our experiments suggest that Fenrir can be three orders of magnitude more efficient than Stan and can even be incorporated into larger sampling schemes for joint inference of model hyperparameters. Our methods are made available to the community as a user-friendly software library written in C++ with an R interface.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"258 ","pages":"442-450"},"PeriodicalIF":0.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Quantification for Conditional Treatment Effect Estimation under Dynamic Treatment Regimes. 动态处理条件下条件处理效果评估的不确定性量化。
Leon Deng, Hong Xiong, Feng Wu, Sanyam Kapoor, Soumya Ghosh, Zach Shahn, Li-Wei H Lehman

In medical decision-making, clinicians must choose between different time-varying treatment strategies. Counterfactual prediction via g-computation enables comparison of alternative outcome distributions under such treatment strategies. While deep learning can better model high-dimensional data with complex temporal dependencies, incorporating model uncertainty into predicted conditional counterfactual distributions remains challenging. We propose a principled approach to model uncertainty in deep learning implementations of g-computations using approximate Bayesian posterior predictive distributions of counterfactual outcomes via variational dropout and deep ensembles. We evaluate these methods by comparing their counterfactual predictive calibration and performance in decision-making tasks, using two simulated datasets from mechanistic models and a real-world sepsis dataset. Our findings suggest that the proposed uncertainty quantification approach improves both calibration and decision-making performance, particularly in minimizing risks of worst-case adverse clinical outcomes under alternative dynamic treatment regimes. To our knowledge, this is the first work to propose and compare multiple uncertainty quantification methods in machine learning models of g-computation in estimating conditional treatment effects under dynamic treatment regimes.

在医疗决策中,临床医生必须在不同的时变治疗策略之间做出选择。通过g计算的反事实预测可以比较这种治疗策略下的不同结果分布。虽然深度学习可以更好地为具有复杂时间依赖性的高维数据建模,但将模型不确定性纳入预测的条件反事实分布仍然具有挑战性。我们提出了一种有原则的方法来模拟g计算的深度学习实现中的不确定性,使用通过变分dropout和深度集成的反事实结果的近似贝叶斯后验预测分布。我们通过比较它们在决策任务中的反事实预测校准和性能来评估这些方法,使用来自机制模型和真实脓毒症数据集的两个模拟数据集。我们的研究结果表明,所提出的不确定性量化方法提高了校准和决策性能,特别是在最大限度地降低了在替代动态治疗方案下最坏的不良临床结果的风险。据我们所知,这是第一次提出和比较在动态处理制度下估计条件处理效果的g计算机器学习模型中的多种不确定性量化方法。
{"title":"Uncertainty Quantification for Conditional Treatment Effect Estimation under Dynamic Treatment Regimes.","authors":"Leon Deng, Hong Xiong, Feng Wu, Sanyam Kapoor, Soumya Ghosh, Zach Shahn, Li-Wei H Lehman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In medical decision-making, clinicians must choose between different time-varying treatment strategies. Counterfactual prediction via g-computation enables comparison of alternative outcome distributions under such treatment strategies. While deep learning can better model high-dimensional data with complex temporal dependencies, incorporating model uncertainty into predicted conditional counterfactual distributions remains challenging. We propose a principled approach to model uncertainty in deep learning implementations of g-computations using approximate Bayesian posterior predictive distributions of counterfactual outcomes via variational dropout and deep ensembles. We evaluate these methods by comparing their counterfactual predictive calibration and performance in decision-making tasks, using two simulated datasets from mechanistic models and a real-world sepsis dataset. Our findings suggest that the proposed uncertainty quantification approach improves both calibration and decision-making performance, particularly in minimizing risks of worst-case adverse clinical outcomes under alternative dynamic treatment regimes. To our knowledge, this is the first work to propose and compare multiple uncertainty quantification methods in machine learning models of g-computation in estimating conditional treatment effects under dynamic treatment regimes.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"259 ","pages":"248-266"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121963/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1