首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
Understanding Transcriptional Regulatory Redundancy by Learnable Global Subset Perturbations. 通过可学习的全局子集扰动理解转录调控冗余。
Junhao Liu, Siwei Xu, Dylan Riffle, Ziheng Duan, Martin Renqiang Min, Jing Zhang

Transcriptional regulation through cis-regulatory elements (CREs) is crucial for numerous biological functions, with its disruption potentially leading to various diseases. It is well-known that these CREs often exhibit redundancy, allowing them to compensate for each other in response to external disturbances, highlighting the need for methods to identify CRE sets that collaboratively regulate gene expression effectively. To address this, we introduce GRIDS, an in silico computational method that approaches the task as a global feature explanation challenge to dissect combinatorial CRE effects in two phases. First, GRIDS constructs a differentiable surrogate function to mirror the complex gene regulatory process, facilitating cross-translations in single-cell modalities. It then employs learnable perturbations within a state transition framework to offer global explanations, efficiently navigating the combinatorial feature landscape. Through comprehensive benchmarks, GRIDS demonstrates superior explanatory capabilities compared to other leading methods. Moreover, GRIDS's global explanations reveal intricate regulatory redundancy across cell types and states, underscoring its potential to advance our understanding of cellular regulation in biological research.

通过顺式调控元件(cre)进行的转录调控对许多生物学功能至关重要,其破坏可能导致各种疾病。众所周知,这些CRE通常表现出冗余性,允许它们在响应外部干扰时相互补偿,这突出了识别有效协同调节基因表达的CRE集的方法的必要性。为了解决这个问题,我们引入了网格,这是一种计算机计算方法,它将任务视为一个全局特征解释挑战,以在两个阶段剖析组合CRE效应。首先,GRIDS构建了一个可微分的替代功能,以反映复杂的基因调控过程,促进单细胞模式的交叉翻译。然后在状态转换框架内使用可学习的扰动来提供全局解释,有效地导航组合特征景观。通过全面的基准测试,与其他领先的方法相比,GRIDS展示了优越的解释能力。此外,网格的全局解释揭示了细胞类型和状态之间复杂的调控冗余,强调了其在生物学研究中促进我们对细胞调控的理解的潜力。
{"title":"Understanding Transcriptional Regulatory Redundancy by Learnable Global Subset Perturbations.","authors":"Junhao Liu, Siwei Xu, Dylan Riffle, Ziheng Duan, Martin Renqiang Min, Jing Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Transcriptional regulation through cis-regulatory elements (CREs) is crucial for numerous biological functions, with its disruption potentially leading to various diseases. It is well-known that these CREs often exhibit redundancy, allowing them to compensate for each other in response to external disturbances, highlighting the need for methods to identify CRE sets that collaboratively regulate gene expression effectively. To address this, we introduce GRIDS, an in silico computational method that approaches the task as a global feature explanation challenge to dissect combinatorial CRE effects in two phases. First, GRIDS constructs a differentiable surrogate function to mirror the complex gene regulatory process, facilitating cross-translations in single-cell modalities. It then employs learnable perturbations within a state transition framework to offer global explanations, efficiently navigating the combinatorial feature landscape. Through comprehensive benchmarks, GRIDS demonstrates superior explanatory capabilities compared to other leading methods. Moreover, GRIDS's global explanations reveal intricate regulatory redundancy across cell types and states, underscoring its potential to advance our understanding of cellular regulation in biological research.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"260 ","pages":"383-398"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation. 儿童肥胖风险评估的可互操作机器学习管道。
Hamed Fayyaz, Mehak Gupta, Alejandra Perez Ramirez, Claudine Jurkovitz, H Timothy Bunnell, Thao-Ly T Phan, Rahmatollah Beheshti

Reliable prediction of pediatric obesity can offer a valuable resource to providers, helping them engage in timely preventive interventions before the disease is established. Many efforts have been made to develop ML-based predictive models of obesity, and some studies have reported high predictive performances. However, no commonly used clinical decision support tool based on existing ML models currently exists. This study presents a novel end-to-end pipeline specifically designed for pediatric obesity prediction, which supports the entire process of data extraction, inference, and communication via an API or a user interface. While focusing only on routinely recorded data in pediatric electronic health records (EHRs), our pipeline uses a diverse expert-curated list of medical concepts to predict the 1-3 years risk of developing obesity. Furthermore, by using the Fast Healthcare Interoperability Resources (FHIR) standard in our design procedure, we specifically target facilitating low-effort integration of our pipeline with different EHR systems. In our experiments, we report the effectiveness of the predictive model as well as its alignment with the feedback from various stakeholders, including ML scientists, providers, health IT personnel, health administration representatives, and patient group representatives.

对儿童肥胖的可靠预测可以为提供者提供宝贵的资源,帮助他们在疾病确定之前及时进行预防干预。基于机器学习的肥胖预测模型的开发已经取得了很多成果,一些研究报告了较高的预测效果。然而,目前还没有基于现有ML模型的常用临床决策支持工具。本研究提出了一种专门为儿童肥胖预测设计的新型端到端管道,该管道支持通过API或用户界面进行数据提取、推理和通信的整个过程。虽然只关注儿科电子健康记录(EHRs)中的常规记录数据,但我们的产品线使用多种专家策划的医学概念列表来预测1-3年的肥胖风险。此外,通过在我们的设计过程中使用快速医疗保健互操作性资源(FHIR)标准,我们的目标是促进我们的管道与不同EHR系统的低工作量集成。在我们的实验中,我们报告了预测模型的有效性,以及它与各种利益相关者(包括ML科学家、提供者、卫生IT人员、卫生管理代表和患者组代表)的反馈的一致性。
{"title":"An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation.","authors":"Hamed Fayyaz, Mehak Gupta, Alejandra Perez Ramirez, Claudine Jurkovitz, H Timothy Bunnell, Thao-Ly T Phan, Rahmatollah Beheshti","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Reliable prediction of pediatric obesity can offer a valuable resource to providers, helping them engage in timely preventive interventions before the disease is established. Many efforts have been made to develop ML-based predictive models of obesity, and some studies have reported high predictive performances. However, no commonly used clinical decision support tool based on existing ML models currently exists. This study presents a novel end-to-end pipeline specifically designed for pediatric obesity prediction, which supports the entire process of data extraction, inference, and communication via an API or a user interface. While focusing only on routinely recorded data in pediatric electronic health records (EHRs), our pipeline uses a diverse expert-curated list of medical concepts to predict the 1-3 years risk of developing obesity. Furthermore, by using the Fast Healthcare Interoperability Resources (FHIR) standard in our design procedure, we specifically target facilitating low-effort integration of our pipeline with different EHR systems. In our experiments, we report the effectiveness of the predictive model as well as its alignment with the feedback from various stakeholders, including ML scientists, providers, health IT personnel, health administration representatives, and patient group representatives.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"259 ","pages":"308-324"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143574461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Pure Transformer Pretraining Framework on Text-attributed Graphs. 文本属性图的纯变压器预训练框架。
Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features. These high-quality features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walk and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, GSPT alleviates structural heterogeneity and achieves significantly better transferability among graphs within the same domain. Our approach can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets. The source code is publicly available at https://github.com/SongYYYY/GSPT.

预训练在从大规模数据中获取广义知识方面发挥着关键作用,在CV和NLP的大型模型中取得了显著的成功。然而,由于特征异质性和结构异质性所代表的根本性挑战,图域的进展仍然有限。最近,人们通过生成固定长度的文本表示作为节点特征,在文本属性图(tag)上通过大型语言模型(llm)来解决特征异质性问题。这些高质量的特征减少了先前图结构的关键作用,导致图神经网络(gnn)和结构不可知的多层感知器(mlp)之间的性能差距不大。受此启发,我们引入了以特征为中心的预训练视角,将图结构作为先验,并利用丰富、统一的特征空间来学习跨图的精细交互模式。我们的框架,使用Transformer的图序列预训练(GSPT),通过随机漫步对节点上下文进行采样,并使用屏蔽特征重建来使用标准Transformer在llm统一的特征空间中捕获成对接近。通过使用统一的文本表示而不是不同的结构,GSPT减轻了结构的异质性,在同一域内的图之间实现了更好的可移植性。我们的方法可以很容易地适应节点分类和链接预测,在各种数据集上展示了有希望的经验成功。源代码可在https://github.com/SongYYYY/GSPT上公开获得。
{"title":"A Pure Transformer Pretraining Framework on Text-attributed Graphs.","authors":"Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features. These high-quality features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walk and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, GSPT alleviates structural heterogeneity and achieves significantly better transferability among graphs within the same domain. Our approach can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets. The source code is publicly available at https://github.com/SongYYYY/GSPT.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"269 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedGraphNet: Leveraging Multi-Relational Graph Neural Networks and Text Knowledge for Biomedical Predictions. MedGraphNet:利用多关系图神经网络和文本知识进行生物医学预测。
Oladimeji Macaulay, Michael Servilla, David Arredondo, Kushal Virupakshappa, Yue Hu, Luis Tafoya, Yanfu Zhang, Avinash Sahu

Genetic, molecular, and environmental factors influence diseases through complex interactions with genes, phenotypes, and drugs. Current methods often fail to integrate diverse multi-relational biological data meaningfully, limiting the discovery of novel risk genes and drugs. To address this, we present MedGraphNet, a multi-relational Graph Neural Network (GNN) model designed to infer relationships among drugs, genes, diseases, and phenotypes. MedGraphNet initializes nodes using informative embeddings from existing text knowledge, allowing for robust integration of various data types and improved generalizability. Our results demonstrate that MedGraphNet matches and often outperforms traditional single-relation approaches, particularly in scenarios with isolated or sparsely connected nodes. The model shows generalizability to external datasets, achieving high accuracy in identifying disease-gene associations and drug-phenotype relationships. Notably, MedGraphNet accurately inferred drug side effects without direct training on such data. Using Alzheimer's disease as a case study, MedGraphNet successfully identified relevant phenotypes, genes, and drugs, corroborated by existing literature. These findings demonstrate the potential of integrating multi-relational data with text knowledge to enhance biomedical predictions and drug repurposing for diseases. MedGraphNet code is available at https://github.com/vinash85/MedGraphNet.

遗传、分子和环境因素通过与基因、表型和药物的复杂相互作用影响疾病。目前的方法往往不能有效地整合各种多关系的生物学数据,限制了新的风险基因和药物的发现。为了解决这个问题,我们提出了MedGraphNet,一个多关系图神经网络(GNN)模型,旨在推断药物、基因、疾病和表型之间的关系。MedGraphNet使用来自现有文本知识的信息嵌入来初始化节点,从而允许各种数据类型的健壮集成和改进的泛化性。我们的研究结果表明,MedGraphNet匹配并经常优于传统的单关系方法,特别是在具有孤立或稀疏连接节点的场景中。该模型显示了对外部数据集的可泛化性,在识别疾病基因关联和药物表型关系方面达到了很高的准确性。值得注意的是,MedGraphNet在没有对这些数据进行直接训练的情况下准确地推断了药物的副作用。MedGraphNet以阿尔茨海默病为例,成功地鉴定出相关的表型、基因和药物,并得到了现有文献的证实。这些发现证明了将多关系数据与文本知识集成在一起以增强生物医学预测和疾病药物再利用的潜力。MedGraphNet的代码可从https://github.com/vinash85/MedGraphNet获得。
{"title":"<i>MedGraphNet</i>: Leveraging Multi-Relational Graph Neural Networks and Text Knowledge for Biomedical Predictions.","authors":"Oladimeji Macaulay, Michael Servilla, David Arredondo, Kushal Virupakshappa, Yue Hu, Luis Tafoya, Yanfu Zhang, Avinash Sahu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Genetic, molecular, and environmental factors influence diseases through complex interactions with genes, phenotypes, and drugs. Current methods often fail to integrate diverse multi-relational biological data meaningfully, limiting the discovery of novel risk genes and drugs. To address this, we present <i>MedGraphNet</i>, a multi-relational Graph Neural Network (GNN) model designed to infer relationships among drugs, genes, diseases, and phenotypes. <i>MedGraphNet</i> initializes nodes using informative embeddings from existing text knowledge, allowing for robust integration of various data types and improved generalizability. Our results demonstrate that <i>MedGraphNet</i> matches and often outperforms traditional single-relation approaches, particularly in scenarios with isolated or sparsely connected nodes. The model shows generalizability to external datasets, achieving high accuracy in identifying disease-gene associations and drug-phenotype relationships. Notably, <i>MedGraphNet</i> accurately inferred drug side effects without direct training on such data. Using Alzheimer's disease as a case study, <i>MedGraphNet</i> successfully identified relevant phenotypes, genes, and drugs, corroborated by existing literature. These findings demonstrate the potential of integrating multi-relational data with text knowledge to enhance biomedical predictions and drug repurposing for diseases. <i>MedGraphNet</i> code is available at https://github.com/vinash85/MedGraphNet.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"261 ","pages":"162-182"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12424194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Sleep Apnea Detection with Missing or Noisy Modalities. 多模态睡眠呼吸暂停检测缺失或噪声模态。
Hamed Fayyaz, Niharika S D'Souza, Rahmatollah Beheshti

Polysomnography (PSG) is a type of sleep study that records multimodal physiological signals and is widely used for purposes such as sleep staging and respiratory event detection. Conventional machine learning methods assume that each sleep study is associated with a fixed set of observed modalities and that all modalities are available for each sample. However, noisy and missing modalities are a common issue in real-world clinical settings. In this study, we propose a comprehensive pipeline aiming to compensate for the missing or noisy modalities when performing sleep apnea detection. Unlike other existing studies, our proposed model works with any combination of available modalities. Our experiments show that the proposed model outperforms other state-of-the-art approaches in sleep apnea detection using various subsets of available data and different levels of noise, and maintains its high performance (AUROC>0.9) even in the presence of high levels of noise or missingness. This is especially relevant in settings where the level of noise and missingness is high (such as pediatric or outside-of-clinic scenarios). Our code is publicly available at https://github.com/healthylaife/apnea-missing-modality.

多导睡眠图(Polysomnography, PSG)是一种记录多模态生理信号的睡眠研究,广泛用于睡眠分期和呼吸事件检测等目的。传统的机器学习方法假设每个睡眠研究都与一组固定的观察模式相关联,并且每个样本都可以使用所有模式。然而,在现实世界的临床环境中,嘈杂和缺失的模式是常见的问题。在这项研究中,我们提出了一个全面的管道,旨在补偿在进行睡眠呼吸暂停检测时缺失或嘈杂的模式。与其他现有的研究不同,我们提出的模型适用于任何可用模式的组合。我们的实验表明,所提出的模型在使用各种可用数据子集和不同水平的噪声进行睡眠呼吸暂停检测方面优于其他最先进的方法,并且即使在存在高水平噪声或缺失的情况下也保持其高性能(AUROC>0.9)。这在噪音和缺失程度高的环境中尤其重要(例如儿科或诊所外的场景)。我们的代码可以在https://github.com/healthylaife/apnea-missing-modality上公开获得。
{"title":"Multimodal Sleep Apnea Detection with Missing or Noisy Modalities.","authors":"Hamed Fayyaz, Niharika S D'Souza, Rahmatollah Beheshti","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Polysomnography (PSG) is a type of sleep study that records multimodal physiological signals and is widely used for purposes such as sleep staging and respiratory event detection. Conventional machine learning methods assume that each sleep study is associated with a fixed set of observed modalities and that all modalities are available for each sample. However, noisy and missing modalities are a common issue in real-world clinical settings. In this study, we propose a comprehensive pipeline aiming to compensate for the missing or noisy modalities when performing sleep apnea detection. Unlike other existing studies, our proposed model works with any combination of available modalities. Our experiments show that the proposed model outperforms other state-of-the-art approaches in sleep apnea detection using various subsets of available data and different levels of noise, and maintains its high performance (AUROC>0.9) even in the presence of high levels of noise or missingness. This is especially relevant in settings where the level of noise and missingness is high (such as pediatric or outside-of-clinic scenarios). Our code is publicly available at https://github.com/healthylaife/apnea-missing-modality.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"252 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes. G-Transformer:动态和时变治疗机制下的反事实结果预测。
Hong Xiong, Feng Wu, Leon Deng, Megan Su, Zach Shahn, Li-Wei H Lehman

In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. In this work, we present G-Transformer for counterfactual outcome prediction under dynamic and time-varying treatment strategies. Our approach leverages a Transformer architecture to capture complex, long-range dependencies in time-varying covariates while enabling g-computation, a causal inference method for estimating the effects of dynamic treatment regimes. Specifically, we use a Transformer-based encoder architecture to estimate the conditional distribution of relevant covariates given covariate and treatment history at each time point, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture that supports g-computation for counterfactual outcome prediction under dynamic and time-varying treatment strategies.

在医疗决策的背景下,反事实预测使临床医生能够根据观察到的患者病史预测治疗行动的替代过程中感兴趣的治疗结果。在这项工作中,我们提出了G-Transformer用于动态和时变治疗策略下的反事实结果预测。我们的方法利用Transformer架构来捕获时变协变量中的复杂、长期依赖关系,同时启用g计算,这是一种用于估计动态处理机制效果的因果推理方法。具体来说,我们使用基于transformer的编码器架构来估计每个时间点给定协变量和治疗历史的相关协变量的条件分布,然后通过模拟患者在感兴趣的治疗策略下的前向轨迹来产生反事实结果的蒙特卡罗估计。我们使用来自机制模型的两个模拟纵向数据集和来自MIMIC-IV的真实脓毒症ICU数据集广泛评估G-Transformer。在这些情况下,G-Transformer的表现优于经典和最先进的反事实预测模型。据我们所知,这是第一个基于transformer的架构,它支持在动态和时变处理策略下进行反事实结果预测的g计算。
{"title":"G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes.","authors":"Hong Xiong, Feng Wu, Leon Deng, Megan Su, Zach Shahn, Li-Wei H Lehman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. In this work, we present G-Transformer for counterfactual outcome prediction under dynamic and time-varying treatment strategies. Our approach leverages a Transformer architecture to capture complex, long-range dependencies in time-varying covariates while enabling g-computation, a causal inference method for estimating the effects of dynamic treatment regimes. Specifically, we use a Transformer-based encoder architecture to estimate the conditional distribution of relevant covariates given covariate and treatment history at each time point, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture that supports g-computation for counterfactual outcome prediction under dynamic and time-varying treatment strategies.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"252 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12113242/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144164074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models. 大型语言模型随机对照试验数值结果的自动提取。
Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, Byron C Wallace

Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs-including ones trained on biomedical texts-perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.

荟萃分析统计汇总了不同随机对照试验(rct)的结果,以评估治疗效果。由于这产生了对治疗有效性的可靠估计,因此荟萃分析的结果被认为是最有力的证据形式。然而,严格的证据合成是耗时和劳动密集型的,需要人工从单个试验中提取数据进行合成。理想情况下,语言技术将允许根据需要进行全自动元分析。这需要准确地从单个试验中提取数值结果,这已经超出了自然语言处理(NLP)模型的能力。在这项工作中,我们评估了现代大型语言模型(llm)是否能够可靠地执行这项任务。我们注释(并发布)了一个适度但精细的临床试验报告评估数据集,并附上了干预措施、比较物和结果的数值结果。使用该数据集,我们评估了七个应用零射击的llm的性能,该任务是有条件地从试验报告中提取数值结果。我们发现,能够容纳长输入的大量llm非常接近于实现全自动元分析,特别是对于二分类(二元)结果(例如死亡率)。然而,法学硕士——包括那些接受过生物医学文本培训的法学硕士——在结果测量很复杂且计算结果需要推理时表现不佳。这项工作为通过llm实现rct的全自动荟萃分析指明了道路,同时也强调了现有模型在这一目标上的局限性。
{"title":"Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models.","authors":"Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, Byron C Wallace","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs-including ones trained on biomedical texts-perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"252 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks. 用双层 ReLU 神经网络进行可证明的多任务表征学习
Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a single task or (ii) they are linear, very little is known about the closer-to-practice case of nonlinear NNs trained on multiple tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an r -dimensional subspace within the d r -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of d . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all r ground-truth features.

一种日益流行的机器学习范式是在许多任务上离线预训练神经网络(NN),然后使其适应下游任务,通常只重新训练网络的最后一层线性层。这种方法在各种情况下都能产生强大的下游性能,证明多任务预训练能带来有效的特征学习。尽管最近的一些理论研究表明,浅层网络在以下两种情况下都能学习到有意义的特征:(i) 在单一任务中训练;(ii) 是线性的,但对于在多个任务中训练的非线性网络这种更贴近实践的情况却知之甚少。在这项研究中,我们首次证明了在多个任务中使用非线性模型进行训练时会出现特征学习。我们的主要见解是,多任务预训练会产生一种伪对比损失,这种损失有利于将通常在不同任务中具有相同标签的点对齐的表征。利用这一观察结果,我们证明,当任务是二元分类任务时,标签取决于数据在 d ≫ r -dimensional 输入空间内的 r -dimensional 子空间上的投影,在双层 ReLU NN 上的基于梯度的简单多任务学习算法可以恢复这一投影,从而在样本和神经元复杂度与 d 无关的情况下泛化到下游任务。与此相反,我们的研究表明,在单个任务的高概率抽取中,对该单个任务的训练无法保证学习到所有 r 个地面真实特征。
{"title":"Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks.","authors":"Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a <i>single</i> task or (ii) they are <i>linear</i>, very little is known about the closer-to-practice case of <i>nonlinear</i> NNs trained on <i>multiple</i> tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an <math><mi>r</mi></math> -dimensional subspace within the <math><mi>d</mi> <mo>≫</mo> <mi>r</mi></math> -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of <math><mi>d</mi></math> . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all <math><mi>r</mi></math> ground-truth features.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"9292-9345"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling. Caduceus:双向等变远程DNA序列建模。
Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov

Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of 10 x larger models that do not leverage bi-directionality or equivariance. Code to reproduce our experiments is available here.

大规模序列建模已经引发了快速发展,现在延伸到生物学和基因组学。然而,基因组序列建模带来了挑战,例如需要对远程标记相互作用、基因组上游和下游区域的影响以及DNA的反向互补(RC)进行建模。在这里,我们提出了一种基于这些挑战的架构,该架构建立在远程Mamba块的基础上,并将其扩展到支持双向性的BiMamba组件,以及额外支持RC等方差的MambaDNA块。我们使用MambaDNA作为第一族RC等变双向远程DNA语言模型Caduceus的基础,并引入预训练和微调策略,生成Caduceus DNA基础模型。Caduceus在下游基准上优于以前的远程模型;在具有挑战性的长期变异效应预测任务中,Caduceus的性能超过了不利用双向性或等方差的10倍较大模型。复制我们实验的代码在这里。
{"title":"Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling.","authors":"Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of <math><mrow><mn>10</mn> <mi>x</mi></mrow> </math> larger models that do not leverage bi-directionality or equivariance. Code to reproduce our experiments is available here.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"43632-43648"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12189541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions. 多区域马尔可夫高斯过程:发现跨多个脑区定向通信的高效方法
Weihan Li, Chengrui Li, Yule Wang, Anqi Wu

Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.

研究不同脑区之间复杂的相互作用对神经科学至关重要。各种统计方法探索了多个脑区之间的潜在交流。其中两大类是高斯过程(GP)和线性动力系统(LDS),它们各有千秋。基于 GP 的方法能有效发现具有频带和通信方向的潜变量。相反,基于 LDS 的方法计算效率高,但在潜在表示方面缺乏强大的表现力。在本研究中,我们将这两种方法融合在一起,创建了一个反映多输出 GP 的 LDS,称为多区域马尔可夫高斯过程(MRM-GP)。我们的研究在 LDS 和多输出 GP 之间建立了联系,明确地模拟了神经记录潜空间内的频率和相位延迟。因此,该模型在时间点上实现了线性推理成本,并提供了可解释的低维表示,揭示了跨脑区的通信方向,并将振荡通信分离为不同的频段。
{"title":"Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions.","authors":"Weihan Li, Chengrui Li, Yule Wang, Anqi Wu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"28112-28131"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1