首页 > 最新文献

Nature Machine Intelligence最新文献

英文 中文
A soft skin with self-decoupled three-axis force-sensing taxels 带有自解耦三轴力感应传感器的软皮肤
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-19 DOI: 10.1038/s42256-024-00904-9
Youcan Yan, Ahmed Zermane, Jia Pan, Abderrahmane Kheddar

Electronic skins integrating both normal and shear force per taxel have a wide range of applications across diverse fields, including robotics, haptics and health monitoring. Current multi-axis tactile sensors often present complexities in structure and fabrication or require an extensive calibration process, limiting their widespread applications. Here we report an electronic soft magnetic skin capable of self-decoupling three-axis forces at each taxel. We use a simple sensor structure with customizable sensitivity and measurement range, reducing the calibration complexity from known quadratic (N2) or cubic (N3) scales down to a linear (3N) scale. The three-axis self-decoupling property of the sensor is achieved by overlaying two sinusoidally magnetized flexible magnetic films with orthogonal magnetization patterns. Leveraging the self-decoupling feature and its simple structure, we demonstrate that our sensor can facilitate a diverse range of applications, such as measuring the three-dimensional force distribution in artificial knee joints, teaching robots by touch demonstration and monitoring the interaction forces between knee braces and human skin during various activities.

集成了法向力和剪切力的电子表皮可广泛应用于机器人、触觉和健康监测等多个领域。目前的多轴触觉传感器往往结构和制造复杂,或需要大量校准过程,限制了其广泛应用。在此,我们报告了一种电子软磁皮肤,它能够在每个滑轮上对三轴力进行自解耦。我们使用的传感器结构简单,灵敏度和测量范围可定制,将校准复杂度从已知的二次方(N2)或三次方(N3)尺度降低到线性(3N)尺度。传感器的三轴自解耦特性是通过叠加两层具有正交磁化模式的正弦磁化柔性磁性薄膜实现的。利用自解耦特性及其简单的结构,我们证明了我们的传感器可以促进多种应用,例如测量人工膝关节的三维力分布、通过触摸演示进行机器人教学,以及监测各种活动中膝套与人体皮肤之间的相互作用力。
{"title":"A soft skin with self-decoupled three-axis force-sensing taxels","authors":"Youcan Yan, Ahmed Zermane, Jia Pan, Abderrahmane Kheddar","doi":"10.1038/s42256-024-00904-9","DOIUrl":"https://doi.org/10.1038/s42256-024-00904-9","url":null,"abstract":"<p>Electronic skins integrating both normal and shear force per taxel have a wide range of applications across diverse fields, including robotics, haptics and health monitoring. Current multi-axis tactile sensors often present complexities in structure and fabrication or require an extensive calibration process, limiting their widespread applications. Here we report an electronic soft magnetic skin capable of self-decoupling three-axis forces at each taxel. We use a simple sensor structure with customizable sensitivity and measurement range, reducing the calibration complexity from known quadratic (<i>N</i><sup>2</sup>) or cubic (<i>N</i><sup>3</sup>) scales down to a linear (3<i>N</i>) scale. The three-axis self-decoupling property of the sensor is achieved by overlaying two sinusoidally magnetized flexible magnetic films with orthogonal magnetization patterns. Leveraging the self-decoupling feature and its simple structure, we demonstrate that our sensor can facilitate a diverse range of applications, such as measuring the three-dimensional force distribution in artificial knee joints, teaching robots by touch demonstration and monitoring the interaction forces between knee braces and human skin during various activities.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"13 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reshaping the discovery of self-assembling peptides with generative AI guided by hybrid deep learning 以混合深度学习为指导的生成式人工智能重塑自组装肽的发现过程
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-19 DOI: 10.1038/s42256-024-00928-1
Marko Njirjak, Lucija Žužić, Marko Babić, Patrizia Janković, Erik Otović, Daniela Kalafatovic, Goran Mauša

Supramolecular peptide-based materials have great potential for revolutionizing fields like nanotechnology and medicine. However, deciphering the intricate sequence-to-assembly pathway, essential for their real-life applications, remains a challenging endeavour. Their discovery relies primarily on empirical approaches that require substantial financial resources, impeding their disruptive potential. Consequently, despite the multitude of characterized self-assembling peptides and their demonstrated advantages, only a few peptide materials have found their way to the market. Machine learning trained on experimentally verified data presents a promising tool for quickly identifying sequences with a high propensity to self-assemble, thereby focusing resource expenditures on the most promising candidates. Here we introduce a framework that implements an accurate classifier in a metaheuristic-based generative model to navigate the search through the peptide sequence space of challenging size. For this purpose, we trained five recurrent neural networks among which the hybrid model that uses sequential information on aggregation propensity and specific physicochemical properties achieved a superior performance with 81.9% accuracy and 0.865 F1 score. Molecular dynamics simulations and experimental validation have confirmed the generative model to be 80–95% accurate in the discovery of self-assembling peptides, outperforming the current state-of-the-art models. The proposed modular framework efficiently complements human intuition in the exploration of self-assembling peptides and presents an important step in the development of intelligent laboratories for accelerated material discovery.

超分子肽基材料在纳米技术和医学等领域具有巨大的变革潜力。然而,破译其实际应用所必需的复杂序列到组装途径仍然是一项具有挑战性的工作。它们的发现主要依靠经验方法,需要大量的财政资源,这阻碍了它们的颠覆性潜力。因此,尽管自组装肽的特征繁多且优势明显,但只有少数肽材料进入了市场。根据实验验证数据训练的机器学习是一种很有前途的工具,可用于快速识别具有高度自组装倾向的序列,从而将资源支出集中在最有前途的候选产品上。在这里,我们介绍了一个框架,该框架在基于元启发式的生成模型中实施了精确的分类器,以引导在具有挑战性大小的肽序列空间中进行搜索。为此,我们训练了五个递归神经网络,其中使用聚集倾向和特定理化性质序列信息的混合模型取得了卓越的性能,准确率达到 81.9%,F1 分数为 0.865。分子动力学模拟和实验验证证实,该生成模型在发现自组装肽方面的准确率为 80-95%,优于目前最先进的模型。在探索自组装肽的过程中,所提出的模块化框架有效地补充了人类的直觉,为开发加速材料发现的智能实验室迈出了重要一步。
{"title":"Reshaping the discovery of self-assembling peptides with generative AI guided by hybrid deep learning","authors":"Marko Njirjak, Lucija Žužić, Marko Babić, Patrizia Janković, Erik Otović, Daniela Kalafatovic, Goran Mauša","doi":"10.1038/s42256-024-00928-1","DOIUrl":"https://doi.org/10.1038/s42256-024-00928-1","url":null,"abstract":"<p>Supramolecular peptide-based materials have great potential for revolutionizing fields like nanotechnology and medicine. However, deciphering the intricate sequence-to-assembly pathway, essential for their real-life applications, remains a challenging endeavour. Their discovery relies primarily on empirical approaches that require substantial financial resources, impeding their disruptive potential. Consequently, despite the multitude of characterized self-assembling peptides and their demonstrated advantages, only a few peptide materials have found their way to the market. Machine learning trained on experimentally verified data presents a promising tool for quickly identifying sequences with a high propensity to self-assemble, thereby focusing resource expenditures on the most promising candidates. Here we introduce a framework that implements an accurate classifier in a metaheuristic-based generative model to navigate the search through the peptide sequence space of challenging size. For this purpose, we trained five recurrent neural networks among which the hybrid model that uses sequential information on aggregation propensity and specific physicochemical properties achieved a superior performance with 81.9% accuracy and 0.865 F1 score. Molecular dynamics simulations and experimental validation have confirmed the generative model to be 80–95% accurate in the discovery of self-assembling peptides, outperforming the current state-of-the-art models. The proposed modular framework efficiently complements human intuition in the exploration of self-assembling peptides and presents an important step in the development of intelligent laboratories for accelerated material discovery.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"18 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient rare event sampling with unsupervised normalizing flows 利用无监督归一化流量进行高效罕见事件采样
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-19 DOI: 10.1038/s42256-024-00918-3
Solomon Asghar, Qing-Xiang Pei, Giorgio Volpe, Ran Ni

From physics and biology to seismology and economics, the behaviour of countless systems is determined by impactful yet unlikely transitions between metastable states known as rare events, the study of which is essential for understanding and controlling the properties of these systems. Classical computational methods to sample rare events remain prohibitively inefficient and are bottlenecks for enhanced samplers that require prior data. Here we introduce a physics-informed machine learning framework, normalizing Flow enhanced Rare Event Sampler (FlowRES), which uses unsupervised normalizing flow neural networks to enhance Monte Carlo sampling of rare events by generating high-quality non-local Monte Carlo proposals. We validated FlowRES by sampling the transition path ensembles of equilibrium and non-equilibrium systems of Brownian particles, exploring increasingly complex potentials. Beyond eliminating the requirements for prior data, FlowRES features key advantages over established samplers: no collective variables need to be defined, efficiency remains constant even as events become increasingly rare and systems with multiple routes between states can be straightforwardly simulated.

从物理学和生物学到地震学和经济学,无数系统的行为都是由被称为罕见事件的可变状态之间有影响但不可能发生的转变决定的,对这些事件的研究对于理解和控制这些系统的特性至关重要。对罕见事件进行采样的经典计算方法仍然效率极低,这也是需要先验数据的增强采样器的瓶颈所在。在这里,我们介绍了一种物理信息机器学习框架--归一化流增强罕见事件采样器(FlowRES),它使用无监督归一化流神经网络,通过生成高质量的非局部蒙特卡罗建议来增强罕见事件的蒙特卡罗采样。我们通过对布朗粒子平衡和非平衡系统的过渡路径集合进行采样,探索日益复杂的势能,从而验证了 FlowRES。与现有的采样器相比,FlowRES 除了无需先验数据外,还具有以下主要优势:无需定义集合变量,即使事件变得越来越罕见,效率也保持不变,而且可以直接模拟状态之间有多种路径的系统。
{"title":"Efficient rare event sampling with unsupervised normalizing flows","authors":"Solomon Asghar, Qing-Xiang Pei, Giorgio Volpe, Ran Ni","doi":"10.1038/s42256-024-00918-3","DOIUrl":"https://doi.org/10.1038/s42256-024-00918-3","url":null,"abstract":"<p>From physics and biology to seismology and economics, the behaviour of countless systems is determined by impactful yet unlikely transitions between metastable states known as rare events, the study of which is essential for understanding and controlling the properties of these systems. Classical computational methods to sample rare events remain prohibitively inefficient and are bottlenecks for enhanced samplers that require prior data. Here we introduce a physics-informed machine learning framework, normalizing Flow enhanced Rare Event Sampler (FlowRES), which uses unsupervised normalizing flow neural networks to enhance Monte Carlo sampling of rare events by generating high-quality non-local Monte Carlo proposals. We validated FlowRES by sampling the transition path ensembles of equilibrium and non-equilibrium systems of Brownian particles, exploring increasingly complex potentials. Beyond eliminating the requirements for prior data, FlowRES features key advantages over established samplers: no collective variables need to be defined, efficiency remains constant even as events become increasingly rare and systems with multiple routes between states can be straightforwardly simulated.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"251 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical large language models with misplaced focus 重点错位的临床大型语言模型
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-18 DOI: 10.1038/s42256-024-00929-0
Zining Luo, Haowei Ma, Zhiwu Li, Yuquan Chen, Yixin Sun, Aimin Hu, Jiang Yu, Yang Qiao, Junxian Gu, Hongying Li, Xuxi Peng, Dunrui Wang, Ying Liu, Zhenglong Liu, Jiebin Xie, Zhen Jiang, Gang Tian

On 12 September 2024, OpenAI released two new large language models (LLMs) — o1-preview and o1-mini — marking an important shift in the competitive landscape of commercial LLMs, particularly concerning their reasoning capabilities. Since the introduction of GPT-3.5, OpenAI has launched 31 LLMs in two years. Researchers are rapidly applying these evolving commercial models in clinical medicine, achieving results that sometimes exceed human performance in specific tasks. Although such success is encouraging, the development of the models used for these tasks may not align with the characteristics and needs of clinical practice.

LLMs can be categorized as either open-source or closed-source. Open-source models, such as Meta’s Llama, allow developers to access source code, training data and documentation freely. By contrast, closed-source models are accessed only through official channels or application programming interfaces (APIs). Initially, open-source models dominated the LLM landscape, until the release of OpenAI’s GPT-3 in 20201, which attracted considerable commercial interest and shifted focus towards closed-source approaches2.

2024 年 9 月 12 日,OpenAI 发布了两款新的大型语言模型(LLM)--o1-preview 和 o1-mini,标志着商业 LLM 的竞争格局发生了重要变化,尤其是在推理能力方面。自 GPT-3.5 推出以来,OpenAI 已在两年内推出了 31 个 LLM。研究人员正在迅速将这些不断发展的商业模型应用于临床医学,取得的成果有时甚至超过了人类在特定任务中的表现。尽管这种成功令人鼓舞,但用于这些任务的模型的开发可能与临床实践的特点和需求不符。LLM 可分为开源和闭源两种。开源模型,如 Meta 的 Llama,允许开发人员自由访问源代码、训练数据和文档。相比之下,闭源模型只能通过官方渠道或应用编程接口(API)访问。最初,开源模型在 LLM 领域占据主导地位,直到 20201 年 OpenAI 的 GPT-3 发布,吸引了相当大的商业兴趣,并将重点转向闭源方法2。
{"title":"Clinical large language models with misplaced focus","authors":"Zining Luo, Haowei Ma, Zhiwu Li, Yuquan Chen, Yixin Sun, Aimin Hu, Jiang Yu, Yang Qiao, Junxian Gu, Hongying Li, Xuxi Peng, Dunrui Wang, Ying Liu, Zhenglong Liu, Jiebin Xie, Zhen Jiang, Gang Tian","doi":"10.1038/s42256-024-00929-0","DOIUrl":"https://doi.org/10.1038/s42256-024-00929-0","url":null,"abstract":"<p>On 12 September 2024, OpenAI released two new large language models (LLMs) — o1-preview and o1-mini — marking an important shift in the competitive landscape of commercial LLMs, particularly concerning their reasoning capabilities. Since the introduction of GPT-3.5, OpenAI has launched 31 LLMs in two years. Researchers are rapidly applying these evolving commercial models in clinical medicine, achieving results that sometimes exceed human performance in specific tasks. Although such success is encouraging, the development of the models used for these tasks may not align with the characteristics and needs of clinical practice.</p><p>LLMs can be categorized as either open-source or closed-source. Open-source models, such as Meta’s Llama, allow developers to access source code, training data and documentation freely. By contrast, closed-source models are accessed only through official channels or application programming interfaces (APIs). Initially, open-source models dominated the LLM landscape, until the release of OpenAI’s GPT-3 in 2020<sup>1</sup>, which attracted considerable commercial interest and shifted focus towards closed-source approaches<sup>2</sup>.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"18 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient generation of protein pockets with PocketGen 利用 PocketGen 高效生成蛋白质口袋
IF 23.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1038/s42256-024-00920-9
Zaixi Zhang, Wan Xiang Shen, Qi Liu, Marinka Zitnik

Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such proteins is challenging due to the complexity of protein–ligand interactions, the flexibility of ligand molecules and amino acid side chains, and sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence and atomic structure of the protein regions in which ligand interactions occur. PocketGen promotes consistency between protein sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The graph transformer captures interactions at multiple scales, including atom, residue and ligand levels. For sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with enhanced binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 97% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 63%.

设计蛋白质结合蛋白对药物发现至关重要。然而,由于蛋白质-配体相互作用的复杂性、配体分子和氨基酸侧链的灵活性以及序列-结构的依赖性,基于人工智能的此类蛋白质设计极具挑战性。我们介绍的 PocketGen 是一种深度生成模型,它能生成发生配体相互作用的蛋白质区域的残基序列和原子结构。PocketGen 通过使用结构编码的图转换器和基于蛋白质语言模型的序列细化模块,促进蛋白质序列和结构之间的一致性。图转换器捕捉多个尺度的相互作用,包括原子、残基和配体水平。在序列细化方面,PocketGen 将结构适配器集成到蛋白质语言模型中,确保基于结构的预测与基于序列的预测相一致。PocketGen 可以生成高保真蛋白质口袋,增强结合亲和力和结构有效性。它的运行速度比基于物理的方法快十倍,成功率高达 97%,成功率的定义是生成的口袋的结合亲和力高于参考口袋的百分比。此外,它的氨基酸回收率超过 63%。
{"title":"Efficient generation of protein pockets with PocketGen","authors":"Zaixi Zhang, Wan Xiang Shen, Qi Liu, Marinka Zitnik","doi":"10.1038/s42256-024-00920-9","DOIUrl":"https://doi.org/10.1038/s42256-024-00920-9","url":null,"abstract":"<p>Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such proteins is challenging due to the complexity of protein–ligand interactions, the flexibility of ligand molecules and amino acid side chains, and sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence and atomic structure of the protein regions in which ligand interactions occur. PocketGen promotes consistency between protein sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The graph transformer captures interactions at multiple scales, including atom, residue and ligand levels. For sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with enhanced binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 97% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 63%.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"21 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pick your AI poison 选择你的人工智能毒药
IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-21 DOI: 10.1038/s42256-024-00921-8
Distinguishing between real and fabricated facts has long been a societal challenge. As the Internet becomes increasingly littered with AI-generated content, the need for curation and safeguarding of high-quality data and information is more crucial than ever.
长期以来,区分真实与捏造的事实一直是一个社会难题。随着互联网上人工智能生成的内容越来越多,对高质量数据和信息的整理和保护比以往任何时候都更加重要。
{"title":"Pick your AI poison","authors":"","doi":"10.1038/s42256-024-00921-8","DOIUrl":"10.1038/s42256-024-00921-8","url":null,"abstract":"Distinguishing between real and fabricated facts has long been a societal challenge. As the Internet becomes increasingly littered with AI-generated content, the need for curation and safeguarding of high-quality data and information is more crucial than ever.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 10","pages":"1119-1119"},"PeriodicalIF":18.8,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00921-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of causal effects of genes on complex traits using a Bayesian-network-based framework applied to GWAS data 使用基于贝叶斯网络的框架估算基因对复杂性状的因果效应,并将其应用于 GWAS 数据
IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-17 DOI: 10.1038/s42256-024-00906-7
Liangying Yin, Yaning Feng, Yujia Shi, Alexandria Lau, Jinghong Qiu, Pak-Chung Sham, Hon-Cheong So
Deciphering the relationships between genes and complex traits can enhance our understanding of phenotypic variations and disease mechanisms. However, determining the specific roles of individual genes and quantifying their direct and indirect causal effects on complex traits remains a significant challenge. Here we present a framework (called Bayesian network genome-wide association studies (BN-GWAS)) to decipher the total and direct causal effects of individual genes. BN-GWAS leverages imputed expression profiles from GWAS and raw expression data from a reference dataset to construct a directed gene–gene–phenotype causal network. It allows gene expression and disease traits to be evaluated in different samples, significantly improving the flexibility and applicability of the approach. It can be extended to decipher the joint causal network of two or more traits, and exhibits high specificity and precision (positive predictive value), making it particularly useful for selecting genes for follow-up studies. We verified the feasibility and validity of BN-GWAS by extensive simulations and applications to 52 traits across 14 tissues in the UK Biobank, revealing insights into their genetic architectures, including the relative contributions of direct, indirect and mediating causal genes. The identified (direct) causal genes were significantly enriched for genes highlighted in the Open Targets database. Overall, BN-GWAS provides a flexible and powerful framework for elucidating the genetic basis of complex traits through a systems-level, causal inference approach. Genome-wide association studies generate extensive data, but interpreting these data remains challenging. A Bayesian-network-based method is presented that uses imputed and raw gene expression data to decipher the causal effects of individual genes.
破译基因与复杂性状之间的关系可以加深我们对表型变异和疾病机理的理解。然而,确定单个基因的具体作用并量化它们对复杂性状的直接和间接因果效应仍然是一项重大挑战。在这里,我们提出了一个框架(称为贝叶斯网络全基因组关联研究(BN-GWAS))来解读单个基因的总体和直接因果效应。贝叶斯网络全基因组关联研究(BN-GWAS)利用来自全基因组关联研究的估算表达谱和来自参考数据集的原始表达数据构建有向基因-基因-表型因果网络。它允许在不同样本中评估基因表达和疾病性状,大大提高了该方法的灵活性和适用性。它可以扩展到解密两个或更多性状的联合因果网络,并表现出很高的特异性和精确性(阳性预测值),因此特别适用于选择基因进行后续研究。我们对英国生物库中 14 种组织的 52 个性状进行了大量模拟和应用,验证了 BN-GWAS 的可行性和有效性,揭示了这些性状的基因结构,包括直接、间接和中介因果基因的相对贡献。确定的(直接)因果基因明显富集于开放目标数据库中突出显示的基因。总之,BN-GWAS 为通过系统级因果推断方法阐明复杂性状的遗传基础提供了一个灵活而强大的框架。
{"title":"Estimation of causal effects of genes on complex traits using a Bayesian-network-based framework applied to GWAS data","authors":"Liangying Yin,&nbsp;Yaning Feng,&nbsp;Yujia Shi,&nbsp;Alexandria Lau,&nbsp;Jinghong Qiu,&nbsp;Pak-Chung Sham,&nbsp;Hon-Cheong So","doi":"10.1038/s42256-024-00906-7","DOIUrl":"10.1038/s42256-024-00906-7","url":null,"abstract":"Deciphering the relationships between genes and complex traits can enhance our understanding of phenotypic variations and disease mechanisms. However, determining the specific roles of individual genes and quantifying their direct and indirect causal effects on complex traits remains a significant challenge. Here we present a framework (called Bayesian network genome-wide association studies (BN-GWAS)) to decipher the total and direct causal effects of individual genes. BN-GWAS leverages imputed expression profiles from GWAS and raw expression data from a reference dataset to construct a directed gene–gene–phenotype causal network. It allows gene expression and disease traits to be evaluated in different samples, significantly improving the flexibility and applicability of the approach. It can be extended to decipher the joint causal network of two or more traits, and exhibits high specificity and precision (positive predictive value), making it particularly useful for selecting genes for follow-up studies. We verified the feasibility and validity of BN-GWAS by extensive simulations and applications to 52 traits across 14 tissues in the UK Biobank, revealing insights into their genetic architectures, including the relative contributions of direct, indirect and mediating causal genes. The identified (direct) causal genes were significantly enriched for genes highlighted in the Open Targets database. Overall, BN-GWAS provides a flexible and powerful framework for elucidating the genetic basis of complex traits through a systems-level, causal inference approach. Genome-wide association studies generate extensive data, but interpreting these data remains challenging. A Bayesian-network-based method is presented that uses imputed and raw gene expression data to decipher the causal effects of individual genes.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 10","pages":"1231-1244"},"PeriodicalIF":18.8,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-modal deep language model for contaminant removal from metagenome-assembled genomes 从元基因组组装基因组中清除污染物的多模式深度语言模型
IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-07 DOI: 10.1038/s42256-024-00908-5
Bohao Zou, Jingjing Wang, Yi Ding, Zhenmiao Zhang, Yufen Huang, Xiaodong Fang, Ka Chun Cheung, Simon See, Lu Zhang
Metagenome-assembled genomes (MAGs) offer valuable insights into the exploration of microbial dark matter using metagenomic sequencing data. However, there is growing concern that contamination in MAGs may substantially affect the results of downstream analysis. Current MAG decontamination tools primarily rely on marker genes and do not fully use the contextual information of genomic sequences. To overcome this limitation, we introduce Deepurify for MAG decontamination. Deepurify uses a multi-modal deep language model with contrastive learning to match microbial genomic sequences with their taxonomic lineages. It allocates contigs within a MAG to a MAG-separated tree and applies a tree traversal algorithm to partition MAGs into sub-MAGs, with the goal of maximizing the number of high- and medium-quality sub-MAGs. Here we show that Deepurify outperformed MDMclearer and MAGpurify on simulated data, CAMI datasets and real-world datasets with varying complexities. Deepurify increased the number of high-quality MAGs by 20.0% in soil, 45.1% in ocean, 45.5% in plants, 33.8% in freshwater and 28.5% in human faecal metagenomic sequencing datasets. Metagenome-assembled genomes (MAGs) provide insights into microbial dark matter, but contamination remains a concern for downstream analysis. Zou et al. develop a multi-modal deep language model that leverages microbial sequences to remove ‘unexpected’ contigs from MAGs. This approach is compatible with any contig binning tools and increases the number of high-quality bins.
元基因组组装基因组(MAGs)为利用元基因组测序数据探索微生物暗物质提供了宝贵的见解。然而,人们越来越担心,MAGs 中的污染可能会严重影响下游分析的结果。目前的 MAG 净化工具主要依赖标记基因,不能充分利用基因组序列的上下文信息。为了克服这一局限,我们推出了用于 MAG 去污的 Deepurify。Deepurify 使用具有对比学习功能的多模态深度语言模型来匹配微生物基因组序列及其分类学系谱。它将 MAG 中的等位基因分配到一棵 MAG 分离树上,并应用树遍历算法将 MAG 划分为子 MAG,目的是最大限度地增加高质量和中等质量子 MAG 的数量。在这里,我们展示了 Deepurify 在模拟数据、CAMI 数据集和具有不同复杂性的真实世界数据集上的表现优于 MDMclearer 和 MAGpurify。在土壤、海洋、植物、淡水和人类粪便元基因组测序数据集中,Deepurify 使高质量 MAG 的数量分别增加了 20.0%、45.1%、45.5%、33.8% 和 28.5%。
{"title":"A multi-modal deep language model for contaminant removal from metagenome-assembled genomes","authors":"Bohao Zou,&nbsp;Jingjing Wang,&nbsp;Yi Ding,&nbsp;Zhenmiao Zhang,&nbsp;Yufen Huang,&nbsp;Xiaodong Fang,&nbsp;Ka Chun Cheung,&nbsp;Simon See,&nbsp;Lu Zhang","doi":"10.1038/s42256-024-00908-5","DOIUrl":"10.1038/s42256-024-00908-5","url":null,"abstract":"Metagenome-assembled genomes (MAGs) offer valuable insights into the exploration of microbial dark matter using metagenomic sequencing data. However, there is growing concern that contamination in MAGs may substantially affect the results of downstream analysis. Current MAG decontamination tools primarily rely on marker genes and do not fully use the contextual information of genomic sequences. To overcome this limitation, we introduce Deepurify for MAG decontamination. Deepurify uses a multi-modal deep language model with contrastive learning to match microbial genomic sequences with their taxonomic lineages. It allocates contigs within a MAG to a MAG-separated tree and applies a tree traversal algorithm to partition MAGs into sub-MAGs, with the goal of maximizing the number of high- and medium-quality sub-MAGs. Here we show that Deepurify outperformed MDMclearer and MAGpurify on simulated data, CAMI datasets and real-world datasets with varying complexities. Deepurify increased the number of high-quality MAGs by 20.0% in soil, 45.1% in ocean, 45.5% in plants, 33.8% in freshwater and 28.5% in human faecal metagenomic sequencing datasets. Metagenome-assembled genomes (MAGs) provide insights into microbial dark matter, but contamination remains a concern for downstream analysis. Zou et al. develop a multi-modal deep language model that leverages microbial sequences to remove ‘unexpected’ contigs from MAGs. This approach is compatible with any contig binning tools and increases the number of high-quality bins.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 10","pages":"1245-1255"},"PeriodicalIF":18.8,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142383814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A call for an industry-led initiative to critically assess machine learning for real-world drug discovery 呼吁发起一项由行业主导的倡议,对机器学习在实际药物研发中的应用进行严格评估
IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-04 DOI: 10.1038/s42256-024-00911-w
Cas Wognum, Jeremy R. Ash, Matteo Aldeghi, Raquel Rodríguez-Pérez, Cheng Fang, Alan C. Cheng, Daniel J. Price, Djork-Arné Clevert, Ola Engkvist, W. Patrick Walters
{"title":"A call for an industry-led initiative to critically assess machine learning for real-world drug discovery","authors":"Cas Wognum,&nbsp;Jeremy R. Ash,&nbsp;Matteo Aldeghi,&nbsp;Raquel Rodríguez-Pérez,&nbsp;Cheng Fang,&nbsp;Alan C. Cheng,&nbsp;Daniel J. Price,&nbsp;Djork-Arné Clevert,&nbsp;Ola Engkvist,&nbsp;W. Patrick Walters","doi":"10.1038/s42256-024-00911-w","DOIUrl":"10.1038/s42256-024-00911-w","url":null,"abstract":"","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 10","pages":"1120-1121"},"PeriodicalIF":18.8,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engineering flexible machine learning systems by traversing functionally invariant paths 通过遍历功能不变路径来设计灵活的机器学习系统
IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-03 DOI: 10.1038/s42256-024-00902-x
Guruprasad Raghavan, Bahey Tharwat, Surya Narayanan Hari, Dhruvil Satani, Rex Liu, Matt Thomson
Contemporary machine learning algorithms train artificial neural networks by setting network weights to a single optimized configuration through gradient descent on task-specific training data. The resulting networks can achieve human-level performance on natural language processing, image analysis and agent-based tasks, but lack the flexibility and robustness characteristic of human intelligence. Here we introduce a differential geometry framework—functionally invariant paths—that provides flexible and continuous adaptation of trained neural networks so that secondary tasks can be achieved beyond the main machine learning goal, including increased network sparsification and adversarial robustness. We formulate the weight space of a neural network as a curved Riemannian manifold equipped with a metric tensor whose spectrum defines low-rank subspaces in weight space that accommodate network adaptation without loss of prior knowledge. We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives. With modest computational resources, the functionally invariant path algorithm achieves performance comparable with or exceeding state-of-the-art methods including low-rank adaptation on continual learning, sparsification and adversarial robustness tasks for large language models (bidirectional encoder representations from transformers), vision transformers (ViT and DeIT) and convolutional neural networks. Machine learning often includes secondary objectives, such as sparsity or robustness. To reach these objectives efficiently, the training of a neural network has been interpreted as the exploration of functionally invariant paths in the parameter space.
当代机器学习算法通过对特定任务的训练数据进行梯度下降,将网络权重设置为单一优化配置,从而训练人工神经网络。由此产生的网络可以在自然语言处理、图像分析和基于代理的任务中实现人类水平的性能,但缺乏人类智能所特有的灵活性和鲁棒性。在这里,我们引入了一个微分几何框架--功能不变路径,它能对训练有素的神经网络进行灵活、持续的调整,从而实现主要机器学习目标之外的次要任务,包括增加网络稀疏性和对抗鲁棒性。我们将神经网络的权重空间表述为一个弯曲的黎曼流形,该流形配备了一个度量张量,其频谱定义了权重空间中的低秩子空间,可在不丢失先验知识的情况下适应网络。我们将适应性形式化为沿着权重空间中的大地路径移动,同时搜索可满足次要目标的网络。在计算资源有限的情况下,功能不变路径算法在大型语言模型(变换器的双向编码器表示)、视觉变换器(ViT 和 DeIT)和卷积神经网络的持续学习、稀疏化和对抗鲁棒性任务中,实现了与最先进方法(包括低阶自适应)相当甚至更高的性能。
{"title":"Engineering flexible machine learning systems by traversing functionally invariant paths","authors":"Guruprasad Raghavan,&nbsp;Bahey Tharwat,&nbsp;Surya Narayanan Hari,&nbsp;Dhruvil Satani,&nbsp;Rex Liu,&nbsp;Matt Thomson","doi":"10.1038/s42256-024-00902-x","DOIUrl":"10.1038/s42256-024-00902-x","url":null,"abstract":"Contemporary machine learning algorithms train artificial neural networks by setting network weights to a single optimized configuration through gradient descent on task-specific training data. The resulting networks can achieve human-level performance on natural language processing, image analysis and agent-based tasks, but lack the flexibility and robustness characteristic of human intelligence. Here we introduce a differential geometry framework—functionally invariant paths—that provides flexible and continuous adaptation of trained neural networks so that secondary tasks can be achieved beyond the main machine learning goal, including increased network sparsification and adversarial robustness. We formulate the weight space of a neural network as a curved Riemannian manifold equipped with a metric tensor whose spectrum defines low-rank subspaces in weight space that accommodate network adaptation without loss of prior knowledge. We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives. With modest computational resources, the functionally invariant path algorithm achieves performance comparable with or exceeding state-of-the-art methods including low-rank adaptation on continual learning, sparsification and adversarial robustness tasks for large language models (bidirectional encoder representations from transformers), vision transformers (ViT and DeIT) and convolutional neural networks. Machine learning often includes secondary objectives, such as sparsity or robustness. To reach these objectives efficiently, the training of a neural network has been interpreted as the exploration of functionally invariant paths in the parameter space.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 10","pages":"1179-1196"},"PeriodicalIF":18.8,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00902-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142369346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Nature Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1