Nature computational science最新文献

英文中文

Enabling efficient analysis of biobank-scale data with genotype representation graphs 通过基因型表示图实现生物库规模数据的有效分析。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-12-05 DOI: 10.1038/s43588-024-00739-9

Drew DeHaas, Ziqing Pan, Xinzhu Wei

Computational analysis of a large number of genomes requires a data structure that can represent the dataset compactly while also enabling efficient operations on variants and samples. However, encoding genetic data in existing tabular data structures and file formats has become costly and unsustainable. Here we introduce the genotype representation graph (GRG), a fully connected hierarchical data structure that losslessly encodes phased whole-genome polymorphisms. Exploiting variant-sharing across samples enables GRG to compress 200,000 UK Biobank phased human genomes to 5–26 gigabytes per chromosome, also enabling graph-traversal algorithms to reuse computed values in random access memory. Constructing and processing GRG files scales to a million whole genomes. Using allele frequencies and association effects as examples, we show that computation on GRG via graph traversal runs the fastest among all tested alternatives. GRG-based algorithms have the potential to increase the scalability and reduce the cost of analyzing large genomic datasets. The genotype representation graph (GRG) is a compact data structure that encodes 200,000 human genomes in just 5–26 gigabytes per chromosome. Computation on GRG via graph traversal greatly accelerates genome-wide analysis.

大量基因组的计算分析需要一种能够紧凑地表示数据集的数据结构，同时还能够对变体和样本进行有效的操作。然而，在现有的表格数据结构和文件格式中编码遗传数据已经变得昂贵且不可持续。在这里，我们介绍了基因型表示图（GRG），这是一种完全连接的分层数据结构，可以无损地编码分阶段的全基因组多态性。利用样本间的变异共享，GRG可以将20万个英国生物银行分阶段的人类基因组压缩到每条染色体5-26千兆字节，还可以使图遍历算法在随机访问存储器中重用计算值。构建和处理GRG文件可以扩展到一百万个完整基因组。以等位基因频率和关联效应为例，我们表明通过图遍历在GRG上的计算在所有测试的替代方案中运行最快。基于grg的算法具有提高可扩展性和降低分析大型基因组数据集成本的潜力。

{"title":"Enabling efficient analysis of biobank-scale data with genotype representation graphs","authors":"Drew DeHaas, Ziqing Pan, Xinzhu Wei","doi":"10.1038/s43588-024-00739-9","DOIUrl":"10.1038/s43588-024-00739-9","url":null,"abstract":"Computational analysis of a large number of genomes requires a data structure that can represent the dataset compactly while also enabling efficient operations on variants and samples. However, encoding genetic data in existing tabular data structures and file formats has become costly and unsustainable. Here we introduce the genotype representation graph (GRG), a fully connected hierarchical data structure that losslessly encodes phased whole-genome polymorphisms. Exploiting variant-sharing across samples enables GRG to compress 200,000 UK Biobank phased human genomes to 5–26 gigabytes per chromosome, also enabling graph-traversal algorithms to reuse computed values in random access memory. Constructing and processing GRG files scales to a million whole genomes. Using allele frequencies and association effects as examples, we show that computation on GRG via graph traversal runs the fastest among all tested alternatives. GRG-based algorithms have the potential to increase the scalability and reduce the cost of analyzing large genomic datasets. The genotype representation graph (GRG) is a compact data structure that encodes 200,000 human genomes in just 5–26 gigabytes per chromosome. Computation on GRG via graph traversal greatly accelerates genome-wide analysis.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 2","pages":"112-124"},"PeriodicalIF":12.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142788035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Teaching spin symmetry while learning neural network wave functions 在学习神经网络波函数的同时教授自旋对称。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-12-04 DOI: 10.1038/s43588-024-00727-z

Yongle Li, Yuhao Chen, Xiao He

By developing an efficient spin symmetry penalty, a recent study has substantially accelerated the calculation of accurate energies with correct spin states in variational Monte Carlo for both ground and excited states of quantum many-particle systems.

通过开发一种有效的自旋对称惩罚，最近的一项研究大大加快了量子多粒子系统的基态和激发态的变分蒙特卡洛精确能量的计算。

引用次数: 0

Deep learning training dynamics analysis for single-cell data 单细胞数据的深度学习训练动态分析。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-12-04 DOI: 10.1038/s43588-024-00728-y

Inspired by recent approaches for natural language processing and computer vision, we developed Annotatability, a framework that analyzes deep neural network training dynamics to interpret pre-annotated single-cell and spatial omics data. Annotatability identified erroneous annotations and ambiguous cell states, inferred trajectories from binary labels, and revealed underlying biological signals.

受最近自然语言处理和计算机视觉方法的启发，我们开发了Annotatability，这是一个分析深度神经网络训练动态以解释预注释单细胞和空间组学数据的框架。可注释性识别错误的注释和模糊的细胞状态，从二元标签推断轨迹，并揭示潜在的生物信号。

引用次数: 0

Spin-symmetry-enforced solution of the many-body Schrödinger equation with a deep neural network 用深度神经网络求解多体Schrödinger方程的自旋对称强制解。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-12-04 DOI: 10.1038/s43588-024-00730-4

Zhe Li, Zixiang Lu, Ruichen Li, Xuelan Wen, Xiang Li, Liwei Wang, Ji Chen, Weiluo Ren

The integration of deep neural networks with the variational Monte Carlo (VMC) method has marked a substantial advancement in solving the Schrödinger equation. In this work we enforce spin symmetry in the neural-network-based VMC calculation using a modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low computational cost. It predicts accurate energies while maintaining the correct symmetry in strongly correlated systems, even in cases in which different spin states are nearly degenerate. Our approach also excels at spin–gap calculations, including the singlet–triplet gap in biradical systems, which is of high interest in photochemistry. Overall, this work establishes a robust framework for efficiently calculating various quantum states with specific spin symmetry in correlated systems. An efficient approach is developed to enforce spin symmetry for neural network wavefunctions when solving the many-body Schrödinger equation. This enables accurate and spin-pure simulations of both ground and excited states.

深度神经网络与变分蒙特卡罗（VMC）方法的集成在求解Schrödinger方程方面取得了实质性进展。在这项工作中，我们使用改进的优化目标在基于神经网络的VMC计算中增强自旋对称性。我们的方法旨在以较低的计算成本求解具有目标自旋对称性的基态和多激发态。它预测了精确的能量，同时在强相关系统中保持了正确的对称性，即使在不同的自旋态几乎是简并的情况下也是如此。我们的方法也擅长于自旋间隙计算，包括双基系统中的单线态-三重态间隙，这在光化学中具有很高的兴趣。总的来说，这项工作为有效计算相关系统中具有特定自旋对称性的各种量子态建立了一个健壮的框架。

{"title":"Spin-symmetry-enforced solution of the many-body Schrödinger equation with a deep neural network","authors":"Zhe Li, Zixiang Lu, Ruichen Li, Xuelan Wen, Xiang Li, Liwei Wang, Ji Chen, Weiluo Ren","doi":"10.1038/s43588-024-00730-4","DOIUrl":"10.1038/s43588-024-00730-4","url":null,"abstract":"The integration of deep neural networks with the variational Monte Carlo (VMC) method has marked a substantial advancement in solving the Schrödinger equation. In this work we enforce spin symmetry in the neural-network-based VMC calculation using a modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low computational cost. It predicts accurate energies while maintaining the correct symmetry in strongly correlated systems, even in cases in which different spin states are nearly degenerate. Our approach also excels at spin–gap calculations, including the singlet–triplet gap in biradical systems, which is of high interest in photochemistry. Overall, this work establishes a robust framework for efficiently calculating various quantum states with specific spin symmetry in correlated systems. An efficient approach is developed to enforce spin symmetry for neural network wavefunctions when solving the many-body Schrödinger equation. This enables accurate and spin-pure simulations of both ground and excited states.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 12","pages":"910-919"},"PeriodicalIF":12.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142782028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpreting single-cell and spatial omics data using deep neural network training dynamics 利用深度神经网络训练动力学解释单细胞和空间组学数据。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-12-04 DOI: 10.1038/s43588-024-00721-5

Jonathan Karin, Reshef Mintz, Barak Raveh, Mor Nitzan

Single-cell and spatial omics datasets can be organized and interpreted by annotating single cells to distinct types, states, locations or phenotypes. However, cell annotations are inherently ambiguous, as discrete labels with subjective interpretations are assigned to heterogeneous cell populations on the basis of noisy, sparse and high-dimensional data. Here we developed Annotatability, a framework for identifying annotation mismatches and characterizing biological data structure by monitoring the dynamics and difficulty of training a deep neural network over such annotated data. Following this, we developed a signal-aware graph embedding method that enables downstream analysis of biological signals. This embedding captures cellular communities associated with target signals. Using Annotatability, we address key challenges in the interpretation of genomic data, demonstrated over eight single-cell RNA sequencing and spatial omics datasets, including identifying erroneous annotations and intermediate cell states, delineating developmental or disease trajectories, and capturing cellular heterogeneity. These results underscore the broad applicability of annotation-trainability analysis via Annotatability for unraveling cellular diversity and interpreting collective cell behaviors in health and disease. The Annotatability framework analyzes neural network training dynamics to interpret single-cell and spatial omics data. It identifies erroneous annotations and ambiguous cell states, infers trajectories from binary labels and enables signal-aware analysis.

单细胞组学和空间组学数据集可以通过将单细胞标注为不同的类型、状态、位置或表型来组织和解释。然而，细胞注释本质上是模糊的，因为在嘈杂、稀疏和高维数据的基础上，将带有主观解释的离散标签分配给异质细胞群体。在这里，我们开发了可注释性，这是一个通过监测在这些注释数据上训练深度神经网络的动态和难度来识别注释不匹配和表征生物数据结构的框架。在此之后，我们开发了一种信号感知图嵌入方法，可以对生物信号进行下游分析。这种嵌入捕捉与目标信号相关的细胞群落。利用可注释性，我们解决了基因组数据解释中的关键挑战，展示了超过8个单细胞RNA测序和空间组学数据集，包括识别错误注释和中间细胞状态，描绘发育或疾病轨迹，以及捕获细胞异质性。这些结果强调了注解可训练性分析在揭示细胞多样性和解释健康和疾病中的集体细胞行为方面的广泛适用性。

{"title":"Interpreting single-cell and spatial omics data using deep neural network training dynamics","authors":"Jonathan Karin, Reshef Mintz, Barak Raveh, Mor Nitzan","doi":"10.1038/s43588-024-00721-5","DOIUrl":"10.1038/s43588-024-00721-5","url":null,"abstract":"Single-cell and spatial omics datasets can be organized and interpreted by annotating single cells to distinct types, states, locations or phenotypes. However, cell annotations are inherently ambiguous, as discrete labels with subjective interpretations are assigned to heterogeneous cell populations on the basis of noisy, sparse and high-dimensional data. Here we developed Annotatability, a framework for identifying annotation mismatches and characterizing biological data structure by monitoring the dynamics and difficulty of training a deep neural network over such annotated data. Following this, we developed a signal-aware graph embedding method that enables downstream analysis of biological signals. This embedding captures cellular communities associated with target signals. Using Annotatability, we address key challenges in the interpretation of genomic data, demonstrated over eight single-cell RNA sequencing and spatial omics datasets, including identifying erroneous annotations and intermediate cell states, delineating developmental or disease trajectories, and capturing cellular heterogeneity. These results underscore the broad applicability of annotation-trainability analysis via Annotatability for unraveling cellular diversity and interpreting collective cell behaviors in health and disease. The Annotatability framework analyzes neural network training dynamics to interpret single-cell and spatial omics data. It identifies erroneous annotations and ambiguous cell states, infers trajectories from binary labels and enables signal-aware analysis.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 12","pages":"941-954"},"PeriodicalIF":12.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00721-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142782027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model. 基于预训练大型语言模型的人类蛋白质本质综合预测与分析。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-11-27 DOI: 10.1038/s43588-024-00733-1

Boming Kang, Rui Fan, Chunmei Cui, Qinghua Cui

Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.

人类必需蛋白（HEPs）是个体存活和发育不可或缺的物质。然而，鉴定 HEPs 的实验方法往往成本高昂、耗时长且劳动强度大。此外，现有的计算方法只能在细胞系水平预测 HEPs，但不同的活人、细胞系和动物模型的 HEPs 都不尽相同。在这里，我们通过微调预训练的蛋白质语言模型，开发了一种基于序列的深度学习模型--蛋白质重要性计算器（PIC）。PIC 不仅大大优于现有的 HEPs 预测方法，还能提供跨越人类、细胞系和小鼠三个层次的综合预测结果。此外，我们还定义了由 PIC 得出的蛋白质本质分数，用于量化人类蛋白质的本质，并通过一系列生物学分析验证了其有效性。我们还通过确定潜在的乳腺癌预后生物标记物和量化 617,462 个人类微蛋白的本质，证明了蛋白质本质分数的生物医学价值。

引用次数: 0

Harnessing the power of DNA for computing 利用 DNA 的力量进行计算

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-11-21 DOI: 10.1038/s43588-024-00742-0

We discuss the thirty-year anniversary of the seminal work on DNA computing and its implications for the field of biotechnology.

我们讨论了 DNA 计算开创性工作 30 周年及其对生物技术领域的影响。

引用次数: 0

Collective deliberation driven by AI 由人工智能驱动的集体审议。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-11-18 DOI: 10.1038/s43588-024-00736-y

Fernando Chirigati

引用次数: 0

Harnessing deep learning to build optimized ligands 利用深度学习构建优化配体。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-11-14 DOI: 10.1038/s43588-024-00725-1

Orestis A. Ntintas, Theodoros Daglis, Vassilis G. Gorgoulis

A recent study proposes DeepBlock, a deep learning-based approach for generating ligands with targeted properties, such as low toxicity and high affinity with the given target. This approach outperforms existing methods in the field while maintaining synthetic accessibility and drug-likeness.

最近的一项研究提出了 DeepBlock，这是一种基于深度学习的方法，用于生成具有靶向特性（如低毒性和与给定靶点的高亲和性）的配体。这种方法优于该领域的现有方法，同时保持了合成的可及性和药物相似性。

引用次数: 0

MassiveFold: unveiling AlphaFold’s hidden potential with optimized and parallelized massive sampling MassiveFold：通过优化和并行化的大规模采样挖掘 AlphaFold 隐藏的潜力。

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science

Pub Date : 2024-11-11 DOI: 10.1038/s43588-024-00714-4

Nessim Raouraoua, Claudio Mirabello, Thibaut Véry, Christophe Blanchet, Björn Wallner, Marc F. Lensink, Guillaume Brysbaert

Massive sampling in AlphaFold enables access to increased structural diversity. In combination with its efficient confidence ranking, this unlocks elevated modeling capabilities for monomeric structures and foremost for protein assemblies. However, the approach struggles with GPU cost and data storage. Here we introduce MassiveFold, an optimized and customizable version of AlphaFold that runs predictions in parallel, reducing the computing time from several months to hours. MassiveFold is scalable and able to run on anything from a single computer to a large GPU infrastructure, where it can fully benefit from all the computing nodes. Although AlphaFold is very efficient for protein structure prediction, massive sampling is a very GPU demanding task. MassiveFold overcomes this limitation, being capable of parallelizing structure prediction computation.

AlphaFold 中的大规模采样可以提高结构的多样性。结合其高效的置信度排序，这就为单体结构和最重要的蛋白质组装释放了更高的建模能力。然而，这种方法在 GPU 成本和数据存储方面存在困难。在这里，我们介绍了MassiveFold，它是AlphaFold的优化和定制版本，可以并行运行预测，将计算时间从几个月缩短到几个小时。MassiveFold具有可扩展性，可以运行在从单台计算机到大型GPU基础架构的任何地方，从而充分受益于所有计算节点。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Nature computational science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀