首页 > 最新文献

Nature computational science最新文献

英文 中文
Accelerating scientific progress with preprints 利用预印本加快科学进步。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-29 DOI: 10.1038/s43588-024-00641-4
We recognize the importance of preprint posting in communicating research findings and encourage our authors to make use of this service.
我们认识到预印本发布在交流研究成果方面的重要性,并鼓励我们的作者利用这项服务。
{"title":"Accelerating scientific progress with preprints","authors":"","doi":"10.1038/s43588-024-00641-4","DOIUrl":"10.1038/s43588-024-00641-4","url":null,"abstract":"We recognize the importance of preprint posting in communicating research findings and encourage our authors to make use of this service.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"311-311"},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00641-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outsourcing eureka moments to artificial intelligence 将 "尤里卡时刻 "外包给人工智能
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-24 DOI: 10.1038/s43588-024-00633-4
Martijn Meeter
A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.
本文提出了一种两阶段学习算法,可直接从大规模训练日志数据中挖掘出技能习得规则的符号表示。
{"title":"Outsourcing eureka moments to artificial intelligence","authors":"Martijn Meeter","doi":"10.1038/s43588-024-00633-4","DOIUrl":"10.1038/s43588-024-00633-4","url":null,"abstract":"A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"314-315"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete latent embeddings illuminate cellular diversity in single-cell epigenomics 离散潜隐嵌入揭示单细胞表观组学中的细胞多样性
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-24 DOI: 10.1038/s43588-024-00634-3
Zhi Wei
CASTLE, a deep learning approach, extracts interpretable discrete representations from single-cell chromatin accessibility data, enabling accurate cell type identification, effective data integration, and quantitative insights into gene regulatory mechanisms.
CASTLE 是一种深度学习方法,可从单细胞染色质可及性数据中提取可解释的离散表示,从而实现准确的细胞类型鉴定、有效的数据整合以及对基因调控机制的定量洞察。
{"title":"Discrete latent embeddings illuminate cellular diversity in single-cell epigenomics","authors":"Zhi Wei","doi":"10.1038/s43588-024-00634-3","DOIUrl":"10.1038/s43588-024-00634-3","url":null,"abstract":"CASTLE, a deep learning approach, extracts interpretable discrete representations from single-cell chromatin accessibility data, enabling accurate cell type identification, effective data integration, and quantitative insights into gene regulatory mechanisms.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"316-317"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated discovery of symbolic laws governing skill acquisition from naturally occurring data 从自然发生的数据中自动发现支配技能习得的符号法则
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-24 DOI: 10.1038/s43588-024-00629-0
Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang
Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and an algorithmic explosion in searching. A deep learning model is initially employed to determine the learner’s cognitive state and assess the feature importance. Symbolic regression algorithms are then used to parse the neural network model into algebraic equations. Experimental results show that the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings. This paper introduces an algorithm to uncover laws of skill acquisition from naturally occurring data. By combining deep learning and symbolic regression, it accurately identifies cognitive states and extracts algebraic equations.
技能习得是认知心理学的一个重要研究领域,因为它包含多种心理过程。在实验范式下发现的规律存在争议,缺乏普适性。本文旨在从大规模训练日志数据中发掘技能学习的规律。本文开发了一种两阶段算法,以解决认知状态不可观测和搜索算法爆炸的问题。首先采用深度学习模型来确定学习者的认知状态并评估特征的重要性。然后使用符号回归算法将神经网络模型解析为代数方程。实验结果表明,该算法能在连续反馈设置的噪声范围内准确还原预设规律。当应用于 Lumosity 训练数据时,该方法在适配性方面优于传统模型和最新模型。这项研究揭示了技能习得规律的两种新形式,并再次证实了之前的一些发现。
{"title":"Automated discovery of symbolic laws governing skill acquisition from naturally occurring data","authors":"Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang","doi":"10.1038/s43588-024-00629-0","DOIUrl":"10.1038/s43588-024-00629-0","url":null,"abstract":"Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and an algorithmic explosion in searching. A deep learning model is initially employed to determine the learner’s cognitive state and assess the feature importance. Symbolic regression algorithms are then used to parse the neural network model into algebraic equations. Experimental results show that the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings. This paper introduces an algorithm to uncover laws of skill acquisition from naturally occurring data. By combining deep learning and symbolic regression, it accurately identifies cognitive states and extracts algebraic equations.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"334-345"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing semiconductor materials and devices in the post-Moore era by tackling computational challenges with data-driven strategies 通过数据驱动战略应对计算挑战,设计后摩尔时代的半导体材料和器件。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-23 DOI: 10.1038/s43588-024-00632-5
Jiahao Xie, Yansong Zhou, Muhammad Faizan, Zewei Li, Tianshu Li, Yuhao Fu, Xinjiang Wang, Lijun Zhang
In the post-Moore’s law era, the progress of electronics relies on discovering superior semiconductor materials and optimizing device fabrication. Computational methods, augmented by emerging data-driven strategies, offer a promising alternative to the traditional trial-and-error approach. In this Perspective, we highlight data-driven computational frameworks for enhancing semiconductor discovery and device development by elaborating on their advances in exploring the materials design space, predicting semiconductor properties and optimizing device fabrication, with a concluding discussion on the challenges and opportunities in these areas. Discovering improved semiconductor materials is essential for optimal device fabrication. In this Perspective, data-driven computational frameworks for semiconductor discovery and device development are discussed, including the challenges and opportunities moving forward.
在后摩尔定律时代,电子技术的进步有赖于发现优异的半导体材料和优化设备制造。计算方法在新兴数据驱动策略的辅助下,为传统的试错法提供了一种前景广阔的替代方案。在本《视角》中,我们将重点介绍数据驱动的计算框架,通过阐述这些框架在探索材料设计空间、预测半导体特性和优化器件制造方面的进展,来加强半导体的发现和器件的开发,并对这些领域的挑战和机遇进行总结性讨论。
{"title":"Designing semiconductor materials and devices in the post-Moore era by tackling computational challenges with data-driven strategies","authors":"Jiahao Xie, Yansong Zhou, Muhammad Faizan, Zewei Li, Tianshu Li, Yuhao Fu, Xinjiang Wang, Lijun Zhang","doi":"10.1038/s43588-024-00632-5","DOIUrl":"10.1038/s43588-024-00632-5","url":null,"abstract":"In the post-Moore’s law era, the progress of electronics relies on discovering superior semiconductor materials and optimizing device fabrication. Computational methods, augmented by emerging data-driven strategies, offer a promising alternative to the traditional trial-and-error approach. In this Perspective, we highlight data-driven computational frameworks for enhancing semiconductor discovery and device development by elaborating on their advances in exploring the materials design space, predicting semiconductor properties and optimizing device fabrication, with a concluding discussion on the challenges and opportunities in these areas. Discovering improved semiconductor materials is essential for optimal device fabrication. In this Perspective, data-driven computational frameworks for semiconductor discovery and device development are discussed, including the challenges and opportunities moving forward.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"322-333"},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141087004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shuffling haplotypes to share reference panels for imputation 对单倍型进行洗牌,以共享用于估算的参考面板。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-22 DOI: 10.1038/s43588-024-00640-5
We present a method to alleviate re-identification risks behind sharing haplotype reference panels for imputation. In an anonymized reference panel, one might try to infer the genomes’ phenotypes to re-identify their owner. Our method protects against such attack by shuffling the reference panels genomes while maintaining imputation accuracy.
我们提出了一种方法来降低共享单倍型参考面板进行归因时的再识别风险。在匿名参考面板中,人们可能会试图推断基因组的表型来重新识别其所有者。我们的方法在保持估算准确性的同时,通过洗牌参考面板基因组来防止这种攻击。
{"title":"Shuffling haplotypes to share reference panels for imputation","authors":"","doi":"10.1038/s43588-024-00640-5","DOIUrl":"10.1038/s43588-024-00640-5","url":null,"abstract":"We present a method to alleviate re-identification risks behind sharing haplotype reference panels for imputation. In an anonymized reference panel, one might try to infer the genomes’ phenotypes to re-identify their owner. Our method protects against such attack by shuffling the reference panels genomes while maintaining imputation accuracy.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"320-321"},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A resampling-based approach to share reference panels 基于重采样的共享参考面板方法。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-14 DOI: 10.1038/s43588-024-00630-7
Théo Cavinato, Simone Rubinacci, Anna-Sapfo Malaspinas, Olivier Delaneau
For many genome-wide association studies, imputing genotypes from a haplotype reference panel is a necessary step. Over the past 15 years, reference panels have become larger and more diverse, leading to improvements in imputation accuracy. However, the latest generation of reference panels is subject to restrictions on data sharing due to concerns about privacy, limiting their usefulness for genotype imputation. In this context, here we propose RESHAPE, a method that employs a recombination Poisson process on a reference panel to simulate the genomes of hypothetical descendants after multiple generations. This data transformation helps to protect against re-identification threats and preserves data attributes, such as linkage disequilibrium patterns and, to some degree, identity-by-descent sharing, allowing for genotype imputation. Our experiments on gold-standard datasets show that simulated descendants up to eight generations can serve as reference panels without substantially reducing genotype imputation accuracy. The authors develop the tool RESHAPE to share reference panels in a safer way. The genome–phenome links in reference panels can generate re-identification threats and RESHAPE breaks these links by shuffling haplotypes while preserving imputation accuracy.
对于许多全基因组关联研究来说,从单倍型参考面板推算基因型是一个必要的步骤。在过去的 15 年中,参考面板的规模越来越大,种类也越来越多,从而提高了归因的准确性。然而,由于对隐私的担忧,最新一代的参考面板在数据共享方面受到了限制,从而限制了它们在基因型推算方面的作用。在这种情况下,我们在这里提出了 RESHAPE,一种在参考面板上采用重组泊松过程来模拟多代后假设后代基因组的方法。这种数据转换有助于防止再识别威胁,并保留数据属性,如连锁不平衡模式,以及一定程度上的后代身份共享,从而实现基因型估算。我们在黄金标准数据集上的实验表明,长达八代的模拟后代可以作为参考面板,而不会大大降低基因型估算的准确性。
{"title":"A resampling-based approach to share reference panels","authors":"Théo Cavinato, Simone Rubinacci, Anna-Sapfo Malaspinas, Olivier Delaneau","doi":"10.1038/s43588-024-00630-7","DOIUrl":"10.1038/s43588-024-00630-7","url":null,"abstract":"For many genome-wide association studies, imputing genotypes from a haplotype reference panel is a necessary step. Over the past 15 years, reference panels have become larger and more diverse, leading to improvements in imputation accuracy. However, the latest generation of reference panels is subject to restrictions on data sharing due to concerns about privacy, limiting their usefulness for genotype imputation. In this context, here we propose RESHAPE, a method that employs a recombination Poisson process on a reference panel to simulate the genomes of hypothetical descendants after multiple generations. This data transformation helps to protect against re-identification threats and preserves data attributes, such as linkage disequilibrium patterns and, to some degree, identity-by-descent sharing, allowing for genotype imputation. Our experiments on gold-standard datasets show that simulated descendants up to eight generations can serve as reference panels without substantially reducing genotype imputation accuracy. The authors develop the tool RESHAPE to share reference panels in a safer way. The genome–phenome links in reference panels can generate re-identification threats and RESHAPE breaks these links by shuffling haplotypes while preserving imputation accuracy.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"360-366"},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00630-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multidimensional dataset for structure-based machine learning 基于结构的机器学习多维数据集。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-14 DOI: 10.1038/s43588-024-00631-6
Matthew Holcomb, Stefano Forli
MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.
MISATO 是一个用于基于结构的药物发现的数据集,它结合了约 20,000 种蛋白质配体结构的量子力学特性数据和分子动力学模拟,大大扩展了社区可用的数据量,并具有推进药物发现工作的潜力。
{"title":"A multidimensional dataset for structure-based machine learning","authors":"Matthew Holcomb, Stefano Forli","doi":"10.1038/s43588-024-00631-6","DOIUrl":"10.1038/s43588-024-00631-6","url":null,"abstract":"MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"318-319"},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery MISATO:基于结构发现药物的蛋白质配体机器学习数据集。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-10 DOI: 10.1038/s43588-024-00627-2
Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. MISATO is a database for structure-based drug discovery that combines quantum mechanics data with molecular dynamics simulations on ~20,000 protein–ligand structures. The artificial intelligence models included provide an easy entry point for the machine learning and drug discovery communities.
大型语言模型极大地增强了我们理解生物学和化学的能力,但基于结构的药物发现、量子化学和结构生物学的稳健方法仍然稀缺。大型语言模型迫切需要精确的生物分子-配体相互作用数据集。为了解决这个问题,我们提出了 MISATO 数据集,该数据集结合了小分子的量子力学性质以及对约 20,000 个实验性蛋白质-配体复合物的相关分子动力学模拟,并对实验数据进行了广泛验证。从现有的实验结构开始,半经验量子力学被用来系统地完善这些结构。我们收集了大量显水中蛋白质配体复合物的分子动力学轨迹,累积时间超过 170 μs。我们举例说明了机器学习(ML)基线模型,证明利用我们的数据提高了准确性。我们为机器学习专家提供了一个简便的切入点,使下一代药物发现人工智能模型成为可能。
{"title":"MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery","authors":"Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz","doi":"10.1038/s43588-024-00627-2","DOIUrl":"10.1038/s43588-024-00627-2","url":null,"abstract":"Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. MISATO is a database for structure-based drug discovery that combines quantum mechanics data with molecular dynamics simulations on ~20,000 protein–ligand structures. The artificial intelligence models included provide an easy entry point for the machine learning and drug discovery communities.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"367-378"},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00627-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity 用于揭示细胞异质性的单细胞染色质可及性测序数据的离散潜在嵌入。
Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-10 DOI: 10.1038/s43588-024-00625-4
Xuejian Cui, Xiaoyang Chen, Zhen Li, Zijing Gao, Shengquan Chen, Rui Jiang
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models—especially variational autoencoders—have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE’s capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively. A method based on a vector-quantized variational autoencoder, called CASTLE, can interpretably extract discrete latent embeddings and quantitatively generate the cell-type-specific feature spectrum for single-cell chromatin accessibility sequencing data.
单细胞表观基因组数据正以前所未有的速度持续增长,但其高维性和稀疏性等特点给下游分析带来了巨大挑战。虽然深度学习模型--尤其是变异自动编码器--已被广泛用于捕捉低维特征嵌入,但流行的高斯假设与实际数据有些不符,而且这些模型往往难以纳入来自丰富细胞图谱的参考信息。在这里,我们提出了 CASTLE,一种基于向量量化变异自动编码器框架的深度生成模型,用于提取离散的潜在嵌入,以解释单细胞染色质可及性测序数据的特征。与最先进的方法相比,我们验证了 CASTLE 在准确识别细胞类型和合理可视化方面的性能和稳健性。我们证明了 CASTLE 以弱监督或监督方式有效整合现有海量参考数据集的优势。我们进一步证明了 CASTLE 能够直观地提炼出细胞类型特异性特征谱,从而定量地揭示细胞的异质性和生物学意义。
{"title":"Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity","authors":"Xuejian Cui, Xiaoyang Chen, Zhen Li, Zijing Gao, Shengquan Chen, Rui Jiang","doi":"10.1038/s43588-024-00625-4","DOIUrl":"10.1038/s43588-024-00625-4","url":null,"abstract":"Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models—especially variational autoencoders—have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE’s capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively. A method based on a vector-quantized variational autoencoder, called CASTLE, can interpretably extract discrete latent embeddings and quantitatively generate the cell-type-specific feature spectrum for single-cell chromatin accessibility sequencing data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"346-359"},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Nature computational science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1