首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Improving protein function prediction by learning and integrating representations of protein sequences and function labels. 通过学习和整合蛋白质序列与功能标签的表征,改进蛋白质功能预测。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae120
Frimpong Boadu, Jianlin Cheng

Motivation: As fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt.

Results: We introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms.

Availability and implementation: https://github.com/BioinfoMachineLearning/TransFew.

动机由于只有不到1%的蛋白质通过实验确定了蛋白质的功能信息,因此计算预测蛋白质的功能对于获得大多数蛋白质的功能信息至关重要,这也是蛋白质生物信息学的一大挑战。尽管近十年来,蛋白质功能预测领域取得了重大进展,但蛋白质功能预测的总体准确率仍然不高,尤其是与蛋白质功能注释数据库(如 UniProt.Results)中少数蛋白质相关的罕见功能术语:我们介绍了一种新的转换器模型 TransFew,它可以学习蛋白质序列和功能标签 [基因本体(GO)术语] 的表示,从而预测蛋白质的功能。TransFew 利用大型预训练蛋白质语言模型(ESM2-t48)从原始蛋白质序列中学习与蛋白质功能相关的表征,并使用生物自然语言模型(BioBert)和基于图卷积神经网络的自动编码器从文本定义和层次关系中生成 GO 术语的语义表征,然后将这些表征结合在一起,通过交叉关注预测蛋白质功能。整合蛋白质序列和标签表征不仅提高了整体功能预测的准确性,而且通过促进GO术语之间的注释转移,在预测注释有限的罕见功能术语时提供了强大的性能。可用性和实现:https://github.com/BioinfoMachineLearning/TransFew。
{"title":"Improving protein function prediction by learning and integrating representations of protein sequences and function labels.","authors":"Frimpong Boadu, Jianlin Cheng","doi":"10.1093/bioadv/vbae120","DOIUrl":"10.1093/bioadv/vbae120","url":null,"abstract":"<p><strong>Motivation: </strong>As fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt.</p><p><strong>Results: </strong>We introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms.</p><p><strong>Availability and implementation: </strong>https://github.com/BioinfoMachineLearning/TransFew.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11374024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142135095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In the twilight zone of protein sequence homology: do protein language models learn protein structure? 蛋白质序列同源性的黄昏地带:蛋白质语言模型能学习蛋白质结构吗?
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae119
Anowarul Kabir, Asher Moldwin, Yana Bromberg, Amarda Shehu

Motivation: Protein language models based on the transformer architecture are increasingly improving performance on protein prediction tasks, including secondary structure, subcellular localization, and more. Despite being trained only on protein sequences, protein language models appear to implicitly learn protein structure. This paper investigates whether sequence representations learned by protein language models encode structural information and to what extent.

Results: We address this by evaluating protein language models on remote homology prediction, where identifying remote homologs from sequence information alone requires structural knowledge, especially in the "twilight zone" of very low sequence identity. Through rigorous testing at progressively lower sequence identities, we profile the performance of protein language models ranging from millions to billions of parameters in a zero-shot setting. Our findings indicate that while transformer-based protein language models outperform traditional sequence alignment methods, they still struggle in the twilight zone. This suggests that current protein language models have not sufficiently learned protein structure to address remote homology prediction when sequence signals are weak.

Availability and implementation: We believe this opens the way for further research both on remote homology prediction and on the broader goal of learning sequence- and structure-rich representations of protein molecules. All code, data, and models are made publicly available.

动机:基于转换器架构的蛋白质语言模型在蛋白质预测任务(包括二级结构、亚细胞定位等)上的性能日益提高。尽管蛋白质语言模型只针对蛋白质序列进行训练,但它似乎能隐式地学习蛋白质结构。本文研究了蛋白质语言模型学习到的序列表征是否编码了结构信息以及编码的程度:我们通过评估远程同源预测中的蛋白质语言模型来解决这个问题,在远程同源预测中,仅从序列信息识别远程同源物需要结构知识,尤其是在序列同一性非常低的 "曙光地带"。通过在序列同一性逐渐降低的情况下进行严格的测试,我们对蛋白质语言模型的性能进行了剖析,其参数范围从数百万到数十亿不等。我们的研究结果表明,虽然基于变换器的蛋白质语言模型优于传统的序列比对方法,但它们在 "黄昏区 "仍然很吃力。这表明,目前的蛋白质语言模型还没有充分学习蛋白质结构,无法在序列信号较弱的情况下解决远程同源性预测问题:我们相信,这为进一步研究远程同源性预测以及学习蛋白质分子富含序列和结构的表征这一更广泛的目标开辟了道路。所有代码、数据和模型均可公开获取。
{"title":"In the twilight zone of protein sequence homology: do protein language models learn protein structure?","authors":"Anowarul Kabir, Asher Moldwin, Yana Bromberg, Amarda Shehu","doi":"10.1093/bioadv/vbae119","DOIUrl":"10.1093/bioadv/vbae119","url":null,"abstract":"<p><strong>Motivation: </strong>Protein language models based on the transformer architecture are increasingly improving performance on protein prediction tasks, including secondary structure, subcellular localization, and more. Despite being trained only on protein sequences, protein language models appear to implicitly learn protein structure. This paper investigates whether sequence representations learned by protein language models encode structural information and to what extent.</p><p><strong>Results: </strong>We address this by evaluating protein language models on remote homology prediction, where identifying remote homologs from sequence information alone requires structural knowledge, especially in the \"twilight zone\" of very low sequence identity. Through rigorous testing at progressively lower sequence identities, we profile the performance of protein language models ranging from millions to billions of parameters in a zero-shot setting. Our findings indicate that while transformer-based protein language models outperform traditional sequence alignment methods, they still struggle in the twilight zone. This suggests that current protein language models have not sufficiently learned protein structure to address remote homology prediction when sequence signals are weak.</p><p><strong>Availability and implementation: </strong>We believe this opens the way for further research both on remote homology prediction and on the broader goal of learning sequence- and structure-rich representations of protein molecules. All code, data, and models are made publicly available.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11344590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
splicekit: an integrative toolkit for splicing analysis from short-read RNA-seq. splicekit:从短线程 RNA-seq 进行剪接分析的综合工具包。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae121
Gregor Rot, Arne Wehling, Roland Schmucki, Nikolaos Berntenis, Jitao David Zhang, Martin Ebeling

Motivation: Analysis of alternative splicing using short-read RNA-seq data is a complex process that involves several steps: alignment of reads to the reference genome, identification of alternatively spliced features, motif discovery, analysis of RNA-protein binding near donor and acceptor splice sites, and exploratory data visualization. To the best of our knowledge, there is currently no integrative open-source software dedicated to this task.

Results: Here, we introduce splicekit, a Python package that provides and integrates a set of existing and novel splicing analysis tools for conducting splicing analysis.

Availability and implementation: The software splicekit is open-source and available at Github (https://github.com/bedapub/splicekit) and via the Python Package Index.

动机:利用短线程 RNA-seq 数据分析替代剪接是一个复杂的过程,涉及多个步骤:将读数与参考基因组比对、识别替代剪接特征、发现主题、分析供体和受体剪接位点附近的 RNA 蛋白结合以及探索性数据可视化。据我们所知,目前还没有专门用于这项任务的集成式开源软件:在此,我们介绍了 splicekit,它是一个 Python 软件包,提供并集成了一套现有的和新颖的剪接分析工具,用于进行剪接分析:软件 splicekit 是开源的,可通过 Github (https://github.com/bedapub/splicekit) 和 Python 软件包索引获取。
{"title":"<i>splicekit</i>: an integrative toolkit for splicing analysis from short-read RNA-seq.","authors":"Gregor Rot, Arne Wehling, Roland Schmucki, Nikolaos Berntenis, Jitao David Zhang, Martin Ebeling","doi":"10.1093/bioadv/vbae121","DOIUrl":"10.1093/bioadv/vbae121","url":null,"abstract":"<p><strong>Motivation: </strong>Analysis of alternative splicing using short-read RNA-seq data is a complex process that involves several steps: alignment of reads to the reference genome, identification of alternatively spliced features, motif discovery, analysis of RNA-protein binding near donor and acceptor splice sites, and exploratory data visualization. To the best of our knowledge, there is currently no integrative open-source software dedicated to this task.</p><p><strong>Results: </strong>Here, we introduce <i>splicekit</i>, a Python package that provides and integrates a set of existing and novel splicing analysis tools for conducting splicing analysis.</p><p><strong>Availability and implementation: </strong>The software <i>splicekit</i> is open-source and available at Github (https://github.com/bedapub/splicekit) and <i>via</i> the Python Package Index.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11364168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C2CDB: an advanced platform integrating comprehensive information and analysis tools of cancer-related circRNAs. C2CDB:一个集成了癌症相关 circRNAs 综合信息和分析工具的先进平台。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-16 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae112
Yuanli Zuo, Wenrong Liu, Yang Jin, Yitong Pan, Ting Fan, Xin Fu, Jiawei Guo, Shuangyan Tan, Juan He, Yang Yang, Zhang Li, Chenyu Yang, Yong Peng

Motivation: Circular RNAs (circRNAs) play important roles in gene expression and their involvement in tumorigenesis is emerging. circRNA-related database is a powerful tool for researchers to investigate circRNAs. However, existing databases lack advanced platform integrating comprehensive information and analysis tools of cancer-related circRNAs.

Results: We developed a comprehensive platform called CircRNA to Cancer Database (C2CDB), encompassing 318 158 cancer-related circRNAs expressed in tumors and adjacent tissues across 30 types of cancers. C2CDB provides basic details such as sequence and expression levels of circRNAs, as well as crucial insights into biological mechanisms, including miRNA binding, RNA-binding protein interaction, coding potential, base modification, mutation, and secondary structure. Moreover, C2CDB collects an extensive compilation of published literature on cancer circRNAs, extracting and presenting pivotal content encompassing biological functions, underlying mechanisms, and molecular tools in these studies. Additionally, C2CDB offers integrated tools to analyse three potential mechanisms: circRNA-miRNA ceRNA interaction, circRNA encoding, and circRNA biogenesis, facilitating investigators with convenient access to highly reliable information. To enhance clarity and organization, C2CDB has meticulously curated and integrated the previously chaotic nomenclature of circRNAs, addressing the prevailing confusion and ambiguity surrounding their designations.

Availability and implementation: C2CDB is freely available at http://pengyonglab.com/c2cdb.

动因:环状RNA(circRNA)在基因表达中发挥着重要作用,其参与肿瘤发生的情况正在逐渐显现。环状RNA相关数据库是研究人员研究环状RNA的有力工具。然而,现有数据库缺乏整合癌症相关 circRNAs 综合信息和分析工具的先进平台:我们开发了一个名为 "癌症循环RNA数据库(CircRNA to Cancer Database,C2CDB)"的综合平台,涵盖了30种癌症中318 158个在肿瘤和邻近组织中表达的癌症相关循环RNA。C2CDB提供了circRNA的序列和表达水平等基本信息,以及对生物学机制的重要见解,包括miRNA结合、RNA结合蛋白相互作用、编码潜能、碱基修饰、突变和二级结构等。此外,C2CDB 收集了大量已发表的癌症 circRNAs 文献,提取并呈现了这些研究中涵盖生物功能、基本机制和分子工具的关键内容。此外,C2CDB 还提供了综合工具,用于分析 circRNA-miRNA ceRNA 相互作用、circRNA 编码和 circRNA 生物发生这三种潜在机制,方便研究人员获取高度可靠的信息。为了提高清晰度和组织性,C2CDB 对以前混乱的 circRNA 命名方法进行了精心整理和整合,解决了目前围绕其命名的混乱和模糊问题:C2CDB 可从 http://pengyonglab.com/c2cdb 免费获取。
{"title":"C2CDB: an advanced platform integrating comprehensive information and analysis tools of cancer-related circRNAs.","authors":"Yuanli Zuo, Wenrong Liu, Yang Jin, Yitong Pan, Ting Fan, Xin Fu, Jiawei Guo, Shuangyan Tan, Juan He, Yang Yang, Zhang Li, Chenyu Yang, Yong Peng","doi":"10.1093/bioadv/vbae112","DOIUrl":"10.1093/bioadv/vbae112","url":null,"abstract":"<p><strong>Motivation: </strong>Circular RNAs (circRNAs) play important roles in gene expression and their involvement in tumorigenesis is emerging. circRNA-related database is a powerful tool for researchers to investigate circRNAs. However, existing databases lack advanced platform integrating comprehensive information and analysis tools of cancer-related circRNAs.</p><p><strong>Results: </strong>We developed a comprehensive platform called CircRNA to Cancer Database (C2CDB), encompassing 318 158 cancer-related circRNAs expressed in tumors and adjacent tissues across 30 types of cancers. C2CDB provides basic details such as sequence and expression levels of circRNAs, as well as crucial insights into biological mechanisms, including miRNA binding, RNA-binding protein interaction, coding potential, base modification, mutation, and secondary structure. Moreover, C2CDB collects an extensive compilation of published literature on cancer circRNAs, extracting and presenting pivotal content encompassing biological functions, underlying mechanisms, and molecular tools in these studies. Additionally, C2CDB offers integrated tools to analyse three potential mechanisms: circRNA-miRNA ceRNA interaction, circRNA encoding, and circRNA biogenesis, facilitating investigators with convenient access to highly reliable information. To enhance clarity and organization, C2CDB has meticulously curated and integrated the previously chaotic nomenclature of circRNAs, addressing the prevailing confusion and ambiguity surrounding their designations.</p><p><strong>Availability and implementation: </strong>C2CDB is freely available at http://pengyonglab.com/c2cdb.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11379471/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current and future directions in network biology. 网络生物学的当前和未来发展方向。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-14 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae099
Marinka Zitnik, Michelle M Li, Aydin Wells, Kimberly Glass, Deisy Morselli Gysi, Arjun Krishnan, T M Murali, Predrag Radivojac, Sushmita Roy, Anaïs Baudot, Serdar Bozdag, Danny Z Chen, Lenore Cowen, Kapil Devkota, Anthony Gitter, Sara J C Gosline, Pengfei Gu, Pietro H Guzzi, Heng Huang, Meng Jiang, Ziynet Nesibe Kesimoglu, Mehmet Koyuturk, Jian Ma, Alexander R Pico, Nataša Pržulj, Teresa M Przytycka, Benjamin J Raphael, Anna Ritz, Roded Sharan, Yang Shen, Mona Singh, Donna K Slonim, Hanghang Tong, Xinan Holly Yang, Byung-Jun Yoon, Haiyuan Yu, Tijana Milenković

Summary: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology.

Availability and implementation: Not applicable.

摘要:网络生物学是一个连接计算科学和生物科学的跨学科领域,事实证明,它在推动人们了解跨生物系统和生物尺度的细胞功能和疾病方面起着举足轻重的作用。虽然该领域已经存在了二十年,但仍处于起步阶段。它经历了快速发展,同时也面临着新的挑战。这些挑战源于各种因素,特别是数据的复杂性和数量不断增加,以及描述不同层次生物组织的数据类型日益多样化。我们将讨论网络生物学的当前研究方向,重点关注分子/细胞网络,同时也关注其他生物网络类型,如生物医学知识图谱、患者相似性网络、脑网络以及与疾病传播相关的社会/联系网络。更详细地说,我们将重点介绍生物网络的推理和比较、多模态数据整合和异构网络、高阶网络分析、网络机器学习以及基于网络的个性化医疗等领域。在概述这五个领域的最新突破之后,我们将展望网络生物学的未来发展方向。此外,我们还讨论了科学界、教育计划以及促进该领域多样性的重要性。本文为网络生物学的近期和长期愿景绘制了路线图:不适用。
{"title":"Current and future directions in network biology.","authors":"Marinka Zitnik, Michelle M Li, Aydin Wells, Kimberly Glass, Deisy Morselli Gysi, Arjun Krishnan, T M Murali, Predrag Radivojac, Sushmita Roy, Anaïs Baudot, Serdar Bozdag, Danny Z Chen, Lenore Cowen, Kapil Devkota, Anthony Gitter, Sara J C Gosline, Pengfei Gu, Pietro H Guzzi, Heng Huang, Meng Jiang, Ziynet Nesibe Kesimoglu, Mehmet Koyuturk, Jian Ma, Alexander R Pico, Nataša Pržulj, Teresa M Przytycka, Benjamin J Raphael, Anna Ritz, Roded Sharan, Yang Shen, Mona Singh, Donna K Slonim, Hanghang Tong, Xinan Holly Yang, Byung-Jun Yoon, Haiyuan Yu, Tijana Milenković","doi":"10.1093/bioadv/vbae099","DOIUrl":"10.1093/bioadv/vbae099","url":null,"abstract":"<p><strong>Summary: </strong>Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11321866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the natural language of DNA using encoder-decoder foundation models with byte-level precision. 利用具有字节级精度的编码器-解码器基础模型理解 DNA 的自然语言。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-12 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae117
Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A Lanman, Vaneet Aggarwal

Summary: This article presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a subquadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pretrain the foundation model using reference genome sequences and apply it in the following downstream tasks: (i) identification of enhancers, promotors, and splice sites, (ii) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, (iii) identification of biological function annotations of genomic sequences, and (iv) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.

Availability and implementation: The source code used to develop and fine-tune the foundation model has been released on Github (https://github.itap.purdue.edu/Clan-labs/ENBED).

摘要:本文介绍了组合核苷酸字节级编码器-解码器(ENBED)基础模型,利用编码器-解码器变换器架构分析字节级精度的 DNA 序列。ENBED利用注意力的亚二次方实现,开发出一种能够进行序列到序列转换的高效模型,从而推广了以往仅使用编码器或仅使用解码器架构的基因组模型。我们使用掩码语言建模技术(Masked Language Modeling),利用参考基因组序列对基础模型进行预训练,并将其应用于以下下游任务:(i) 识别增强子、启动子和剪接位点;(ii) 识别包含碱基调用错配和插入/删除错误的序列,这比涉及多个碱基对的标记化方案更有优势,因为后者失去了以字节级精度进行分析的能力;(iii) 识别基因组序列的生物功能注释;(iv) 使用编码器-解码器架构生成流感病毒的突变,并根据真实世界的观察结果对其进行验证。与现有的最先进成果相比,我们在上述每项任务中都取得了显著进步:用于开发和微调基础模型的源代码已在 Github 上发布(https://github.itap.purdue.edu/Clan-labs/ENBED)。
{"title":"Understanding the natural language of DNA using encoder-decoder foundation models with byte-level precision.","authors":"Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A Lanman, Vaneet Aggarwal","doi":"10.1093/bioadv/vbae117","DOIUrl":"10.1093/bioadv/vbae117","url":null,"abstract":"<p><strong>Summary: </strong>This article presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a subquadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pretrain the foundation model using reference genome sequences and apply it in the following downstream tasks: (i) identification of enhancers, promotors, and splice sites, (ii) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, (iii) identification of biological function annotations of genomic sequences, and (iv) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.</p><p><strong>Availability and implementation: </strong>The source code used to develop and fine-tune the foundation model has been released on Github (https://github.itap.purdue.edu/Clan-labs/ENBED).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11341122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
gwid: an R package and Shiny application for Genome-Wide analysis of IBD data. gwid:用于对 IBD 数据进行全基因组分析的 R 软件包和 Shiny 应用程序。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-31 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae115
Soroush Mahmoudiandehkordi, Mehdi Maadooliat, Steven J Schrodi

Summary: Genome-wide identity by descent (gwid) is an R package developed for the analysis of identity-by-descent (IBD) data pertaining to dichotomous traits. This package offers a set of tools to assess differential IBD levels for the two states of a binary trait, yielding informative and meaningful results. Furthermore, it provides convenient functions to visualize the outcomes of these analyses, enhancing the interpretability and accessibility of the results. To assess the performance of the package, we conducted an evaluation using real genotype data derived from the SNPs to investigate rheumatoid arthritis susceptibility from the Marshfield Clinic Personalized Medicine Research Project.

Availability and implementation: gwid is available as an open-source R package. Release versions can be accessed on CRAN (https://cran.r-project.org/package=gwid) for all major operating systems. The development version is maintained on GitHub (https://github.com/soroushmdg/gwid) and full documentation with examples and workflow templates is provided via the package website (http://tinyurl.com/gwid-tutorial). An interactive R Shiny dashboard is also developed (https://tinyurl.com/gwid-shiny).

摘要:Genome-wide identity by descent(gwid)是一个 R 软件包,用于分析与二元性状相关的 IBD 数据。该软件包提供了一套工具,用于评估二元性状两种状态的不同 IBD 水平,从而得出信息丰富且有意义的结果。此外,它还提供了可视化这些分析结果的便捷功能,提高了结果的可解释性和可访问性。为了评估该软件包的性能,我们使用真实的基因型数据进行了评估,这些数据来自马什菲尔德诊所个性化医学研究项目(Marshfield Clinic Personalized Medicine Research Project),用于研究类风湿性关节炎易感性的 SNPs。发布版本可在 CRAN (https://cran.r-project.org/package=gwid) 上访问,适用于所有主流操作系统。开发版本在 GitHub (https://github.com/soroushmdg/gwid) 上维护,并通过软件包网站 (http://tinyurl.com/gwid-tutorial) 提供包含示例和工作流模板的完整文档。此外,还开发了一个交互式 R Shiny 面板 (https://tinyurl.com/gwid-shiny)。
{"title":"gwid: an R package and Shiny application for Genome-Wide analysis of IBD data.","authors":"Soroush Mahmoudiandehkordi, Mehdi Maadooliat, Steven J Schrodi","doi":"10.1093/bioadv/vbae115","DOIUrl":"10.1093/bioadv/vbae115","url":null,"abstract":"<p><strong>Summary: </strong>Genome-wide identity by descent (gwid) is an R package developed for the analysis of identity-by-descent (IBD) data pertaining to dichotomous traits. This package offers a set of tools to assess differential IBD levels for the two states of a binary trait, yielding informative and meaningful results. Furthermore, it provides convenient functions to visualize the outcomes of these analyses, enhancing the interpretability and accessibility of the results. To assess the performance of the package, we conducted an evaluation using real genotype data derived from the SNPs to investigate rheumatoid arthritis susceptibility from the Marshfield Clinic Personalized Medicine Research Project.</p><p><strong>Availability and implementation: </strong>gwid is available as an open-source R package. Release versions can be accessed on CRAN (https://cran.r-project.org/package=gwid) for all major operating systems. The development version is maintained on GitHub (https://github.com/soroushmdg/gwid) and full documentation with examples and workflow templates is provided <i>via</i> the package website (http://tinyurl.com/gwid-tutorial). An interactive R Shiny dashboard is also developed (https://tinyurl.com/gwid-shiny).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11379470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142157201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolution and subfamilies of HERVL human endogenous retrovirus. HERVL 人类内源性逆转录病毒的进化和亚家族。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae110
Huan Zhang, Martin C Frith

Background: Endogenous retroviruses (ERVs), which blur the boundary between virus and transposable element, are genetic material derived from retroviruses and have important implications for evolution. This study examines the diversity and evolution of human endogenous retroviruses (HERVs) of the HERVL family, which has long terminal repeats (LTRs) named MLT2.

Results: By probability-based sequence comparison, we uncover systematic annotation errors that conceal the true complexity and diversity of transposable elements (TEs) in the human genome. Our analysis identifies new subfamilies within the MLT2 group, proposes a refined classification scheme, and constructs new consensus sequences. We present an evolutionary analysis including phylogenetic trees that elucidate the relationships between these subfamilies and their contributions to human evolution. The results underscore the significance of accurate TE annotation in understanding genome evolution, highlighting the potential for misclassified TEs to impact interpretations of genomic studies.

Availability and implementation: Not applicable.

背景:内源性逆转录病毒(ERVs)模糊了病毒和转座元件之间的界限,是逆转录病毒衍生的遗传物质,对进化具有重要意义。本研究考察了HERVL家族人类内源性逆转录病毒(HERVs)的多样性和进化,该家族的长末端重复序列(LTR)被命名为MLT2:通过基于概率的序列比较,我们发现了系统性的注释错误,这些错误掩盖了人类基因组中转座元件(TEs)的真实复杂性和多样性。我们的分析在 MLT2 群体中发现了新的亚家族,提出了一个完善的分类方案,并构建了新的共识序列。我们提出的进化分析包括系统发生树,阐明了这些亚家族之间的关系及其对人类进化的贡献。这些结果强调了准确的 TE 注释对理解基因组进化的重要意义,并突出了错误分类的 TE 可能会影响基因组研究的解释:不适用。
{"title":"Evolution and subfamilies of HERVL human endogenous retrovirus.","authors":"Huan Zhang, Martin C Frith","doi":"10.1093/bioadv/vbae110","DOIUrl":"10.1093/bioadv/vbae110","url":null,"abstract":"<p><strong>Background: </strong>Endogenous retroviruses (ERVs), which blur the boundary between virus and transposable element, are genetic material derived from retroviruses and have important implications for evolution. This study examines the diversity and evolution of human endogenous retroviruses (HERVs) of the HERVL family, which has long terminal repeats (LTRs) named MLT2.</p><p><strong>Results: </strong>By probability-based sequence comparison, we uncover systematic annotation errors that conceal the true complexity and diversity of transposable elements (TEs) in the human genome. Our analysis identifies new subfamilies within the MLT2 group, proposes a refined classification scheme, and constructs new consensus sequences. We present an evolutionary analysis including phylogenetic trees that elucidate the relationships between these subfamilies and their contributions to human evolution. The results underscore the significance of accurate TE annotation in understanding genome evolution, highlighting the potential for misclassified TEs to impact interpretations of genomic studies.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11319637/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introducing field-programmable gate arrays in genotype phasing and imputation. 将现场可编程门阵列引入基因型分期和估算。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae114
Lars Wienbrandt, David Ellinghaus

Summary: We recently developed EagleImp, a free software that combines genotype phasing and imputation in a single tool. By introducing algorithmic and technical improvements we accelerated the classical two-step approach using Eagle2 and PBWT. Here, we demonstrate how to use field-programmable gate arrays (FPGAs) to accelerate EagleImp even further by a factor of up to 93% without loss of phasing and imputation quality. Due to the speed advantage over a not accelerated processor-based implementation, the FPGA extension of EagleImp allows the user to choose a more resource-intensive parameter setting in exchange for computation time to further improve phasing and imputation quality.

Availability and implementation: EagleImp and its FPGA extension are freely available at https://github.com/ikmb/eagleimp and https://github.com/ikmb/eagleimp-fpga.

摘要:我们最近开发了一款免费软件 EagleImp,它将基因型分期和归因结合在一个工具中。通过引入算法和技术改进,我们加速了使用 Eagle2 和 PBWT 的经典两步法。在这里,我们展示了如何使用现场可编程门阵列(FPGA)将 EagleImp 的速度进一步提高 93%,而不会降低分期和归因的质量。由于与未加速的基于处理器的实现相比具有速度优势,EagleImp 的 FPGA 扩展允许用户选择更耗费资源的参数设置,以换取计算时间,从而进一步提高相位和估算质量:EagleImp 及其 FPGA 扩展可在 https://github.com/ikmb/eagleimp 和 https://github.com/ikmb/eagleimp-fpga 免费获取。
{"title":"Introducing field-programmable gate arrays in genotype phasing and imputation.","authors":"Lars Wienbrandt, David Ellinghaus","doi":"10.1093/bioadv/vbae114","DOIUrl":"10.1093/bioadv/vbae114","url":null,"abstract":"<p><strong>Summary: </strong>We recently developed <i>EagleImp</i>, a free software that combines genotype phasing and imputation in a single tool. By introducing algorithmic and technical improvements we accelerated the classical two-step approach using <i>Eagle2</i> and <i>PBWT</i>. Here, we demonstrate how to use field-programmable gate arrays (FPGAs) to accelerate <i>EagleImp</i> even further by a factor of up to 93% without loss of phasing and imputation quality. Due to the speed advantage over a not accelerated processor-based implementation, the FPGA extension of <i>EagleImp</i> allows the user to choose a more resource-intensive parameter setting in exchange for computation time to further improve phasing and imputation quality.</p><p><strong>Availability and implementation: </strong><i>EagleImp</i> and its FPGA extension are freely available at https://github.com/ikmb/eagleimp and https://github.com/ikmb/eagleimp-fpga.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11333566/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic analysis on the horse-shoe-like effect in PCA plots of scRNA-seq data. 对 scRNA-seq 数据 PCA 图中的马蹄铁效应进行系统分析。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-29 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae109
Najeebullah Shah, Qiuchen Meng, Ziheng Zou, Xuegong Zhang

Motivation: In single-cell studies, principal component analysis (PCA) is widely used to reduce the dimensionality of dataset and visualize in 2D or 3D PC plots. Scientists often focus on different clusters within PC plot, overlooking the specific phenomenon, such as horse-shoe-like effect, that may reveal hidden knowledge about underlying biological dataset. This phenomenon remains largely unexplored in single-cell studies.

Results: In this study, we investigated into the horse-shoe-like effect in PC plots using simulated and real scRNA-seq datasets. We systematically explain horse-shoe-like phenomenon from various inter-related perspectives. Initially, we establish an intuitive understanding with the help of simulated datasets. Then, we generalized the acquired knowledge on real biological scRNA-seq data. Experimental results provide logical explanations and understanding for the appearance of horse-shoe-like effect in PC plots. Furthermore, we identify a potential problem with a well-known theory of 'distance saturation property' attributed to induce horse-shoe phenomenon. Finally, we analyse a mathematical model for horse-shoe effect that suggests trigonometric solutions to estimated eigenvectors. We observe significant resemblance after comparing the results of mathematical model with simulated and real scRNA-seq datasets.

Availability and implementation: The code for reproducing the results of this study is available at: https://github.com/najeebullahshah/PCA-Horse-Shoe.

动机在单细胞研究中,主成分分析(PCA)被广泛用于降低数据集的维度,并在二维或三维 PC 图中进行可视化。科学家们通常只关注 PC 图中的不同聚类,而忽略了一些特殊现象,如马蹄铁效应,它可能揭示了生物数据集背后隐藏的知识。在单细胞研究中,这一现象在很大程度上仍未被探索:在这项研究中,我们利用模拟和真实的 scRNA-seq 数据集研究了 PC 图中的马蹄铁效应。我们从多个相互关联的角度系统地解释了马蹄铁样现象。首先,我们借助模拟数据集建立了直观的理解。然后,我们在真实的生物 scRNA-seq 数据上推广所获得的知识。实验结果为 PC 图中出现马蹄铁效应提供了合理的解释和理解。此外,我们还发现了著名的 "距离饱和特性 "理论在诱发马蹄铁现象方面存在的潜在问题。最后,我们分析了马蹄铁效应的数学模型,该模型提出了估计特征向量的三角解。在将数学模型的结果与模拟和真实的 scRNA-seq 数据集进行比较后,我们发现两者有很大的相似性:重现本研究结果的代码可在以下网址获取:https://github.com/najeebullahshah/PCA-Horse-Shoe。
{"title":"Systematic analysis on the horse-shoe-like effect in PCA plots of scRNA-seq data.","authors":"Najeebullah Shah, Qiuchen Meng, Ziheng Zou, Xuegong Zhang","doi":"10.1093/bioadv/vbae109","DOIUrl":"10.1093/bioadv/vbae109","url":null,"abstract":"<p><strong>Motivation: </strong>In single-cell studies, principal component analysis (PCA) is widely used to reduce the dimensionality of dataset and visualize in 2D or 3D PC plots. Scientists often focus on different clusters within PC plot, overlooking the specific phenomenon, such as horse-shoe-like effect, that may reveal hidden knowledge about underlying biological dataset. This phenomenon remains largely unexplored in single-cell studies.</p><p><strong>Results: </strong>In this study, we investigated into the horse-shoe-like effect in PC plots using simulated and real scRNA-seq datasets. We systematically explain horse-shoe-like phenomenon from various inter-related perspectives. Initially, we establish an intuitive understanding with the help of simulated datasets. Then, we generalized the acquired knowledge on real biological scRNA-seq data. Experimental results provide logical explanations and understanding for the appearance of horse-shoe-like effect in PC plots. Furthermore, we identify a potential problem with a well-known theory of 'distance saturation property' attributed to induce horse-shoe phenomenon. Finally, we analyse a mathematical model for horse-shoe effect that suggests trigonometric solutions to estimated eigenvectors. We observe significant resemblance after comparing the results of mathematical model with simulated and real scRNA-seq datasets.</p><p><strong>Availability and implementation: </strong>The code for reproducing the results of this study is available at: https://github.com/najeebullahshah/PCA-Horse-Shoe.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1