首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Correction to: Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. 更正:革命性的 GPCR 配体预测:用于高精度药物发现的 DeepGPCR 与实验验证。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae427
{"title":"Correction to: Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery.","authors":"","doi":"10.1093/bib/bbae427","DOIUrl":"10.1093/bib/bbae427","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11326808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141987422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AntiFormer: graph enhanced large language model for binding affinity prediction. AntiFormer:用于结合亲和力预测的图增强大型语言模型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae403
Qing Wang, Yuzhou Feng, Yanfei Wang, Bo Li, Jianguo Wen, Xiaobo Zhou, Qianqian Song

Antibodies play a pivotal role in immune defense and serve as key therapeutic agents. The process of affinity maturation, wherein antibodies evolve through somatic mutations to achieve heightened specificity and affinity to target antigens, is crucial for effective immune response. Despite their significance, assessing antibody-antigen binding affinity remains challenging due to limitations in conventional wet lab techniques. To address this, we introduce AntiFormer, a graph-based large language model designed to predict antibody binding affinity. AntiFormer incorporates sequence information into a graph-based framework, allowing for precise prediction of binding affinity. Through extensive evaluations, AntiFormer demonstrates superior performance compared with existing methods, offering accurate predictions with reduced computational time. Application of AntiFormer to severe acute respiratory syndrome coronavirus 2 patient samples reveals antibodies with strong neutralizing capabilities, providing insights for therapeutic development and vaccination strategies. Furthermore, analysis of individual samples following influenza vaccination elucidates differences in antibody response between young and older adults. AntiFormer identifies specific clonotypes with enhanced binding affinity post-vaccination, particularly in young individuals, suggesting age-related variations in immune response dynamics. Moreover, our findings underscore the importance of large clonotype category in driving affinity maturation and immune modulation. Overall, AntiFormer is a promising approach to accelerate antibody-based diagnostics and therapeutics, bridging the gap between traditional methods and complex antibody maturation processes.

抗体在免疫防御中发挥着关键作用,也是重要的治疗药物。在亲和力成熟过程中,抗体通过体细胞突变不断进化,从而提高对目标抗原的特异性和亲和力,这一过程对于有效的免疫反应至关重要。尽管其意义重大,但由于传统湿实验室技术的限制,评估抗体与抗原的结合亲和力仍具有挑战性。为了解决这个问题,我们引入了 AntiFormer,这是一种基于图的大语言模型,旨在预测抗体结合亲和力。AntiFormer 将序列信息纳入基于图的框架,从而可以精确预测结合亲和力。通过广泛的评估,AntiFormer 与现有方法相比表现出更优越的性能,在减少计算时间的同时提供精确的预测。将 AntiFormer 应用于严重急性呼吸系统综合征冠状病毒 2 患者样本,发现了具有强大中和能力的抗体,为治疗开发和疫苗接种策略提供了启示。此外,对接种流感疫苗后的个体样本进行分析,也阐明了年轻人和老年人之间抗体反应的差异。AntiFormer 发现了接种疫苗后结合亲和力增强的特定克隆型,尤其是在年轻人中,这表明免疫反应动态的变化与年龄有关。此外,我们的研究结果还强调了大克隆型类别在推动亲和力成熟和免疫调节方面的重要性。总之,AntiFormer 是加速抗体诊断和治疗的一种有前途的方法,它弥补了传统方法与复杂的抗体成熟过程之间的差距。
{"title":"AntiFormer: graph enhanced large language model for binding affinity prediction.","authors":"Qing Wang, Yuzhou Feng, Yanfei Wang, Bo Li, Jianguo Wen, Xiaobo Zhou, Qianqian Song","doi":"10.1093/bib/bbae403","DOIUrl":"10.1093/bib/bbae403","url":null,"abstract":"<p><p>Antibodies play a pivotal role in immune defense and serve as key therapeutic agents. The process of affinity maturation, wherein antibodies evolve through somatic mutations to achieve heightened specificity and affinity to target antigens, is crucial for effective immune response. Despite their significance, assessing antibody-antigen binding affinity remains challenging due to limitations in conventional wet lab techniques. To address this, we introduce AntiFormer, a graph-based large language model designed to predict antibody binding affinity. AntiFormer incorporates sequence information into a graph-based framework, allowing for precise prediction of binding affinity. Through extensive evaluations, AntiFormer demonstrates superior performance compared with existing methods, offering accurate predictions with reduced computational time. Application of AntiFormer to severe acute respiratory syndrome coronavirus 2 patient samples reveals antibodies with strong neutralizing capabilities, providing insights for therapeutic development and vaccination strategies. Furthermore, analysis of individual samples following influenza vaccination elucidates differences in antibody response between young and older adults. AntiFormer identifies specific clonotypes with enhanced binding affinity post-vaccination, particularly in young individuals, suggesting age-related variations in immune response dynamics. Moreover, our findings underscore the importance of large clonotype category in driving affinity maturation and immune modulation. Overall, AntiFormer is a promising approach to accelerate antibody-based diagnostics and therapeutics, bridging the gap between traditional methods and complex antibody maturation processes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11333967/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. 从转录组学和染色质可及性数据推断基因调控网络的单细胞多模态视图。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae382
Jens Uwe Loers, Vanessa Vermeirssen

Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.

真核生物基因调控是一个组合、动态和定量的过程,在发育和疾病中起着至关重要的作用,可在系统水平上用基因调控网络(GRN)建模。在相同样本甚至相同细胞上测量的大量多组学数据将基因调控网络推断领域推向了新的阶段。结合(单细胞)转录组学和染色质可及性,可以预测细粒度的调控程序,而不仅仅是转录因子和靶基因表达的相关性,增强子调控网络(egRN)模拟了转录因子、调控元件和靶基因之间的分子相互作用。在这篇综述中,我们重点介绍了从(sc)RNA-seq和(sc)ATAC-seq数据中成功推断(e)GRN的关键要素,并列举了最先进的方法以及面临的挑战和未来的发展。此外,我们还讨论了预处理策略、元细胞生成和计算组学配对、转录因子结合位点检测、线性和三维方法以确定染色质相互作用以及动态和因果 eGRN 推断。我们认为,在单细胞水平上整合转录组学和表观基因组学数据是机理网络推断的新标准,它可以通过整合更多的组学层和时空数据以及将重点转向更多的定量和因果建模策略而得到进一步发展。
{"title":"A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data.","authors":"Jens Uwe Loers, Vanessa Vermeirssen","doi":"10.1093/bib/bbae382","DOIUrl":"10.1093/bib/bbae382","url":null,"abstract":"<p><p>Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11359808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of molecular subtypes of dementia by using blood-proteins interaction-aware graph propagational network. 利用血液蛋白相互作用感知图传播网络识别痴呆症的分子亚型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae428
Sunghong Park, Chang Hyung Hong, Sang Joon Son, Hyun Woong Roh, Doyoon Kim, Hyunjung Shin, Hyun Goo Woo

Plasma protein biomarkers have been considered promising tools for diagnosing dementia subtypes due to their low variability, cost-effectiveness, and minimal invasiveness in diagnostic procedures. Machine learning (ML) methods have been applied to enhance accuracy of the biomarker discovery. However, previous ML-based studies often overlook interactions between proteins, which are crucial in complex disorders like dementia. While protein-protein interactions (PPIs) have been used in network models, these models often fail to fully capture the diverse properties of PPIs due to their local awareness. This drawback increases the chance of neglecting critical components and magnifying the impact of noisy interactions. In this study, we propose a novel graph-based ML model for dementia subtype diagnosis, the graph propagational network (GPN). By propagating the independent effect of plasma proteins on PPI network, the GPN extracts the globally interactive effects between proteins. Experimental results showed that the interactive effect between proteins yielded to further clarify the differences between dementia subtype groups and contributed to the performance improvement where the GPN outperformed existing methods by 10.4% on average.

血浆蛋白生物标志物因其变异性低、成本效益高、诊断过程中的侵入性小而被认为是诊断痴呆症亚型的有前途的工具。机器学习(ML)方法已被用于提高生物标记物发现的准确性。然而,以往基于 ML 的研究往往忽略了蛋白质之间的相互作用,而这在痴呆症等复杂疾病中至关重要。虽然蛋白质-蛋白质相互作用(PPIs)已被用于网络模型,但这些模型往往由于其局部意识而无法完全捕捉到 PPIs 的各种特性。这一缺陷增加了忽略关键成分和放大嘈杂相互作用影响的几率。在本研究中,我们提出了一种用于痴呆症亚型诊断的基于图的新型 ML 模型--图传播网络(GPN)。通过在 PPI 网络上传播血浆蛋白的独立效应,GPN 提取了蛋白之间的全局交互效应。实验结果表明,蛋白质之间的交互效应进一步明确了痴呆症亚型组之间的差异,并有助于提高性能,GPN的性能比现有方法平均高出10.4%。
{"title":"Identification of molecular subtypes of dementia by using blood-proteins interaction-aware graph propagational network.","authors":"Sunghong Park, Chang Hyung Hong, Sang Joon Son, Hyun Woong Roh, Doyoon Kim, Hyunjung Shin, Hyun Goo Woo","doi":"10.1093/bib/bbae428","DOIUrl":"10.1093/bib/bbae428","url":null,"abstract":"<p><p>Plasma protein biomarkers have been considered promising tools for diagnosing dementia subtypes due to their low variability, cost-effectiveness, and minimal invasiveness in diagnostic procedures. Machine learning (ML) methods have been applied to enhance accuracy of the biomarker discovery. However, previous ML-based studies often overlook interactions between proteins, which are crucial in complex disorders like dementia. While protein-protein interactions (PPIs) have been used in network models, these models often fail to fully capture the diverse properties of PPIs due to their local awareness. This drawback increases the chance of neglecting critical components and magnifying the impact of noisy interactions. In this study, we propose a novel graph-based ML model for dementia subtype diagnosis, the graph propagational network (GPN). By propagating the independent effect of plasma proteins on PPI network, the GPN extracts the globally interactive effects between proteins. Experimental results showed that the interactive effect between proteins yielded to further clarify the differences between dementia subtype groups and contributed to the performance improvement where the GPN outperformed existing methods by 10.4% on average.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning-driven exploration of peptide space: accelerating generation of drug-like peptides. 强化学习驱动的多肽空间探索:加速类药物多肽的生成。
IF 9.5 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae444
Qian Wang,Xiaotong Hu,Zhiqiang Wei,Hao Lu,Hao Liu
Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$%$, thereby significantly accelerating the process of drug-like peptide generation.
利用氨基酸残基生成多肽解决了几个关键问题,包括氨基酸序列顺序的精确控制、用于性质修饰的定制多肽以及大规模多肽合成。蛋白质含有未知的氨基酸残基。提取这些氨基酸残基合成类药物肽,可以创造出具有独特性质的新结构,从而推动药物开发。计算机辅助设计新型多肽药物分子可以解决传统药物发现过程中的高成本和低效率问题。以往的研究由于不太重视氨基酸结构的连接关系,在提高多肽药物的生物活性和药物亲和性方面存在局限性。因此,我们提出了一种基于图注意机制的强化学习驱动的多肽生成模型。通过利用图注意机制的优势,该模型有效地捕捉到了多肽中氨基酸残基之间的连接结构。同时,利用强化学习在指导最佳序列搜索方面的优势,为多肽设计和优化提供了一种新方法。该模型引入了一个具有实时反馈回路的行为批判框架,以实现属性之间的动态平衡,从而为特定靶点定制生成多种多肽,并增强多肽与靶点之间的亲和力。实验结果表明,生成的类药物多肽符合指定的吸收、分布、代谢、排泄、毒性特性和生物活性,成功率超过90%,从而大大加快了类药物多肽的生成过程。
{"title":"Reinforcement learning-driven exploration of peptide space: accelerating generation of drug-like peptides.","authors":"Qian Wang,Xiaotong Hu,Zhiqiang Wei,Hao Lu,Hao Liu","doi":"10.1093/bib/bbae444","DOIUrl":"https://doi.org/10.1093/bib/bbae444","url":null,"abstract":"Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$%$, thereby significantly accelerating the process of drug-like peptide generation.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptomics and epigenetic data integration learning module on Google Cloud. 谷歌云上的转录组学和表观遗传学数据集成学习模块。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae352
Nathan A Ruprecht, Joshua D Kennedy, Benu Bansal, Sonalika Singhal, Donald Sens, Angela Maggio, Valena Doe, Dale Hawkins, Ross Campbel, Kyle O'Connell, Jappreet Singh Gill, Kalli Schaefer, Sandeep K Singhal
<p><p>Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers le
多组学(基因组学、转录组学、表观基因组学、蛋白质组学、代谢组学等)研究方法对于了解人类生物学的层次复杂性至关重要,已被证明在癌症研究和精准医疗方面极具价值。近年来不断涌现的科学进步使高通量全基因组测序成为分子研究的核心重点,它允许对来自单个组织甚至单个细胞中不同类型标本的各种分子生物学数据进行集体分析。此外,在改进的计算资源和数据挖掘的帮助下,研究人员能够整合来自不同多组学系统的数据,以确定新的预后、诊断或预测生物标志物,发现新的治疗靶点,并为患者制定更加个性化的治疗方案。研究界要想从每天产生的所有生物数据中更有效地解析出具有科学和临床意义的信息,减少资源浪费,就必须熟悉并熟练使用谷歌云平台等先进的分析工具。本项目是一项跨学科、跨组织的工作,旨在提供一个指导性学习模块,将转录组学和表观遗传学数据分析协议整合到一个综合分析管道中,供用户利用谷歌云上的云计算基础设施在自己的工作中实施。该学习模块由三个子模块组成,引导用户通过教程示例分析 RNA 序列和还原表现型亚硫酸氢盐测序数据。这些示例以乳腺癌案例研究的形式出现,数据集来自公共存储库 Gene Expression Omnibus。第一个子模块致力于利用 RNA 测序数据进行转录组学分析,第二个子模块侧重于利用 DNA 甲基化数据进行表观遗传学分析,第三个子模块将这两种方法结合起来,以加深对生物学的理解。这些模块从数据收集和预处理开始,在带有 R 内核的 Vertex AI Jupyter 笔记本实例中进行进一步的下游分析。分析结果将返回谷歌云桶进行存储和可视化,从而消除本地资源的计算压力。本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论《NIGMS 沙盒》[16]介绍了沙盒的总体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析:
{"title":"Transcriptomics and epigenetic data integration learning module on Google Cloud.","authors":"Nathan A Ruprecht, Joshua D Kennedy, Benu Bansal, Sonalika Singhal, Donald Sens, Angela Maggio, Valena Doe, Dale Hawkins, Ross Campbel, Kyle O'Connell, Jappreet Singh Gill, Kalli Schaefer, Sandeep K Singhal","doi":"10.1093/bib/bbae352","DOIUrl":"10.1093/bib/bbae352","url":null,"abstract":"&lt;p&gt;&lt;p&gt;Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers le","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying and training deep learning neural networks on biomedical-related datasets. 在生物医学相关数据集上识别和训练深度学习神经网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae232
Alan E Woessner, Usman Anjum, Hadi Salman, Jacob Lear, Jeffrey T Turner, Ross Campbell, Laura Beaudry, Justin Zhan, Lawrence E Cornett, Susan Gauch, Kyle P Quinn

This manuscript describes the development of a resources module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on implementing deep learning algorithms for biomedical image data in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical-related datasets are widely used in both research and clinical settings, but the ability for professionally trained clinicians and researchers to interpret datasets becomes difficult as the size and breadth of these datasets increases. Artificial intelligence, and specifically deep learning neural networks, have recently become an important tool in novel biomedical research. However, use is limited due to their computational requirements and confusion regarding different neural network architectures. The goal of this learning module is to introduce types of deep learning neural networks and cover practices that are commonly used in biomedical research. This module is subdivided into four submodules that cover classification, augmentation, segmentation and regression. Each complementary submodule was written on the Google Cloud Platform and contains detailed code and explanations, as well as quizzes and challenges to facilitate user training. Overall, the goal of this learning module is to enable users to identify and integrate the correct type of neural network with their data while highlighting the ease-of-use of cloud computing for implementing neural networks. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本手稿介绍了资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论 "NIGMS 沙盒 "介绍了沙盒的总体起源。该模块以互动的形式提供有关针对生物医学图像数据实施深度学习算法的学习材料,并使用适当的云资源进行数据访问和分析。生物医学相关数据集广泛应用于研究和临床环境中,但随着这些数据集的规模和广度的增加,受过专业培训的临床医生和研究人员解释数据集的能力变得越来越困难。人工智能,特别是深度学习神经网络,最近已成为新型生物医学研究的重要工具。然而,由于其计算要求和不同神经网络架构的混淆,其使用受到了限制。本学习模块的目标是介绍深度学习神经网络的类型,并涵盖生物医学研究中常用的做法。该模块细分为四个子模块,涵盖分类、增强、分割和回归。每个互补的子模块都是在谷歌云平台上编写的,包含详细的代码和解释,以及测验和挑战,以方便用户培训。总之,该学习模块的目标是让用户能够识别并将正确类型的神经网络与他们的数据整合在一起,同时突出云计算在实施神经网络方面的易用性。本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本补编开头的社论《NIGMS 沙盒》[1]介绍了沙盒的整体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"Identifying and training deep learning neural networks on biomedical-related datasets.","authors":"Alan E Woessner, Usman Anjum, Hadi Salman, Jacob Lear, Jeffrey T Turner, Ross Campbell, Laura Beaudry, Justin Zhan, Lawrence E Cornett, Susan Gauch, Kyle P Quinn","doi":"10.1093/bib/bbae232","DOIUrl":"10.1093/bib/bbae232","url":null,"abstract":"<p><p>This manuscript describes the development of a resources module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on implementing deep learning algorithms for biomedical image data in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical-related datasets are widely used in both research and clinical settings, but the ability for professionally trained clinicians and researchers to interpret datasets becomes difficult as the size and breadth of these datasets increases. Artificial intelligence, and specifically deep learning neural networks, have recently become an important tool in novel biomedical research. However, use is limited due to their computational requirements and confusion regarding different neural network architectures. The goal of this learning module is to introduce types of deep learning neural networks and cover practices that are commonly used in biomedical research. This module is subdivided into four submodules that cover classification, augmentation, segmentation and regression. Each complementary submodule was written on the Google Cloud Platform and contains detailed code and explanations, as well as quizzes and challenges to facilitate user training. Overall, the goal of this learning module is to enable users to identify and integrate the correct type of neural network with their data while highlighting the ease-of-use of cloud computing for implementing neural networks. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding proteome quantification in an interactive learning module on Google Cloud Platform. 在谷歌云平台的互动学习模块中了解蛋白质组定量。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae235
Kyle A O'Connell, Benjamin Kopchick, Thad Carlson, David Belardo, Stephanie D Byrum

This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on protein quantification in an interactive format that uses appropriate cloud resources for data access and analyses. Quantitative proteomics is a rapidly growing discipline due to the cutting-edge technologies of high resolution mass spectrometry. There are many data types to consider for proteome quantification including data dependent acquisition, data independent acquisition, multiplexing with Tandem Mass Tag reporter ions, spectral counts, and more. As part of the NIH NIGMS Sandbox effort, we developed a learning module to introduce students to mass spectrometry terminology, normalization methods, statistical designs, and basics of R programming. By utilizing the Google Cloud environment, the learning module is easily accessible without the need for complex installation procedures. The proteome quantification module demonstrates the analysis using a provided TMT10plex data set using MS3 reporter ion intensity quantitative values in a Jupyter notebook with an R kernel. The learning module begins with the raw intensities, performs normalization, and differential abundance analysis using limma models, and is designed for researchers with a basic understanding of mass spectrometry and R programming language. Learners walk away with a better understanding of how to navigate Google Cloud Platform for proteomic research, and with the basics of mass spectrometry data analysis at the command line. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本手稿介绍了一个资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论 "NIGMS 沙盒 "介绍了沙盒的总体起源。该模块以互动形式提供有关蛋白质定量的学习材料,并使用适当的云资源进行数据访问和分析。由于高分辨率质谱仪的尖端技术,定量蛋白质组学是一门快速发展的学科。蛋白质组定量分析需要考虑许多数据类型,包括依赖数据的采集、独立数据的采集、使用串联质谱标签报告离子的多路复用、光谱计数等。作为美国国立卫生研究院 NIGMS 沙盒项目的一部分,我们开发了一个学习模块,向学生介绍质谱术语、归一化方法、统计设计和 R 编程基础。通过利用谷歌云环境,该学习模块无需复杂的安装程序即可轻松访问。蛋白质组定量模块演示了如何使用提供的 TMT10plex 数据集,在带有 R 内核的 Jupyter 笔记本中使用 MS3 报告离子强度定量值进行分析。该学习模块从原始强度开始,使用 limma 模型执行归一化和差异丰度分析,专为对质谱和 R 编程语言有基本了解的研究人员设计。通过学习,学习者可以更好地了解如何利用谷歌云平台开展蛋白质组学研究,并掌握命令行质谱数据分析的基础知识。本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 基于云的学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论《NIGMS 沙盒》[1] 介绍了沙盒的整体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"Understanding proteome quantification in an interactive learning module on Google Cloud Platform.","authors":"Kyle A O'Connell, Benjamin Kopchick, Thad Carlson, David Belardo, Stephanie D Byrum","doi":"10.1093/bib/bbae235","DOIUrl":"10.1093/bib/bbae235","url":null,"abstract":"<p><p>This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on protein quantification in an interactive format that uses appropriate cloud resources for data access and analyses. Quantitative proteomics is a rapidly growing discipline due to the cutting-edge technologies of high resolution mass spectrometry. There are many data types to consider for proteome quantification including data dependent acquisition, data independent acquisition, multiplexing with Tandem Mass Tag reporter ions, spectral counts, and more. As part of the NIH NIGMS Sandbox effort, we developed a learning module to introduce students to mass spectrometry terminology, normalization methods, statistical designs, and basics of R programming. By utilizing the Google Cloud environment, the learning module is easily accessible without the need for complex installation procedures. The proteome quantification module demonstrates the analysis using a provided TMT10plex data set using MS3 reporter ion intensity quantitative values in a Jupyter notebook with an R kernel. The learning module begins with the raw intensities, performs normalization, and differential abundance analysis using limma models, and is designed for researchers with a basic understanding of mass spectrometry and R programming language. Learners walk away with a better understanding of how to navigate Google Cloud Platform for proteomic research, and with the basics of mass spectrometry data analysis at the command line. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cloud-based learning module for biomarker discovery. 基于云的生物标记物发现学习模块。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae126
Christopher L Hemme, Laura Beaudry, Zelaikha Yosufzai, Allen Kim, Daniel Pan, Ross Campbell, Marcia Price, Bongsup P Cho

This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on basic principles in biomarker discovery in an interactive format that uses appropriate cloud resources for data access and analyses. In collaboration with Google Cloud, Deloitte Consulting and NIGMS, the Rhode Island INBRE Molecular Informatics Core developed a cloud-based training module for biomarker discovery. The module consists of nine submodules covering various topics on biomarker discovery and assessment and is deployed on the Google Cloud Platform and available for public use through the NIGMS Sandbox. The submodules are written as a series of Jupyter Notebooks utilizing R and Bioconductor for biomarker and omics data analysis. The submodules cover the following topics: 1) introduction to biomarkers; 2) introduction to R data structures; 3) introduction to linear models; 4) introduction to exploratory analysis; 5) rat renal ischemia-reperfusion injury case study; (6) linear and logistic regression for comparison of quantitative biomarkers; 7) exploratory analysis of proteomics IRI data; 8) identification of IRI biomarkers from proteomic data; and 9) machine learning methods for biomarker discovery. Each notebook includes an in-line quiz for self-assessment on the submodule topic and an overview video is available on YouTube (https://www.youtube.com/watch?v=2-Q9Ax8EW84). This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本手稿介绍了一个资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论 "NIGMS 沙盒 "介绍了沙盒的总体起源。该模块以互动形式提供有关生物标记物发现基本原理的学习材料,并使用适当的云资源进行数据访问和分析。罗德岛 INBRE 分子信息学核心与谷歌云、德勤咨询公司和 NIGMS 合作开发了基于云的生物标记物发现培训模块。该模块由九个子模块组成,涵盖生物标记物发现和评估的各种主题,部署在谷歌云平台上,可通过 NIGMS 沙盒供公众使用。这些子模块以一系列 Jupyter 笔记本的形式编写,利用 R 和 Bioconductor 进行生物标记物和 omics 数据分析。子模块涵盖以下主题:1)生物标记物介绍;2)R 数据结构介绍;3)线性模型介绍;4)探索性分析介绍;5)大鼠肾缺血再灌注损伤案例研究;6)比较定量生物标记物的线性回归和逻辑回归;7)蛋白质组 IRI 数据的探索性分析;8)从蛋白质组数据中识别 IRI 生物标记物;9)发现生物标记物的机器学习方法。每个笔记本都包含一个在线测验,用于对子模块主题进行自我评估,概述视频可在 YouTube 上观看 (https://www.youtube.com/watch?v=2-Q9Ax8EW84)。本手稿介绍了资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本补编开头的社论《NIGMS 沙盒》[1] 介绍了沙盒的总体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"A cloud-based learning module for biomarker discovery.","authors":"Christopher L Hemme, Laura Beaudry, Zelaikha Yosufzai, Allen Kim, Daniel Pan, Ross Campbell, Marcia Price, Bongsup P Cho","doi":"10.1093/bib/bbae126","DOIUrl":"10.1093/bib/bbae126","url":null,"abstract":"<p><p>This manuscript describes the development of a resource module that is part of a learning platform named \"NIGMS Sandbox for Cloud-based Learning\" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on basic principles in biomarker discovery in an interactive format that uses appropriate cloud resources for data access and analyses. In collaboration with Google Cloud, Deloitte Consulting and NIGMS, the Rhode Island INBRE Molecular Informatics Core developed a cloud-based training module for biomarker discovery. The module consists of nine submodules covering various topics on biomarker discovery and assessment and is deployed on the Google Cloud Platform and available for public use through the NIGMS Sandbox. The submodules are written as a series of Jupyter Notebooks utilizing R and Bioconductor for biomarker and omics data analysis. The submodules cover the following topics: 1) introduction to biomarkers; 2) introduction to R data structures; 3) introduction to linear models; 4) introduction to exploratory analysis; 5) rat renal ischemia-reperfusion injury case study; (6) linear and logistic regression for comparison of quantitative biomarkers; 7) exploratory analysis of proteomics IRI data; 8) identification of IRI biomarkers from proteomic data; and 9) machine learning methods for biomarker discovery. Each notebook includes an in-line quiz for self-assessment on the submodule topic and an overview video is available on YouTube (https://www.youtube.com/watch?v=2-Q9Ax8EW84). This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cloud-based introduction to BASH programming for biologists. 基于云的生物学家 BASH 编程入门。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae244
Owen M Wilkins, Ross Campbell, Zelaikha Yosufzai, Valena Doe, Shannon M Soucy

This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning', https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial authored by National Institute of General Medical Sciences: NIGMS Sandbox: A Learning Platform toward Democratizing Cloud Computing for Biomedical Research at the beginning of this supplement. This module delivers learning materials introducing the utility of the BASH (Bourne Again Shell) programming language for genomic data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. The next-generation sequencing revolution has generated massive amounts of novel biological data from a multitude of platforms that survey an ever-growing list of genomic modalities. These data require significant downstream computational and statistical analyses to glean meaningful biological insights. However, the skill sets required to generate these data are vastly different from the skills required to analyze these data. Bench scientists that generate next-generation data often lack the training required to perform analysis of these datasets and require support from bioinformatics specialists. Dedicated computational training is required to empower biologists in the area of genomic data analysis, however, learning to efficiently leverage a command line interface is a significant barrier in learning how to leverage common analytical tools. Cloud platforms have the potential to democratize access to the technical tools and computational resources necessary to work with modern sequencing data, providing an effective framework for bioinformatics education. This module aims to provide an interactive platform that slowly builds technical skills and knowledge needed to interact with genomics data on the command line in the Cloud. The sandbox format of this module enables users to move through the material at their own pace and test their grasp of the material with knowledge self-checks before building on that material in the next sub-module. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本手稿介绍了一个资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。国家普通医学研究所》撰写的社论介绍了沙盒的总体起源:NIGMS Sandbox:本补编开头的 "NIGMS 沙盒:生物医学研究云计算民主化的学习平台 "中介绍了沙盒的总体起源。该模块提供的学习材料介绍了 BASH(Bourne Again Shell)编程语言在基因组数据分析中的实用性,该语言采用交互式格式,使用适当的云资源进行数据访问和分析。新一代测序革命产生了海量的新型生物数据,这些数据来自多个平台,调查的基因组模式不断增加。这些数据需要进行大量的下游计算和统计分析,才能获得有意义的生物学见解。然而,生成这些数据所需的技能与分析这些数据所需的技能大相径庭。生成下一代数据的基层科学家往往缺乏对这些数据集进行分析所需的培训,需要生物信息学专家的支持。要提高生物学家在基因组数据分析领域的能力,需要专门的计算培训,然而,学习如何有效利用命令行界面是学习如何利用常用分析工具的一大障碍。云平台有可能使获取现代测序数据所需的技术工具和计算资源的途径民主化,从而为生物信息学教育提供一个有效的框架。本模块旨在提供一个互动平台,慢慢培养在云平台上通过命令行与基因组学数据交互所需的技术技能和知识。本模块的沙盒形式使用户能够按照自己的节奏学习材料,并在下一个子模块中继续学习材料之前,通过知识自检测试自己对材料的掌握程度。本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本补编开头的社论 "NIGMS 沙盒"[1]介绍了沙盒的整体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"Cloud-based introduction to BASH programming for biologists.","authors":"Owen M Wilkins, Ross Campbell, Zelaikha Yosufzai, Valena Doe, Shannon M Soucy","doi":"10.1093/bib/bbae244","DOIUrl":"10.1093/bib/bbae244","url":null,"abstract":"<p><p>This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning', https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial authored by National Institute of General Medical Sciences: NIGMS Sandbox: A Learning Platform toward Democratizing Cloud Computing for Biomedical Research at the beginning of this supplement. This module delivers learning materials introducing the utility of the BASH (Bourne Again Shell) programming language for genomic data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. The next-generation sequencing revolution has generated massive amounts of novel biological data from a multitude of platforms that survey an ever-growing list of genomic modalities. These data require significant downstream computational and statistical analyses to glean meaningful biological insights. However, the skill sets required to generate these data are vastly different from the skills required to analyze these data. Bench scientists that generate next-generation data often lack the training required to perform analysis of these datasets and require support from bioinformatics specialists. Dedicated computational training is required to empower biologists in the area of genomic data analysis, however, learning to efficiently leverage a command line interface is a significant barrier in learning how to leverage common analytical tools. Cloud platforms have the potential to democratize access to the technical tools and computational resources necessary to work with modern sequencing data, providing an effective framework for bioinformatics education. This module aims to provide an interactive platform that slowly builds technical skills and knowledge needed to interact with genomics data on the command line in the Cloud. The sandbox format of this module enables users to move through the material at their own pace and test their grasp of the material with knowledge self-checks before building on that material in the next sub-module. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1