首页 > 最新文献

Bioinformatics最新文献

英文 中文
A machine learning-based quantitative model (LogBB_Pred) to predict the blood-brain barrier permeability (logBB value) of drug compounds. 一种基于机器学习的定量模型(LogBB_Pred),用于预测药物化合物的血脑屏障通透性(LogBB值)。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-10-03 DOI: 10.1093/bioinformatics/btad577
Bilal Shaker, Jingyu Lee, Yunhyeok Lee, Myeong-Sang Yu, Hyang-Mi Lee, Eunee Lee, Hoon-Chul Kang, Kwang-Seok Oh, Hyung Wook Kim, Dokyun Na

Motivation: Efficient assessment of the blood-brain barrier (BBB) penetration ability of a drug compound is one of the major hurdles in central nervous system drug discovery since experimental methods are costly and time-consuming. To advance and elevate the success rate of neurotherapeutic drug discovery, it is essential to develop an accurate computational quantitative model to determine the absolute logBB value (a logarithmic ratio of the concentration of a drug in the brain to its concentration in the blood) of a drug candidate.

Results: Here, we developed a quantitative model (LogBB_Pred) capable of predicting a logBB value of a query compound. The model achieved an R2 of 0.61 on an independent test dataset and outperformed other publicly available quantitative models. When compared with the available qualitative (classification) models that only classified whether a compound is BBB-permeable or not, our model achieved the same accuracy (0.85) with the best qualitative model and far-outperformed other qualitative models (accuracies between 0.64 and 0.70). For further evaluation, our model, quantitative models, and the qualitative models were evaluated on a real-world central nervous system drug screening library. Our model showed an accuracy of 0.97 while the other models showed an accuracy in the range of 0.29-0.83. Consequently, our model can accurately classify BBB-permeable compounds as well as predict the absolute logBB values of drug candidates.

Availability and implementation: Web server is freely available on the web at http://ssbio.cau.ac.kr/software/logbb_pred/. The data used in this study are available to download at http://ssbio.cau.ac.kr/software/logbb_pred/dataset.zip.

动机:有效评估药物化合物的血脑屏障(BBB)穿透能力是中枢神经系统药物发现的主要障碍之一,因为实验方法成本高昂且耗时。为了推进和提高神经治疗药物发现的成功率,必须开发一个准确的计算定量模型来确定候选药物的绝对logBB值(大脑中药物浓度与血液中药物浓度的对数比)。结果:在这里,我们开发了一个能够预测查询化合物的LogBB值的定量模型(LogBB_Pred)。该模型在独立测试数据集上获得了0.61的R2,并优于其他公开可用的定量模型。与只分类化合物是否具有血脑屏障渗透性的现有定性(分类)模型相比,我们的模型获得了与最佳定性模型相同的准确度(0.85),并且远远优于其他定性模型(准确度在0.64和0.70之间)。为了进一步评估,并在真实世界的中枢神经系统药物筛选库中对定性模型进行评估。我们的模型显示出0.97的准确度,而其他模型显示出0.29-0.83的准确度。因此,我们的模型可以准确地对血脑屏障可渗透的化合物进行分类,并预测候选药物的绝对logBB值。可用性和实施:Web服务器可在http://ssbio.cau.ac.kr/software/logbb_pred/.本研究中使用的数据可在http://ssbio.cau.ac.kr/software/logbb_pred/dataset.zip.
{"title":"A machine learning-based quantitative model (LogBB_Pred) to predict the blood-brain barrier permeability (logBB value) of drug compounds.","authors":"Bilal Shaker,&nbsp;Jingyu Lee,&nbsp;Yunhyeok Lee,&nbsp;Myeong-Sang Yu,&nbsp;Hyang-Mi Lee,&nbsp;Eunee Lee,&nbsp;Hoon-Chul Kang,&nbsp;Kwang-Seok Oh,&nbsp;Hyung Wook Kim,&nbsp;Dokyun Na","doi":"10.1093/bioinformatics/btad577","DOIUrl":"10.1093/bioinformatics/btad577","url":null,"abstract":"<p><strong>Motivation: </strong>Efficient assessment of the blood-brain barrier (BBB) penetration ability of a drug compound is one of the major hurdles in central nervous system drug discovery since experimental methods are costly and time-consuming. To advance and elevate the success rate of neurotherapeutic drug discovery, it is essential to develop an accurate computational quantitative model to determine the absolute logBB value (a logarithmic ratio of the concentration of a drug in the brain to its concentration in the blood) of a drug candidate.</p><p><strong>Results: </strong>Here, we developed a quantitative model (LogBB_Pred) capable of predicting a logBB value of a query compound. The model achieved an R2 of 0.61 on an independent test dataset and outperformed other publicly available quantitative models. When compared with the available qualitative (classification) models that only classified whether a compound is BBB-permeable or not, our model achieved the same accuracy (0.85) with the best qualitative model and far-outperformed other qualitative models (accuracies between 0.64 and 0.70). For further evaluation, our model, quantitative models, and the qualitative models were evaluated on a real-world central nervous system drug screening library. Our model showed an accuracy of 0.97 while the other models showed an accuracy in the range of 0.29-0.83. Consequently, our model can accurately classify BBB-permeable compounds as well as predict the absolute logBB values of drug candidates.</p><p><strong>Availability and implementation: </strong>Web server is freely available on the web at http://ssbio.cau.ac.kr/software/logbb_pred/. The data used in this study are available to download at http://ssbio.cau.ac.kr/software/logbb_pred/dataset.zip.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10560102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10260174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FrameD: framework for DNA-based data storage design, verification, and validation. FrameD:基于DNA的数据存储设计、验证和验证框架。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-10-03 DOI: 10.1093/bioinformatics/btad572
Kevin D Volkel, Kevin N Lin, Paul W Hook, Winston Timp, Albert J Keung, James M Tuck

Motivation: DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.

Results: We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.

Availability and implementation: The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).

动机:基于DNA的数据存储是一个快速发展的领域,希望利用DNA分子的巨大理论信息密度,生产出一种具有竞争力的适用于档案数据的下一代存储介质。近年来,已经提出了许多基于DNA的存储系统设计。由于不存在用于模拟这些存储系统的通用基础架构,因此比较许多不同的设计以及许多不同的错误模型变得越来越困难。为了应对这一挑战,我们引入了FrameD,这是一种用于DNA存储系统的模拟基础设施,它利用DNA存储系统设计的底层模块性,提供了一个框架来表达不同的设计,同时能够重用通用组件。结果:我们通过案例研究证明了FrameD的实用性和对通用仿真平台的需求。我们的案例研究比较了以不同方式使用链拷贝的设计,有些使用多个序列比对算法比对链拷贝,有些则不使用。我们发现,在管道中包括多序列比对的选择取决于错误率和注入的错误类型,并不总是有益的。除了支持广泛的设计外,FrameD还为用户提供了透明的并行性,以处理来自测序的大量读取以及许多故障注入迭代的需要。我们相信,FrameD通过提供一个模块化和可扩展的框架,支持大规模并行性,填补了DNA存储社区公开可用工具的空白。因此,它将有助于加快未来基于DNA的存储系统的设计过程。可用性和实现:FrameD的源代码以及在FrameD演示过程中生成的数据可在公共Github存储库中获得,网址为https://github.com/dna-storage/framed(https://dx.doi.org/10.5281/zenodo.7757762)。
{"title":"FrameD: framework for DNA-based data storage design, verification, and validation.","authors":"Kevin D Volkel,&nbsp;Kevin N Lin,&nbsp;Paul W Hook,&nbsp;Winston Timp,&nbsp;Albert J Keung,&nbsp;James M Tuck","doi":"10.1093/bioinformatics/btad572","DOIUrl":"10.1093/bioinformatics/btad572","url":null,"abstract":"<p><strong>Motivation: </strong>DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.</p><p><strong>Results: </strong>We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.</p><p><strong>Availability and implementation: </strong>The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10563143/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10261101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. 使用特定章节学习法从随机对照试验摘要中精确提取 PICO。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-05 DOI: 10.1093/bioinformatics/btad542
Yan Hu, Vipina K Keloth, Kalpana Raja, Yong Chen, Hua Xu

Motivation: Automated extraction of participants, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation.

Results: We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmoddataset, a randomly selected and reannotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 COVID-19 RCT abstracts, and a dataset of 150 Alzheimer's disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level.

Availability: Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.

Supplementary information: Supplementary data are available at Bioinformatics online.

动机从随机对照试验(RCT)摘要中自动提取参与者、干预措施、对比/对照和结果(PICO)对于证据综合非常重要。之前的研究已经证明了应用自然语言处理(NLP)提取 PICO 的可行性。然而,由于 RCT 摘要中 PICO 信息的复杂性及其注释所涉及的挑战,其性能并不理想:我们提出了从 RCT 摘要中提取 PICO 要素的两步 NLP 流程:(i) 使用基于提示的学习模型进行句子分类;(ii) 使用命名实体识别(NER)模型提取 PICO。首先,将摘要中的句子分为四个部分,即背景、方法、结果和结论。然后,应用 NER 模型从标题和方法部分的句子中提取 PICO 要素,这两个部分包含的 PICO 信息量大于 96%。我们在三个数据集上评估了我们提出的 NLP 管道:EBM-NLPmoddataset(从 EBM-NLP 语料库中随机挑选并重新标注的 500 篇 RCT 摘要数据集)、150 篇 COVID-19 RCT 摘要数据集和 150 篇阿尔茨海默病(AD)RCT 摘要数据集。端到端评估结果表明,我们提出的方法在EBM-NLPmod数据集上取得了0.833的微观F1得分,在COVID-19数据集上取得了0.928的微观F1得分,在AD数据集上取得了0.899的微观F1得分;在EBM-NLPmod数据集上取得了0.712的微观F1得分,在COVID-19数据集上取得了0.850的微观F1得分,在AD数据集上取得了0.805的微观F1得分;在实体层面上取得了0.712的微观F1得分,在COVID-19数据集上取得了0.850的微观F1得分,在AD数据集上取得了0.805的微观F1得分:我们的代码和数据集可在 https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.Supplementary 信息网站上公开获取:补充数据可在 Bioinformatics online 上获取。
{"title":"Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach.","authors":"Yan Hu, Vipina K Keloth, Kalpana Raja, Yong Chen, Hua Xu","doi":"10.1093/bioinformatics/btad542","DOIUrl":"10.1093/bioinformatics/btad542","url":null,"abstract":"<p><strong>Motivation: </strong>Automated extraction of participants, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation.</p><p><strong>Results: </strong>We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmoddataset, a randomly selected and reannotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 COVID-19 RCT abstracts, and a dataset of 150 Alzheimer's disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level.</p><p><strong>Availability: </strong>Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10261389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general framework for powerful confounder adjustment in omics association studies. 组学关联研究中强大的混杂因素调整的通用框架。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad563
Asmita Roy, Jun Chen, Xianyang Zhang

Motivation: Genomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction.

Results: This study shows that the traditional approach is suboptimal and proposes a new two-dimensional false discovery rate control framework (2DFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2DFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2DFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications.

Availability and implementation: R codes and vignettes are available at https://github.com/asmita112358/tdfdr.np.

动机:基因组数据受到各种混杂来源的影响,如人口统计学变量、生物学异质性和批量效应。为了在存在混杂因素的情况下识别与感兴趣变量相关的基因组特征,传统方法包括将混杂因素调整的回归模型拟合到每个基因组特征,然后进行多重性校正。结果:本研究表明,传统方法是次优的,并提出了一种新的二维错误发现率控制框架(2DFDR+),该框架比传统方法提供了显著的功率改进,适用于广泛的设置。2DFDR+使用边际独立性测试统计数据作为辅助信息来过滤出不太有希望的特征,并基于剩余特征中的条件独立性测试统计学来执行FDR控制。2DFDR+在给定感兴趣的协变和混杂因素的基因组变量的条件分布是任意和完全未知的情况下,从样本中提供(渐进)有效的推断。通过广泛的模拟和实际数据应用,展示了有希望的有限样本性能。可用性和实施:R代码和小插曲可在https://github.com/asmita112358/tdfdr.np.
{"title":"A general framework for powerful confounder adjustment in omics association studies.","authors":"Asmita Roy,&nbsp;Jun Chen,&nbsp;Xianyang Zhang","doi":"10.1093/bioinformatics/btad563","DOIUrl":"10.1093/bioinformatics/btad563","url":null,"abstract":"<p><strong>Motivation: </strong>Genomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction.</p><p><strong>Results: </strong>This study shows that the traditional approach is suboptimal and proposes a new two-dimensional false discovery rate control framework (2DFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2DFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2DFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications.</p><p><strong>Availability and implementation: </strong>R codes and vignettes are available at https://github.com/asmita112358/tdfdr.np.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10539716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10188188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ActivePPI: quantifying protein-protein interaction network activity with Markov random fields. ActivePPI:用马尔可夫随机场量化蛋白质-蛋白质相互作用网络活性。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad567
Chuanyuan Wang, Shiyu Xu, Duanchen Sun, Zhi-Ping Liu

Motivation: Protein-protein interactions (PPI) are crucial components of the biomolecular networks that enable cells to function. Biological experiments have identified a large number of PPI, and these interactions are stored in knowledge bases. However, these interactions are often restricted to specific cellular environments and conditions. Network activity can be characterized as the extent of agreement between a PPI network (PPIN) and a distinct cellular environment measured by protein mass spectrometry, and it can also be quantified as a statistical significance score. Without knowing the activity of these PPI in the cellular environments or specific phenotypes, it is impossible to reveal how these PPI perform and affect cellular functioning.

Results: To calculate the activity of PPIN in different cellular conditions, we proposed a PPIN activity evaluation framework named ActivePPI to measure the consistency between network architecture and protein measurement data. ActivePPI estimates the probability density of protein mass spectrometry abundance and models PPIN using a Markov-random-field-based method. Furthermore, empirical P-value is derived based on a nonparametric permutation test to quantify the likelihood significance of the match between PPIN structure and protein abundance data. Extensive numerical experiments demonstrate the superior performance of ActivePPI and result in network activity evaluation, pathway activity assessment, and optimal network architecture tuning tasks. To summarize it succinctly, ActivePPI is a versatile tool for evaluating PPI network that can uncover the functional significance of protein interactions in crucial cellular biological processes and offer further insights into physiological phenomena.

Availability and implementation: All source code and data are freely available at https://github.com/zpliulab/ActivePPI.

动机:蛋白质-蛋白质相互作用(PPI)是使细胞发挥功能的生物分子网络的关键组成部分。生物实验已经确定了大量的PPI,并且这些相互作用存储在知识库中。然而,这些相互作用通常局限于特定的细胞环境和条件。网络活性可以表征为PPI网络(PPIN)与蛋白质质谱法测量的不同细胞环境之间的一致程度,也可以量化为统计显著性得分。如果不知道这些PPI在细胞环境或特定表型中的活性,就不可能揭示这些PPI是如何表现和影响细胞功能的。结果:为了计算PPIN在不同细胞条件下的活性,我们提出了一个名为ActivePPI的PPIN活性评估框架,以测量网络结构和蛋白质测量数据之间的一致性。ActivePPI估计蛋白质质谱丰度的概率密度,并使用基于马尔可夫随机场的方法对PPIN进行建模。此外,基于非参数排列检验推导了经验P值,以量化PPIN结构和蛋白质丰度数据之间匹配的似然显著性。大量的数值实验证明了ActivePPI的优越性能,并导致了网络活动评估、路径活动评估和最佳网络架构调整任务。简而言之,ActivePPI是一种评估PPI网络的通用工具,可以揭示蛋白质相互作用在关键细胞生物学过程中的功能意义,并对生理现象提供进一步的见解。可用性和实现:所有源代码和数据均可在https://github.com/zpliulab/ActivePPI.
{"title":"ActivePPI: quantifying protein-protein interaction network activity with Markov random fields.","authors":"Chuanyuan Wang,&nbsp;Shiyu Xu,&nbsp;Duanchen Sun,&nbsp;Zhi-Ping Liu","doi":"10.1093/bioinformatics/btad567","DOIUrl":"10.1093/bioinformatics/btad567","url":null,"abstract":"<p><strong>Motivation: </strong>Protein-protein interactions (PPI) are crucial components of the biomolecular networks that enable cells to function. Biological experiments have identified a large number of PPI, and these interactions are stored in knowledge bases. However, these interactions are often restricted to specific cellular environments and conditions. Network activity can be characterized as the extent of agreement between a PPI network (PPIN) and a distinct cellular environment measured by protein mass spectrometry, and it can also be quantified as a statistical significance score. Without knowing the activity of these PPI in the cellular environments or specific phenotypes, it is impossible to reveal how these PPI perform and affect cellular functioning.</p><p><strong>Results: </strong>To calculate the activity of PPIN in different cellular conditions, we proposed a PPIN activity evaluation framework named ActivePPI to measure the consistency between network architecture and protein measurement data. ActivePPI estimates the probability density of protein mass spectrometry abundance and models PPIN using a Markov-random-field-based method. Furthermore, empirical P-value is derived based on a nonparametric permutation test to quantify the likelihood significance of the match between PPIN structure and protein abundance data. Extensive numerical experiments demonstrate the superior performance of ActivePPI and result in network activity evaluation, pathway activity assessment, and optimal network architecture tuning tasks. To summarize it succinctly, ActivePPI is a versatile tool for evaluating PPI network that can uncover the functional significance of protein interactions in crucial cellular biological processes and offer further insights into physiological phenomena.</p><p><strong>Availability and implementation: </strong>All source code and data are freely available at https://github.com/zpliulab/ActivePPI.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10224105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
aenmd: annotating escape from nonsense-mediated decay for transcripts with protein-truncating variants. aenmd:注释具有蛋白质截短变体的转录物从无义介导的衰变中逃逸。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad556
Jonathan Klonowski, Qianqian Liang, Zeynep Coban-Akdemir, Cecilia Lo, Dennis Kostka

Summary: DNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce transcript degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease. Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It offers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these variants that could exert DN/GOF effects via NMD escape.

Availability and implementation: aenmd is implemented in the R programming language. Code is available on GitHub as an R-package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).

摘要:引起过早终止密码子(PTC)的DNA变化代表了临床相关致病基因组变异的很大一部分。通常,PTC通过无义介导的mRNA衰变(NMD)诱导转录物降解,并使这种变化失去功能等位基因。然而,某些含有PTC的转录物可以逃避NMD,并可以发挥显性负效应或功能获得效应(DN/GOF)。因此,系统鉴定人类PTC引起的变异及其对NMD的易感性有助于研究DN/GOF等位基因在人类疾病中的作用。在这里,我们介绍了aenmd,一个用于注释PTC的软件,该软件包含预测NMD逃逸的转录物变体对。aenmd是一个用户友好且独立的系统。它提供了目前其他方法无法提供的功能,并基于已建立和实验验证的NMD逃逸规则;该软件旨在大规模工作,并与现有的分析工作流程无缝集成。我们将aenmd应用于gnomAD、Clinvar和GWAS目录数据库中的变体,并报告了这些数据库中引起人类PTC的变体的流行率,以及这些变体中可以通过NMD逃逸发挥DN/GOF作用的子集。可用性和实现:aenmd是用R编程语言实现的。代码在GitHub上可以作为R包(GitHub.com/kostkalab/aenmd.git)和容器化命令行接口(GitHub.com/skostkalb/aenmd_cli.git)使用。
{"title":"aenmd: annotating escape from nonsense-mediated decay for transcripts with protein-truncating variants.","authors":"Jonathan Klonowski,&nbsp;Qianqian Liang,&nbsp;Zeynep Coban-Akdemir,&nbsp;Cecilia Lo,&nbsp;Dennis Kostka","doi":"10.1093/bioinformatics/btad556","DOIUrl":"10.1093/bioinformatics/btad556","url":null,"abstract":"<p><strong>Summary: </strong>DNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce transcript degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease. Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It offers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these variants that could exert DN/GOF effects via NMD escape.</p><p><strong>Availability and implementation: </strong>aenmd is implemented in the R programming language. Code is available on GitHub as an R-package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10284138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking and curating putative SARS-CoV-2 recombinants with RIVET. 用RIVET追踪和培养假定的SARS-CoV-2重组病毒。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad538
Kyle Smith, Cheng Ye, Yatish Turakhia

Motivation: Identifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation.

Results: We present RIVET-a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants.

Availability and implementation: RIVET-based web interface displaying the most updated analysis of potential SARS-CoV-2 recombinants is available at https://rivet.ucsd.edu/. RIVET's frontend and backend code is freely available under the MIT license at https://github.com/TurakhiaLab/rivet and the documentation for RIVET is available at https://turakhialab.github.io/rivet/. The inputs necessary for running RIVET's backend workflow for SARS-CoV-2 are available through a public database maintained and updated daily by UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/).

动机:识别和追踪重组SARS-CoV-2菌株对于了解病毒的进化和控制其传播至关重要。但是,从每天在网上共享的数千个新基因组序列中自信地识别SARS-CoV-2重组是相当具有挑战性的,导致许多重组被遗漏,或者在接受专家管理时延迟数周才被正式识别。结果:我们提出了rivet -一个软件管道和可视化平台,利用重组推断的最新算法进展,全面、灵敏地搜索潜在的SARS-CoV-2重组体,并在web界面中组织相关信息,这将有助于大大加快识别和跟踪重组体的过程。可用性和实施:基于rivet的web界面显示对潜在SARS-CoV-2重组体的最新分析,可在https://rivet.ucsd.edu/上获得。RIVET的前端和后端代码在MIT许可下可在https://github.com/TurakhiaLab/rivet免费获得,RIVET的文档可在https://turakhialab.github.io/rivet/获得。运行RIVET针对SARS-CoV-2的后端工作流程所需的输入可通过UCSC每天维护和更新的公共数据库(https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/)获得。
{"title":"Tracking and curating putative SARS-CoV-2 recombinants with RIVET.","authors":"Kyle Smith,&nbsp;Cheng Ye,&nbsp;Yatish Turakhia","doi":"10.1093/bioinformatics/btad538","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad538","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation.</p><p><strong>Results: </strong>We present RIVET-a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants.</p><p><strong>Availability and implementation: </strong>RIVET-based web interface displaying the most updated analysis of potential SARS-CoV-2 recombinants is available at https://rivet.ucsd.edu/. RIVET's frontend and backend code is freely available under the MIT license at https://github.com/TurakhiaLab/rivet and the documentation for RIVET is available at https://turakhialab.github.io/rivet/. The inputs necessary for running RIVET's backend workflow for SARS-CoV-2 are available through a public database maintained and updated daily by UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/).</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10493179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10285636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning. MULGA,一种基于多视图图自动编码器的统一方法,用于识别药物-蛋白质相互作用和药物重新定位。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad524
Jiani Ma, Chen Li, Yiwen Zhang, Zhikang Wang, Shanshan Li, Yuming Guo, Lin Zhang, Hui Liu, Xin Gao, Jiangning Song

Motivation: Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed.

Results: To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new "guilty-by-association"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.

Availability and implementation: MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.

动机:识别药物-蛋白质相互作用(DPI)是药物重新定位的关键一步,这允许重复使用可能对治疗不同疾病有效的获批药物,从而缓解新药开发的挑战。尽管已经提出了多种DPI预测的计算方法,但关键挑战,如可扩展和无偏的相似性计算、异构信息利用和可靠的负样本选择,仍有待解决。结果:为了解决这些问题,我们提出了一种新的、统一的多视图图自动编码器框架,称为MULGA,用于DPI和药物重新定位预测。MULGA的特点是:(i)一种多视角学习技术,可以有效地学习真实的药物亲和力和靶点亲和力矩阵;(ii)图自动编码器,用于推断缺失的DPI交互;以及(iii)一种新的基于“关联有罪”的负采样方法,用于选择高度可靠的非DPI。基准实验表明,MULGA在DPI预测方面优于最先进的方法,消融研究验证了每个拟议组件的有效性。重要的是,我们重点介绍了MULGA入围的针对严重急性呼吸综合征冠状病毒2(SAR-CoV-2)刺突糖蛋白的顶级药物,为新冠肺炎的治疗提供了更多见解和潜在的有用选择。结合数据集和源代码的可用性,我们设想MULGA可以作为DPI预测和药物重新定位的有用工具进行探索。可用性和实施:MULGA可在https://github.com/jianiM/MULGA/.
{"title":"MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning.","authors":"Jiani Ma,&nbsp;Chen Li,&nbsp;Yiwen Zhang,&nbsp;Zhikang Wang,&nbsp;Shanshan Li,&nbsp;Yuming Guo,&nbsp;Lin Zhang,&nbsp;Hui Liu,&nbsp;Xin Gao,&nbsp;Jiangning Song","doi":"10.1093/bioinformatics/btad524","DOIUrl":"10.1093/bioinformatics/btad524","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed.</p><p><strong>Results: </strong>To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new \"guilty-by-association\"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.</p><p><strong>Availability and implementation: </strong>MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10049260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ePlatypus: an ecosystem for computational analysis of immunogenomics data. ePlatypus:用于免疫基因组学数据计算分析的生态系统。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad553
Tudor-Stefan Cotet, Andreas Agrafiotis, Victor Kreiner, Raphael Kuhn, Danielle Shlesinger, Marcos Manero-Carranza, Keywan Khodaverdi, Evgenios Kladis, Aurora Desideri Perea, Dylan Maassen-Veeters, Wiona Glänzer, Solène Massery, Lorenzo Guerci, Kai-Lin Hong, Jiami Han, Kostas Stiklioraitis, Vittoria Martinolli D'Arcy, Raphael Dizerens, Samuel Kilchenmann, Lucas Stalder, Leon Nissen, Basil Vogelsanger, Stine Anzböck, Daria Laslo, Sophie Bakker, Melinda Kondorosy, Marco Venerito, Alejandro Sanz García, Isabelle Feller, Annette Oxenius, Sai T Reddy, Alexander Yermanos

Motivation: The maturation of systems immunology methodologies requires novel and transparent computational frameworks capable of integrating diverse data modalities in a reproducible manner.

Results: Here, we present the ePlatypus computational immunology ecosystem for immunogenomics data analysis, with a focus on adaptive immune repertoires and single-cell sequencing. ePlatypus is an open-source web-based platform and provides programming tutorials and an integrative database that helps elucidate signatures of B and T cell clonal selection. Furthermore, the ecosystem links novel and established bioinformatics pipelines relevant for single-cell immune repertoires and other aspects of computational immunology such as predicting ligand-receptor interactions, structural modeling, simulations, machine learning, graph theory, pseudotime, spatial transcriptomics, and phylogenetics. The ePlatypus ecosystem helps extract deeper insight in computational immunology and immunogenomics and promote open science.

Availability and implementation: Platypus code used in this manuscript can be found at github.com/alexyermanos/Platypus.

动机:系统免疫学方法的成熟需要新颖透明的计算框架,能够以可复制的方式集成各种数据模式。结果:在这里,我们介绍了用于免疫基因组学数据分析的ePlatypus计算免疫学生态系统,重点是适应性免疫库和单细胞测序。ePlatypus是一个开源的基于网络的平台,提供编程教程和综合数据库,帮助阐明B细胞和T细胞克隆选择的特征。此外,该生态系统连接了与单细胞免疫库和计算免疫学的其他方面相关的新的和已建立的生物信息学管道,如预测配体-受体相互作用、结构建模、模拟、机器学习、图论、假时间、空间转录组学和系统发育学。ePlatypus生态系统有助于深入了解计算免疫学和免疫基因组学,并促进开放科学。可用性和实现:本文中使用的Platypus代码可以在github.com/alexyermanos/Platypus上找到。
{"title":"ePlatypus: an ecosystem for computational analysis of immunogenomics data.","authors":"Tudor-Stefan Cotet,&nbsp;Andreas Agrafiotis,&nbsp;Victor Kreiner,&nbsp;Raphael Kuhn,&nbsp;Danielle Shlesinger,&nbsp;Marcos Manero-Carranza,&nbsp;Keywan Khodaverdi,&nbsp;Evgenios Kladis,&nbsp;Aurora Desideri Perea,&nbsp;Dylan Maassen-Veeters,&nbsp;Wiona Glänzer,&nbsp;Solène Massery,&nbsp;Lorenzo Guerci,&nbsp;Kai-Lin Hong,&nbsp;Jiami Han,&nbsp;Kostas Stiklioraitis,&nbsp;Vittoria Martinolli D'Arcy,&nbsp;Raphael Dizerens,&nbsp;Samuel Kilchenmann,&nbsp;Lucas Stalder,&nbsp;Leon Nissen,&nbsp;Basil Vogelsanger,&nbsp;Stine Anzböck,&nbsp;Daria Laslo,&nbsp;Sophie Bakker,&nbsp;Melinda Kondorosy,&nbsp;Marco Venerito,&nbsp;Alejandro Sanz García,&nbsp;Isabelle Feller,&nbsp;Annette Oxenius,&nbsp;Sai T Reddy,&nbsp;Alexander Yermanos","doi":"10.1093/bioinformatics/btad553","DOIUrl":"10.1093/bioinformatics/btad553","url":null,"abstract":"<p><strong>Motivation: </strong>The maturation of systems immunology methodologies requires novel and transparent computational frameworks capable of integrating diverse data modalities in a reproducible manner.</p><p><strong>Results: </strong>Here, we present the ePlatypus computational immunology ecosystem for immunogenomics data analysis, with a focus on adaptive immune repertoires and single-cell sequencing. ePlatypus is an open-source web-based platform and provides programming tutorials and an integrative database that helps elucidate signatures of B and T cell clonal selection. Furthermore, the ecosystem links novel and established bioinformatics pipelines relevant for single-cell immune repertoires and other aspects of computational immunology such as predicting ligand-receptor interactions, structural modeling, simulations, machine learning, graph theory, pseudotime, spatial transcriptomics, and phylogenetics. The ePlatypus ecosystem helps extract deeper insight in computational immunology and immunogenomics and promote open science.</p><p><strong>Availability and implementation: </strong>Platypus code used in this manuscript can be found at github.com/alexyermanos/Platypus.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10173922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoleculeExperiment enables consistent infrastructure for molecule-resolved spatial omics data in bioconductor. MoleculeExperiment为生物导管中分子解析的空间组学数据提供了一致的基础设施。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad550
Bárbara Zita Peters Couto, Nicholas Robertson, Ellis Patrick, Shila Ghazanfar

Motivation: Imaging-based spatial transcriptomics (ST) technologies have achieved subcellular resolution, enabling detection of individual molecules in their native tissue context. Data associated with these technologies promise unprecedented opportunity toward understanding cellular and subcellular biology. However, in R/Bioconductor, there is a scarcity of existing computational infrastructure to represent such data, and particularly to summarize and transform it for existing widely adopted computational tools in single-cell transcriptomics analysis, including SingleCellExperiment and SpatialExperiment (SPE) classes. With the emergence of several commercial offerings of imaging-based ST, there is a pressing need to develop consistent data structure standards for these technologies at the individual molecule-level.

Results: To this end, we have developed MoleculeExperiment, an R/Bioconductor package, which (i) stores molecule and cell segmentation boundary information at the molecule-level, (ii) standardizes this molecule-level information across different imaging-based ST technologies, including 10× Genomics' Xenium, and (iii) streamlines transition from a MoleculeExperiment object to a SpatialExperiment object. Overall, MoleculeExperiment is generally applicable as a data infrastructure class for consistent analysis of molecule-resolved spatial omics data.

Availability and implementation: The MoleculeExperiment package is publicly available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html. Source code is available on Github at: https://github.com/SydneyBioX/MoleculeExperiment. The vignette for MoleculeExperiment can be found at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.

动机:基于成像的空间转录组学(ST)技术已经实现了亚细胞分辨率,能够在其天然组织环境中检测单个分子。与这些技术相关的数据为理解细胞和亚细胞生物学提供了前所未有的机会。然而,在R/Bioconductor中,缺乏现有的计算基础设施来表示这些数据,特别是为单细胞转录组学分析中广泛采用的现有计算工具总结和转换这些数据,包括单细胞实验和空间实验(SPE)类。随着基于成像的ST的几种商业产品的出现,迫切需要在单个分子水平上为这些技术开发一致的数据结构标准。结果:为此,我们开发了MoleculeExperiment,一种R/生物导体包,它(i)在分子水平上存储分子和细胞分割边界信息,(ii)在不同的基于成像的ST技术(包括10×Genomics的Xenium)中标准化这种分子水平的信息,以及(iii)简化从分子实验对象到空间实验对象的转换。总体而言,MoleculeExperiment通常适用于作为一个数据基础设施类,用于对分子解析的空间组学数据进行一致分析。可用性和实施:MoleculeExperiment软件包可在Bioconductor上公开获取,网址为https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.源代码可在Github上获得,网址为:https://github.com/SydneyBioX/MoleculeExperiment.MoleculeExperiment的小插曲可以在https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.
{"title":"MoleculeExperiment enables consistent infrastructure for molecule-resolved spatial omics data in bioconductor.","authors":"Bárbara Zita Peters Couto, Nicholas Robertson, Ellis Patrick, Shila Ghazanfar","doi":"10.1093/bioinformatics/btad550","DOIUrl":"10.1093/bioinformatics/btad550","url":null,"abstract":"<p><strong>Motivation: </strong>Imaging-based spatial transcriptomics (ST) technologies have achieved subcellular resolution, enabling detection of individual molecules in their native tissue context. Data associated with these technologies promise unprecedented opportunity toward understanding cellular and subcellular biology. However, in R/Bioconductor, there is a scarcity of existing computational infrastructure to represent such data, and particularly to summarize and transform it for existing widely adopted computational tools in single-cell transcriptomics analysis, including SingleCellExperiment and SpatialExperiment (SPE) classes. With the emergence of several commercial offerings of imaging-based ST, there is a pressing need to develop consistent data structure standards for these technologies at the individual molecule-level.</p><p><strong>Results: </strong>To this end, we have developed MoleculeExperiment, an R/Bioconductor package, which (i) stores molecule and cell segmentation boundary information at the molecule-level, (ii) standardizes this molecule-level information across different imaging-based ST technologies, including 10× Genomics' Xenium, and (iii) streamlines transition from a MoleculeExperiment object to a SpatialExperiment object. Overall, MoleculeExperiment is generally applicable as a data infrastructure class for consistent analysis of molecule-resolved spatial omics data.</p><p><strong>Availability and implementation: </strong>The MoleculeExperiment package is publicly available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html. Source code is available on Github at: https://github.com/SydneyBioX/MoleculeExperiment. The vignette for MoleculeExperiment can be found at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10504467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10307715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1