首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
ursaPGx: a new R package to annotate pharmacogenetic star alleles using phased whole-genome sequencing data. ursaPGx:利用分阶段全基因组测序数据注释药物遗传星等位基因的新 R 软件包。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-12 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1351620
Gennaro Calendo, Dara Kusic, Jozef Madzo, Neda Gharani, Laura Scheinfeldt

Long-read sequencing technologies offer new opportunities to generate high-confidence phased whole-genome sequencing data for robust pharmacogenetic annotation. Here, we describe a new user-friendly R package, ursaPGx, designed to accept multi-sample phased whole-genome sequencing data VCF input files and output star allele annotations for pharmacogenes annotated in PharmVar.

长线程测序技术为生成高置信度的分阶段全基因组测序数据以进行可靠的药物基因注释提供了新的机遇。在此,我们介绍了一个新的用户友好型 R 软件包 ursaPGx,它可接受多样本分阶段全基因组测序数据 VCF 输入文件,并为 PharmVar 中注释的药物基因输出星等位基因注释。
{"title":"ursaPGx: a new R package to annotate pharmacogenetic star alleles using phased whole-genome sequencing data.","authors":"Gennaro Calendo, Dara Kusic, Jozef Madzo, Neda Gharani, Laura Scheinfeldt","doi":"10.3389/fbinf.2024.1351620","DOIUrl":"10.3389/fbinf.2024.1351620","url":null,"abstract":"<p><p>Long-read sequencing technologies offer new opportunities to generate high-confidence phased whole-genome sequencing data for robust pharmacogenetic annotation. Here, we describe a new user-friendly R package, ursaPGx, designed to accept multi-sample phased whole-genome sequencing data VCF input files and output star allele annotations for pharmacogenes annotated in PharmVar.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1351620"},"PeriodicalIF":2.8,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10963438/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140295559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A network-based method for associating genes with autism spectrum disorder. 基于网络的自闭症谱系障碍基因关联方法。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-08 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1295600
Neta Zadok, Gil Ast, Roded Sharan

Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.

自闭症谱系障碍(ASD)是一种高度遗传的复杂疾病,影响着1%的人口,但其潜在的分子机制却在很大程度上不为人知。在这里,我们通过将基因组尺度数据与网络传播方法相结合,研究了预测 ASD 致病基因的问题。我们构建了一个预测器,它整合了多个 omic 数据集,这些数据集评估了基因组、转录组、蛋白质组和磷酸蛋白组与 ASD 的关联。在交叉验证中,我们的预测器得出的平均 ROC 曲线下面积为 0.87,精度-召回曲线下面积为 0.89。我们进一步证明,它优于以前的自闭症关联基因水平预测方法。最后,我们还表明,我们可以使用该模型预测与精神分裂症相关的基因,而精神分裂症与自闭症具有相同的遗传成分。
{"title":"A network-based method for associating genes with autism spectrum disorder.","authors":"Neta Zadok, Gil Ast, Roded Sharan","doi":"10.3389/fbinf.2024.1295600","DOIUrl":"10.3389/fbinf.2024.1295600","url":null,"abstract":"<p><p>Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1295600"},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10960359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting subtle transcriptomic perturbations induced by lncRNAs knock-down in single-cell CRISPRi screening using a new sparse supervised autoencoder neural network. 在单细胞 CRISPRi 筛选中使用新型稀疏监督自动编码器神经网络检测 lncRNAs 敲除引起的微妙转录组扰动
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-04 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1340339
Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari

Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.

基于CRISPR技术的单细胞转录组筛选是一种有效的遗传工具,可同时评估以一组引导RNA(gRNA)为靶标的细胞的表达谱,并从观察到的扰动推断靶基因的功能。然而,由于各种局限性,这种方法在检测微弱扰动方面缺乏灵敏度,在研究转录因子等主调节因子时基本不可靠。为了克服检测微妙的 gRNA 诱导的转录组扰动并对反应最灵敏的细胞进行分类这一难题,我们开发了一种新的有监督自动编码器神经网络方法。我们的稀疏监督自动编码器(SSAE)神经网络可同时选择相关特征(基因)和实际受扰动细胞。我们将这种方法应用于基于 CRISPR 干涉(CRISPRi)的内部单细胞转录组筛选(CROP-Seq),重点研究肺腺癌(LUAD)中受缺氧调控的长非编码 RNA(lncRNA)子集。我们首先验证了针对 HIF1A 和 HIF2 的 SSAE 方法,确认了在缺氧反应的时间转换过程中敲除这两种基因的特殊效果。接下来,SSAE方法能够检测到一些lncRNAs候选基因敲除诱导的稳定的短缺氧依赖性转录组特征,优于之前发表的机器学习方法。这一概念验证证明了 SSAE 方法在解读单细胞转录组数据读出的微弱扰动方面的相关性,是基于 CRISPR 的筛选的一部分。
{"title":"Detecting subtle transcriptomic perturbations induced by lncRNAs knock-down in single-cell CRISPRi screening using a new sparse supervised autoencoder neural network.","authors":"Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari","doi":"10.3389/fbinf.2024.1340339","DOIUrl":"10.3389/fbinf.2024.1340339","url":null,"abstract":"<p><p>Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1340339"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting cell population-specific gene expression from genomic sequence. 从基因组序列预测细胞群特异性基因表达。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-04 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1347276
Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz

Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.

大多数调控元件,尤其是增强子序列,都具有细胞群体特异性。甚至可以说,一组独特的调控元件就是细胞群体的定义。然而,要发现 DNA 中哪些非编码区域在何种情况下必不可少,从而发现哪些基因会表达,是一项艰巨的任务。一些计算模型通过直接从基因组序列预测基因表达来解决这一问题。这些模型目前仅限于预测批量测量,主要是针对特定组织进行预测。在这里,我们提出了一种利用单细胞 RNA 序列数据预测基因表达的模型。我们的研究表明,细胞群特异性模型优于组织特异性模型,尤其是当细胞群和相应组织的表达谱不同时。此外,我们的研究还表明,我们的模型可以确定 GWAS 变异的优先次序,并学习转录因子结合位点的图案。我们设想,我们的模型可用于划分细胞群特异性调控元件。
{"title":"Predicting cell population-specific gene expression from genomic sequence.","authors":"Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz","doi":"10.3389/fbinf.2024.1347276","DOIUrl":"10.3389/fbinf.2024.1347276","url":null,"abstract":"<p><p>Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347276"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Where are we in the implementation of tissue-specific epigenetic clocks? 组织特异性表观遗传时钟的实施进展如何?
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-04 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1306244
Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini

Introduction: DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. Methods: Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. Results: Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. Conclusion: These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.

导言DNA 甲基化时钟在确定疾病早期标志物的宏伟目标方面具有优势,其概念是加速衰老是可靠的预测指标。方法:这种基于表观基因组学的工具预计会受到性别和组织特异性的影响,这项工作就是要量化这种依赖性以及回归模型和训练集大小的影响。结果我们的量化结果表明,弹性网惩罚是性能最好的策略,而且在数据集越大的情况下效果越好,这一点不足为奇;性别似乎并不影响时钟的性能,而组织特异性时钟似乎比一般血液时钟性能更好。最后,在考虑所有训练有素的时钟时,我们发现了一个基因子集,据我们所知,这些基因子集尚未被提出,可能值得进一步研究:CPT1A、MMP15、SHROOM3、SLIT3 和 SYNGR。结论这些事实出发点对时钟未来的医学转化很有帮助,特别是在多组织时钟(通常在大多数血液样本上进行训练)和组织特异性时钟之间的争论中。
{"title":"Where are we in the implementation of tissue-specific epigenetic clocks?","authors":"Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini","doi":"10.3389/fbinf.2024.1306244","DOIUrl":"10.3389/fbinf.2024.1306244","url":null,"abstract":"<p><p><b>Introduction:</b> DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. <b>Methods:</b> Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. <b>Results:</b> Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. <b>Conclusion:</b> These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1306244"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational identification of antibody-binding epitopes from mimotope datasets. 从拟态数据集计算识别抗体结合表位。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-23 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1295972
Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy

Introduction: A fundamental challenge in computational vaccinology is that most B-cell epitopes are conformational and therefore hard to predict from sequence alone. Another significant challenge is that a great deal of the amino acid sequence of a viral surface protein might not in fact be antigenic. Thus, identifying the regions of a protein that are most promising for vaccine design based on the degree of surface exposure may not lead to a clinically relevant immune response. Methods: Linear peptides selected by phage display experiments that have high affinity to the monoclonal antibody of interest ("mimotopes") usually have similar physicochemical properties to the antigen epitope corresponding to that antibody. The sequences of these linear peptides can be used to find possible epitopes on the surface of the antigen structure or a homology model of the antigen in the absence of an antigen-antibody complex structure. Results and Discussion: Herein we describe two novel methods for mapping mimotopes to epitopes. The first is a novel algorithm named MimoTree that allows for gaps in the mimotopes and epitopes on the antigen. More specifically, a mimotope may have a gap that does not match to the epitope to allow it to adopt a conformation relevant for binding to an antibody, and residues may similarly be discontinuous in conformational epitopes. MimoTree is a fully automated epitope detection algorithm suitable for the identification of conformational as well as linear epitopes. The second is an ensemble approach, which combines the prediction results from MimoTree and two existing methods.

导言:计算疫苗学的一个基本挑战是,大多数 B 细胞表位都是构象性的,因此很难仅凭序列预测。另一个重大挑战是病毒表面蛋白的大量氨基酸序列实际上可能不具有抗原性。因此,根据表面暴露程度确定最有希望设计疫苗的蛋白质区域可能不会产生临床相关的免疫反应。方法:通过噬菌体展示实验选出的与相关单克隆抗体具有高亲和力的线性肽("拟态")通常与该抗体对应的抗原表位具有相似的理化性质。这些线性肽的序列可用于寻找抗原结构表面的可能表位,或在没有抗原-抗体复合物结构的情况下寻找抗原的同源模型。结果与讨论:在此,我们介绍了两种将拟态映射到表位的新方法。第一种方法是一种名为 MimoTree 的新算法,它允许抗原上的拟态和表位之间存在间隙。更具体地说,拟态位点可能存在与表位不匹配的间隙,使其无法采用与抗体结合相关的构象,而构象表位中的残基也可能存在类似的不连续性。MimoTree 是一种全自动表位检测算法,适用于识别构象表位和线性表位。第二种是集合方法,它结合了 MimoTree 和两种现有方法的预测结果。
{"title":"Computational identification of antibody-binding epitopes from mimotope datasets.","authors":"Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy","doi":"10.3389/fbinf.2024.1295972","DOIUrl":"10.3389/fbinf.2024.1295972","url":null,"abstract":"<p><p><b>Introduction:</b> A fundamental challenge in computational vaccinology is that most B-cell epitopes are conformational and therefore hard to predict from sequence alone. Another significant challenge is that a great deal of the amino acid sequence of a viral surface protein might not in fact be antigenic. Thus, identifying the regions of a protein that are most promising for vaccine design based on the degree of surface exposure may not lead to a clinically relevant immune response. <b>Methods:</b> Linear peptides selected by phage display experiments that have high affinity to the monoclonal antibody of interest (\"mimotopes\") usually have similar physicochemical properties to the antigen epitope corresponding to that antibody. The sequences of these linear peptides can be used to find possible epitopes on the surface of the antigen structure or a homology model of the antigen in the absence of an antigen-antibody complex structure. <b>Results and Discussion:</b> Herein we describe two novel methods for mapping mimotopes to epitopes. The first is a novel algorithm named MimoTree that allows for gaps in the mimotopes and epitopes on the antigen. More specifically, a mimotope may have a gap that does not match to the epitope to allow it to adopt a conformation relevant for binding to an antibody, and residues may similarly be discontinuous in conformational epitopes. MimoTree is a fully automated epitope detection algorithm suitable for the identification of conformational as well as linear epitopes. The second is an ensemble approach, which combines the prediction results from MimoTree and two existing methods.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1295972"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10920257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140095259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limits of experimental evidence in RNA secondary structure prediction. RNA 二级结构预测中实验证据的局限性。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-22 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1346779
Sarah von Löhneysen, Mario Mörl, Peter F Stadler
{"title":"Limits of experimental evidence in RNA secondary structure prediction.","authors":"Sarah von Löhneysen, Mario Mörl, Peter F Stadler","doi":"10.3389/fbinf.2024.1346779","DOIUrl":"10.3389/fbinf.2024.1346779","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1346779"},"PeriodicalIF":2.8,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10918467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Posterior inference of Hi-C contact frequency through sampling. 通过抽样对 Hi-C 接触频率进行后验推断。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-02-22 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1285828
Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette

Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.

Hi-C 是研究三维基因组构象最广泛使用的方法之一。Hi-C 实验捕获的接触用接触频率矩阵表示。由于测序深度和其他因素的限制,Hi-C 接触频率矩阵只是真实相互作用频率的近似值,在进一步报告时没有对不确定性进行量化。因此,基于 Hi-C 接触图的下游分析(如 TAD 和环注释)本身就是点估计。在这里,我们提出了 Hi-C 相互作用频率采样器(HiCSampler),它通过利用相邻基因座之间的依赖关系,可靠地推断出给定 Hi-C 接触图的相互作用频率的后验分布。后验预测检查表明,HiCSampler 可以推断出具有高度预测性的染色体相互作用频率。由 HiCSampler 计算出的汇总统计量可测量 Hi-C 实验的不确定性,而且由 HiCSampler 推断出的样本可供大多数现成的下游分析工具使用,无需修改即可在这些分析中进行不确定性测量。
{"title":"Posterior inference of Hi-C contact frequency through sampling.","authors":"Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette","doi":"10.3389/fbinf.2023.1285828","DOIUrl":"10.3389/fbinf.2023.1285828","url":null,"abstract":"<p><p>Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1285828"},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making bioinformatics training FAIR: the EMBL-EBI training portal. 使生物信息学培训 FAIR:EMBL-EBI 培训门户网站。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-31 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1347168
A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank

EMBL-EBI provides a broad range of training in data-driven life sciences. To improve awareness and access to training course listings and to make digital learning materials findable and simple to use, the EMBL-EBI Training website, www.ebi.ac.uk/training, was redesigned and restructured. To provide a framework for the redesign of the website, the FAIR (findable, accessible, interoperable, reusable) principles were applied to both the listings of live training courses and the presentation of on-demand training content. Each of the FAIR principles guided decisions on the choice of technology used to develop the website, including the details provided about training and the way in which training was presented. Since its release the openly accessible website has been accessed by an average of 58,492 users a month. There have also been over 12,000 unique users creating accounts since the functionality was added in March 2022, allowing these users to track their learning and record completion of training. Development of the website was completed using the Agile Scrum project management methodology and a focus on user experience. This framework continues to be used now that the website is live for the maintenance and improvement of the website, as feedback continues to be collected and further ways to make training FAIR are identified. Here, we describe the process of making EMBL-EBI's training FAIR through the development of a new website and our experience of implementing Agile Scrum.

EMBL-EBI 在数据驱动的生命科学领域提供广泛的培训。为了提高对培训课程列表的认识和访问,并使数字学习材料易于查找和使用,EMBL-EBI 培训网站 www.ebi.ac.uk/training 进行了重新设计和结构调整。为了给网站的重新设计提供一个框架,FAIR(可查找、可访问、可互操作、可重用)原则被应用于实时培训课程列表和点播培训内容的展示。FAIR 原则中的每一项原则都指导着网站开发技术选择的决策,包括提供培训的详细信息和展示培训的方式。自公开访问网站发布以来,平均每月有 58 492 名用户访问该网站。自 2022 年 3 月添加该功能以来,已有 12,000 多名独特用户创建了账户,使这些用户能够跟踪自己的学习情况并记录培训完成情况。网站的开发采用了 Agile Scrum 项目管理方法,注重用户体验。网站上线后,我们将继续使用这一框架对网站进行维护和改进,同时继续收集反馈意见,进一步确定使培训 FAIR 化的方法。在此,我们将介绍通过开发新网站使 EMBL-EBI 的培训 FAIR 化的过程,以及我们实施敏捷 Scrum 的经验。
{"title":"Making bioinformatics training FAIR: the EMBL-EBI training portal.","authors":"A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank","doi":"10.3389/fbinf.2024.1347168","DOIUrl":"10.3389/fbinf.2024.1347168","url":null,"abstract":"<p><p>EMBL-EBI provides a broad range of training in data-driven life sciences. To improve awareness and access to training course listings and to make digital learning materials findable and simple to use, the EMBL-EBI Training website, www.ebi.ac.uk/training, was redesigned and restructured. To provide a framework for the redesign of the website, the FAIR (findable, accessible, interoperable, reusable) principles were applied to both the listings of live training courses and the presentation of on-demand training content. Each of the FAIR principles guided decisions on the choice of technology used to develop the website, including the details provided about training and the way in which training was presented. Since its release the openly accessible website has been accessed by an average of 58,492 users a month. There have also been over 12,000 unique users creating accounts since the functionality was added in March 2022, allowing these users to track their learning and record completion of training. Development of the website was completed using the Agile Scrum project management methodology and a focus on user experience. This framework continues to be used now that the website is live for the maintenance and improvement of the website, as feedback continues to be collected and further ways to make training FAIR are identified. Here, we describe the process of making EMBL-EBI's training FAIR through the development of a new website and our experience of implementing Agile Scrum.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347168"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10866141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells. 在内皮 HMEC-1 细胞感染登革热病毒过程中,通过系统计算寻找 ncRNAs 衍生的小 RNAs。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-31 eCollection Date: 2024-01-01 DOI: 10.3389/fbinf.2024.1293412
Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana

In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection.

近年来,源自非编码 RNA 的小 RNA 片段(sfd-RNAs)因其在功能和结构上与 miRNAs 相似而备受关注,这为我们理解小 RNA 介导的基因调控增加了另一层复杂性。尽管如此,科学家们仍需要更多的工具来检测 sfd-RNAs 的差异表达,因为目前检测 miRNAs 的方法可能无法直接应用于它们。主要原因是缺乏准确的小 RNA 和 ncRNA 注释、多映射读数(MMR)位置以及人类基因组中 ncRNA 的多拷贝特性。为了解决这些问题,我们采用了一种方法,通过使用综合拷贝数校正 ncRNA 注释,检测差异表达的 sfd-RNA,包括典型 miRNA。这种方法与 16 种不同的计算策略相结合,由四种排列器和四种归一化方法组合而成,为每种差异表达的 sfd-RNA 提供了一个预测等级顺序。通过系统地解决这三个主要问题,我们可以检测出受登革热病毒感染的人真皮微血管内皮细胞中差异表达的 miRNA 和 sfd-RNA。尽管还需要更多的生物学评估,但 hsa-mir-103a 和 hsa-mir-494 的两个分子靶标(CDK5 和 PI3/AKT)似乎与登革热病毒(DENV)感染有关。在这里,我们进行了全面的注释和差异表达分析,这些分析可用于其他研究,探讨从 ncRNAs 派生的小片段 RNA 群体在病毒感染中的作用。
{"title":"Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells.","authors":"Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana","doi":"10.3389/fbinf.2024.1293412","DOIUrl":"10.3389/fbinf.2024.1293412","url":null,"abstract":"<p><p>In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1293412"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10864640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1