Pub Date : 2024-03-12eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1351620
Gennaro Calendo, Dara Kusic, Jozef Madzo, Neda Gharani, Laura Scheinfeldt
Long-read sequencing technologies offer new opportunities to generate high-confidence phased whole-genome sequencing data for robust pharmacogenetic annotation. Here, we describe a new user-friendly R package, ursaPGx, designed to accept multi-sample phased whole-genome sequencing data VCF input files and output star allele annotations for pharmacogenes annotated in PharmVar.
长线程测序技术为生成高置信度的分阶段全基因组测序数据以进行可靠的药物基因注释提供了新的机遇。在此,我们介绍了一个新的用户友好型 R 软件包 ursaPGx,它可接受多样本分阶段全基因组测序数据 VCF 输入文件,并为 PharmVar 中注释的药物基因输出星等位基因注释。
{"title":"ursaPGx: a new R package to annotate pharmacogenetic star alleles using phased whole-genome sequencing data.","authors":"Gennaro Calendo, Dara Kusic, Jozef Madzo, Neda Gharani, Laura Scheinfeldt","doi":"10.3389/fbinf.2024.1351620","DOIUrl":"10.3389/fbinf.2024.1351620","url":null,"abstract":"<p><p>Long-read sequencing technologies offer new opportunities to generate high-confidence phased whole-genome sequencing data for robust pharmacogenetic annotation. Here, we describe a new user-friendly R package, ursaPGx, designed to accept multi-sample phased whole-genome sequencing data VCF input files and output star allele annotations for pharmacogenes annotated in PharmVar.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1351620"},"PeriodicalIF":2.8,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10963438/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140295559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-08eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1295600
Neta Zadok, Gil Ast, Roded Sharan
Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.
{"title":"A network-based method for associating genes with autism spectrum disorder.","authors":"Neta Zadok, Gil Ast, Roded Sharan","doi":"10.3389/fbinf.2024.1295600","DOIUrl":"10.3389/fbinf.2024.1295600","url":null,"abstract":"<p><p>Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1295600"},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10960359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1340339
Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari
Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.
{"title":"Detecting subtle transcriptomic perturbations induced by lncRNAs knock-down in single-cell CRISPRi screening using a new sparse supervised autoencoder neural network.","authors":"Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari","doi":"10.3389/fbinf.2024.1340339","DOIUrl":"10.3389/fbinf.2024.1340339","url":null,"abstract":"<p><p>Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1340339"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1347276
Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz
Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.
大多数调控元件,尤其是增强子序列,都具有细胞群体特异性。甚至可以说,一组独特的调控元件就是细胞群体的定义。然而,要发现 DNA 中哪些非编码区域在何种情况下必不可少,从而发现哪些基因会表达,是一项艰巨的任务。一些计算模型通过直接从基因组序列预测基因表达来解决这一问题。这些模型目前仅限于预测批量测量,主要是针对特定组织进行预测。在这里,我们提出了一种利用单细胞 RNA 序列数据预测基因表达的模型。我们的研究表明,细胞群特异性模型优于组织特异性模型,尤其是当细胞群和相应组织的表达谱不同时。此外,我们的研究还表明,我们的模型可以确定 GWAS 变异的优先次序,并学习转录因子结合位点的图案。我们设想,我们的模型可用于划分细胞群特异性调控元件。
{"title":"Predicting cell population-specific gene expression from genomic sequence.","authors":"Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz","doi":"10.3389/fbinf.2024.1347276","DOIUrl":"10.3389/fbinf.2024.1347276","url":null,"abstract":"<p><p>Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347276"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1306244
Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini
Introduction: DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. Methods: Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. Results: Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. Conclusion: These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.
{"title":"Where are we in the implementation of tissue-specific epigenetic clocks?","authors":"Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini","doi":"10.3389/fbinf.2024.1306244","DOIUrl":"10.3389/fbinf.2024.1306244","url":null,"abstract":"<p><p><b>Introduction:</b> DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. <b>Methods:</b> Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. <b>Results:</b> Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. <b>Conclusion:</b> These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1306244"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-23eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1295972
Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy
Introduction: A fundamental challenge in computational vaccinology is that most B-cell epitopes are conformational and therefore hard to predict from sequence alone. Another significant challenge is that a great deal of the amino acid sequence of a viral surface protein might not in fact be antigenic. Thus, identifying the regions of a protein that are most promising for vaccine design based on the degree of surface exposure may not lead to a clinically relevant immune response. Methods: Linear peptides selected by phage display experiments that have high affinity to the monoclonal antibody of interest ("mimotopes") usually have similar physicochemical properties to the antigen epitope corresponding to that antibody. The sequences of these linear peptides can be used to find possible epitopes on the surface of the antigen structure or a homology model of the antigen in the absence of an antigen-antibody complex structure. Results and Discussion: Herein we describe two novel methods for mapping mimotopes to epitopes. The first is a novel algorithm named MimoTree that allows for gaps in the mimotopes and epitopes on the antigen. More specifically, a mimotope may have a gap that does not match to the epitope to allow it to adopt a conformation relevant for binding to an antibody, and residues may similarly be discontinuous in conformational epitopes. MimoTree is a fully automated epitope detection algorithm suitable for the identification of conformational as well as linear epitopes. The second is an ensemble approach, which combines the prediction results from MimoTree and two existing methods.
导言:计算疫苗学的一个基本挑战是,大多数 B 细胞表位都是构象性的,因此很难仅凭序列预测。另一个重大挑战是病毒表面蛋白的大量氨基酸序列实际上可能不具有抗原性。因此,根据表面暴露程度确定最有希望设计疫苗的蛋白质区域可能不会产生临床相关的免疫反应。方法:通过噬菌体展示实验选出的与相关单克隆抗体具有高亲和力的线性肽("拟态")通常与该抗体对应的抗原表位具有相似的理化性质。这些线性肽的序列可用于寻找抗原结构表面的可能表位,或在没有抗原-抗体复合物结构的情况下寻找抗原的同源模型。结果与讨论:在此,我们介绍了两种将拟态映射到表位的新方法。第一种方法是一种名为 MimoTree 的新算法,它允许抗原上的拟态和表位之间存在间隙。更具体地说,拟态位点可能存在与表位不匹配的间隙,使其无法采用与抗体结合相关的构象,而构象表位中的残基也可能存在类似的不连续性。MimoTree 是一种全自动表位检测算法,适用于识别构象表位和线性表位。第二种是集合方法,它结合了 MimoTree 和两种现有方法的预测结果。
{"title":"Computational identification of antibody-binding epitopes from mimotope datasets.","authors":"Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy","doi":"10.3389/fbinf.2024.1295972","DOIUrl":"10.3389/fbinf.2024.1295972","url":null,"abstract":"<p><p><b>Introduction:</b> A fundamental challenge in computational vaccinology is that most B-cell epitopes are conformational and therefore hard to predict from sequence alone. Another significant challenge is that a great deal of the amino acid sequence of a viral surface protein might not in fact be antigenic. Thus, identifying the regions of a protein that are most promising for vaccine design based on the degree of surface exposure may not lead to a clinically relevant immune response. <b>Methods:</b> Linear peptides selected by phage display experiments that have high affinity to the monoclonal antibody of interest (\"mimotopes\") usually have similar physicochemical properties to the antigen epitope corresponding to that antibody. The sequences of these linear peptides can be used to find possible epitopes on the surface of the antigen structure or a homology model of the antigen in the absence of an antigen-antibody complex structure. <b>Results and Discussion:</b> Herein we describe two novel methods for mapping mimotopes to epitopes. The first is a novel algorithm named MimoTree that allows for gaps in the mimotopes and epitopes on the antigen. More specifically, a mimotope may have a gap that does not match to the epitope to allow it to adopt a conformation relevant for binding to an antibody, and residues may similarly be discontinuous in conformational epitopes. MimoTree is a fully automated epitope detection algorithm suitable for the identification of conformational as well as linear epitopes. The second is an ensemble approach, which combines the prediction results from MimoTree and two existing methods.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1295972"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10920257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140095259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-22eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1346779
Sarah von Löhneysen, Mario Mörl, Peter F Stadler
{"title":"Limits of experimental evidence in RNA secondary structure prediction.","authors":"Sarah von Löhneysen, Mario Mörl, Peter F Stadler","doi":"10.3389/fbinf.2024.1346779","DOIUrl":"10.3389/fbinf.2024.1346779","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1346779"},"PeriodicalIF":2.8,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10918467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-22eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1285828
Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
{"title":"Posterior inference of Hi-C contact frequency through sampling.","authors":"Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette","doi":"10.3389/fbinf.2023.1285828","DOIUrl":"10.3389/fbinf.2023.1285828","url":null,"abstract":"<p><p>Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1285828"},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1347168
A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank
EMBL-EBI provides a broad range of training in data-driven life sciences. To improve awareness and access to training course listings and to make digital learning materials findable and simple to use, the EMBL-EBI Training website, www.ebi.ac.uk/training, was redesigned and restructured. To provide a framework for the redesign of the website, the FAIR (findable, accessible, interoperable, reusable) principles were applied to both the listings of live training courses and the presentation of on-demand training content. Each of the FAIR principles guided decisions on the choice of technology used to develop the website, including the details provided about training and the way in which training was presented. Since its release the openly accessible website has been accessed by an average of 58,492 users a month. There have also been over 12,000 unique users creating accounts since the functionality was added in March 2022, allowing these users to track their learning and record completion of training. Development of the website was completed using the Agile Scrum project management methodology and a focus on user experience. This framework continues to be used now that the website is live for the maintenance and improvement of the website, as feedback continues to be collected and further ways to make training FAIR are identified. Here, we describe the process of making EMBL-EBI's training FAIR through the development of a new website and our experience of implementing Agile Scrum.
{"title":"Making bioinformatics training FAIR: the EMBL-EBI training portal.","authors":"A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank","doi":"10.3389/fbinf.2024.1347168","DOIUrl":"10.3389/fbinf.2024.1347168","url":null,"abstract":"<p><p>EMBL-EBI provides a broad range of training in data-driven life sciences. To improve awareness and access to training course listings and to make digital learning materials findable and simple to use, the EMBL-EBI Training website, www.ebi.ac.uk/training, was redesigned and restructured. To provide a framework for the redesign of the website, the FAIR (findable, accessible, interoperable, reusable) principles were applied to both the listings of live training courses and the presentation of on-demand training content. Each of the FAIR principles guided decisions on the choice of technology used to develop the website, including the details provided about training and the way in which training was presented. Since its release the openly accessible website has been accessed by an average of 58,492 users a month. There have also been over 12,000 unique users creating accounts since the functionality was added in March 2022, allowing these users to track their learning and record completion of training. Development of the website was completed using the Agile Scrum project management methodology and a focus on user experience. This framework continues to be used now that the website is live for the maintenance and improvement of the website, as feedback continues to be collected and further ways to make training FAIR are identified. Here, we describe the process of making EMBL-EBI's training FAIR through the development of a new website and our experience of implementing Agile Scrum.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347168"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10866141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31eCollection Date: 2024-01-01DOI: 10.3389/fbinf.2024.1293412
Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana
In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection.
{"title":"Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells.","authors":"Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana","doi":"10.3389/fbinf.2024.1293412","DOIUrl":"10.3389/fbinf.2024.1293412","url":null,"abstract":"<p><p>In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1293412"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10864640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}