首页 > 最新文献

NAR Genomics and Bioinformatics最新文献

英文 中文
Improving prediction of bacterial sRNA regulatory targets with expression data. 利用表达数据改进细菌sRNA调控靶点的预测。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-05-08 eCollection Date: 2025-06-01 DOI: 10.1093/nargab/lqaf055
Yildiz Derinkok, Haiqi Wang, Brian Tjaden

Small regulatory RNAs (sRNAs) are widespread in bacteria. However, characterizing the targets of sRNA regulation in a way that scales with the increasing number of identified sRNAs has proven challenging. Computational methods offer one means for efficient characterization of sRNA targets, but the sensitivity and precision of such computational methods is limited. Here, we investigate whether publicly available expression data from RNA-seq experiments can improve the accuracy of computational prediction of sRNA regulatory targets. Using compendia of 2143 Escherichia coli RNA-seq samples and 177 Salmonella RNA-seq samples, we identify groups of co-expressed genes in each organism and incorporate this expression information into computational prediction of sRNA targets based on machine learning methods. We find that integrating expression information significantly improves the accuracy of computational results. Further, we observe that computational methods perform better when trained on smaller, higher quality sets of targets rather than on larger, noisier sets of targets identified by high-throughput methods.

小调控rna (sRNAs)在细菌中广泛存在。然而,随着鉴定的sRNA数量的增加,以一种可扩展的方式表征sRNA调控的目标已被证明具有挑战性。计算方法为有效表征sRNA靶点提供了一种手段,但这种计算方法的灵敏度和精度是有限的。在这里,我们研究了来自RNA-seq实验的公开表达数据是否可以提高计算预测sRNA调控靶点的准确性。利用2143份大肠杆菌RNA-seq样本和177份沙门氏菌RNA-seq样本的概要,我们确定了每个生物体中共表达的基因群,并将这些表达信息纳入基于机器学习方法的sRNA靶标的计算预测中。我们发现整合表达式信息可以显著提高计算结果的准确性。此外,我们观察到计算方法在更小、更高质量的目标集上训练时比在高通量方法识别的更大、更嘈杂的目标集上训练时表现更好。
{"title":"Improving prediction of bacterial sRNA regulatory targets with expression data.","authors":"Yildiz Derinkok, Haiqi Wang, Brian Tjaden","doi":"10.1093/nargab/lqaf055","DOIUrl":"10.1093/nargab/lqaf055","url":null,"abstract":"<p><p>Small regulatory RNAs (sRNAs) are widespread in bacteria. However, characterizing the targets of sRNA regulation in a way that scales with the increasing number of identified sRNAs has proven challenging. Computational methods offer one means for efficient characterization of sRNA targets, but the sensitivity and precision of such computational methods is limited. Here, we investigate whether publicly available expression data from RNA-seq experiments can improve the accuracy of computational prediction of sRNA regulatory targets. Using compendia of 2143 <i>Escherichia coli</i> RNA-seq samples and 177 <i>Salmonella</i> RNA-seq samples, we identify groups of co-expressed genes in each organism and incorporate this expression information into computational prediction of sRNA targets based on machine learning methods. We find that integrating expression information significantly improves the accuracy of computational results. Further, we observe that computational methods perform better when trained on smaller, higher quality sets of targets rather than on larger, noisier sets of targets identified by high-throughput methods.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf055"},"PeriodicalIF":4.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GINClus: RNA structural motif clustering using graph isomorphism network. GINClus:基于图同构网络的RNA结构基序聚类。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-04-26 eCollection Date: 2025-06-01 DOI: 10.1093/nargab/lqaf050
Nabila Shahnaz Khan, Md Mahfuzur Rahaman, Shaojie Zhang

Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with K-means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.

核糖核酸(RNA)结构基序的鉴定是了解RNA结构和功能的关键步骤。由于RNA三维结构的复杂性和变化性,鉴定RNA结构基序是具有挑战性和耗时的。特别是,发现新的RNA结构基序家族是一个难题,仍然很大程度上依赖于人工分析。在本文中,我们提出了一个RNA结构基序聚类工具GINClus,它使用半监督深度学习模型基于碱基相互作用和三维结构相似性对RNA基序候选(RNA环区)进行聚类。GINClus将候选RNA基序的碱基相互作用和三维结构转化为图表示,并利用图同构网络(GIN)模型结合K-means和分层聚类,基于结构相似性对候选RNA基序进行聚类。GINClus对已知的内部环模的聚类准确率为87.88%,对已知的发夹环模的聚类准确率为97.69%。使用GINClus,我们成功地将相同家族的基序聚类在一起,并能够找到927个新的Sarcin-ricin, Kink-turn, tantem -shear, Hook-turn, E-loop, C-loop, T-loop和GNRA loop基序家族。我们还发现了12个具有独特结构和碱基对相互作用的新的RNA结构基序家族。
{"title":"GINClus: RNA structural motif clustering using graph isomorphism network.","authors":"Nabila Shahnaz Khan, Md Mahfuzur Rahaman, Shaojie Zhang","doi":"10.1093/nargab/lqaf050","DOIUrl":"10.1093/nargab/lqaf050","url":null,"abstract":"<p><p>Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with <i>K</i>-means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf050"},"PeriodicalIF":4.0,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ocelli: an open-source tool for the analysis and visualization of developmental multimodal single-cell data. Ocelli:一个开源工具,用于分析和可视化发展中的多模态单细胞数据。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-04-10 eCollection Date: 2025-06-01 DOI: 10.1093/nargab/lqaf040
Piotr Rutkowski, Marcin Tabaka

The recent expansion of single-cell technologies has enabled simultaneous genome-wide measurements of multiple modalities in the same single cell. The potential to jointly profile such modalities as gene expression, chromatin accessibility, protein epitopes, or multiple histone modifications at single-cell resolution represents a compelling opportunity to study developmental processes at multiple layers of gene regulation. Here, we present Ocelli, a lightweight Python package implemented in Ray for scalable visualization and analysis of developmental multimodal single-cell data. The core functionality of Ocelli focuses on diffusion-based modeling of biological processes involving cell state transitions. Ocelli addresses common tasks in single-cell data analysis, such as visualization of cells on a low-dimensional embedding that preserves the continuity of the developmental progression of cells, identification of rare and transient cell states, integration with trajectory inference algorithms, and imputation of undetected feature counts. Extensive benchmarking shows that Ocelli outperforms existing methods regarding computational time and quality of the reconstructed low-dimensional representation of developmental data.

最近单细胞技术的扩展使得在同一个单细胞中同时进行多种模式的全基因组测量成为可能。在单细胞分辨率下联合分析基因表达、染色质可及性、蛋白质表位或多组蛋白修饰等模式的潜力,为研究基因调控的多层发育过程提供了一个引人注目的机会。在这里,我们介绍Ocelli,一个轻量级的Python包,在Ray中实现,用于可扩展的可视化和分析发展中的多模态单细胞数据。Ocelli的核心功能侧重于涉及细胞状态转换的生物过程的基于扩散的建模。Ocelli解决了单细胞数据分析中的常见任务,例如在低维嵌入上的细胞可视化,以保持细胞发育过程的连续性,识别罕见和瞬态细胞状态,与轨迹推断算法集成,以及未检测到的特征计数的输入。广泛的基准测试表明,Ocelli在计算时间和重建发展数据低维表示的质量方面优于现有方法。
{"title":"Ocelli: an open-source tool for the analysis and visualization of developmental multimodal single-cell data.","authors":"Piotr Rutkowski, Marcin Tabaka","doi":"10.1093/nargab/lqaf040","DOIUrl":"10.1093/nargab/lqaf040","url":null,"abstract":"<p><p>The recent expansion of single-cell technologies has enabled simultaneous genome-wide measurements of multiple modalities in the same single cell. The potential to jointly profile such modalities as gene expression, chromatin accessibility, protein epitopes, or multiple histone modifications at single-cell resolution represents a compelling opportunity to study developmental processes at multiple layers of gene regulation. Here, we present Ocelli, a lightweight Python package implemented in Ray for scalable visualization and analysis of developmental multimodal single-cell data. The core functionality of Ocelli focuses on diffusion-based modeling of biological processes involving cell state transitions. Ocelli addresses common tasks in single-cell data analysis, such as visualization of cells on a low-dimensional embedding that preserves the continuity of the developmental progression of cells, identification of rare and transient cell states, integration with trajectory inference algorithms, and imputation of undetected feature counts. Extensive benchmarking shows that Ocelli outperforms existing methods regarding computational time and quality of the reconstructed low-dimensional representation of developmental data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf040"},"PeriodicalIF":4.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12086682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144102446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-analysis of genomic characteristics for antiviral influenza defective interfering particle prioritization. 用于确定抗病毒流感缺陷干扰颗粒优先级的基因组特征元分析。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-04-04 eCollection Date: 2025-06-01 DOI: 10.1093/nargab/lqaf031
Jens J G Lohmann, Mia Le, Fadi G Alnaji, Olga Zolotareva, Jan Baumbach, Tanja Laske

Defective interfering particles (DIPs) are viral deletion mutants that hamper virus replication and are, thus, potent novel antiviral agents. To evaluate possible antiviral treatments, we first need to get a deeper understanding of DIP characteristics. Thus, we performed a meta-analysis of 20 already published sequencing datasets of influenza A and B viruses (IAV and IBV) from in vivo and in vitro experiments. We analyzed each dataset for characteristics, such as deletion-containing viral genome (DelVG) length distributions, direct repeats, and nucleotide enrichment at the deletion site. Our analysis suggests differences in the length of the 3'- and 5'-end retained in IAV and IBV viral sequences upon deletion. Moreover, in vitro DelVGs tend to be shorter than those in vivo, which is a novel finding with potential implications for future DIP treatment design. Additionally, our analysis demonstrates the presence of DelVGs with longer than expected sequences, possibly related to an alternative mechanism of DelVG formation. Finally, a joint ranking of DelVGs originating from 7 A/Puerto Rico/8/1934 datasets revealed 11 highly abundant, yet unnoticed, candidates. Together, our study highlights the importance of meta-analyses to uncover yet unknown DelVG characteristics and to pre-select candidates for antiviral treatment design.

缺陷干扰颗粒(DIPs)是一种病毒缺失突变体,能阻碍病毒复制,因此是一种有效的新型抗病毒药物。为了评估可能的抗病毒疗法,我们首先需要深入了解 DIP 的特征。因此,我们对已发表的 20 个甲型和乙型流感病毒(IAV 和 IBV)体内和体外实验测序数据集进行了荟萃分析。我们分析了每个数据集的特征,如含缺失病毒基因组(DelVG)长度分布、直接重复和缺失位点的核苷酸富集。我们的分析表明,IAV 和 IBV 病毒序列在缺失时保留的 3'- 端和 5'- 端长度存在差异。此外,体外的 DelVG 往往比体内的短,这是一个新发现,对未来的 DIP 治疗设计具有潜在影响。此外,我们的分析表明,存在比预期序列更长的 DelVG,这可能与 DelVG 的另一种形成机制有关。最后,对来自 7 A/Puerto Rico/8/1934 数据集的 DelVGs 进行联合排序,发现了 11 个高度丰富但未被注意的候选者。总之,我们的研究强调了荟萃分析在发现未知的 DelVG 特性和预选候选抗病毒治疗设计方面的重要性。
{"title":"Meta-analysis of genomic characteristics for antiviral influenza defective interfering particle prioritization.","authors":"Jens J G Lohmann, Mia Le, Fadi G Alnaji, Olga Zolotareva, Jan Baumbach, Tanja Laske","doi":"10.1093/nargab/lqaf031","DOIUrl":"10.1093/nargab/lqaf031","url":null,"abstract":"<p><p>Defective interfering particles (DIPs) are viral deletion mutants that hamper virus replication and are, thus, potent novel antiviral agents. To evaluate possible antiviral treatments, we first need to get a deeper understanding of DIP characteristics. Thus, we performed a meta-analysis of 20 already published sequencing datasets of influenza A and B viruses (IAV and IBV) from <i>in vivo</i> and <i>in vitro</i> experiments. We analyzed each dataset for characteristics, such as deletion-containing viral genome (DelVG) length distributions, direct repeats, and nucleotide enrichment at the deletion site. Our analysis suggests differences in the length of the 3'- and 5'-end retained in IAV and IBV viral sequences upon deletion. Moreover, <i>in vitro</i> DelVGs tend to be shorter than those <i>in vivo</i>, which is a novel finding with potential implications for future DIP treatment design. Additionally, our analysis demonstrates the presence of DelVGs with longer than expected sequences, possibly related to an alternative mechanism of DelVG formation. Finally, a joint ranking of DelVGs originating from 7 A/Puerto Rico/8/1934 datasets revealed 11 highly abundant, yet unnoticed, candidates. Together, our study highlights the importance of meta-analyses to uncover yet unknown DelVG characteristics and to pre-select candidates for antiviral treatment design.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf031"},"PeriodicalIF":4.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11970370/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digenic variant interpretation with hypothesis-driven explainable AI. 用假设驱动的可解释人工智能解释遗传变异。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-29 eCollection Date: 2025-06-01 DOI: 10.1093/nargab/lqaf029
Federica De Paoli, Giovanna Nicora, Silvia Berardelli, Andrea Gazzo, Riccardo Bellazzi, Paolo Magni, Ettore Rizzo, Ivan Limongelli, Susanna Zucca

The digenic inheritance hypothesis holds the potential to enhance diagnostic yield in rare diseases. Computational approaches capable of accurately interpreting and prioritizing digenic combinations of variants based on the proband's phenotypes and family information can provide valuable assistance during the diagnostic process. We developed diVas, a hypothesis-driven machine learning approach that interprets genomic variants across different gene pairs. DiVas demonstrates strong performance in both classifying and prioritizing causative digenic combinations of rare variants within the top positions across 11 cases with the complete list of variants available (73% sensitivity and a median ranking of 3). Furthermore, it achieves a sensitivity of 0.81 when applied to 645 published causative digenic combinations. Additionally, diVas leverages explainable artificial intelligence to elucidate the digenic disease mechanism for predicted positive pairs.

基因遗传假说有可能提高罕见病的诊断率。基于先证者的表型和家族信息,能够准确解释和优先排序变异基因组合的计算方法可以在诊断过程中提供有价值的帮助。我们开发了diVas,这是一种假设驱动的机器学习方法,可以解释不同基因对之间的基因组变异。DiVas在11个具有完整变异列表的病例中,对顶级罕见变异的致病基因组合进行分类和优先排序方面表现出色(灵敏度为73%,中位排名为3)。此外,当应用于645个已发表的致病基因组合时,DiVas的灵敏度为0.81。此外,diVas利用可解释的人工智能来阐明预测阳性对的遗传疾病机制。
{"title":"Digenic variant interpretation with hypothesis-driven explainable AI.","authors":"Federica De Paoli, Giovanna Nicora, Silvia Berardelli, Andrea Gazzo, Riccardo Bellazzi, Paolo Magni, Ettore Rizzo, Ivan Limongelli, Susanna Zucca","doi":"10.1093/nargab/lqaf029","DOIUrl":"10.1093/nargab/lqaf029","url":null,"abstract":"<p><p>The digenic inheritance hypothesis holds the potential to enhance diagnostic yield in rare diseases. Computational approaches capable of accurately interpreting and prioritizing digenic combinations of variants based on the proband's phenotypes and family information can provide valuable assistance during the diagnostic process. We developed diVas, a hypothesis-driven machine learning approach that interprets genomic variants across different gene pairs. DiVas demonstrates strong performance in both classifying and prioritizing causative digenic combinations of rare variants within the top positions across 11 cases with the complete list of variants available (73% sensitivity and a median ranking of 3). Furthermore, it achieves a sensitivity of 0.81 when applied to 645 published causative digenic combinations. Additionally, diVas leverages explainable artificial intelligence to elucidate the digenic disease mechanism for predicted positive pairs.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf029"},"PeriodicalIF":4.0,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11954523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143754740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPANEMAP Suite: a pipeline for probing-informed RNA structure modeling. IPANEMAP套件:用于探测信息RNA结构建模的管道。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-25 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqaf028
Pierre Hardouin, Nan Pan, Francois-Xavier Lyonnet du Moutier, Nathalie Chamond, Yann Ponty, Sebastian Will, Bruno Sargueil

In addition to their sequence, multiple functions of RNAs are encoded within their structure, which is often difficult to solve using physico-chemical methods. Incorporating low-resolution experimental data such as chemical probing into computational prediction significantly enhances RNA structure modeling accuracy. While medium- and high-throughput RNA structure probing techniques are widely accessible, the subsequent analysis process can be cumbersome, involving multiple software and manual data manipulation. In addition, the relevant interpretation of the data requires proper parameterization of the software and a strict consistency in the analysis pipeline. To streamline such workflows, we introduce IPANEMAP Suite, a comprehensive platform that guides users from chemically probing raw data to visually informative secondary structure models. IPANEMAP Suite seamlessly integrates various experimental datasets and facilitates comparative analysis of RNA structures under different conditions (footprinting), aiding in the study of protein or small molecule interactions with RNA. Here, we show that the unique ability of IPANEMAP Suite to perform integrative modeling using several chemical probing datasets with phylogenetic data can be instrumental in obtaining accurate secondary structure models. The platform's project-based approach ensures full traceability and generates publication-quality outputs, simplifying the entire RNA structure analysis process. IPANEMAP Suite is freely available at https://github.com/Sargueil-CiTCoM/ipasuite under a GPL-3.0 license.

除了它们的序列,rna的多种功能在它们的结构中被编码,这通常很难用物理化学方法来解决。将低分辨率实验数据(如化学探测)纳入计算预测,可以显著提高RNA结构建模的准确性。虽然中通量和高通量RNA结构探测技术广泛可用,但随后的分析过程可能很麻烦,涉及多个软件和手动数据操作。此外,数据的相关解释需要适当的软件参数化和分析管道的严格一致性。为了简化这样的工作流程,我们引入了IPANEMAP套件,这是一个全面的平台,指导用户从化学探测原始数据到视觉信息丰富的二级结构模型。IPANEMAP Suite无缝集成了各种实验数据集,促进了不同条件下RNA结构的比较分析(足迹),有助于研究蛋白质或小分子与RNA的相互作用。在这里,我们展示了IPANEMAP Suite的独特能力,它可以使用几个化学探测数据集和系统发育数据进行综合建模,这有助于获得准确的二级结构模型。该平台基于项目的方法确保了完全的可追溯性,并产生了出版质量的输出,简化了整个RNA结构分析过程。IPANEMAP套件在GPL-3.0许可下可在https://github.com/Sargueil-CiTCoM/ipasuite免费获得。
{"title":"IPANEMAP Suite: a pipeline for probing-informed RNA structure modeling.","authors":"Pierre Hardouin, Nan Pan, Francois-Xavier Lyonnet du Moutier, Nathalie Chamond, Yann Ponty, Sebastian Will, Bruno Sargueil","doi":"10.1093/nargab/lqaf028","DOIUrl":"10.1093/nargab/lqaf028","url":null,"abstract":"<p><p>In addition to their sequence, multiple functions of RNAs are encoded within their structure, which is often difficult to solve using physico-chemical methods. Incorporating low-resolution experimental data such as chemical probing into computational prediction significantly enhances RNA structure modeling accuracy. While medium- and high-throughput RNA structure probing techniques are widely accessible, the subsequent analysis process can be cumbersome, involving multiple software and manual data manipulation. In addition, the relevant interpretation of the data requires proper parameterization of the software and a strict consistency in the analysis pipeline. To streamline such workflows, we introduce IPANEMAP Suite, a comprehensive platform that guides users from chemically probing raw data to visually informative secondary structure models. IPANEMAP Suite seamlessly integrates various experimental datasets and facilitates comparative analysis of RNA structures under different conditions (footprinting), aiding in the study of protein or small molecule interactions with RNA. Here, we show that the unique ability of IPANEMAP Suite to perform integrative modeling using several chemical probing datasets with phylogenetic data can be instrumental in obtaining accurate secondary structure models. The platform's project-based approach ensures full traceability and generates publication-quality outputs, simplifying the entire RNA structure analysis process. IPANEMAP Suite is freely available at https://github.com/Sargueil-CiTCoM/ipasuite under a GPL-3.0 license.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf028"},"PeriodicalIF":4.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934922/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The BioGenome Portal: a web-based platform for biodiversity genomics data management. 生物基因组门户:一个基于网络的生物多样性基因组数据管理平台。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-22 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqaf020
Emilio Righi, Roderic Guigó

Biodiversity genomics projects are underway with the aim of sequencing the genomes of all eukaryotic species on Earth. Here we describe the BioGenome Portal, a web-based application to facilitate organization and access to the data produced by biodiversity genomics projects. The portal integrates user-generated data with data deposited in public repositories. The portal generates sequence status reports that can be eventually ingested by designated metadata tracking systems, facilitating the coordination task of these systems. The portal is open-source and fully customizable. It can be deployed at any site with minimum effort, contributing to the democratization of biodiversity genomics projects. We illustrate the features of the BioGenome Portal through a number of specific instances. One such instance is being used as the reference portal for the Catalan Initiative for the Earth Biogenome Project, a regional project aiming to sequencing the genomes of the species of the Catalan linguistic area.

生物多样性基因组学项目正在进行中,目的是对地球上所有真核生物物种的基因组进行测序。在这里,我们描述了生物基因组门户网站,一个基于网络的应用程序,以促进组织和访问由生物多样性基因组学项目产生的数据。门户将用户生成的数据与存储在公共存储库中的数据集成在一起。门户生成序列状态报告,这些报告最终可由指定的元数据跟踪系统接收,从而促进这些系统的协调任务。门户是开源的,完全可定制的。它可以以最小的努力部署在任何地点,有助于生物多样性基因组学项目的民主化。我们通过一些具体的实例来说明BioGenome Portal的特点。其中一个例子被用作加泰罗尼亚地球生物基因组计划倡议的参考门户,这是一个旨在对加泰罗尼亚语地区物种的基因组进行测序的区域项目。
{"title":"The BioGenome Portal: a web-based platform for biodiversity genomics data management.","authors":"Emilio Righi, Roderic Guigó","doi":"10.1093/nargab/lqaf020","DOIUrl":"10.1093/nargab/lqaf020","url":null,"abstract":"<p><p>Biodiversity genomics projects are underway with the aim of sequencing the genomes of all eukaryotic species on Earth. Here we describe the BioGenome Portal, a web-based application to facilitate organization and access to the data produced by biodiversity genomics projects. The portal integrates user-generated data with data deposited in public repositories. The portal generates sequence status reports that can be eventually ingested by designated metadata tracking systems, facilitating the coordination task of these systems. The portal is open-source and fully customizable. It can be deployed at any site with minimum effort, contributing to the democratization of biodiversity genomics projects. We illustrate the features of the BioGenome Portal through a number of specific instances. One such instance is being used as the reference portal for the Catalan Initiative for the Earth Biogenome Project, a regional project aiming to sequencing the genomes of the species of the Catalan linguistic area.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf020"},"PeriodicalIF":4.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143693445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing 3D chromosome structures from single-cell Hi-C data with SO(3)-equivariant graph neural networks. 利用SO(3)-等变图神经网络从单细胞Hi-C数据重建三维染色体结构。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-22 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqaf027
Yanli Wang, Jianlin Cheng

The spatial conformation of chromosomes and genomes of single cells is relevant to cellular function and useful for elucidating the mechanism underlying gene expression and genome methylation. The chromosomal contacts (i.e. chromosomal regions in spatial proximity) entailing the three-dimensional (3D) structure of the genome of a single cell can be obtained by single-cell chromosome conformation capture techniques, such as single-cell Hi-C (ScHi-C). However, due to the sparsity of chromosomal contacts in ScHi-C data, it is still challenging for traditional 3D conformation optimization methods to reconstruct the 3D chromosome structures from ScHi-C data. Here, we present a machine learning-based method based on a novel SO(3)-equivariant graph neural network (HiCEGNN) to reconstruct 3D structures of chromosomes of single cells from ScHi-C data. HiCEGNN consistently outperforms both the traditional optimization methods and the only other deep learning method across diverse cells, different structural resolutions, and different noise levels of the data. Moreover, HiCEGNN is robust against the noise in the ScHi-C data.

单细胞染色体和基因组的空间构象与细胞功能有关,有助于阐明基因表达和基因组甲基化的机制。单细胞染色体构象捕获技术,如单细胞Hi-C (ScHi-C),可以获得单细胞基因组三维(3D)结构的染色体接触(即空间接近的染色体区域)。然而,由于ScHi-C数据中染色体接触的稀疏性,传统的三维构象优化方法仍然难以从ScHi-C数据中重建三维染色体结构。在这里,我们提出了一种基于机器学习的方法,该方法基于一种新颖的SO(3)-等变图神经网络(HiCEGNN),从ScHi-C数据中重建单细胞染色体的三维结构。在不同的单元、不同的结构分辨率和不同的噪声水平的数据中,HiCEGNN始终优于传统的优化方法和唯一的其他深度学习方法。此外,HiCEGNN对ScHi-C数据中的噪声具有鲁棒性。
{"title":"Reconstructing 3D chromosome structures from single-cell Hi-C data with SO(3)-equivariant graph neural networks.","authors":"Yanli Wang, Jianlin Cheng","doi":"10.1093/nargab/lqaf027","DOIUrl":"10.1093/nargab/lqaf027","url":null,"abstract":"<p><p>The spatial conformation of chromosomes and genomes of single cells is relevant to cellular function and useful for elucidating the mechanism underlying gene expression and genome methylation. The chromosomal contacts (i.e. chromosomal regions in spatial proximity) entailing the three-dimensional (3D) structure of the genome of a single cell can be obtained by single-cell chromosome conformation capture techniques, such as single-cell Hi-C (ScHi-C). However, due to the sparsity of chromosomal contacts in ScHi-C data, it is still challenging for traditional 3D conformation optimization methods to reconstruct the 3D chromosome structures from ScHi-C data. Here, we present a machine learning-based method based on a novel SO(3)-equivariant graph neural network (HiCEGNN) to reconstruct 3D structures of chromosomes of single cells from ScHi-C data. HiCEGNN consistently outperforms both the traditional optimization methods and the only other deep learning method across diverse cells, different structural resolutions, and different noise levels of the data. Moreover, HiCEGNN is robust against the noise in the ScHi-C data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf027"},"PeriodicalIF":4.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928942/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143693442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing the annotation of small ORF-altering variants using MORFEE: introducing MORFEEdb, a comprehensive catalog of SNVs affecting upstream ORFs in human 5'UTRs. 使用MORFEE增强对改变orf的小变异的注释:引入MORFEEdb,一个影响人类5' utr上游orf的snv的综合目录。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-19 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqaf017
Caroline Meguerditchian, David Baux, Thomas E Ludwig, Emmanuelle Genin, David-Alexandre Trégouët, Omar Soukarieh

Non-canonical small open reading frames (sORFs) are among the main regulators of gene expression. The most studied of these are upstream ORFs (upORFs) located in the 5'-untranslated region (UTR) of coding genes. Internal ORFs (intORFs) in the coding sequence and downstream ORFs (dORFs) in the 3'UTR have received less attention. Different bioinformatics tools permit the prediction of single nucleotide variants (SNVs) altering upORFs, mainly those creating AUGs or deleting stop codons, but no tool predicts variants altering non-canonical translation initiation sites and those altering intORFs or dORFs. We propose an upgrade of our MORFEE bioinformatics tool to identify SNVs that may alter all types of sORFs in coding transcripts from a VCF file. Moreover, we generate an exhaustive catalog, named MORFEEdb, reporting all possible SNVs altering existing upORFs or creating new ones in human transcripts, and provide an R script for visualizing the results. MORFEEdb has been implemented in the public platform Mobidetails. Finally, the annotation of ClinVar variants with MORFEE reveals that > 45% of UTR-SNVs can alter upORFs or dORFs. In conclusion, MORFEE and MORFEEdb have the potential to improve the molecular diagnosis of rare human diseases and to facilitate the identification of functional variants from genome-wide association studies of complex traits.

非规范小开放阅读框(sorf)是基因表达的主要调控因子之一。其中研究最多的是位于编码基因5′-未翻译区(UTR)的上游orf (uporf)。编码序列中的内部orf (intorf)和3'UTR中的下游orf (dorf)受到的关注较少。不同的生物信息学工具可以预测改变uporf的单核苷酸变异(snv),主要是那些产生aug或删除终止密码子的变异,但没有工具可以预测改变非规范翻译起始位点的变异以及改变intorf或dorf的变异。我们建议升级我们的MORFEE生物信息学工具,以识别可能改变VCF文件编码转录本中所有类型sorf的snv。此外,我们生成了一个名为MORFEEdb的详尽目录,报告了所有可能改变现有uporf或在人类转录本中创建新uporf的snv,并提供了一个R脚本来可视化结果。MORFEEdb已经在公共平台Mobidetails中实现。最后,用MORFEE对ClinVar变体进行注释,发现约45%的utr - snv可以改变upfs或dORFs。总之,MORFEE和MORFEEdb有潜力改善罕见人类疾病的分子诊断,并有助于从复杂性状的全基因组关联研究中鉴定功能变异。
{"title":"Enhancing the annotation of small ORF-altering variants using MORFEE: introducing MORFEEdb, a comprehensive catalog of SNVs affecting upstream ORFs in human 5'UTRs.","authors":"Caroline Meguerditchian, David Baux, Thomas E Ludwig, Emmanuelle Genin, David-Alexandre Trégouët, Omar Soukarieh","doi":"10.1093/nargab/lqaf017","DOIUrl":"10.1093/nargab/lqaf017","url":null,"abstract":"<p><p>Non-canonical small open reading frames (sORFs) are among the main regulators of gene expression. The most studied of these are upstream ORFs (upORFs) located in the 5'-untranslated region (UTR) of coding genes. Internal ORFs (intORFs) in the coding sequence and downstream ORFs (dORFs) in the 3'UTR have received less attention. Different bioinformatics tools permit the prediction of single nucleotide variants (SNVs) altering upORFs, mainly those creating AUGs or deleting stop codons, but no tool predicts variants altering non-canonical translation initiation sites and those altering intORFs or dORFs. We propose an upgrade of our MORFEE bioinformatics tool to identify SNVs that may alter all types of sORFs in coding transcripts from a VCF file. Moreover, we generate an exhaustive catalog, named MORFEEdb, reporting all possible SNVs altering existing upORFs or creating new ones in human transcripts, and provide an R script for visualizing the results. MORFEEdb has been implemented in the public platform Mobidetails. Finally, the annotation of ClinVar variants with MORFEE reveals that > 45% of UTR-SNVs can alter upORFs or dORFs. In conclusion, MORFEE and MORFEEdb have the potential to improve the molecular diagnosis of rare human diseases and to facilitate the identification of functional variants from genome-wide association studies of complex traits.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf017"},"PeriodicalIF":4.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11920869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143664809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data. 精心设计的实验来评估单细胞RNA-seq数据的特征选择方法。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2025-03-19 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqaf023
Siyao Liu, David L Corcoran, Susana Garcia-Recio, James S Marron, Charles M Perou

While numerous methods have been developed for analyzing scRNA-seq data, benchmarking various methods remains challenging. There is a lack of ground truth datasets for evaluating novel gene selection and/or clustering methods. We propose the use of crafted experiments, a new approach based upon perturbing signals in a real dataset for comparing analysis methods. We demonstrate the effectiveness of crafted experiments for evaluating new univariate distribution-oriented suite of feature selection methods, called GOF. We show GOF selects features that robustly identify crafted features and perform well on real non-crafted data sets. Using varying ways of crafting, we also show the context in which each GOF method performs the best. GOF is implemented as an open-source R package and freely available under GPL-2 license at https://github.com/siyao-liu/GOF. Source code, including all functions for constructing crafted experiments and benchmarking feature selection methods, are publicly available at https://github.com/siyao-liu/CraftedExperiment.

虽然已经开发了许多方法来分析scRNA-seq数据,但对各种方法进行基准测试仍然具有挑战性。缺乏评估新的基因选择和/或聚类方法的真实数据集。我们建议使用精心设计的实验,这是一种基于真实数据集中的扰动信号的新方法,用于比较分析方法。我们展示了精心设计的实验的有效性,用于评估新的单变量面向分布的特征选择方法套件,称为GOF。我们展示了GOF选择的特征鲁棒地识别了精心设计的特征,并在真实的非精心设计的数据集上表现良好。使用不同的制作方法,我们还展示了每种GOF方法表现最佳的上下文。GOF是作为开源R包实现的,在GPL-2许可下可在https://github.com/siyao-liu/GOF免费获得。源代码,包括用于构建精心设计的实验和对特征选择方法进行基准测试的所有函数,可在https://github.com/siyao-liu/CraftedExperiment上公开获得。
{"title":"Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data.","authors":"Siyao Liu, David L Corcoran, Susana Garcia-Recio, James S Marron, Charles M Perou","doi":"10.1093/nargab/lqaf023","DOIUrl":"10.1093/nargab/lqaf023","url":null,"abstract":"<p><p>While numerous methods have been developed for analyzing scRNA-seq data, benchmarking various methods remains challenging. There is a lack of ground truth datasets for evaluating novel gene selection and/or clustering methods. We propose the use of <i>crafted experiments</i>, a new approach based upon perturbing signals in a real dataset for comparing analysis methods. We demonstrate the effectiveness of crafted experiments for evaluating new univariate distribution-oriented suite of feature selection methods, called GOF. We show GOF selects features that robustly identify crafted features and perform well on real non-crafted data sets. Using varying ways of crafting, we also show the context in which each GOF method performs the best. GOF is implemented as an open-source R package and freely available under GPL-2 license at https://github.com/siyao-liu/GOF. Source code, including all functions for constructing crafted experiments and benchmarking feature selection methods, are publicly available at https://github.com/siyao-liu/CraftedExperiment.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf023"},"PeriodicalIF":4.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11920870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143664831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NAR Genomics and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1