首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. tcrBLOSUM:用于远距离表位特异性 TCR 敏感比对的氨基酸替代矩阵。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae602
Anna Postovskaya, Koen Vercauteren, Pieter Meysman, Kris Laukens

Deciphering the specificity of T-cell receptor (TCR) repertoires is crucial for monitoring adaptive immune responses and developing targeted immunotherapies and vaccines. To elucidate the specificity of previously unseen TCRs, many methods employ the BLOSUM62 matrix to find TCRs with similar amino acid (AA) sequences. However, while BLOSUM62 reflects the AA substitutions within conserved regions of proteins with similar functions, the remarkable diversity of TCRs means that both TCRs with similar and dissimilar sequences can bind the same epitope. Therefore, reliance on BLOSUM62 may bias detection towards epitope-specific TCRs with similar biochemical properties, overlooking those with more diverse AA compositions. In this study, we introduce tcrBLOSUMa and tcrBLOSUMb, specialized AA substitution matrices for CDR3 alpha and CDR3 beta TCR chains, respectively. The matrices reflect AA frequencies and variations occurring within TCRs that bind the same epitope, revealing that both CDR3 alpha and CDR3 beta display tolerance to a wide range of AA substitutions and differ noticeably from the standard BLOSUM62. By accurately aligning distant TCRs employing tcrBLOSUMb, we were able to improve clustering performance and capture a large number of epitope-specific TCRs with diverse AA compositions and physicochemical profiles overlooked by BLOSUM62. Utilizing both the general BLOSUM62 and specialized tcrBLOSUM matrices in existing computational tools will broaden the range of TCRs that can be associated with their cognate epitopes, thereby enhancing TCR repertoire analysis.

破解 T 细胞受体 (TCR) 复合物的特异性对于监测适应性免疫反应以及开发靶向免疫疗法和疫苗至关重要。为了阐明以前从未见过的 TCR 的特异性,许多方法都采用 BLOSUM62 矩阵来寻找具有相似氨基酸(AA)序列的 TCR。然而,虽然 BLOSUM62 反映了具有相似功能的蛋白质保守区域内的 AA 替换,但 TCR 的显著多样性意味着具有相似和不相似序列的 TCR 都能结合相同的表位。因此,依赖 BLOSUM62 可能会偏向于检测具有相似生化特性的表位特异性 TCR,而忽略那些 AA 组成更多样化的 TCR。在这项研究中,我们引入了 tcrBLOSUMa 和 tcrBLOSUMb,它们分别是 CDR3 alpha 和 CDR3 beta TCR 链的专用 AA 替换矩阵。这两个矩阵反映了结合相同表位的 TCR 中出现的 AA 频率和变化,揭示了 CDR3 alpha 和 CDR3 beta 对广泛的 AA 替换具有耐受性,与标准 BLOSUM62 有明显不同。通过使用 tcrBLOSUMb 对遥远的 TCR 进行精确配准,我们提高了聚类性能,并捕获了 BLOSUM62 忽略的大量具有不同 AA 组成和理化特征的表位特异性 TCR。在现有的计算工具中同时使用通用的 BLOSUM62 和专用的 tcrBLOSUM 矩阵将扩大 TCR 与其同源表位相关联的范围,从而加强 TCR 基因库分析。
{"title":"tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs.","authors":"Anna Postovskaya, Koen Vercauteren, Pieter Meysman, Kris Laukens","doi":"10.1093/bib/bbae602","DOIUrl":"10.1093/bib/bbae602","url":null,"abstract":"<p><p>Deciphering the specificity of T-cell receptor (TCR) repertoires is crucial for monitoring adaptive immune responses and developing targeted immunotherapies and vaccines. To elucidate the specificity of previously unseen TCRs, many methods employ the BLOSUM62 matrix to find TCRs with similar amino acid (AA) sequences. However, while BLOSUM62 reflects the AA substitutions within conserved regions of proteins with similar functions, the remarkable diversity of TCRs means that both TCRs with similar and dissimilar sequences can bind the same epitope. Therefore, reliance on BLOSUM62 may bias detection towards epitope-specific TCRs with similar biochemical properties, overlooking those with more diverse AA compositions. In this study, we introduce tcrBLOSUMa and tcrBLOSUMb, specialized AA substitution matrices for CDR3 alpha and CDR3 beta TCR chains, respectively. The matrices reflect AA frequencies and variations occurring within TCRs that bind the same epitope, revealing that both CDR3 alpha and CDR3 beta display tolerance to a wide range of AA substitutions and differ noticeably from the standard BLOSUM62. By accurately aligning distant TCRs employing tcrBLOSUMb, we were able to improve clustering performance and capture a large number of epitope-specific TCRs with diverse AA compositions and physicochemical profiles overlooked by BLOSUM62. Utilizing both the general BLOSUM62 and specialized tcrBLOSUM matrices in existing computational tools will broaden the range of TCRs that can be associated with their cognate epitopes, thereby enhancing TCR repertoire analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583439/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142686153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Filtering out the noise: metagenomic classifiers optimize ancient DNA mapping. 过滤噪音:元基因组分类器优化古 DNA 图谱。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae646
Shyamsundar Ravishankar, Vilma Perez, Roberta Davidson, Xavier Roca-Rada, Divon Lan, Yassine Souilmi, Bastien Llamas

Contamination with exogenous DNA presents a significant challenge in ancient DNA (aDNA) studies of single organisms. Failure to address contamination from microbes, reagents, and present-day sources can impact the interpretation of results. Although field and laboratory protocols exist to limit contamination, there is still a need to accurately distinguish between endogenous and exogenous data computationally. Here, we propose a workflow to reduce exogenous contamination based on a metagenomic classifier. Unlike previous methods that relied exclusively on DNA sequencing reads mapping specificity to a single reference genome to remove contaminating reads, our approach uses Kraken2-based filtering before mapping to the reference genome. Using both simulated and empirical shotgun aDNA data, we show that this workflow presents a simple and efficient method that can be used in a wide range of computational environments-including personal machines. We propose strategies to build specific databases used to profile sequencing data that take into consideration available computational resources and prior knowledge about the target taxa and likely contaminants. Our workflow significantly reduces the overall computational resources required during the mapping process and reduces the total runtime by up to ~94%. The most significant impacts are observed in low endogenous samples. Importantly, contaminants that would map to the reference are filtered out using our strategy, reducing false positive alignments. We also show that our method results in a negligible loss of endogenous data with no measurable impact on downstream population genetics analyses.

外源 DNA 污染是单一生物古 DNA(aDNA)研究中的一个重大挑战。如果不能解决微生物、试剂和现今来源的污染问题,就会影响结果的解读。虽然已有野外和实验室规程来限制污染,但仍需要通过计算来准确区分内源和外源数据。在此,我们提出了一种基于元基因组分类器减少外源污染的工作流程。以往的方法完全依赖DNA测序读数与单一参考基因组的特异性映射来去除污染读数,而我们的方法则不同,在映射到参考基因组之前使用基于Kraken2的过滤。我们使用模拟和经验霰弹枪 aDNA 数据表明,这种工作流程是一种简单高效的方法,可用于各种计算环境,包括个人计算机。我们提出了建立用于测序数据分析的特定数据库的策略,其中考虑到了可用的计算资源以及关于目标类群和可能污染物的先验知识。我们的工作流程大大减少了测绘过程中所需的总体计算资源,并将总运行时间减少了约 94%。在低内源性样本中观察到的影响最为明显。重要的是,使用我们的策略可以过滤掉会映射到参考文献的污染物,从而减少假阳性比对。我们还表明,我们的方法导致的内源数据损失可以忽略不计,对下游群体遗传学分析没有明显影响。
{"title":"Filtering out the noise: metagenomic classifiers optimize ancient DNA mapping.","authors":"Shyamsundar Ravishankar, Vilma Perez, Roberta Davidson, Xavier Roca-Rada, Divon Lan, Yassine Souilmi, Bastien Llamas","doi":"10.1093/bib/bbae646","DOIUrl":"10.1093/bib/bbae646","url":null,"abstract":"<p><p>Contamination with exogenous DNA presents a significant challenge in ancient DNA (aDNA) studies of single organisms. Failure to address contamination from microbes, reagents, and present-day sources can impact the interpretation of results. Although field and laboratory protocols exist to limit contamination, there is still a need to accurately distinguish between endogenous and exogenous data computationally. Here, we propose a workflow to reduce exogenous contamination based on a metagenomic classifier. Unlike previous methods that relied exclusively on DNA sequencing reads mapping specificity to a single reference genome to remove contaminating reads, our approach uses Kraken2-based filtering before mapping to the reference genome. Using both simulated and empirical shotgun aDNA data, we show that this workflow presents a simple and efficient method that can be used in a wide range of computational environments-including personal machines. We propose strategies to build specific databases used to profile sequencing data that take into consideration available computational resources and prior knowledge about the target taxa and likely contaminants. Our workflow significantly reduces the overall computational resources required during the mapping process and reduces the total runtime by up to ~94%. The most significant impacts are observed in low endogenous samples. Importantly, contaminants that would map to the reference are filtered out using our strategy, reducing false positive alignments. We also show that our method results in a negligible loss of endogenous data with no measurable impact on downstream population genetics analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11646131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring single-cell resolution spatial gene expression via fusing spot-based spatial transcriptomics, location, and histology using GCN. 利用 GCN 将基于点的空间转录组学、位置和组织学融合在一起,推断单细胞分辨率的空间基因表达。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae630
Shuailin Xue, Fangfang Zhu, Jinyu Chen, Wenwen Min

Spatial transcriptomics (ST technology allows for the detection of cellular transcriptome information while preserving the spatial location of cells. This capability enables researchers to better understand the cellular heterogeneity, spatial organization, and functional interactions in complex biological systems. However, current technological methods are limited by low resolution, which reduces the accuracy of gene expression levels. Here, we propose scstGCN, a multimodal information fusion method based on Vision Transformer and Graph Convolutional Network that integrates histological images, spot-based ST data and spatial location information to infer super-resolution gene expression profiles at single-cell level. We evaluated the accuracy of the super-resolution gene expression profiles generated on diverse tissue ST datasets with disease and healthy by scstGCN along with their performance in identifying spatial patterns, conducting functional enrichment analysis, and tissue annotation. The results show that scstGCN can predict super-resolution gene expression accurately and aid researchers in discovering biologically meaningful differentially expressed genes and pathways. Additionally, scstGCN can segment and annotate tissues at a finer granularity, with results demonstrating strong consistency with coarse manual annotations. Our source code and all used datasets are available at https://github.com/wenwenmin/scstGCN and https://zenodo.org/records/12800375.

空间转录组学(ST)技术可以检测细胞转录组信息,同时保留细胞的空间位置。这种能力使研究人员能够更好地了解复杂生物系统中的细胞异质性、空间组织和功能相互作用。然而,目前的技术方法受限于低分辨率,降低了基因表达水平的准确性。在这里,我们提出了一种基于视觉变换器和图卷积网络的多模态信息融合方法--scstGCN,它整合了组织学图像、基于斑点的 ST 数据和空间位置信息,以推断单细胞水平的超分辨率基因表达谱。我们评估了 scstGCN 在不同组织 ST 数据集上生成的疾病和健康超分辨率基因表达谱的准确性,以及它们在识别空间模式、进行功能富集分析和组织注释方面的性能。结果表明,scstGCN 可以准确预测超分辨率基因表达,帮助研究人员发现具有生物学意义的差异表达基因和通路。此外,scstGCN 还能以更细的粒度分割和注释组织,其结果与粗略的人工注释结果具有很强的一致性。我们的源代码和所有使用的数据集可在 https://github.com/wenwenmin/scstGCN 和 https://zenodo.org/records/12800375 上获取。
{"title":"Inferring single-cell resolution spatial gene expression via fusing spot-based spatial transcriptomics, location, and histology using GCN.","authors":"Shuailin Xue, Fangfang Zhu, Jinyu Chen, Wenwen Min","doi":"10.1093/bib/bbae630","DOIUrl":"10.1093/bib/bbae630","url":null,"abstract":"<p><p>Spatial transcriptomics (ST technology allows for the detection of cellular transcriptome information while preserving the spatial location of cells. This capability enables researchers to better understand the cellular heterogeneity, spatial organization, and functional interactions in complex biological systems. However, current technological methods are limited by low resolution, which reduces the accuracy of gene expression levels. Here, we propose scstGCN, a multimodal information fusion method based on Vision Transformer and Graph Convolutional Network that integrates histological images, spot-based ST data and spatial location information to infer super-resolution gene expression profiles at single-cell level. We evaluated the accuracy of the super-resolution gene expression profiles generated on diverse tissue ST datasets with disease and healthy by scstGCN along with their performance in identifying spatial patterns, conducting functional enrichment analysis, and tissue annotation. The results show that scstGCN can predict super-resolution gene expression accurately and aid researchers in discovering biologically meaningful differentially expressed genes and pathways. Additionally, scstGCN can segment and annotate tissues at a finer granularity, with results demonstrating strong consistency with coarse manual annotations. Our source code and all used datasets are available at https://github.com/wenwenmin/scstGCN and https://zenodo.org/records/12800375.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11645551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering the genetic interplay between depression and dysmenorrhea: a Mendelian randomization study. 解密抑郁症与痛经之间的基因相互作用:孟德尔随机研究。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae589
Shuhe Liu, Zhen Wei, Daniel F Carr, John Moraros

Background: This study aims to explore the link between depression and dysmenorrhea by using an integrated and innovative approach that combines genomic, transcriptomic, and protein interaction data/information from various resources.

Methods: A two-sample, bidirectional, and multivariate Mendelian randomization (MR) approach was applied to determine causality between dysmenorrhea and depression. Genome-wide association study (GWAS) data were used to identify genetic variants associated with both dysmenorrhea and depression, followed by colocalization analysis of shared genetic influences. Expression quantitative trait locus (eQTL) data were analyzed from public databases to pinpoint target genes in relevant tissues. Additionally, a protein-protein interaction (PPI) network was constructed using the STRING database to analyze interactions among identified proteins.

Results: MR analysis confirmed a significant causal effect of depression on dysmenorrhea ['odds ratio' (95% confidence interval) = 1.51 (1.19, 1.91), P = 7.26 × 10-4]. Conversely, no evidence was found to support a causal effect of dysmenorrhea on depression (P = .74). Genetic analysis, using GWAS and eQTL data, identified single-nucleotide polymorphisms in several genes, including GRK4, TRAIP, and RNF123, indicating that depression may impact reproductive function through these genetic pathways, with a detailed picture presented by way of analysis in the PPI network. Colocalization analysis highlighted rs34341246(RBMS3) as a potential shared causal variant.

Conclusions: This study suggests that depression significantly affects dysmenorrhea and identifies key genes and proteins involved in this interaction. The findings underline the need for integrated clinical and public health approaches that screen for depression among women presenting with dysmenorrhea and suggest new targeted preventive strategies.

背景:本研究旨在通过综合利用各种资源中的基因组、转录组和蛋白质相互作用数据/信息的创新方法,探讨抑郁症与痛经之间的联系:本研究旨在采用一种综合的创新方法,结合来自各种资源的基因组、转录组和蛋白质相互作用数据/信息,探讨抑郁症与痛经之间的联系:方法:采用双样本、双向和多变量孟德尔随机化(MR)方法确定痛经与抑郁症之间的因果关系。利用全基因组关联研究(GWAS)数据确定与痛经和抑郁症相关的遗传变异,然后对共同的遗传影响因素进行共定位分析。通过分析公共数据库中的表达量性状位点(eQTL)数据,确定了相关组织中的目标基因。此外,还利用 STRING 数据库构建了蛋白质-蛋白质相互作用(PPI)网络,以分析已识别蛋白质之间的相互作用:结果:磁共振分析证实抑郁症对痛经有明显的因果效应['几率比'(95% 置信区间)= 1.51 (1.19, 1.91),P = 7.26 × 10-4]。相反,没有证据支持痛经对抑郁症的因果效应(P = .74)。利用 GWAS 和 eQTL 数据进行的遗传分析确定了多个基因的单核苷酸多态性,包括 GRK4、TRAIP 和 RNF123,表明抑郁症可能通过这些遗传途径影响生殖功能,并通过 PPI 网络分析呈现了详细情况。共定位分析强调了rs34341246(RBMS3)是一个潜在的共享因果变异体:这项研究表明,抑郁症对痛经有重大影响,并确定了参与这种相互作用的关键基因和蛋白质。研究结果突出表明,有必要采取综合的临床和公共卫生方法,对痛经妇女进行抑郁筛查,并提出新的有针对性的预防策略。
{"title":"Deciphering the genetic interplay between depression and dysmenorrhea: a Mendelian randomization study.","authors":"Shuhe Liu, Zhen Wei, Daniel F Carr, John Moraros","doi":"10.1093/bib/bbae589","DOIUrl":"10.1093/bib/bbae589","url":null,"abstract":"<p><strong>Background: </strong>This study aims to explore the link between depression and dysmenorrhea by using an integrated and innovative approach that combines genomic, transcriptomic, and protein interaction data/information from various resources.</p><p><strong>Methods: </strong>A two-sample, bidirectional, and multivariate Mendelian randomization (MR) approach was applied to determine causality between dysmenorrhea and depression. Genome-wide association study (GWAS) data were used to identify genetic variants associated with both dysmenorrhea and depression, followed by colocalization analysis of shared genetic influences. Expression quantitative trait locus (eQTL) data were analyzed from public databases to pinpoint target genes in relevant tissues. Additionally, a protein-protein interaction (PPI) network was constructed using the STRING database to analyze interactions among identified proteins.</p><p><strong>Results: </strong>MR analysis confirmed a significant causal effect of depression on dysmenorrhea ['odds ratio' (95% confidence interval) = 1.51 (1.19, 1.91), P = 7.26 × 10-4]. Conversely, no evidence was found to support a causal effect of dysmenorrhea on depression (P = .74). Genetic analysis, using GWAS and eQTL data, identified single-nucleotide polymorphisms in several genes, including GRK4, TRAIP, and RNF123, indicating that depression may impact reproductive function through these genetic pathways, with a detailed picture presented by way of analysis in the PPI network. Colocalization analysis highlighted rs34341246(RBMS3) as a potential shared causal variant.</p><p><strong>Conclusions: </strong>This study suggests that depression significantly affects dysmenorrhea and identifies key genes and proteins involved in this interaction. The findings underline the need for integrated clinical and public health approaches that screen for depression among women presenting with dysmenorrhea and suggest new targeted preventive strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11596086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142726289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
microT-CNN: an avant-garde deep convolutional neural network unravels functional miRNA targets beyond canonical sites. micro - cnn:一个前卫的深度卷积神经网络揭示了规范位点之外的功能miRNA目标。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae678
Elissavet Zacharopoulou, Maria D Paraskevopoulou, Spyros Tastsoglou, Athanasios Alexiou, Anna Karavangeli, Vasilis Pierros, Stefanos Digenis, Galatea Mavromati, Artemis G Hatzigeorgiou, Dimitra Karagkouni

microRNAs (miRNAs) are central post-transcriptional gene expression regulators in healthy and diseased states. Despite decades of effort, deciphering miRNA targets remains challenging, leading to an incomplete miRNA interactome and partially elucidated miRNA functions. Here, we introduce microT-CNN, an avant-garde deep convolutional neural network model that moves the needle by integrating hundreds of tissue-matched (in-)direct experiments from 26 distinct cell types, corresponding to a unique training and evaluation set of >60 000 miRNA binding events and ~30 000 unique miRNA-gene target pairs. The multilayer sequence-based design enables the prediction of both host and virus-encoded miRNA interactions, providing for the first time up to 67% of direct genuine Epstein-Barr virus- and Kaposi's sarcoma-associated herpesvirus-derived miRNA-target pairs corresponding to one out of four binding events of virus-encoded miRNAs. microT-CNN fills the existing gap of the miRNA-target prediction by providing functional targets beyond the canonical sites, including 3' compensatory miRNA pairings, prompting 1.4-fold more validated miRNA binding events compared to other implementations and shedding light on previously unexplored facets of the miRNA interactome.

microRNAs (miRNAs)是健康和疾病状态下转录后基因表达的主要调控因子。尽管经过数十年的努力,破译miRNA靶点仍然具有挑战性,导致miRNA相互作用组不完整,miRNA功能部分被阐明。在这里,我们引入了micro - cnn,这是一种前卫的深度卷积神经网络模型,通过整合来自26种不同细胞类型的数百个组织匹配(in-)直接实验来移动针头,对应于一个独特的训练和评估集,其中包含约60000个miRNA结合事件和约30000个独特的miRNA基因靶对。基于多层序列的设计能够预测宿主和病毒编码的miRNA相互作用,首次提供高达67%的直接真正的爱泼斯坦-巴尔病毒和卡波西氏肉瘤相关疱疹病毒衍生的miRNA靶对,对应于病毒编码miRNA的四分之一的结合事件。microT-CNN通过提供规范位点之外的功能靶标(包括3'代偿性miRNA配对)填补了miRNA靶标预测的现有空白,与其他实现相比,促进了1.4倍的验证miRNA结合事件,并揭示了miRNA相互作用组以前未被探索的方面。
{"title":"microT-CNN: an avant-garde deep convolutional neural network unravels functional miRNA targets beyond canonical sites.","authors":"Elissavet Zacharopoulou, Maria D Paraskevopoulou, Spyros Tastsoglou, Athanasios Alexiou, Anna Karavangeli, Vasilis Pierros, Stefanos Digenis, Galatea Mavromati, Artemis G Hatzigeorgiou, Dimitra Karagkouni","doi":"10.1093/bib/bbae678","DOIUrl":"10.1093/bib/bbae678","url":null,"abstract":"<p><p>microRNAs (miRNAs) are central post-transcriptional gene expression regulators in healthy and diseased states. Despite decades of effort, deciphering miRNA targets remains challenging, leading to an incomplete miRNA interactome and partially elucidated miRNA functions. Here, we introduce microT-CNN, an avant-garde deep convolutional neural network model that moves the needle by integrating hundreds of tissue-matched (in-)direct experiments from 26 distinct cell types, corresponding to a unique training and evaluation set of >60 000 miRNA binding events and ~30 000 unique miRNA-gene target pairs. The multilayer sequence-based design enables the prediction of both host and virus-encoded miRNA interactions, providing for the first time up to 67% of direct genuine Epstein-Barr virus- and Kaposi's sarcoma-associated herpesvirus-derived miRNA-target pairs corresponding to one out of four binding events of virus-encoded miRNAs. microT-CNN fills the existing gap of the miRNA-target prediction by providing functional targets beyond the canonical sites, including 3' compensatory miRNA pairings, prompting 1.4-fold more validated miRNA binding events compared to other implementations and shedding light on previously unexplored facets of the miRNA interactome.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata. 使用非结构化元数据的可解释建模对公开可用的样本和研究进行注释。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae652
Hao Yuan, Parker Hicks, Mansooreh Ahmadian, Kayla A Johnson, Lydia Valtadoros, Arjun Krishnan

Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.0, a general-purpose method based on natural language processing and machine learning for annotating biomedical unstructured metadata to controlled vocabularies of diseases and tissues. Compared to the previous version (txt2onto 1.0), which uses numerical embeddings as features, this new version uses words as features, resulting in improved interpretability and performance, especially when few positive training instances are available. Txt2onto 2.0 uses embeddings from a large language model during prediction to deal with unseen-yet-relevant words related to each disease and tissue term being predicted from the input text, thereby explaining the basis of every annotation. We demonstrate the generalizability of txt2onto 2.0 by accurately predicting disease annotations for studies from independent datasets, using proteomics and clinical trials as examples. Overall, our approach can annotate biomedical text regardless of experimental types or sources. Code, data, and trained models are available at https://github.com/krishnanlab/txt2onto2.0.

重用大量可公开获得的生物医学数据可以显著影响知识发现。然而,这些公共样本和研究通常使用非结构化纯文本进行描述,阻碍了数据的可查找性和进一步重用。为了解决这个问题,我们提出了txt2onto 2.0,这是一种基于自然语言处理和机器学习的通用方法,用于将生物医学非结构化元数据注释到疾病和组织的受控词汇表中。与使用数字嵌入作为特征的上一个版本(txt2onto 1.0)相比,这个新版本使用单词作为特征,从而提高了可解释性和性能,特别是在可用的正面训练实例很少的情况下。Txt2onto 2.0在预测期间使用来自大型语言模型的嵌入来处理从输入文本中预测的与每种疾病和组织术语相关的未见但相关的单词,从而解释每个注释的基础。我们以蛋白质组学和临床试验为例,通过准确预测来自独立数据集的研究的疾病注释,证明了txt22.0的通用性。总的来说,我们的方法可以注释生物医学文本,而不考虑实验类型或来源。代码、数据和经过训练的模型可在https://github.com/krishnanlab/txt2onto2.0上获得。
{"title":"Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata.","authors":"Hao Yuan, Parker Hicks, Mansooreh Ahmadian, Kayla A Johnson, Lydia Valtadoros, Arjun Krishnan","doi":"10.1093/bib/bbae652","DOIUrl":"10.1093/bib/bbae652","url":null,"abstract":"<p><p>Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.0, a general-purpose method based on natural language processing and machine learning for annotating biomedical unstructured metadata to controlled vocabularies of diseases and tissues. Compared to the previous version (txt2onto 1.0), which uses numerical embeddings as features, this new version uses words as features, resulting in improved interpretability and performance, especially when few positive training instances are available. Txt2onto 2.0 uses embeddings from a large language model during prediction to deal with unseen-yet-relevant words related to each disease and tissue term being predicted from the input text, thereby explaining the basis of every annotation. We demonstrate the generalizability of txt2onto 2.0 by accurately predicting disease annotations for studies from independent datasets, using proteomics and clinical trials as examples. Overall, our approach can annotate biomedical text regardless of experimental types or sources. Code, data, and trained models are available at https://github.com/krishnanlab/txt2onto2.0.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11663484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R3Design: deep tertiary structure-based RNA sequence design and beyond. R3Design:基于深层三级结构的RNA序列设计及超越。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae682
Cheng Tan, Yijie Zhang, Zhangyang Gao, Hanqun Cao, Siyuan Li, Siqi Ma, Mathieu Blanchette, Stan Z Li

The rational design of Ribonucleic acid (RNA) molecules is crucial for advancing therapeutic applications, synthetic biology, and understanding the fundamental principles of life. Traditional RNA design methods have predominantly focused on secondary structure-based sequence design, often neglecting the intricate and essential tertiary interactions. We introduce R3Design, a tertiary structure-based RNA sequence design method that shifts the paradigm to prioritize tertiary structure in the RNA sequence design. R3Design significantly enhances sequence design on native RNA backbones, achieving high sequence recovery and Macro-F1 score, and outperforming traditional secondary structure-based approaches by substantial margins. We demonstrate that R3Design can design RNA sequences that fold into the desired tertiary structures by validating these predictions using advanced structure prediction models. This method, which is available through standalone software, provides a comprehensive toolkit for designing, folding, and evaluating RNA at the tertiary level. Our findings demonstrate R3Design's superior capability in designing RNA sequences, which achieves around $44%$ in terms of both recovery score and Macro-F1 score in multiple datasets. This not only denotes the accuracy and fairness of the model but also underscores its potential to drive forward the development of innovative RNA-based therapeutics and to deepen our understanding of RNA biology.

核糖核酸(RNA)分子的合理设计对于推进治疗应用、合成生物学和理解生命的基本原理至关重要。传统的RNA设计方法主要集中在基于二级结构的序列设计上,往往忽略了复杂而重要的三级相互作用。我们介绍了R3Design,这是一种基于三级结构的RNA序列设计方法,它将范式转移到优先考虑RNA序列设计中的三级结构。R3Design显著增强了在天然RNA主干上的序列设计,实现了较高的序列恢复和Macro-F1评分,大大优于传统的基于二级结构的方法。通过使用先进的结构预测模型验证这些预测,我们证明R3Design可以设计出折叠成所需三级结构的RNA序列。该方法可通过独立软件获得,为三级RNA的设计、折叠和评估提供了一个全面的工具包。我们的研究结果证明了R3Design在设计RNA序列方面的卓越能力,在多个数据集的恢复评分和Macro-F1评分方面都达到了约44%。这不仅表明了该模型的准确性和公平性,而且强调了其推动基于RNA的创新疗法发展和加深我们对RNA生物学理解的潜力。
{"title":"R3Design: deep tertiary structure-based RNA sequence design and beyond.","authors":"Cheng Tan, Yijie Zhang, Zhangyang Gao, Hanqun Cao, Siyuan Li, Siqi Ma, Mathieu Blanchette, Stan Z Li","doi":"10.1093/bib/bbae682","DOIUrl":"10.1093/bib/bbae682","url":null,"abstract":"<p><p>The rational design of Ribonucleic acid (RNA) molecules is crucial for advancing therapeutic applications, synthetic biology, and understanding the fundamental principles of life. Traditional RNA design methods have predominantly focused on secondary structure-based sequence design, often neglecting the intricate and essential tertiary interactions. We introduce R3Design, a tertiary structure-based RNA sequence design method that shifts the paradigm to prioritize tertiary structure in the RNA sequence design. R3Design significantly enhances sequence design on native RNA backbones, achieving high sequence recovery and Macro-F1 score, and outperforming traditional secondary structure-based approaches by substantial margins. We demonstrate that R3Design can design RNA sequences that fold into the desired tertiary structures by validating these predictions using advanced structure prediction models. This method, which is available through standalone software, provides a comprehensive toolkit for designing, folding, and evaluating RNA at the tertiary level. Our findings demonstrate R3Design's superior capability in designing RNA sequences, which achieves around $44%$ in terms of both recovery score and Macro-F1 score in multiple datasets. This not only denotes the accuracy and fairness of the model but also underscores its potential to drive forward the development of innovative RNA-based therapeutics and to deepen our understanding of RNA biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating scRNA-seq and scATAC-seq with inter-type attention heterogeneous graph neural networks. 将scRNA-seq和scATAC-seq与类型间注意异构图神经网络相结合。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae711
Lingsheng Cai, Xiuli Ma, Jianzhu Ma

Single-cell multi-omics techniques, which enable the simultaneous measurement of multiple modalities such as RNA gene expression and Assay for Transposase-Accessible Chromatin (ATAC) within individual cells, have become a powerful tool for deciphering the intricate complexity of cellular systems. Most current methods rely on motif databases to establish cross-modality relationships between genes from RNA-seq data and peaks from ATAC-seq data. However, these approaches are constrained by incomplete database coverage, particularly for novel or poorly characterized relationships. To address these limitations, we introduce single-cell Multi-omics Integration (scMI), a heterogeneous graph embedding method that encodes both cells and modality features from single-cell RNA-seq and ATAC-seq data into a shared latent space by learning cross-modality relationships. By modeling cells and modality features as distinct node types, we design an inter-type attention mechanism to effectively capture long-range cross-modality interactions between genes and peaks. Benchmark results demonstrate that embeddings learned by scMI preserve more biological information and achieve comparable or superior performance in downstream tasks including modality prediction, cell clustering, and gene regulatory network inference compared to methods that rely on databases. Furthermore, scMI significantly improves the alignment and integration of unmatched multi-omics data, enabling more accurate embedding and improved outcomes in downstream tasks.

单细胞多组学技术能够同时测量多种模式,如单个细胞内的RNA基因表达和转座酶可及染色质(ATAC)测定,已成为破译细胞系统复杂复杂性的有力工具。目前大多数方法依靠基序数据库来建立RNA-seq数据中的基因与ATAC-seq数据中的峰之间的交叉模态关系。然而,这些方法受到数据库覆盖不完整的限制,特别是对于新的或特征不明确的关系。为了解决这些限制,我们引入了单细胞多组学集成(scMI),这是一种异构图嵌入方法,通过学习跨模态关系,将单细胞RNA-seq和ATAC-seq数据中的细胞和模态特征编码到共享的潜在空间中。通过将细胞和模态特征建模为不同的节点类型,我们设计了一种类型间注意机制,以有效捕获基因和峰之间的远程跨模态相互作用。基准测试结果表明,与依赖数据库的方法相比,通过scMI学习的嵌入保存了更多的生物信息,并在下游任务(包括模态预测、细胞聚类和基因调控网络推断)中取得了相当或更好的性能。此外,scMI显著改善了不匹配的多组学数据的对齐和集成,使下游任务的嵌入更准确,并改善了结果。
{"title":"Integrating scRNA-seq and scATAC-seq with inter-type attention heterogeneous graph neural networks.","authors":"Lingsheng Cai, Xiuli Ma, Jianzhu Ma","doi":"10.1093/bib/bbae711","DOIUrl":"10.1093/bib/bbae711","url":null,"abstract":"<p><p>Single-cell multi-omics techniques, which enable the simultaneous measurement of multiple modalities such as RNA gene expression and Assay for Transposase-Accessible Chromatin (ATAC) within individual cells, have become a powerful tool for deciphering the intricate complexity of cellular systems. Most current methods rely on motif databases to establish cross-modality relationships between genes from RNA-seq data and peaks from ATAC-seq data. However, these approaches are constrained by incomplete database coverage, particularly for novel or poorly characterized relationships. To address these limitations, we introduce single-cell Multi-omics Integration (scMI), a heterogeneous graph embedding method that encodes both cells and modality features from single-cell RNA-seq and ATAC-seq data into a shared latent space by learning cross-modality relationships. By modeling cells and modality features as distinct node types, we design an inter-type attention mechanism to effectively capture long-range cross-modality interactions between genes and peaks. Benchmark results demonstrate that embeddings learned by scMI preserve more biological information and achieve comparable or superior performance in downstream tasks including modality prediction, cell clustering, and gene regulatory network inference compared to methods that rely on databases. Furthermore, scMI significantly improves the alignment and integration of unmatched multi-omics data, enabling more accurate embedding and improved outcomes in downstream tasks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying cancer prognosis genes through causal learning. 通过因果学习识别癌症预后基因。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae721
Siwei Wu, Chaoyi Yin, Yuezhu Wang, Huiyan Sun

Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.

准确识别癌症预后的致病基因对于估计疾病进展和指导治疗干预至关重要。在这项研究中,我们提出了CPCG(癌症预后的因果基因),这是一个两阶段的框架,利用转录组学数据识别与不同癌症类型的患者预后有因果关系的基因集。最初,一个集合方法用参数和半参数风险模型来模拟基因表达对生存的影响。随后,利用迭代条件独立性检验结合图修剪来推断因果骨架,从而精确定位预后相关基因。对来自癌症基因组图谱项目的18种癌症类型的转录组学数据的实验表明,CPCG在四个评估指标下预测预后的有效性。对来自基因表达综合数据库和中国胶质瘤基因组图谱项目的24个额外数据集的验证进一步证明了CPCG的稳健性和普遍性。CPCG识别了一组简洁但可靠的基因,避免了对生存时间估计的基因组合枚举的需要。这些基因也被证明与癌症的关键生物过程密切相关。此外,CPCG构建了一个稳定的因果骨架,对数据洗牌顺序不敏感。总的来说,CPCG是提取癌症预后生物标志物的强大工具,具有可解释性、通用性和稳健性。CPCG有望促进临床治疗策略中有针对性的干预。
{"title":"Identifying cancer prognosis genes through causal learning.","authors":"Siwei Wu, Chaoyi Yin, Yuezhu Wang, Huiyan Sun","doi":"10.1093/bib/bbae721","DOIUrl":"10.1093/bib/bbae721","url":null,"abstract":"<p><p>Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Higher order interaction analysis quantifies coordination in the epigenome revealing novel biological relationships in Kabuki syndrome. 高阶相互作用分析量化了表观基因组中的协调,揭示了歌舞伎综合征中的新型生物学关系。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae667
Sara Cuvertino, Terence Garner, Evgenii Martirosian, Bridgious Walusimbi, Susan J Kimber, Siddharth Banka, Adam Stevens

Complex direct and indirect relationships between multiple variables, termed higher order interactions (HOIs), are characteristics of all natural systems. Traditional differential and network analyses fail to account for the omic datasets richness and miss HOIs. We investigated peripheral blood DNA methylation data from Kabuki syndrome type 1 (KS1) and control individuals, identified 2,002 differentially methylated points (DMPs), and inferred 17 differentially methylated regions, which represent only 189 DMPs. We applied hypergraph models to measure HOIs on all the CpGs and revealed differences in the coordination of DMPs with lower entropy and higher coordination of the peripheral epigenome in KS1 implying reduced network complexity. Hypergraphs also capture epigenomic trans-relationships, and identify biologically relevant pathways that escape the standard analyses. These findings construct the basis of a suitable model for the analysis of organization in the epigenome in rare diseases, which can be applied to investigate mechanism in big data.

多变量之间复杂的直接和间接关系,称为高阶相互作用(hoi),是所有自然系统的特征。传统的差分分析和网络分析未能考虑到经济数据集的丰富度,并且遗漏了hoi。我们研究了歌舞伎综合征1型(KS1)和对照个体的外周血DNA甲基化数据,确定了2002个差异甲基化点(dmp),并推断出17个差异甲基化区域,仅代表189个dmp。我们应用超图模型测量了所有CpGs的hoi,并揭示了KS1中DMPs的协调性差异,即低熵和高协调的外周表观基因组,这意味着网络复杂性降低。超图还捕获表观基因组的跨关系,并确定逃避标准分析的生物学相关途径。这些发现为罕见病表观基因组组织分析提供了合适的模型基础,可应用于大数据机制研究。
{"title":"Higher order interaction analysis quantifies coordination in the epigenome revealing novel biological relationships in Kabuki syndrome.","authors":"Sara Cuvertino, Terence Garner, Evgenii Martirosian, Bridgious Walusimbi, Susan J Kimber, Siddharth Banka, Adam Stevens","doi":"10.1093/bib/bbae667","DOIUrl":"10.1093/bib/bbae667","url":null,"abstract":"<p><p>Complex direct and indirect relationships between multiple variables, termed higher order interactions (HOIs), are characteristics of all natural systems. Traditional differential and network analyses fail to account for the omic datasets richness and miss HOIs. We investigated peripheral blood DNA methylation data from Kabuki syndrome type 1 (KS1) and control individuals, identified 2,002 differentially methylated points (DMPs), and inferred 17 differentially methylated regions, which represent only 189 DMPs. We applied hypergraph models to measure HOIs on all the CpGs and revealed differences in the coordination of DMPs with lower entropy and higher coordination of the peripheral epigenome in KS1 implying reduced network complexity. Hypergraphs also capture epigenomic trans-relationships, and identify biologically relevant pathways that escape the standard analyses. These findings construct the basis of a suitable model for the analysis of organization in the epigenome in rare diseases, which can be applied to investigate mechanism in big data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1