Yifan Guo, Haoyu Wen, Zongwei Chen, Mengxia Jiao, Yuchen Zhang, Di Ge, Ronghua Liu, Jie Gu
Cancerous genetic mutations result in a complex and comprehensive post-translational modification (PTM) dynamics, in which protein succinylation is well known for its ability to reprogram cell metabolism and is involved in the malignant evolution. Little is known about the regulatory interactions between succinylation and other PTMs in the PTM network. Here, we developed a conjoint analysis and systematic clustering method to explore the intermodification communications between succinylome and phosphorylome from eight lung cancer patients. We found that the intermodification coorperation in both parallel and series. Besides directly participating in metabolism pathways, some phosphosites out of mitochondria were identified as an upstream regulatory modification directing succinylome dynamics in cancer metabolism reprogramming. Phosphorylated activation of histone deacetylase (HDAC) in lung cancer resulted in the removal of acetylation and favored the occurrence of succinylation modification of mitochondrial proteins. These results suggest a tandem regulation between succinylation and phosphorylation in the PTM network and provide HDAC-related targets for intervening mitochondrial succinylation and cancer metabolism reprogramming.
{"title":"Conjoint analysis of succinylome and phosphorylome reveals imbalanced HDAC phosphorylation-driven succinylayion dynamic contibutes to lung cancer.","authors":"Yifan Guo, Haoyu Wen, Zongwei Chen, Mengxia Jiao, Yuchen Zhang, Di Ge, Ronghua Liu, Jie Gu","doi":"10.1093/bib/bbae415","DOIUrl":"10.1093/bib/bbae415","url":null,"abstract":"<p><p>Cancerous genetic mutations result in a complex and comprehensive post-translational modification (PTM) dynamics, in which protein succinylation is well known for its ability to reprogram cell metabolism and is involved in the malignant evolution. Little is known about the regulatory interactions between succinylation and other PTMs in the PTM network. Here, we developed a conjoint analysis and systematic clustering method to explore the intermodification communications between succinylome and phosphorylome from eight lung cancer patients. We found that the intermodification coorperation in both parallel and series. Besides directly participating in metabolism pathways, some phosphosites out of mitochondria were identified as an upstream regulatory modification directing succinylome dynamics in cancer metabolism reprogramming. Phosphorylated activation of histone deacetylase (HDAC) in lung cancer resulted in the removal of acetylation and favored the occurrence of succinylation modification of mitochondrial proteins. These results suggest a tandem regulation between succinylation and phosphorylation in the PTM network and provide HDAC-related targets for intervening mitochondrial succinylation and cancer metabolism reprogramming.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11343571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142046353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zichang Xu, Hendra S Ismanto, Dianita S Saputri, Soichiro Haruna, Guanqun Sun, Jan Wilamowski, Shunsuke Teraguchi, Ayan Sengupta, Songling Li, Daron M Standley
Liquid biopsies based on peripheral blood offer a minimally invasive alternative to solid tissue biopsies for the detection of diseases, primarily cancers. However, such tests currently consider only the serum component of blood, overlooking a potentially rich source of biomarkers: adaptive immune receptors (AIRs) expressed on circulating B and T cells. Machine learning-based classifiers trained on AIRs have been reported to accurately identify not only cancers but also autoimmune and infectious diseases as well. However, when using the conventional "clonotype cluster" representation of AIRs, individuals within a disease or healthy cohort exhibit vastly different features, limiting the generalizability of these classifiers. This study aimed to address the challenge of classifying specific diseases from circulating B or T cells by developing a novel representation of AIRs based on similarity networks constructed from their antigen-binding regions (paratopes). Features based on this novel representation, paratope cluster occupancies (PCOs), significantly improved disease classification performance for infectious disease, autoimmune disease, and cancer. Under identical methodological conditions, classifiers trained on PCOs achieved a mean AUC of 0.893 when applied to new individuals, outperforming clonotype cluster-based classifiers (AUC 0.714) and the best-performing published classifier (AUC 0.777). Surprisingly, for cancer patients, we observed that "healthy-biased" AIRs were predicted to target known cancer-associated antigens at dramatically higher rates than healthy AIRs as a whole (Z scores >75), suggesting an overlooked reservoir of cancer-targeting immune cells that could be identified by PCOs.
基于外周血的液体活检为检测疾病(主要是癌症)提供了实体组织活检的微创替代方法。然而,这类检测目前只考虑血液中的血清成分,忽略了潜在的丰富生物标志物来源:在循环 B 细胞和 T 细胞上表达的适应性免疫受体(AIRs)。据报道,基于 AIRs 训练的机器学习分类器不仅能准确识别癌症,还能识别自身免疫性疾病和传染性疾病。然而,当使用传统的 AIRs "克隆型集群 "表示法时,疾病或健康队列中的个体会表现出截然不同的特征,从而限制了这些分类器的普适性。本研究旨在通过开发一种基于抗原结合区(旁位点)构建的相似性网络的新型 AIR 表示方法,解决从循环 B 细胞或 T 细胞中对特定疾病进行分类的难题。基于这种新表征的特征--旁位群占位(PCOs)--显著提高了传染病、自身免疫性疾病和癌症的疾病分类性能。在相同的方法条件下,基于 PCOs 训练的分类器应用于新个体时,平均 AUC 为 0.893,优于基于克隆型聚类的分类器(AUC 0.714)和已发表的最佳分类器(AUC 0.777)。令人惊讶的是,对于癌症患者,我们观察到 "健康偏倚 "AIRs靶向已知癌症相关抗原的预测率大大高于健康AIRs整体(Z分数大于75),这表明PCOs可以识别出一个被忽视的癌症靶向免疫细胞库。
{"title":"Robust detection of infectious disease, autoimmunity, and cancer from the paratope networks of adaptive immune receptors.","authors":"Zichang Xu, Hendra S Ismanto, Dianita S Saputri, Soichiro Haruna, Guanqun Sun, Jan Wilamowski, Shunsuke Teraguchi, Ayan Sengupta, Songling Li, Daron M Standley","doi":"10.1093/bib/bbae431","DOIUrl":"10.1093/bib/bbae431","url":null,"abstract":"<p><p>Liquid biopsies based on peripheral blood offer a minimally invasive alternative to solid tissue biopsies for the detection of diseases, primarily cancers. However, such tests currently consider only the serum component of blood, overlooking a potentially rich source of biomarkers: adaptive immune receptors (AIRs) expressed on circulating B and T cells. Machine learning-based classifiers trained on AIRs have been reported to accurately identify not only cancers but also autoimmune and infectious diseases as well. However, when using the conventional \"clonotype cluster\" representation of AIRs, individuals within a disease or healthy cohort exhibit vastly different features, limiting the generalizability of these classifiers. This study aimed to address the challenge of classifying specific diseases from circulating B or T cells by developing a novel representation of AIRs based on similarity networks constructed from their antigen-binding regions (paratopes). Features based on this novel representation, paratope cluster occupancies (PCOs), significantly improved disease classification performance for infectious disease, autoimmune disease, and cancer. Under identical methodological conditions, classifiers trained on PCOs achieved a mean AUC of 0.893 when applied to new individuals, outperforming clonotype cluster-based classifiers (AUC 0.714) and the best-performing published classifier (AUC 0.777). Surprisingly, for cancer patients, we observed that \"healthy-biased\" AIRs were predicted to target known cancer-associated antigens at dramatically higher rates than healthy AIRs as a whole (Z scores >75), suggesting an overlooked reservoir of cancer-targeting immune cells that could be identified by PCOs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deoxyribonucleic acid (DNA) methylation plays a key role in gene regulation and is critical for development and human disease. Techniques such as whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) allow DNA methylation analysis at the genome scale, with Illumina NovaSeq 6000 and MGI Tech DNBSEQ-T7 being popular due to their efficiency and affordability. However, detailed comparative studies of their performance are not available. In this study, we constructed 60 WGBS and RRBS libraries for two platforms using different types of clinical samples and generated approximately 2.8 terabases of sequencing data. We systematically compared quality control metrics, genomic coverage, CpG methylation levels, intra- and interplatform correlations, and performance in detecting differentially methylated positions. Our results revealed that the DNBSEQ platform exhibited better raw read quality, although base quality recalibration indicated potential overestimation of base quality. The DNBSEQ platform also showed lower sequencing depth and less coverage uniformity in GC-rich regions than did the NovaSeq platform and tended to enrich methylated regions. Overall, both platforms demonstrated robust intra- and interplatform reproducibility for RRBS and WGBS, with NovaSeq performing better for WGBS, highlighting the importance of considering these factors when selecting a platform for bisulfite sequencing.
{"title":"Beyond the base pairs: comparative genome-wide DNA methylation profiling across sequencing technologies.","authors":"Xin Liu,Yu Pang,Junqi Shan,Yunfei Wang,Yanhua Zheng,Yuhang Xue,Xuerong Zhou,Wenjun Wang,Yanlai Sun,Xiaojing Yan,Jiantao Shi,Xiaoxue Wang,Hongcang Gu,Fan Zhang","doi":"10.1093/bib/bbae440","DOIUrl":"https://doi.org/10.1093/bib/bbae440","url":null,"abstract":"Deoxyribonucleic acid (DNA) methylation plays a key role in gene regulation and is critical for development and human disease. Techniques such as whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) allow DNA methylation analysis at the genome scale, with Illumina NovaSeq 6000 and MGI Tech DNBSEQ-T7 being popular due to their efficiency and affordability. However, detailed comparative studies of their performance are not available. In this study, we constructed 60 WGBS and RRBS libraries for two platforms using different types of clinical samples and generated approximately 2.8 terabases of sequencing data. We systematically compared quality control metrics, genomic coverage, CpG methylation levels, intra- and interplatform correlations, and performance in detecting differentially methylated positions. Our results revealed that the DNBSEQ platform exhibited better raw read quality, although base quality recalibration indicated potential overestimation of base quality. The DNBSEQ platform also showed lower sequencing depth and less coverage uniformity in GC-rich regions than did the NovaSeq platform and tended to enrich methylated regions. Overall, both platforms demonstrated robust intra- and interplatform reproducibility for RRBS and WGBS, with NovaSeq performing better for WGBS, highlighting the importance of considering these factors when selecting a platform for bisulfite sequencing.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linlin Hou, Hongxin Xiang, Xiangxiang Zeng, Dongsheng Cao, Li Zeng, Bosheng Song
The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.
{"title":"Attribute-guided prototype network for few-shot molecular property prediction.","authors":"Linlin Hou, Hongxin Xiang, Xiangxiang Zeng, Dongsheng Cao, Li Zeng, Bosheng Song","doi":"10.1093/bib/bbae394","DOIUrl":"10.1093/bib/bbae394","url":null,"abstract":"<p><p>The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11318080/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141916158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in spatial transcriptomics (ST) enable measurements of transcriptome within intact biological tissues by preserving spatial information, offering biologists unprecedented opportunities to comprehensively understand tissue micro-environment, where spatial domains are basic units of tissues. Although great efforts are devoted to this issue, they still have many shortcomings, such as ignoring local information and relations of spatial domains, requiring alternatives to solve these problems. Here, a novel algorithm for spatial domain identification in Spatial Transcriptomics data with Structure Correlation and Self-Representation (ST-SCSR), which integrates local information, global information, and similarity of spatial domains. Specifically, ST-SCSR utilzes matrix tri-factorization to simultaneously decompose expression profiles and spatial network of spots, where expressional and spatial features of spots are fused via the shared factor matrix that interpreted as similarity of spatial domains. Furthermore, ST-SCSR learns affinity graph of spots by manipulating expressional and spatial features, where local preservation and sparse constraints are employed, thereby enhancing the quality of graph. The experimental results demonstrate that ST-SCSR not only outperforms state-of-the-art algorithms in terms of accuracy, but also identifies many potential interesting patterns.
{"title":"ST-SCSR: identifying spatial domains in spatial transcriptomics data via structure correlation and self-representation.","authors":"Min Zhang, Wensheng Zhang, Xiaoke Ma","doi":"10.1093/bib/bbae437","DOIUrl":"10.1093/bib/bbae437","url":null,"abstract":"<p><p>Recent advances in spatial transcriptomics (ST) enable measurements of transcriptome within intact biological tissues by preserving spatial information, offering biologists unprecedented opportunities to comprehensively understand tissue micro-environment, where spatial domains are basic units of tissues. Although great efforts are devoted to this issue, they still have many shortcomings, such as ignoring local information and relations of spatial domains, requiring alternatives to solve these problems. Here, a novel algorithm for spatial domain identification in Spatial Transcriptomics data with Structure Correlation and Self-Representation (ST-SCSR), which integrates local information, global information, and similarity of spatial domains. Specifically, ST-SCSR utilzes matrix tri-factorization to simultaneously decompose expression profiles and spatial network of spots, where expressional and spatial features of spots are fused via the shared factor matrix that interpreted as similarity of spatial domains. Furthermore, ST-SCSR learns affinity graph of spots by manipulating expressional and spatial features, where local preservation and sparse constraints are employed, thereby enhancing the quality of graph. The experimental results demonstrate that ST-SCSR not only outperforms state-of-the-art algorithms in terms of accuracy, but also identifies many potential interesting patterns.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11372132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Namhee Kim, Jonghoon Lee, Jongwan Kim, Yunseong Kim, Kwang-Hyun Cho
The tendency for cell fate to be robust to most perturbations, yet sensitive to certain perturbations raises intriguing questions about the existence of a key path within the underlying molecular network that critically determines distinct cell fates. Reprogramming and trans-differentiation clearly show examples of cell fate change by regulating only a few or even a single molecular switch. However, it is still unknown how to identify such a switch, called a master regulator, and how cell fate is determined by its regulation. Here, we present CAESAR, a computational framework that can systematically identify master regulators and unravel the resulting canalizing kernel, a key substructure of interconnected feedbacks that is critical for cell fate determination. We demonstrate that CAESAR can successfully predict reprogramming factors for de-differentiation into mouse embryonic stem cells and trans-differentiation of hematopoietic stem cells, while unveiling the underlying essential mechanism through the canalizing kernel. CAESAR provides a system-level understanding of how complex molecular networks determine cell fates.
{"title":"Canalizing kernel for cell fate determination.","authors":"Namhee Kim, Jonghoon Lee, Jongwan Kim, Yunseong Kim, Kwang-Hyun Cho","doi":"10.1093/bib/bbae406","DOIUrl":"10.1093/bib/bbae406","url":null,"abstract":"<p><p>The tendency for cell fate to be robust to most perturbations, yet sensitive to certain perturbations raises intriguing questions about the existence of a key path within the underlying molecular network that critically determines distinct cell fates. Reprogramming and trans-differentiation clearly show examples of cell fate change by regulating only a few or even a single molecular switch. However, it is still unknown how to identify such a switch, called a master regulator, and how cell fate is determined by its regulation. Here, we present CAESAR, a computational framework that can systematically identify master regulators and unravel the resulting canalizing kernel, a key substructure of interconnected feedbacks that is critical for cell fate determination. We demonstrate that CAESAR can successfully predict reprogramming factors for de-differentiation into mouse embryonic stem cells and trans-differentiation of hematopoietic stem cells, while unveiling the underlying essential mechanism through the canalizing kernel. CAESAR provides a system-level understanding of how complex molecular networks determine cell fates.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
{"title":"Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation.","authors":"Na Yuan,Peilin Jia","doi":"10.1093/bib/bbae441","DOIUrl":"https://doi.org/10.1093/bib/bbae441","url":null,"abstract":"Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renhua Song,Gavin J Sutton,Fuyi Li,Qian Liu,Justin J-L Wong
N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.
{"title":"Variable calling of m6A and associated features in databases: a guide for end-users.","authors":"Renhua Song,Gavin J Sutton,Fuyi Li,Qian Liu,Justin J-L Wong","doi":"10.1093/bib/bbae434","DOIUrl":"https://doi.org/10.1093/bib/bbae434","url":null,"abstract":"N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Lu, Xue Xiao, Qiang Zheng, Xinlei Wang, Lin Xu
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
本文深入评述了利用查询基因组预测转录调控因子(TRs)的计算方法。转录调节因子的鉴定在许多生物学应用中都至关重要,包括但不限于阐明生物发展机制、鉴定关键疾病基因和预测治疗靶点。在过去十年中,基于新一代测序(NGS)数据的各种计算方法相继问世,但尚未对基于 NGS 的方法进行系统评估。我们根据这些方法的共同特点将其分为两类,即基于文库的方法和基于区域的方法。我们进一步开展了基准研究,利用分子实验数据集评估基于 NGS 方法的准确性、灵敏度、覆盖率和可用性。结果表明,BART、ChIP-Atlas 和 Lisa 的性能相对较好。此外,我们还指出了基于 NGS 方法的局限性,并探讨了进一步改进的潜在方向。
{"title":"Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets.","authors":"Zeyu Lu, Xue Xiao, Qiang Zheng, Xinlei Wang, Lin Xu","doi":"10.1093/bib/bbae366","DOIUrl":"10.1093/bib/bbae366","url":null,"abstract":"<p><p>This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141854881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Miao, Zhenyuan Sun, Chen Lin, Haoran Gu, Chenjing Ma, Yingjian Liang, Guohua Wang
Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.
{"title":"DeePhafier: a phage lifestyle classifier using a multilayer self-attention neural network combining protein information.","authors":"Yan Miao, Zhenyuan Sun, Chen Lin, Haoran Gu, Chenjing Ma, Yingjian Liang, Guohua Wang","doi":"10.1093/bib/bbae377","DOIUrl":"10.1093/bib/bbae377","url":null,"abstract":"<p><p>Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}