{"title":"Why Did Treg and Immune Tolerance Win Nobel Prize This Year?","authors":"Song Guo Zheng 郑颂国","doi":"10.1093/gpbjnl/qzaf124","DOIUrl":"10.1093/gpbjnl/qzaf124","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12974991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Wang 王军, Xin Hou 侯鑫, Xiaowen Chen 陈晓雯, Roland Nathan Mandal, Nusrat Hasan Kanika, Chunhong Yuan 袁春红, Yongju Luo 罗永巨, Chenghui Wang 王成辉
Hybridization is a prominent and influential phenomenon with significant implications for adaptive evolution, species distribution, and biodiversity. However, the intricacies of how hybridization influences genomic structure and facilitates adaptive evolution remain poorly understood. By analyzing whole-genome data from seven populations within the Eriocheir genus across diverse geographic regions, we validated a complex hybridization history between Chinese and Japanese mitten crabs. This hybridization gave rise to two distinct ecological species: Hepu and Russian mitten crabs with unique genomic architectures and adaptations. Genes related to reproduction, development, and temperature adaptation exhibited divergent selection signals, potentially contributing to their phenotypic diversity and ecological niches. Meanwhile, genes associated with reproduction, namely Birc6, Bap31, and Poxn, displayed robust evidence of selective sweeps in Hepu mitten crab. Notably, the favored alleles for these genes originated from the parental lineages during the hybridization process. Furthermore, Hepu mitten crab is a homoploid hybrid species that originated from an ancient hybridization event, resolving its longstanding taxonomic controversy. Our study sheds light on the evolutionary history of mitten crabs and highlights the crucial role of hybridization in driving adaptation, range expansion, and diversification within the Eriocheir genus.
{"title":"Genomic Insights into Hybridization and Speciation of Mitten Crabs in the Eriocheir Genus.","authors":"Jun Wang 王军, Xin Hou 侯鑫, Xiaowen Chen 陈晓雯, Roland Nathan Mandal, Nusrat Hasan Kanika, Chunhong Yuan 袁春红, Yongju Luo 罗永巨, Chenghui Wang 王成辉","doi":"10.1093/gpbjnl/qzaf079","DOIUrl":"10.1093/gpbjnl/qzaf079","url":null,"abstract":"<p><p>Hybridization is a prominent and influential phenomenon with significant implications for adaptive evolution, species distribution, and biodiversity. However, the intricacies of how hybridization influences genomic structure and facilitates adaptive evolution remain poorly understood. By analyzing whole-genome data from seven populations within the Eriocheir genus across diverse geographic regions, we validated a complex hybridization history between Chinese and Japanese mitten crabs. This hybridization gave rise to two distinct ecological species: Hepu and Russian mitten crabs with unique genomic architectures and adaptations. Genes related to reproduction, development, and temperature adaptation exhibited divergent selection signals, potentially contributing to their phenotypic diversity and ecological niches. Meanwhile, genes associated with reproduction, namely Birc6, Bap31, and Poxn, displayed robust evidence of selective sweeps in Hepu mitten crab. Notably, the favored alleles for these genes originated from the parental lineages during the hybridization process. Furthermore, Hepu mitten crab is a homoploid hybrid species that originated from an ancient hybridization event, resolving its longstanding taxonomic controversy. Our study sheds light on the evolutionary history of mitten crabs and highlights the crucial role of hybridization in driving adaptation, range expansion, and diversification within the Eriocheir genus.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12996911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The success of protein structure prediction by the deep learning method AlphaFold 2 naturally raises the question whether similar success can be achieved for RNA structure prediction. One reason for the success in protein structure prediction is that the structural space of proteins, at the fragment to domain levels, has been nearly complete for many years. Here, we examined the completeness of RNA fragment structural space at the di-, tri-, tetra-, and penta-nucleotide levels. We show that the number of non-redundant structural fragments at the tetra- and penta-nucleotide levels is in the midst of exponential increase, suggesting that the structural space currently observed in RNA is far from complete. Thus, more concerted efforts are clearly needed to improve the speed and methods of experimental determination of RNA structures to go beyond the limited structural space observed in RNAs. Moreover, the reference frame made of three sugar-ring atoms near the base side (O4', C1', and C2') exhibits the least structural diversity among existing RNA structures, suggesting it as the most stable platform for building other parts of RNA structures.
{"title":"On the Completeness of Existing RNA Fragment Structures.","authors":"Xu Hong, Jian Zhan, Yaoqi Zhou","doi":"10.1093/gpbjnl/qzaf127","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf127","url":null,"abstract":"<p><p>The success of protein structure prediction by the deep learning method AlphaFold 2 naturally raises the question whether similar success can be achieved for RNA structure prediction. One reason for the success in protein structure prediction is that the structural space of proteins, at the fragment to domain levels, has been nearly complete for many years. Here, we examined the completeness of RNA fragment structural space at the di-, tri-, tetra-, and penta-nucleotide levels. We show that the number of non-redundant structural fragments at the tetra- and penta-nucleotide levels is in the midst of exponential increase, suggesting that the structural space currently observed in RNA is far from complete. Thus, more concerted efforts are clearly needed to improve the speed and methods of experimental determination of RNA structures to go beyond the limited structural space observed in RNAs. Moreover, the reference frame made of three sugar-ring atoms near the base side (O4', C1', and C2') exhibits the least structural diversity among existing RNA structures, suggesting it as the most stable platform for building other parts of RNA structures.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145784010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schistosoma japonicum (S. japonicum) is the causative agent of human schistosomiasis in Asia. Identification of differentially expressed proteins (DEPs) between males and females could elucidate critical signaling pathways underlying sexual maturation and egg production. In the study, quantitative proteome and phosphoproteome profiles were obtained for adult males and females of S. japonicum. In total, we identified 2710 unique proteins, including 2055 proteins and 924 phosphorylated proteins, and 252 (∼ 12.5%) non-phosphorylated and 209 (11.7%) phosphorylated DEPs between males and females. Combined with RNA sequencing, 22 non-phosphorylated DEPs exhibited corresponding mRNA-level changes. Meanwhile, several non-phosphorylated DEPs were shown to function in sex-biased biological processes, including vitellocyte development, oviposition, and parasite mobility by RNA interference. Furthermore, we annotated 96 kinases for S. japonicum, of which CMGC/MAPK and Atypical/RIO kinases are significantly activated in males, while CAMK/CAMKL, AGC/DMPK, and STE/STE7 kinases are activated in females. Finally, the potential drugs targeting these kinases were determined in silico, resulting in 28 kinases as potentially targetable by 30 FDA-approved drugs. Overall, our study provided a collection of evidence-based proteomic and phosphoproteomic resources of S. japonicum and identified sex-biased proteins, phosphopeptides, and kinases, which could serve as potentially effective targets for developing novel interventions against schistosomiasis.
{"title":"Quantitative Proteomics and Phosphoproteomics Analyses Identify Sex-biased Protein Ontologies of Schistosoma Japonicum.","authors":"Chuantao Fang, Bikash R Giri, Guofeng Cheng","doi":"10.1093/gpbjnl/qzaf126","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf126","url":null,"abstract":"<p><p>Schistosoma japonicum (S. japonicum) is the causative agent of human schistosomiasis in Asia. Identification of differentially expressed proteins (DEPs) between males and females could elucidate critical signaling pathways underlying sexual maturation and egg production. In the study, quantitative proteome and phosphoproteome profiles were obtained for adult males and females of S. japonicum. In total, we identified 2710 unique proteins, including 2055 proteins and 924 phosphorylated proteins, and 252 (∼ 12.5%) non-phosphorylated and 209 (11.7%) phosphorylated DEPs between males and females. Combined with RNA sequencing, 22 non-phosphorylated DEPs exhibited corresponding mRNA-level changes. Meanwhile, several non-phosphorylated DEPs were shown to function in sex-biased biological processes, including vitellocyte development, oviposition, and parasite mobility by RNA interference. Furthermore, we annotated 96 kinases for S. japonicum, of which CMGC/MAPK and Atypical/RIO kinases are significantly activated in males, while CAMK/CAMKL, AGC/DMPK, and STE/STE7 kinases are activated in females. Finally, the potential drugs targeting these kinases were determined in silico, resulting in 28 kinases as potentially targetable by 30 FDA-approved drugs. Overall, our study provided a collection of evidence-based proteomic and phosphoproteomic resources of S. japonicum and identified sex-biased proteins, phosphopeptides, and kinases, which could serve as potentially effective targets for developing novel interventions against schistosomiasis.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145759068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting Li, Fan Mo, Jianhuan Qi, Chunqiong Li, Xiangshang Li, Jie Zhang, Yingfei Lu, Chao Yao, Li Zhang, Baoyang Hu, Chuan-Yun Li, Ni A An
Although new genes and regulatory events have been linked to the uniqueness of human brain development, it is unknown whether alternative polyadenylation (APA) could also be involved in shaping this key feature that differentiates humans from other species. Here, we present an atlas of APAs of the human brain and identified 161 development-related, open-reading-frame-disrupting APAs associated with the dynamic translation of protein products. Among the genes impacted by these events we identified ZNF271P, encoding a human-specific protein when using the distal polyadenylation site, which preferentially occurs during early brain development. The cortical organoids grown from ZNF271P-knockout human embryonic stem cells seemed to exhibit accelerated development and maturation, resulting in a significant decrease in organoid size, implicating ZNF271P in features unique to human brain development. We thus highlight APAs as new regulators in shaping the unique aspects of human brain development.
{"title":"A Human-specific Protein Regulated by Alternative Polyadenylation Shapes Uniqueness of Human Brain Development.","authors":"Ting Li, Fan Mo, Jianhuan Qi, Chunqiong Li, Xiangshang Li, Jie Zhang, Yingfei Lu, Chao Yao, Li Zhang, Baoyang Hu, Chuan-Yun Li, Ni A An","doi":"10.1093/gpbjnl/qzaf125","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf125","url":null,"abstract":"<p><p>Although new genes and regulatory events have been linked to the uniqueness of human brain development, it is unknown whether alternative polyadenylation (APA) could also be involved in shaping this key feature that differentiates humans from other species. Here, we present an atlas of APAs of the human brain and identified 161 development-related, open-reading-frame-disrupting APAs associated with the dynamic translation of protein products. Among the genes impacted by these events we identified ZNF271P, encoding a human-specific protein when using the distal polyadenylation site, which preferentially occurs during early brain development. The cortical organoids grown from ZNF271P-knockout human embryonic stem cells seemed to exhibit accelerated development and maturation, resulting in a significant decrease in organoid size, implicating ZNF271P in features unique to human brain development. We thus highlight APAs as new regulators in shaping the unique aspects of human brain development.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145752472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite extensive evidence has underscored the critical role of alternative splicing in generating mature circular RNA (circRNA) isoforms and augmenting their function diversity, a significant gap remains in the availability of specialized databases housing circRNA alternative splicing events. To bridge this gap, we develop circASbase, a pioneering and comprehensive database that catalogues 452,129 alternative splicing events in 884,047 full-length circRNAs from 581 samples across 13 species, and provides rich annotations to facilitate understanding the splicing regulation of circRNA. Our findings reveal substantial differences between circRNAs and linear transcripts regarding the distribution and occurrence of alternative splicing events, highlighting the unique regulatory landscape of circRNAs. These special splicing events result in functional differences of circRNAs by affecting IRES sites, m6A sites, ORFs, protein features, miRNA targets, and more. In summary, circASbase not only covers the urgent need of the research community for data repositories, but also represents a significant advancement in our understanding of circRNA biology. With its user-friendly interfaces and web-based visualization tools, circASbase is poised to become an indispensable resource for researchers exploring the regulatory mechanisms and functional roles of alternative splicing events in circRNAs. This database will continuously drive new insights and discoveries in the field, setting the stage for further advancements in circRNA research. circASbase is freely available at http://reprod.njmu.edu.cn/cgi-bin/circASbase/.
{"title":"circASbase:A Comprehensive Database of Alternative Splicing Events in circRNAs.","authors":"Lingxiao Zou, Jian Zhao, Haojie Li, Chen Xu, Yulan Wang, Xuejiang Guo, Xiaofeng Song","doi":"10.1093/gpbjnl/qzaf121","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf121","url":null,"abstract":"<p><p>Despite extensive evidence has underscored the critical role of alternative splicing in generating mature circular RNA (circRNA) isoforms and augmenting their function diversity, a significant gap remains in the availability of specialized databases housing circRNA alternative splicing events. To bridge this gap, we develop circASbase, a pioneering and comprehensive database that catalogues 452,129 alternative splicing events in 884,047 full-length circRNAs from 581 samples across 13 species, and provides rich annotations to facilitate understanding the splicing regulation of circRNA. Our findings reveal substantial differences between circRNAs and linear transcripts regarding the distribution and occurrence of alternative splicing events, highlighting the unique regulatory landscape of circRNAs. These special splicing events result in functional differences of circRNAs by affecting IRES sites, m6A sites, ORFs, protein features, miRNA targets, and more. In summary, circASbase not only covers the urgent need of the research community for data repositories, but also represents a significant advancement in our understanding of circRNA biology. With its user-friendly interfaces and web-based visualization tools, circASbase is poised to become an indispensable resource for researchers exploring the regulatory mechanisms and functional roles of alternative splicing events in circRNAs. This database will continuously drive new insights and discoveries in the field, setting the stage for further advancements in circRNA research. circASbase is freely available at http://reprod.njmu.edu.cn/cgi-bin/circASbase/.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein-nucleic acid interactions (PNIs) are essential for biological processes, including gene regulation, DNA repair, and viral infection. The changes in binding regions and modes in PNIs are vital for understanding the action mechanism and detecting abnormalities. Computational methods, particularly those leveraging machine learning (ML) and deep learning (DL), have become powerful tools for predicting PNI binding sites and structural features. However, systematic evaluation is needed to ensure the reliability and promote innovation in these bioinformatics resources. Here, we present a comprehensive toolbox for PNIs. It includes curated databases detailing interaction types, sources, cross-domain interactions, and potential applications. Then, we investigated a toolbox that leveraged ML and DL-based algorithms to predict binding sites and conformational dynamics, with the aim of uncovering the molecular mechanisms underlying PNIs. Additionally, we discussed the potential applications of drug research related to PNIs. This study introduces a suite of advanced predictive tools that utilize computational modeling to enhance the design of nucleic acid therapeutics forward. We've streamlined these tools into a user-friendly online platform accessible for academic use at http://rv.agroda.cn/pni_portal.
{"title":"Bioinformatics Portal for Predicting Binding Regions and Modes in Protein-Nucleic Acid Interactions.","authors":"Xiao Zhang, Wenbo Guo, Jiaxin Liu, Juan Huang, Yangyang Gao, Vinit Kumar, Gefei Hao","doi":"10.1093/gpbjnl/qzaf114","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf114","url":null,"abstract":"<p><p>Protein-nucleic acid interactions (PNIs) are essential for biological processes, including gene regulation, DNA repair, and viral infection. The changes in binding regions and modes in PNIs are vital for understanding the action mechanism and detecting abnormalities. Computational methods, particularly those leveraging machine learning (ML) and deep learning (DL), have become powerful tools for predicting PNI binding sites and structural features. However, systematic evaluation is needed to ensure the reliability and promote innovation in these bioinformatics resources. Here, we present a comprehensive toolbox for PNIs. It includes curated databases detailing interaction types, sources, cross-domain interactions, and potential applications. Then, we investigated a toolbox that leveraged ML and DL-based algorithms to predict binding sites and conformational dynamics, with the aim of uncovering the molecular mechanisms underlying PNIs. Additionally, we discussed the potential applications of drug research related to PNIs. This study introduces a suite of advanced predictive tools that utilize computational modeling to enhance the design of nucleic acid therapeutics forward. We've streamlined these tools into a user-friendly online platform accessible for academic use at http://rv.agroda.cn/pni_portal.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zebin Wen, Yulong Zhang, Guanchuan Lin, Xu Li, Changtai Xiao, Siwen Xu, Jiahong Wang, Shuyi Cao, Yuting Chen, Hui Liu, Xingguang Luo, Yan Chen, Paul K H Tam, Xinghua Pan
Cell states within cancer have garnered significant attention, yet the mechanisms through which malignant cells assert dominance in pan-cancer commonalities remain elusive. In this study, we employed label-free multiplexed single-cell RNA sequencing (scRNA-seq) to analyze cell states in 159,372 cells across 245 cell lines spanning 14 tissue types, integrating both public and proprietary datasets. We identified 21 meta-programs (MPs) representing shared characteristics across pan-cancer landscapes, encompassing 16 biological processes. Subsequently, we developed a deep learning model StateNet to generate cell-state fingerprints for delineating the individuality of each cell line based on these MPs. Leveraging StateNet, we pinpointed ACAT2 as a potential mediator bridging hypoxia and the lipid metabolism pathway, and we also showcased that epithelial-mesenchymal transition programs are vital for classifying cell lines through perturbation experiments. StateNet not only elucidates the overarching manifold structure of scRNA-seq data but also furnishes cell-state fingerprints of cell clusters, unveiling prognosis-related programs and distinguishing between patients with varying survival outcomes. Utilizing these prognosis-related programs on 3210 cancer samples, we constructed Cox models and identified risk-associated programs and genes responsible for different cancer types. StateNet thus emerges as a novel and efficient tool for cancer profiling, unraveling the shared commonalities and distinct individualities of pan-cancer cells across expansive datasets.
{"title":"Profiling Cell-state Fingerprints Based on Deep Learning Model with Meta-programs of Pan-cancer.","authors":"Zebin Wen, Yulong Zhang, Guanchuan Lin, Xu Li, Changtai Xiao, Siwen Xu, Jiahong Wang, Shuyi Cao, Yuting Chen, Hui Liu, Xingguang Luo, Yan Chen, Paul K H Tam, Xinghua Pan","doi":"10.1093/gpbjnl/qzaf123","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf123","url":null,"abstract":"<p><p>Cell states within cancer have garnered significant attention, yet the mechanisms through which malignant cells assert dominance in pan-cancer commonalities remain elusive. In this study, we employed label-free multiplexed single-cell RNA sequencing (scRNA-seq) to analyze cell states in 159,372 cells across 245 cell lines spanning 14 tissue types, integrating both public and proprietary datasets. We identified 21 meta-programs (MPs) representing shared characteristics across pan-cancer landscapes, encompassing 16 biological processes. Subsequently, we developed a deep learning model StateNet to generate cell-state fingerprints for delineating the individuality of each cell line based on these MPs. Leveraging StateNet, we pinpointed ACAT2 as a potential mediator bridging hypoxia and the lipid metabolism pathway, and we also showcased that epithelial-mesenchymal transition programs are vital for classifying cell lines through perturbation experiments. StateNet not only elucidates the overarching manifold structure of scRNA-seq data but also furnishes cell-state fingerprints of cell clusters, unveiling prognosis-related programs and distinguishing between patients with varying survival outcomes. Utilizing these prognosis-related programs on 3210 cancer samples, we constructed Cox models and identified risk-associated programs and genes responsible for different cancer types. StateNet thus emerges as a novel and efficient tool for cancer profiling, unraveling the shared commonalities and distinct individualities of pan-cancer cells across expansive datasets.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A small peptide encoded by a non-coding RNA (ncRNA), known as a non-coding peptide (ncPEP), is emerging as a critical regulator and biomarker in cancer, holding immense promise for immunotherapy. However, systematic identification of ncPEPs is hampered by computational methods that typically analyze peptides based on sequence alone. This approach overlooks the fundamental biological principle that multiple distinct peptides can be translated from a single ncRNA transcript, thus sharing a common transcriptional origin. Here, we address this limitation by developing HGCPep, a deep learning framework that leverages hypergraphs to model these intrinsic relationships. In our model, each ncRNA is represented as a hyperedge connecting the cohort of peptides it encodes, thereby enriching peptide feature representations with transcriptional context. We demonstrate that HGCPep, which integrates a hypergraph neural network with a convolutional neural network, outperforms state-of-the-art methods in identifying cancer-associated ncPEPs. Furthermore, dimensionality reduction of the learned embeddings reveals distinct clustering of ncPEPs by cancer type, illustrating how the model effectively deciphers complex biological associations. Our work introduces a new method for ncPEPs analytics and provides a powerful tool for discovering novel therapeutic targets in oncology. The dataset and source code of our proposed method can be found via https://github.com/Longwt123/HGCPep_Github.
{"title":"HGCPep: Hypergraph Deep Learning Identifies Cancer-associated Non-coding Peptides.","authors":"Wentao Long, Zhongshen Li, Junru Jin, Jianbo Qiao, Yu Wang, Leyi Wei","doi":"10.1093/gpbjnl/qzaf093","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf093","url":null,"abstract":"<p><p>A small peptide encoded by a non-coding RNA (ncRNA), known as a non-coding peptide (ncPEP), is emerging as a critical regulator and biomarker in cancer, holding immense promise for immunotherapy. However, systematic identification of ncPEPs is hampered by computational methods that typically analyze peptides based on sequence alone. This approach overlooks the fundamental biological principle that multiple distinct peptides can be translated from a single ncRNA transcript, thus sharing a common transcriptional origin. Here, we address this limitation by developing HGCPep, a deep learning framework that leverages hypergraphs to model these intrinsic relationships. In our model, each ncRNA is represented as a hyperedge connecting the cohort of peptides it encodes, thereby enriching peptide feature representations with transcriptional context. We demonstrate that HGCPep, which integrates a hypergraph neural network with a convolutional neural network, outperforms state-of-the-art methods in identifying cancer-associated ncPEPs. Furthermore, dimensionality reduction of the learned embeddings reveals distinct clustering of ncPEPs by cancer type, illustrating how the model effectively deciphers complex biological associations. Our work introduces a new method for ncPEPs analytics and provides a powerful tool for discovering novel therapeutic targets in oncology. The dataset and source code of our proposed method can be found via https://github.com/Longwt123/HGCPep_Github.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel K Putnam, Alexander M Gout, Delaram Rahbarinia, Meiling Jin, David Finkelstein, Xiaotu Ma, Jinghui Zhang, David A Wheeler, Larissa V Furtado, Xiang Chen
Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92%-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID.
{"title":"CanID: A Robust and Accurate RNA-seq Expression-based Diagnostic Classification Scheme for Pediatric Malignancies.","authors":"Daniel K Putnam, Alexander M Gout, Delaram Rahbarinia, Meiling Jin, David Finkelstein, Xiaotu Ma, Jinghui Zhang, David A Wheeler, Larissa V Furtado, Xiang Chen","doi":"10.1093/gpbjnl/qzaf122","DOIUrl":"10.1093/gpbjnl/qzaf122","url":null,"abstract":"<p><p>Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92%-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}