The success of protein structure prediction by the deep learning method AlphaFold 2 naturally raises the question whether similar success can be achieved for RNA structure prediction. One reason for the success in protein structure prediction is that the structural space of proteins, at the fragment to domain levels, has been nearly complete for many years. Here, we examined the completeness of RNA fragment structural space at the di-, tri-, tetra-, and penta-nucleotide levels. We show that the number of non-redundant structural fragments at the tetra- and penta-nucleotide levels is in the midst of exponential increase, suggesting that the structural space currently observed in RNA is far from complete. Thus, more concerted efforts are clearly needed to improve the speed and methods of experimental determination of RNA structures to go beyond the limited structural space observed in RNAs. Moreover, the reference frame made of three sugar-ring atoms near the base side (O4', C1', and C2') exhibits the least structural diversity among existing RNA structures, suggesting it as the most stable platform for building other parts of RNA structures.
{"title":"On the Completeness of Existing RNA Fragment Structures.","authors":"Xu Hong, Jian Zhan, Yaoqi Zhou","doi":"10.1093/gpbjnl/qzaf127","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf127","url":null,"abstract":"<p><p>The success of protein structure prediction by the deep learning method AlphaFold 2 naturally raises the question whether similar success can be achieved for RNA structure prediction. One reason for the success in protein structure prediction is that the structural space of proteins, at the fragment to domain levels, has been nearly complete for many years. Here, we examined the completeness of RNA fragment structural space at the di-, tri-, tetra-, and penta-nucleotide levels. We show that the number of non-redundant structural fragments at the tetra- and penta-nucleotide levels is in the midst of exponential increase, suggesting that the structural space currently observed in RNA is far from complete. Thus, more concerted efforts are clearly needed to improve the speed and methods of experimental determination of RNA structures to go beyond the limited structural space observed in RNAs. Moreover, the reference frame made of three sugar-ring atoms near the base side (O4', C1', and C2') exhibits the least structural diversity among existing RNA structures, suggesting it as the most stable platform for building other parts of RNA structures.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145784010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schistosoma japonicum (S. japonicum) is the causative agent of human schistosomiasis in Asia. Identification of differentially expressed proteins (DEPs) between males and females could elucidate critical signaling pathways underlying sexual maturation and egg production. In the study, quantitative proteome and phosphoproteome profiles were obtained for adult males and females of S. japonicum. In total, we identified 2710 unique proteins, including 2055 proteins and 924 phosphorylated proteins, and 252 (∼ 12.5%) non-phosphorylated and 209 (11.7%) phosphorylated DEPs between males and females. Combined with RNA sequencing, 22 non-phosphorylated DEPs exhibited corresponding mRNA-level changes. Meanwhile, several non-phosphorylated DEPs were shown to function in sex-biased biological processes, including vitellocyte development, oviposition, and parasite mobility by RNA interference. Furthermore, we annotated 96 kinases for S. japonicum, of which CMGC/MAPK and Atypical/RIO kinases are significantly activated in males, while CAMK/CAMKL, AGC/DMPK, and STE/STE7 kinases are activated in females. Finally, the potential drugs targeting these kinases were determined in silico, resulting in 28 kinases as potentially targetable by 30 FDA-approved drugs. Overall, our study provided a collection of evidence-based proteomic and phosphoproteomic resources of S. japonicum and identified sex-biased proteins, phosphopeptides, and kinases, which could serve as potentially effective targets for developing novel interventions against schistosomiasis.
{"title":"Quantitative Proteomics and Phosphoproteomics Analyses Identify Sex-biased Protein Ontologies of Schistosoma Japonicum.","authors":"Chuantao Fang, Bikash R Giri, Guofeng Cheng","doi":"10.1093/gpbjnl/qzaf126","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf126","url":null,"abstract":"<p><p>Schistosoma japonicum (S. japonicum) is the causative agent of human schistosomiasis in Asia. Identification of differentially expressed proteins (DEPs) between males and females could elucidate critical signaling pathways underlying sexual maturation and egg production. In the study, quantitative proteome and phosphoproteome profiles were obtained for adult males and females of S. japonicum. In total, we identified 2710 unique proteins, including 2055 proteins and 924 phosphorylated proteins, and 252 (∼ 12.5%) non-phosphorylated and 209 (11.7%) phosphorylated DEPs between males and females. Combined with RNA sequencing, 22 non-phosphorylated DEPs exhibited corresponding mRNA-level changes. Meanwhile, several non-phosphorylated DEPs were shown to function in sex-biased biological processes, including vitellocyte development, oviposition, and parasite mobility by RNA interference. Furthermore, we annotated 96 kinases for S. japonicum, of which CMGC/MAPK and Atypical/RIO kinases are significantly activated in males, while CAMK/CAMKL, AGC/DMPK, and STE/STE7 kinases are activated in females. Finally, the potential drugs targeting these kinases were determined in silico, resulting in 28 kinases as potentially targetable by 30 FDA-approved drugs. Overall, our study provided a collection of evidence-based proteomic and phosphoproteomic resources of S. japonicum and identified sex-biased proteins, phosphopeptides, and kinases, which could serve as potentially effective targets for developing novel interventions against schistosomiasis.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145759068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting Li, Fan Mo, Jianhuan Qi, Chunqiong Li, Xiangshang Li, Jie Zhang, Yingfei Lu, Chao Yao, Li Zhang, Baoyang Hu, Chuan-Yun Li, Ni A An
Although new genes and regulatory events have been linked to the uniqueness of human brain development, it is unknown whether alternative polyadenylation (APA) could also be involved in shaping this key feature that differentiates humans from other species. Here, we present an atlas of APAs of the human brain and identified 161 development-related, open-reading-frame-disrupting APAs associated with the dynamic translation of protein products. Among the genes impacted by these events we identified ZNF271P, encoding a human-specific protein when using the distal polyadenylation site, which preferentially occurs during early brain development. The cortical organoids grown from ZNF271P-knockout human embryonic stem cells seemed to exhibit accelerated development and maturation, resulting in a significant decrease in organoid size, implicating ZNF271P in features unique to human brain development. We thus highlight APAs as new regulators in shaping the unique aspects of human brain development.
{"title":"A Human-specific Protein Regulated by Alternative Polyadenylation Shapes Uniqueness of Human Brain Development.","authors":"Ting Li, Fan Mo, Jianhuan Qi, Chunqiong Li, Xiangshang Li, Jie Zhang, Yingfei Lu, Chao Yao, Li Zhang, Baoyang Hu, Chuan-Yun Li, Ni A An","doi":"10.1093/gpbjnl/qzaf125","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf125","url":null,"abstract":"<p><p>Although new genes and regulatory events have been linked to the uniqueness of human brain development, it is unknown whether alternative polyadenylation (APA) could also be involved in shaping this key feature that differentiates humans from other species. Here, we present an atlas of APAs of the human brain and identified 161 development-related, open-reading-frame-disrupting APAs associated with the dynamic translation of protein products. Among the genes impacted by these events we identified ZNF271P, encoding a human-specific protein when using the distal polyadenylation site, which preferentially occurs during early brain development. The cortical organoids grown from ZNF271P-knockout human embryonic stem cells seemed to exhibit accelerated development and maturation, resulting in a significant decrease in organoid size, implicating ZNF271P in features unique to human brain development. We thus highlight APAs as new regulators in shaping the unique aspects of human brain development.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145752472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Did Treg and Immune Tolerance Win Nobel Prize This Year?","authors":"Song Guo Zheng","doi":"10.1093/gpbjnl/qzaf124","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf124","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite extensive evidence has underscored the critical role of alternative splicing in generating mature circular RNA (circRNA) isoforms and augmenting their function diversity, a significant gap remains in the availability of specialized databases housing circRNA alternative splicing events. To bridge this gap, we develop circASbase, a pioneering and comprehensive database that catalogues 452,129 alternative splicing events in 884,047 full-length circRNAs from 581 samples across 13 species, and provides rich annotations to facilitate understanding the splicing regulation of circRNA. Our findings reveal substantial differences between circRNAs and linear transcripts regarding the distribution and occurrence of alternative splicing events, highlighting the unique regulatory landscape of circRNAs. These special splicing events result in functional differences of circRNAs by affecting IRES sites, m6A sites, ORFs, protein features, miRNA targets, and more. In summary, circASbase not only covers the urgent need of the research community for data repositories, but also represents a significant advancement in our understanding of circRNA biology. With its user-friendly interfaces and web-based visualization tools, circASbase is poised to become an indispensable resource for researchers exploring the regulatory mechanisms and functional roles of alternative splicing events in circRNAs. This database will continuously drive new insights and discoveries in the field, setting the stage for further advancements in circRNA research. circASbase is freely available at http://reprod.njmu.edu.cn/cgi-bin/circASbase/.
{"title":"circASbase:A Comprehensive Database of Alternative Splicing Events in circRNAs.","authors":"Lingxiao Zou, Jian Zhao, Haojie Li, Chen Xu, Yulan Wang, Xuejiang Guo, Xiaofeng Song","doi":"10.1093/gpbjnl/qzaf121","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf121","url":null,"abstract":"<p><p>Despite extensive evidence has underscored the critical role of alternative splicing in generating mature circular RNA (circRNA) isoforms and augmenting their function diversity, a significant gap remains in the availability of specialized databases housing circRNA alternative splicing events. To bridge this gap, we develop circASbase, a pioneering and comprehensive database that catalogues 452,129 alternative splicing events in 884,047 full-length circRNAs from 581 samples across 13 species, and provides rich annotations to facilitate understanding the splicing regulation of circRNA. Our findings reveal substantial differences between circRNAs and linear transcripts regarding the distribution and occurrence of alternative splicing events, highlighting the unique regulatory landscape of circRNAs. These special splicing events result in functional differences of circRNAs by affecting IRES sites, m6A sites, ORFs, protein features, miRNA targets, and more. In summary, circASbase not only covers the urgent need of the research community for data repositories, but also represents a significant advancement in our understanding of circRNA biology. With its user-friendly interfaces and web-based visualization tools, circASbase is poised to become an indispensable resource for researchers exploring the regulatory mechanisms and functional roles of alternative splicing events in circRNAs. This database will continuously drive new insights and discoveries in the field, setting the stage for further advancements in circRNA research. circASbase is freely available at http://reprod.njmu.edu.cn/cgi-bin/circASbase/.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein-nucleic acid interactions (PNIs) are essential for biological processes, including gene regulation, DNA repair, and viral infection. The changes in binding regions and modes in PNIs are vital for understanding the action mechanism and detecting abnormalities. Computational methods, particularly those leveraging machine learning (ML) and deep learning (DL), have become powerful tools for predicting PNI binding sites and structural features. However, systematic evaluation is needed to ensure the reliability and promote innovation in these bioinformatics resources. Here, we present a comprehensive toolbox for PNIs. It includes curated databases detailing interaction types, sources, cross-domain interactions, and potential applications. Then, we investigated a toolbox that leveraged ML and DL-based algorithms to predict binding sites and conformational dynamics, with the aim of uncovering the molecular mechanisms underlying PNIs. Additionally, we discussed the potential applications of drug research related to PNIs. This study introduces a suite of advanced predictive tools that utilize computational modeling to enhance the design of nucleic acid therapeutics forward. We've streamlined these tools into a user-friendly online platform accessible for academic use at http://rv.agroda.cn/pni_portal.
{"title":"Bioinformatics Portal for Predicting Binding Regions and Modes in Protein-Nucleic Acid Interactions.","authors":"Xiao Zhang, Wenbo Guo, Jiaxin Liu, Juan Huang, Yangyang Gao, Vinit Kumar, Gefei Hao","doi":"10.1093/gpbjnl/qzaf114","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf114","url":null,"abstract":"<p><p>Protein-nucleic acid interactions (PNIs) are essential for biological processes, including gene regulation, DNA repair, and viral infection. The changes in binding regions and modes in PNIs are vital for understanding the action mechanism and detecting abnormalities. Computational methods, particularly those leveraging machine learning (ML) and deep learning (DL), have become powerful tools for predicting PNI binding sites and structural features. However, systematic evaluation is needed to ensure the reliability and promote innovation in these bioinformatics resources. Here, we present a comprehensive toolbox for PNIs. It includes curated databases detailing interaction types, sources, cross-domain interactions, and potential applications. Then, we investigated a toolbox that leveraged ML and DL-based algorithms to predict binding sites and conformational dynamics, with the aim of uncovering the molecular mechanisms underlying PNIs. Additionally, we discussed the potential applications of drug research related to PNIs. This study introduces a suite of advanced predictive tools that utilize computational modeling to enhance the design of nucleic acid therapeutics forward. We've streamlined these tools into a user-friendly online platform accessible for academic use at http://rv.agroda.cn/pni_portal.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zebin Wen, Yulong Zhang, Guanchuan Lin, Xu Li, Changtai Xiao, Siwen Xu, Jiahong Wang, Shuyi Cao, Yuting Chen, Hui Liu, Xingguang Luo, Yan Chen, Paul K H Tam, Xinghua Pan
Cell states within cancer have garnered significant attention, yet the mechanisms through which malignant cells assert dominance in pan-cancer commonalities remain elusive. In this study, we employed label-free multiplexed single-cell RNA sequencing (scRNA-seq) to analyze cell states in 159,372 cells across 245 cell lines spanning 14 tissue types, integrating both public and proprietary datasets. We identified 21 meta-programs (MPs) representing shared characteristics across pan-cancer landscapes, encompassing 16 biological processes. Subsequently, we developed a deep learning model StateNet to generate cell-state fingerprints for delineating the individuality of each cell line based on these MPs. Leveraging StateNet, we pinpointed ACAT2 as a potential mediator bridging hypoxia and the lipid metabolism pathway, and we also showcased that epithelial-mesenchymal transition programs are vital for classifying cell lines through perturbation experiments. StateNet not only elucidates the overarching manifold structure of scRNA-seq data but also furnishes cell-state fingerprints of cell clusters, unveiling prognosis-related programs and distinguishing between patients with varying survival outcomes. Utilizing these prognosis-related programs on 3210 cancer samples, we constructed Cox models and identified risk-associated programs and genes responsible for different cancer types. StateNet thus emerges as a novel and efficient tool for cancer profiling, unraveling the shared commonalities and distinct individualities of pan-cancer cells across expansive datasets.
{"title":"Profiling Cell-state Fingerprints Based on Deep Learning Model with Meta-programs of Pan-cancer.","authors":"Zebin Wen, Yulong Zhang, Guanchuan Lin, Xu Li, Changtai Xiao, Siwen Xu, Jiahong Wang, Shuyi Cao, Yuting Chen, Hui Liu, Xingguang Luo, Yan Chen, Paul K H Tam, Xinghua Pan","doi":"10.1093/gpbjnl/qzaf123","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf123","url":null,"abstract":"<p><p>Cell states within cancer have garnered significant attention, yet the mechanisms through which malignant cells assert dominance in pan-cancer commonalities remain elusive. In this study, we employed label-free multiplexed single-cell RNA sequencing (scRNA-seq) to analyze cell states in 159,372 cells across 245 cell lines spanning 14 tissue types, integrating both public and proprietary datasets. We identified 21 meta-programs (MPs) representing shared characteristics across pan-cancer landscapes, encompassing 16 biological processes. Subsequently, we developed a deep learning model StateNet to generate cell-state fingerprints for delineating the individuality of each cell line based on these MPs. Leveraging StateNet, we pinpointed ACAT2 as a potential mediator bridging hypoxia and the lipid metabolism pathway, and we also showcased that epithelial-mesenchymal transition programs are vital for classifying cell lines through perturbation experiments. StateNet not only elucidates the overarching manifold structure of scRNA-seq data but also furnishes cell-state fingerprints of cell clusters, unveiling prognosis-related programs and distinguishing between patients with varying survival outcomes. Utilizing these prognosis-related programs on 3210 cancer samples, we constructed Cox models and identified risk-associated programs and genes responsible for different cancer types. StateNet thus emerges as a novel and efficient tool for cancer profiling, unraveling the shared commonalities and distinct individualities of pan-cancer cells across expansive datasets.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A small peptide encoded by a non-coding RNA (ncRNA), known as a non-coding peptide (ncPEP), is emerging as a critical regulator and biomarker in cancer, holding immense promise for immunotherapy. However, systematic identification of ncPEPs is hampered by computational methods that typically analyze peptides based on sequence alone. This approach overlooks the fundamental biological principle that multiple distinct peptides can be translated from a single ncRNA transcript, thus sharing a common transcriptional origin. Here, we address this limitation by developing HGCPep, a deep learning framework that leverages hypergraphs to model these intrinsic relationships. In our model, each ncRNA is represented as a hyperedge connecting the cohort of peptides it encodes, thereby enriching peptide feature representations with transcriptional context. We demonstrate that HGCPep, which integrates a hypergraph neural network with a convolutional neural network, outperforms state-of-the-art methods in identifying cancer-associated ncPEPs. Furthermore, dimensionality reduction of the learned embeddings reveals distinct clustering of ncPEPs by cancer type, illustrating how the model effectively deciphers complex biological associations. Our work introduces a new method for ncPEPs analytics and provides a powerful tool for discovering novel therapeutic targets in oncology. The dataset and source code of our proposed method can be found via https://github.com/Longwt123/HGCPep_Github.
{"title":"HGCPep: Hypergraph Deep Learning Identifies Cancer-associated Non-coding Peptides.","authors":"Wentao Long, Zhongshen Li, Junru Jin, Jianbo Qiao, Yu Wang, Leyi Wei","doi":"10.1093/gpbjnl/qzaf093","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf093","url":null,"abstract":"<p><p>A small peptide encoded by a non-coding RNA (ncRNA), known as a non-coding peptide (ncPEP), is emerging as a critical regulator and biomarker in cancer, holding immense promise for immunotherapy. However, systematic identification of ncPEPs is hampered by computational methods that typically analyze peptides based on sequence alone. This approach overlooks the fundamental biological principle that multiple distinct peptides can be translated from a single ncRNA transcript, thus sharing a common transcriptional origin. Here, we address this limitation by developing HGCPep, a deep learning framework that leverages hypergraphs to model these intrinsic relationships. In our model, each ncRNA is represented as a hyperedge connecting the cohort of peptides it encodes, thereby enriching peptide feature representations with transcriptional context. We demonstrate that HGCPep, which integrates a hypergraph neural network with a convolutional neural network, outperforms state-of-the-art methods in identifying cancer-associated ncPEPs. Furthermore, dimensionality reduction of the learned embeddings reveals distinct clustering of ncPEPs by cancer type, illustrating how the model effectively deciphers complex biological associations. Our work introduces a new method for ncPEPs analytics and provides a powerful tool for discovering novel therapeutic targets in oncology. The dataset and source code of our proposed method can be found via https://github.com/Longwt123/HGCPep_Github.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel K Putnam, Alexander M Gout, Delaram Rahbarinia, Meiling Jin, David Finkelstein, Xiaotu Ma, Jinghui Zhang, David A Wheeler, Larissa V Furtado, Xiang Chen
Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92%-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID.
{"title":"CanID: A Robust and Accurate RNA-seq Expression-based Diagnostic Classification Scheme for Pediatric Malignancies.","authors":"Daniel K Putnam, Alexander M Gout, Delaram Rahbarinia, Meiling Jin, David Finkelstein, Xiaotu Ma, Jinghui Zhang, David A Wheeler, Larissa V Furtado, Xiang Chen","doi":"10.1093/gpbjnl/qzaf122","DOIUrl":"10.1093/gpbjnl/qzaf122","url":null,"abstract":"<p><p>Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92%-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin L Couetil, Ziyu Liu, Chao Chen, Ahmed K Alomari, Kun Huang, Jie Zhang, Travis S Johnson
The field effect describes the phenomena where environmental exposures, infection, and genetic predisposition result in molecular changes in cells that predispose them to developing cancer. Though this is a well-established concept in pathology, it remains underexplored in the context of high-resolution omics. We utilized the Diagnostic Evidence Gauge of Single Cells (DEGAS) deep transfer learning framework to analyze prostate cancer spatial transcriptomics to identify cells and tissues that are highly associated with cancer progression. DEGAS highlighted morphologically benign glands that had reduced expression of MSMB, a differentiation marker that is decreased in aggressive tumors. These glands have upregulated genes associated with antigen presentation and aggressive neoplasms. Integration of single-cell transcriptomics and deep learning image analysis separately revealed altered immune-cell infiltration, suggesting a complex interplay in the tumor environment facilitating aggressiveness. We used immunohistochemistry to quantify the MSMB protein (PSP-94) expression on morphologically normal and tumor tissues from patients with and without 5-year distant metastasis. Samples from patients who developed metastasis consistently showed lower fractions of positively stained cells, indicating a subtle yet significant "field effect" in seemingly benign regions. These proteomic results validate the transcriptomic findings and further underscore that inflammatory or immune-related changes in ostensibly normal tissue may contribute to aggressive disease progression.
{"title":"Deep Transfer Learning Links Benign Glands to Prostate Cancer Progression via Transcriptomics.","authors":"Justin L Couetil, Ziyu Liu, Chao Chen, Ahmed K Alomari, Kun Huang, Jie Zhang, Travis S Johnson","doi":"10.1093/gpbjnl/qzaf119","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf119","url":null,"abstract":"<p><p>The field effect describes the phenomena where environmental exposures, infection, and genetic predisposition result in molecular changes in cells that predispose them to developing cancer. Though this is a well-established concept in pathology, it remains underexplored in the context of high-resolution omics. We utilized the Diagnostic Evidence Gauge of Single Cells (DEGAS) deep transfer learning framework to analyze prostate cancer spatial transcriptomics to identify cells and tissues that are highly associated with cancer progression. DEGAS highlighted morphologically benign glands that had reduced expression of MSMB, a differentiation marker that is decreased in aggressive tumors. These glands have upregulated genes associated with antigen presentation and aggressive neoplasms. Integration of single-cell transcriptomics and deep learning image analysis separately revealed altered immune-cell infiltration, suggesting a complex interplay in the tumor environment facilitating aggressiveness. We used immunohistochemistry to quantify the MSMB protein (PSP-94) expression on morphologically normal and tumor tissues from patients with and without 5-year distant metastasis. Samples from patients who developed metastasis consistently showed lower fractions of positively stained cells, indicating a subtle yet significant \"field effect\" in seemingly benign regions. These proteomic results validate the transcriptomic findings and further underscore that inflammatory or immune-related changes in ostensibly normal tissue may contribute to aggressive disease progression.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}