Ying Yi, Yongfei Hu, Juanjuan Kang, Qifa Liu, Yan Huang, Dong Wang
Hematology research has greatly benefited from the integration of diverse biological data resources and advanced machine learning frameworks. This integration has not only deepened our understanding of blood diseases such as leukemia and lymphoma, but also enhanced diagnostic accuracy and personalized treatment strategies. By applying machine learning algorithms to analyze large-scale biological data, researchers are able to more effectively identify disease patterns, predict treatment responses, and provide new perspectives for the diagnosis and treatment of hematologic disorders. Here, we provide an overview of the current landscape of biological data resources and the application of machine learning frameworks pertinent to hematology research.
{"title":"Biological Data Resources and Machine Learning Frameworks for Hematology Research.","authors":"Ying Yi, Yongfei Hu, Juanjuan Kang, Qifa Liu, Yan Huang, Dong Wang","doi":"10.1093/gpbjnl/qzaf021","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf021","url":null,"abstract":"<p><p>Hematology research has greatly benefited from the integration of diverse biological data resources and advanced machine learning frameworks. This integration has not only deepened our understanding of blood diseases such as leukemia and lymphoma, but also enhanced diagnostic accuracy and personalized treatment strategies. By applying machine learning algorithms to analyze large-scale biological data, researchers are able to more effectively identify disease patterns, predict treatment responses, and provide new perspectives for the diagnosis and treatment of hematologic disorders. Here, we provide an overview of the current landscape of biological data resources and the application of machine learning frameworks pertinent to hematology research.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbin Huang, Zhenwei Qian, Jieni Zhang, Yi Ding, Bin Wang, Jiuxiang Lin, Xiannian Zhang, Huaxiang Zhao, Feng Chen
Cleft palate is one of the most common congenital craniofacial disorders that affects children's appearance and oral functions. Investigating the transcriptomics during palatogenesis is crucial for comprehending the etiology of this disorder and facilitating prenatal molecular diagnosis. However, there is limited knowledge about the single-cell differentiation dynamics during mid-palatogenesis and late-palatogenesis, specifically regarding the subpopulations and developmental trajectories of periderm, a rare but critical cell population. Here we explored the single-cell landscape of mouse developing palates from embryonic day (E) 10.5 to E16.5. We systematically depicted the single-cell transcriptomics of mesenchymal and epithelial cells during palatogenesis, including subpopulations and differentiation dynamics. Additionally, we identified four subclusters of palatal periderm and constructed two distinct trajectories of cell fates for periderm cells. Our findings reveal that claudin-family coding genes and Arhgap29 play a role in the non-stick function of the periderm before the palatal shelves contact, and Pitx2 mediates the adhesion of periderm during the contact of opposing palatal shelves. Furthermore, we demonstrated that epithelial-mesenchymal transition (EMT), apoptosis, and migration collectively contribute to the degeneration of periderm cells in the medial epithelial seam. Taken together, our study suggests a novel model of periderm development during palatogenesis and delineates the cellular and molecular transitions in periderm cell determination.
{"title":"Single-cell Atlas of Developing Mouse Palates Reveals Cellular and Molecular Transitions in Periderm Cell Fate.","authors":"Wenbin Huang, Zhenwei Qian, Jieni Zhang, Yi Ding, Bin Wang, Jiuxiang Lin, Xiannian Zhang, Huaxiang Zhao, Feng Chen","doi":"10.1093/gpbjnl/qzaf013","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf013","url":null,"abstract":"<p><p>Cleft palate is one of the most common congenital craniofacial disorders that affects children's appearance and oral functions. Investigating the transcriptomics during palatogenesis is crucial for comprehending the etiology of this disorder and facilitating prenatal molecular diagnosis. However, there is limited knowledge about the single-cell differentiation dynamics during mid-palatogenesis and late-palatogenesis, specifically regarding the subpopulations and developmental trajectories of periderm, a rare but critical cell population. Here we explored the single-cell landscape of mouse developing palates from embryonic day (E) 10.5 to E16.5. We systematically depicted the single-cell transcriptomics of mesenchymal and epithelial cells during palatogenesis, including subpopulations and differentiation dynamics. Additionally, we identified four subclusters of palatal periderm and constructed two distinct trajectories of cell fates for periderm cells. Our findings reveal that claudin-family coding genes and Arhgap29 play a role in the non-stick function of the periderm before the palatal shelves contact, and Pitx2 mediates the adhesion of periderm during the contact of opposing palatal shelves. Furthermore, we demonstrated that epithelial-mesenchymal transition (EMT), apoptosis, and migration collectively contribute to the degeneration of periderm cells in the medial epithelial seam. Taken together, our study suggests a novel model of periderm development during palatogenesis and delineates the cellular and molecular transitions in periderm cell determination.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taoyu Chen, Guoguo Tang, Tianhao Li, Zhining Yanghong, Chao Hou, Zezhou Du, Kaiqiang You, Liwei Ma, Tingting Li
Biomacromolecules form membraneless organelles through liquid-liquid phase separation in order to regulate the efficiency of particular biochemical reactions. Dysregulation of phase separation might result in pathological condensation or sequestration of biomolecules, leading to diseases. Thus, phase separation and phase separating factors may serve as drug targets for disease treatment. Nevertheless, such associations have not yet been integrated into phase separation related databases. Therefore, based on MloDisDB, a database for membraneless organelle factor-disease association previously developed by our lab, we constructed PhaSeDis, the phase separation-disease association database. We increased the number of phase separation entries from 52 to 185, and supplemented the evidence provided by the original article verifying the phase separation nature of the factors. Moreover, we included the information of interacting small molecules with low-throughput or high-throughput evidence that might serve as potential drugs for phase separation entries. PhaSeDis strives to offer comprehensive descriptions of each entry, elucidating how phase separating factors induce pathological conditions via phase separation and the mechanisms by which small molecules intervene. We believe that PhaSeDis would be very important in the application of phase separation regulation in treating related diseases. PhaSeDis is available at http://mlodis.phasep.pro.
{"title":"PhaSeDis: A Manually Curated Database of Phase Separation-Disease Associations and Corresponding Small Molecules.","authors":"Taoyu Chen, Guoguo Tang, Tianhao Li, Zhining Yanghong, Chao Hou, Zezhou Du, Kaiqiang You, Liwei Ma, Tingting Li","doi":"10.1093/gpbjnl/qzaf014","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf014","url":null,"abstract":"<p><p>Biomacromolecules form membraneless organelles through liquid-liquid phase separation in order to regulate the efficiency of particular biochemical reactions. Dysregulation of phase separation might result in pathological condensation or sequestration of biomolecules, leading to diseases. Thus, phase separation and phase separating factors may serve as drug targets for disease treatment. Nevertheless, such associations have not yet been integrated into phase separation related databases. Therefore, based on MloDisDB, a database for membraneless organelle factor-disease association previously developed by our lab, we constructed PhaSeDis, the phase separation-disease association database. We increased the number of phase separation entries from 52 to 185, and supplemented the evidence provided by the original article verifying the phase separation nature of the factors. Moreover, we included the information of interacting small molecules with low-throughput or high-throughput evidence that might serve as potential drugs for phase separation entries. PhaSeDis strives to offer comprehensive descriptions of each entry, elucidating how phase separating factors induce pathological conditions via phase separation and the mechanisms by which small molecules intervene. We believe that PhaSeDis would be very important in the application of phase separation regulation in treating related diseases. PhaSeDis is available at http://mlodis.phasep.pro.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingru Zhao, Hanpeng Luo, Xuefeng Fu, Guoming Zhang, Emily L Clark, Feng Wang, Brian Paul Dalrymple, V Hutton Oddy, Philip E Vercoe, Cuiling Wu, George E Liu, Cong-Jun Li, Ruidong Xiang, Kechuan Tian, Yanli Zhang, Lingzhao Fang
Sheep (Ovis aries) represents one of the most important livestock species for animal protein and wool production worldwide. However, little is known about the genetic and biological basis of ovine phenotypes, particularly for those of high economic value and environmental impact. Here, by integrating 1413 RNA-seq samples from 51 distinct tissues across 14 developmental time points, representing early prenatal, late prenatal, neonate, lamb, juvenile, adult, and elderly stages, we built a high-resolution developmental Gene Expression Atlas (dGEA) in sheep. We observed dynamic patterns of gene expression and regulatory networks across tissues and developmental stages. When harnessing this resource for interpreting genetic associations of 48 monogenetic and 12 complex traits in sheep, we found that genes upregulated at prenatal developmental stages played more important roles in shaping these phenotypes than those upregulated at postnatal stages. For instance, genetic associations of crimp number, mean staple length (MSL), and individual birth weight were significantly enriched in the prenatal rather than postnatal skin and immune tissues. By comprehensively integrating GWAS fine-mapping results and the sheep dGEA, we proposed several candidate genes for complex traits in sheep, such as SOX9 for MSL, GNRHR for litter size at birth, and PRKDC for live weight. These results provide novel insights into the developmental and molecular architecture underlying ovine phenotypes. The dGEA (https://sheepdgea.njau.edu.cn/) will serve as an invaluable resource for sheep developmental biology, genetics, genomics, and selective breeding.
{"title":"A Developmental Gene Expression Atlas Reveals Novel Biological Basis of Complex Phenotypes in Sheep.","authors":"Bingru Zhao, Hanpeng Luo, Xuefeng Fu, Guoming Zhang, Emily L Clark, Feng Wang, Brian Paul Dalrymple, V Hutton Oddy, Philip E Vercoe, Cuiling Wu, George E Liu, Cong-Jun Li, Ruidong Xiang, Kechuan Tian, Yanli Zhang, Lingzhao Fang","doi":"10.1093/gpbjnl/qzaf020","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf020","url":null,"abstract":"<p><p>Sheep (Ovis aries) represents one of the most important livestock species for animal protein and wool production worldwide. However, little is known about the genetic and biological basis of ovine phenotypes, particularly for those of high economic value and environmental impact. Here, by integrating 1413 RNA-seq samples from 51 distinct tissues across 14 developmental time points, representing early prenatal, late prenatal, neonate, lamb, juvenile, adult, and elderly stages, we built a high-resolution developmental Gene Expression Atlas (dGEA) in sheep. We observed dynamic patterns of gene expression and regulatory networks across tissues and developmental stages. When harnessing this resource for interpreting genetic associations of 48 monogenetic and 12 complex traits in sheep, we found that genes upregulated at prenatal developmental stages played more important roles in shaping these phenotypes than those upregulated at postnatal stages. For instance, genetic associations of crimp number, mean staple length (MSL), and individual birth weight were significantly enriched in the prenatal rather than postnatal skin and immune tissues. By comprehensively integrating GWAS fine-mapping results and the sheep dGEA, we proposed several candidate genes for complex traits in sheep, such as SOX9 for MSL, GNRHR for litter size at birth, and PRKDC for live weight. These results provide novel insights into the developmental and molecular architecture underlying ovine phenotypes. The dGEA (https://sheepdgea.njau.edu.cn/) will serve as an invaluable resource for sheep developmental biology, genetics, genomics, and selective breeding.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guohao Han, Peng Yang, Yongjin Zhang, Qiaowei Li, Xinhao Fan, Ruipu Chen, Chao Yan, Mu Zeng, Yalan Yang, Zhonglin Tang
In addition to being a major source of animal protein, pigs are an important model for the study of development and diseases in humans. During the past two decades, thousands of high-throughput sequencing studies in pigs have been performed using a variety of tissues from different breeds and developmental stages. However, the multi-omics database specifically used for pig functional genomic research is still limited. Here, we present a user-friendly database of pig multi-omics named PIGOME. PIGOME currently contains seven types of pig omics datasets, including whole-genome sequencing (WGS), RNA sequencing (RNA-seq), microRNA sequencing (miRNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), assay for transposase-accessible chromatin sequencing (ATAC-seq), bisulfite sequencing (BS-seq), and methylated RNA immunoprecipitation sequencing (MeRIP-seq), from 6901 samples and 392 projects with manually curated metadata, integrated gene annotation, and quantitative trait locus information. Furthermore, various "Explore" and "Browse" functions have been established for user-friendly access to omics information. PIGOME implemented several tools to visualize genomic variants, gene expression, and epigenetic signals of a given gene in the pig genome, enabling efficient exploration of spatial-temporal gene expression/epigenetic pattern, function, regulatory mechanism, and associated economic traits. Collectively, PIGOME provides valuable resources for pig breeding and is helpful for human biomedical research. PIGOME is available at https://pigome.com.
{"title":"PIGOME: An Integrated and Comprehensive Multi-omics Database for Pig Functional Genomics Studies.","authors":"Guohao Han, Peng Yang, Yongjin Zhang, Qiaowei Li, Xinhao Fan, Ruipu Chen, Chao Yan, Mu Zeng, Yalan Yang, Zhonglin Tang","doi":"10.1093/gpbjnl/qzaf016","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf016","url":null,"abstract":"<p><p>In addition to being a major source of animal protein, pigs are an important model for the study of development and diseases in humans. During the past two decades, thousands of high-throughput sequencing studies in pigs have been performed using a variety of tissues from different breeds and developmental stages. However, the multi-omics database specifically used for pig functional genomic research is still limited. Here, we present a user-friendly database of pig multi-omics named PIGOME. PIGOME currently contains seven types of pig omics datasets, including whole-genome sequencing (WGS), RNA sequencing (RNA-seq), microRNA sequencing (miRNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), assay for transposase-accessible chromatin sequencing (ATAC-seq), bisulfite sequencing (BS-seq), and methylated RNA immunoprecipitation sequencing (MeRIP-seq), from 6901 samples and 392 projects with manually curated metadata, integrated gene annotation, and quantitative trait locus information. Furthermore, various \"Explore\" and \"Browse\" functions have been established for user-friendly access to omics information. PIGOME implemented several tools to visualize genomic variants, gene expression, and epigenetic signals of a given gene in the pig genome, enabling efficient exploration of spatial-temporal gene expression/epigenetic pattern, function, regulatory mechanism, and associated economic traits. Collectively, PIGOME provides valuable resources for pig breeding and is helpful for human biomedical research. PIGOME is available at https://pigome.com.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes
The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).
{"title":"LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank.","authors":"Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes","doi":"10.1093/gpbjnl/qzaf018","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf018","url":null,"abstract":"<p><p>The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid development of biological and medical examination methods has vastly expanded personal biomedical information, including molecular, cellular, image, and electronic health record datasets. Integrating this wealth of information enables precise disease diagnosis, biomarker identification, and treatment design in clinical settings. Artificial intelligence (AI) techniques, particularly deep learning models, have been extensively employed in biomedical applications, demonstrating increased precision, efficiency, and generalization. The success of the large language and vision models further significantly extends their biomedical applications. However, challenges remain in learning these multimodal biomedical datasets, such as data privacy, fusion, and model interpretation. In this review, we provided a comprehensive overview of various biomedical data modalities, multi-modal representation learning methods, and the applications of AI in biomedical data integrative analysis. Additionally, we discussed the challenges in applying these deep learning methods and how to better integrate them into biomedical scenarios. We then proposed future directions for adapting deep learning methods with model pre-training and knowledge integration to advance biomedical research and benefit their clinical applications.
{"title":"Challenges in AI-driven Biomedical Multimodal Data Fusion and Analysis.","authors":"Junwei Liu, Xiaoping Cen, Chenxin Yi, Feng-Ao Wang, Junxiang Ding, Jinyu Cheng, Qinhua Wu, Baowen Gai, Yiwen Zhou, Ruikun He, Feng Gao, Yixue Li","doi":"10.1093/gpbjnl/qzaf011","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf011","url":null,"abstract":"<p><p>The rapid development of biological and medical examination methods has vastly expanded personal biomedical information, including molecular, cellular, image, and electronic health record datasets. Integrating this wealth of information enables precise disease diagnosis, biomarker identification, and treatment design in clinical settings. Artificial intelligence (AI) techniques, particularly deep learning models, have been extensively employed in biomedical applications, demonstrating increased precision, efficiency, and generalization. The success of the large language and vision models further significantly extends their biomedical applications. However, challenges remain in learning these multimodal biomedical datasets, such as data privacy, fusion, and model interpretation. In this review, we provided a comprehensive overview of various biomedical data modalities, multi-modal representation learning methods, and the applications of AI in biomedical data integrative analysis. Additionally, we discussed the challenges in applying these deep learning methods and how to better integrate them into biomedical scenarios. We then proposed future directions for adapting deep learning methods with model pre-training and knowledge integration to advance biomedical research and benefit their clinical applications.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Homologous recombination deficiency (HRD) has emerged as a critical prognostic and predictive biomarker in oncology. However, current testing methods, especially those reliant on targeted panels, are plagued by inconsistent results from the same samples. This highlights the urgent need for standardized benchmarks to evaluate HRD assay performance. In phases IIa and IIb of the Chinese HRD Harmonization Project, we developed ten pairs of well-characterized DNA reference materials derived from lung, breast, and melanoma cancer cell lines and their matched normal cell lines, each paired with seven cancer-to-normal mass ratios. Reference datasets for allele-specific copy number variations (ASCNVs) and HRD scores were established and validated based on three sequencing methods and nine analytical pipelines. The Genomic Instability Scores (GIS) of the reference materials ranged from 11 to 96, enabling validation across various thresholds. The ASCNV reference datasets covered a genomic span of 2340 to 2749 Mb, equivalent to 81.2% to 95.4% of the autosomes in the 37d5 reference genome. These benchmarks were subsequently utilized to assess the accuracy and reproducibility of four HRD panel assays, revealing significant variability in both ASCNV detection and HRD scores. The concordance between panel-detected GIS and reference GIS ranged from 0.81 to 0.94, and only two assays exhibited high overall agreement with Myriad MyChoice CDx for HRD classification. This study also identified specific challenges in ASCNV detection in HRD-related regions and the profound impact of high ploidy on consistency. The established HRD reference materials and datasets provide a robust toolkit for objective evaluation of HRD testing.
{"title":"Evaluative Methodology for HRD Testing: Development of Standard Tools for Consistency Assessment.","authors":"Zheng Jia, Yaqing Liu, Shoufang Qu, Wenbin Li, Lin Gao, Lin Dong, Yun Xing, Yadi Cheng, Huan Fang, Yuting Yi, Yuxing Chu, Chao Zhang, Yanming Xie, Chunli Wang, Zhe Li, Zhihong Zhang, Zhipeng Xu, Yang Wang, Wenxin Zhang, Xiaoping Gu, Shuang Yang, Jinghua Li, Liangshen Wei, Yuanting Zheng, Guohui Ding, Leming Shi, Xin Yi, Jianming Ying, Jie Huang","doi":"10.1093/gpbjnl/qzaf017","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf017","url":null,"abstract":"<p><p>Homologous recombination deficiency (HRD) has emerged as a critical prognostic and predictive biomarker in oncology. However, current testing methods, especially those reliant on targeted panels, are plagued by inconsistent results from the same samples. This highlights the urgent need for standardized benchmarks to evaluate HRD assay performance. In phases IIa and IIb of the Chinese HRD Harmonization Project, we developed ten pairs of well-characterized DNA reference materials derived from lung, breast, and melanoma cancer cell lines and their matched normal cell lines, each paired with seven cancer-to-normal mass ratios. Reference datasets for allele-specific copy number variations (ASCNVs) and HRD scores were established and validated based on three sequencing methods and nine analytical pipelines. The Genomic Instability Scores (GIS) of the reference materials ranged from 11 to 96, enabling validation across various thresholds. The ASCNV reference datasets covered a genomic span of 2340 to 2749 Mb, equivalent to 81.2% to 95.4% of the autosomes in the 37d5 reference genome. These benchmarks were subsequently utilized to assess the accuracy and reproducibility of four HRD panel assays, revealing significant variability in both ASCNV detection and HRD scores. The concordance between panel-detected GIS and reference GIS ranged from 0.81 to 0.94, and only two assays exhibited high overall agreement with Myriad MyChoice CDx for HRD classification. This study also identified specific challenges in ASCNV detection in HRD-related regions and the profound impact of high ploidy on consistency. The established HRD reference materials and datasets provide a robust toolkit for objective evaluation of HRD testing.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mass spectrometry-based single cell proteomics (MS-SCP) is attracting tremendous attention because it is now technically feasible to quantify thousands of proteins in minute samples. Since protein amplification is still not possible, technological improvements in MS-SCP focus on minimizing sample loss and increasing throughput, resolution, and sensitivity, as well as achieving the measurement depth, accuracy, and stability as bulk samples. Major advances in MS-SCP have facilitated its use in biological and even medical applications. Here, we review the key advancements in MS-SCP technology and discuss the strategies of the classic proteomics workflow to improve MS-SCP analysis from single cell isolation, sample preparation and liquid chromatography separation to MS data acquisition and analysis. The review will provide an overall understanding of the development and application of MS-SCP and inspire more novel ideas regarding the innovation of MS-SCP technology.
{"title":"MS-based Solutions for Single Cell Proteomics.","authors":"Siqi Li, Shuwei Li, Siqi Liu, Yan Ren","doi":"10.1093/gpbjnl/qzaf012","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf012","url":null,"abstract":"<p><p>Mass spectrometry-based single cell proteomics (MS-SCP) is attracting tremendous attention because it is now technically feasible to quantify thousands of proteins in minute samples. Since protein amplification is still not possible, technological improvements in MS-SCP focus on minimizing sample loss and increasing throughput, resolution, and sensitivity, as well as achieving the measurement depth, accuracy, and stability as bulk samples. Major advances in MS-SCP have facilitated its use in biological and even medical applications. Here, we review the key advancements in MS-SCP technology and discuss the strategies of the classic proteomics workflow to improve MS-SCP analysis from single cell isolation, sample preparation and liquid chromatography separation to MS data acquisition and analysis. The review will provide an overall understanding of the development and application of MS-SCP and inspire more novel ideas regarding the innovation of MS-SCP technology.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143477009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Genome Warehouse (GWH), accessible at https://ngdc.cncb.ac.cn/gwh, is an extensively utilized public repository dedicated to the deposition, management and sharing of genome assembly sequences, annotations, and metadata. This paper highlights noteworthy enhancements to the GWH since the 2021 version, emphasizing substantial advancements in web interfaces for data submission, database functionality updates, and resource integration. Key updates include the reannotation of released prokaryotic genomes, mirroring of genome resources from National Center for Biotechnology Information (NCBI) GenBank and Reference Sequence Database (RefSeq), integration of Poxviridae sequences, implementation of an online batch submission system, enhancements to the quality control system, advanced search capabilities, and the introduction of a controlled-access mechanism for human genome data. These improvements collectively augment the ease and security of data submission and access as well as genome data value, thereby fostering heightened convenience and utility for researchers in the genomics field.
{"title":"The Updated Genome Warehouse: Enhancing Data Value, Security, and Usability to Address Data Expansion.","authors":"Yingke Ma, Xuetong Zhao, Yaokai Jia, Zhenxian Han, Caixia Yu, Zhuojing Fan, Zhang Zhang, Jingfa Xiao, Wenming Zhao, Yiming Bao, Meili Chen","doi":"10.1093/gpbjnl/qzaf010","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf010","url":null,"abstract":"<p><p>The Genome Warehouse (GWH), accessible at https://ngdc.cncb.ac.cn/gwh, is an extensively utilized public repository dedicated to the deposition, management and sharing of genome assembly sequences, annotations, and metadata. This paper highlights noteworthy enhancements to the GWH since the 2021 version, emphasizing substantial advancements in web interfaces for data submission, database functionality updates, and resource integration. Key updates include the reannotation of released prokaryotic genomes, mirroring of genome resources from National Center for Biotechnology Information (NCBI) GenBank and Reference Sequence Database (RefSeq), integration of Poxviridae sequences, implementation of an online batch submission system, enhancements to the quality control system, advanced search capabilities, and the introduction of a controlled-access mechanism for human genome data. These improvements collectively augment the ease and security of data submission and access as well as genome data value, thereby fostering heightened convenience and utility for researchers in the genomics field.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}