Genomics, proteomics & bioinformatics最新文献_第2页

CancerSEA-X: A Single-cell Resource for Tumor Microenvironment Cell States Across over 30 Cancer Types. CancerSEA-X: 30多种癌症类型肿瘤微环境细胞状态的单细胞资源。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-16 DOI: 10.1093/gpbjnl/qzaf134

Lantian Deng, Wei Liu, Jinyuan Xu, Bo Zhang, Shengyuan He, Kun Liu, Xinxin Zhang, Huating Yuan, Fei Quan, Yun Xiao

Single-cell studies have significantly advanced our understanding of the transcriptional and functional heterogeneity in cancers. Recent studies have identified distinct states of cancer, immune, and stromal cells in the tumor microenvironment (TME), with growing evidence highlighting their clinical significance and therapeutic potential. Here, we present CancerSEA-X, an expanded version of CancerSEA that offers a comprehensive atlas of TME cell states. CancerSEA-X integrates 25 cancer cell states, 105 immune cell states, and 26 stromal cell states from systematically curated publications. Combining 239 single-cell datasets across 32 cancer types, encompassing over 9 million cells from 2120 patients, CancerSEA-X provides functional activity spectra and cancer-specific gene associations for these 156 cell states. These cell state-gene relationships were mapped onto networks, providing a systematic view of the TME. To improve usability, we redesigned the user interface to feature cell state characterization, state-gene correlation analysis, and interactive visualization of cell state-gene networks, enabling researchers to comprehensively explore these states and their functional relevance. Overall, CancerSEA-X serves as a valuable platform for investigating TME cell states, deepening our understanding of cancer heterogeneity, and potentially advancing the design of more effective clinical therapies. CancerSEA-X is freely available at http://biocc.hrbmu.edu.cn/CancerState.

单细胞研究大大提高了我们对癌症转录和功能异质性的理解。最近的研究已经确定了肿瘤微环境（TME）中不同的癌症、免疫和基质细胞状态，越来越多的证据突出了它们的临床意义和治疗潜力。在这里，我们提出了CancerSEA- x，这是CancerSEA的扩展版本，提供了TME细胞状态的全面图谱。CancerSEA-X整合了25种癌细胞状态、105种免疫细胞状态和26种基质细胞状态。结合32种癌症类型的239个单细胞数据集，包括来自2120名患者的900多万个细胞，CancerSEA-X提供了这156种细胞状态的功能活性谱和癌症特异性基因关联。这些细胞状态-基因关系被映射到网络上，提供了TME的系统视图。为了提高可用性，我们重新设计了用户界面，以提供细胞状态表征、状态-基因相关分析和细胞状态-基因网络的交互式可视化，使研究人员能够全面探索这些状态及其功能相关性。总的来说，CancerSEA-X是研究TME细胞状态的一个有价值的平台，加深了我们对癌症异质性的理解，并有可能推进更有效临床治疗的设计。CancerSEA-X可在http://biocc.hrbmu.edu.cn/CancerState免费获得。

{"title":"CancerSEA-X: A Single-cell Resource for Tumor Microenvironment Cell States Across over 30 Cancer Types.","authors":"Lantian Deng, Wei Liu, Jinyuan Xu, Bo Zhang, Shengyuan He, Kun Liu, Xinxin Zhang, Huating Yuan, Fei Quan, Yun Xiao","doi":"10.1093/gpbjnl/qzaf134","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf134","url":null,"abstract":"Single-cell studies have significantly advanced our understanding of the transcriptional and functional heterogeneity in cancers. Recent studies have identified distinct states of cancer, immune, and stromal cells in the tumor microenvironment (TME), with growing evidence highlighting their clinical significance and therapeutic potential. Here, we present CancerSEA-X, an expanded version of CancerSEA that offers a comprehensive atlas of TME cell states. CancerSEA-X integrates 25 cancer cell states, 105 immune cell states, and 26 stromal cell states from systematically curated publications. Combining 239 single-cell datasets across 32 cancer types, encompassing over 9 million cells from 2120 patients, CancerSEA-X provides functional activity spectra and cancer-specific gene associations for these 156 cell states. These cell state-gene relationships were mapped onto networks, providing a systematic view of the TME. To improve usability, we redesigned the user interface to feature cell state characterization, state-gene correlation analysis, and interactive visualization of cell state-gene networks, enabling researchers to comprehensively explore these states and their functional relevance. Overall, CancerSEA-X serves as a valuable platform for investigating TME cell states, deepening our understanding of cancer heterogeneity, and potentially advancing the design of more effective clinical therapies. CancerSEA-X is freely available at http://biocc.hrbmu.edu.cn/CancerState.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

4D Chromatin Dynamics Resolved During Early Random X Chromosome Inactivation. 4D染色质动力学在早期随机X染色体失活期间解决。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-13 DOI: 10.1093/gpbjnl/qzag002

Xiaowen Liu, Hao Xie, Zhiyuan Liu, Yujie Chen, Qimin Xia, Heming Xu, Yi Chi, Shuai Gao, Dong Xing

X chromosome inactivation is a process that compensates X-linked gene dosage in mammalian female cells. The silencing of a randomly selected chromosome is accompanied by dramatic three-dimensional reorganization across the entire chromosome. To investigate the four-dimensional chromatin dynamics during early inactivation stages, we applied the multi-omics sequencing technique HiRES (Hi-C and RNA-seq employed simultaneously), which simultaneously detects the three-dimensional genome and transcriptome in single cells, in a mouse embryonic stem cell line with induced random inactivation. This three-dimensional genome and transcriptome dual-omics data allowed us to identify random inactivation trajectories at single-cell resolution. We characterized multiple layers of X-chromosome reorganization and discovered a transient structural state shared by both X chromosomes, associated with biallelic X-inactive specific transcript (Xist) expression. By constructing single-cell inactivation trajectories, we found that most chromatin remodeling either accompanied or followed gene silencing. Further analysis of interaction decay kinetics revealed that topologically associating domain (TAD) attenuation began from loss of interactions on TAD anchors. This study thus provides a detailed depiction of fine-scale chromatin reorganization during the initiation of random X chromosome inactivation.

在哺乳动物雌性细胞中，X染色体失活是一个补偿X连锁基因剂量的过程。随机选择的染色体沉默伴随着整个染色体的三维重组。为了研究早期失活阶段的四维染色质动力学，我们应用了多组学测序技术HiRES（同时使用Hi-C和RNA-seq），该技术同时检测了诱导随机失活的小鼠胚胎干细胞系单细胞中的三维基因组和转录组。这种三维基因组和转录组双组学数据使我们能够在单细胞分辨率下识别随机失活轨迹。我们对X染色体的多层重组进行了表征，并发现了两个X染色体共享的瞬时结构状态，该状态与双等位基因X无活性特异性转录物（Xist）的表达有关。通过构建单细胞失活轨迹，我们发现大多数染色质重塑要么伴随着基因沉默，要么伴随着基因沉默。进一步的相互作用衰减动力学分析表明，拓扑相关结构域（TAD）的衰减始于TAD锚点上相互作用的丧失。因此，这项研究提供了在随机X染色体失活起始过程中精细尺度染色质重组的详细描述。

{"title":"4D Chromatin Dynamics Resolved During Early Random X Chromosome Inactivation.","authors":"Xiaowen Liu, Hao Xie, Zhiyuan Liu, Yujie Chen, Qimin Xia, Heming Xu, Yi Chi, Shuai Gao, Dong Xing","doi":"10.1093/gpbjnl/qzag002","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzag002","url":null,"abstract":"X chromosome inactivation is a process that compensates X-linked gene dosage in mammalian female cells. The silencing of a randomly selected chromosome is accompanied by dramatic three-dimensional reorganization across the entire chromosome. To investigate the four-dimensional chromatin dynamics during early inactivation stages, we applied the multi-omics sequencing technique HiRES (Hi-C and RNA-seq employed simultaneously), which simultaneously detects the three-dimensional genome and transcriptome in single cells, in a mouse embryonic stem cell line with induced random inactivation. This three-dimensional genome and transcriptome dual-omics data allowed us to identify random inactivation trajectories at single-cell resolution. We characterized multiple layers of X-chromosome reorganization and discovered a transient structural state shared by both X chromosomes, associated with biallelic X-inactive specific transcript (Xist) expression. By constructing single-cell inactivation trajectories, we found that most chromatin remodeling either accompanied or followed gene silencing. Further analysis of interaction decay kinetics revealed that topologically associating domain (TAD) attenuation began from loss of interactions on TAD anchors. This study thus provides a detailed depiction of fine-scale chromatin reorganization during the initiation of random X chromosome inactivation.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial Chromatin Accessibility Analysis of Intratumor Heterogeneity in Breast Cancer. 乳腺癌肿瘤内异质性的空间染色质可及性分析。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-12 DOI: 10.1093/gpbjnl/qzag001

Yingying Qian, Miao Zhu, Chongyang Ren, Yeyong Zhou, Jian Xu, Liang Dong, Guangyu Zhang, Cheukfai Li, Jiaoyi Lv, Qiaorui Xing, Guochun Zhang, Guangdun Peng, Ning Liao

Intratumoral heterogeneity (ITH) is a major driver of mortality in breast cancer (BC) patients and a critical factor in the variable therapeutic outcomes observed in BC treatment. Understanding the mechanisms underlying ITH is essential for advancing both clinical and basic BC research. Chromatin accessibility is critical for regulation of gene expression and cellular identity and plays a central role in shaping ITH and tumor evolution. However, studying chromatin accessibility in situ has been challenging due to the availability of technical platforms. Here, we leveraged the spatial ATAC-seq platform to profile the chromatin accessibility landscape of tumors from six BC patients. Our analyses revealed prominent heterogeneity within tumor regulatory modules and spatial variations in immune cell composition and stromal structures, offering a framework for investigation of the molecular architecture underlying ITH. Moreover, we identified two tumor subclones with potential common origin but distinct immune infiltration conferred by regulatory cascades, suggesting that epigenetic regulation may further contribute to the divergent tumor microenvironments and phenotypic diversity of these subclones. Our study provides novel insights into the molecular mechanisms driving ITH and opens up potential avenues for therapeutic intervention.

肿瘤内异质性（ITH）是乳腺癌（BC）患者死亡率的主要驱动因素，也是乳腺癌治疗中观察到的不同治疗结果的关键因素。了解ITH的机制对于推进临床和基础BC研究至关重要。染色质可及性对基因表达和细胞身份的调控至关重要，并在形成ITH和肿瘤进化中起着核心作用。然而，由于技术平台的可用性，原位研究染色质可及性一直具有挑战性。在这里，我们利用空间ATAC-seq平台来分析6例BC患者肿瘤的染色质可及性景观。我们的分析揭示了肿瘤调节模块的显著异质性以及免疫细胞组成和基质结构的空间差异，为研究ITH的分子结构提供了一个框架。此外，我们发现了两个肿瘤亚克隆，它们具有潜在的共同起源，但由于调节级联而具有不同的免疫浸润，这表明表观遗传调节可能进一步促进了这些亚克隆的肿瘤微环境差异和表型多样性。我们的研究为驱动ITH的分子机制提供了新的见解，并为治疗干预开辟了潜在的途径。

{"title":"Spatial Chromatin Accessibility Analysis of Intratumor Heterogeneity in Breast Cancer.","authors":"Yingying Qian, Miao Zhu, Chongyang Ren, Yeyong Zhou, Jian Xu, Liang Dong, Guangyu Zhang, Cheukfai Li, Jiaoyi Lv, Qiaorui Xing, Guochun Zhang, Guangdun Peng, Ning Liao","doi":"10.1093/gpbjnl/qzag001","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzag001","url":null,"abstract":"Intratumoral heterogeneity (ITH) is a major driver of mortality in breast cancer (BC) patients and a critical factor in the variable therapeutic outcomes observed in BC treatment. Understanding the mechanisms underlying ITH is essential for advancing both clinical and basic BC research. Chromatin accessibility is critical for regulation of gene expression and cellular identity and plays a central role in shaping ITH and tumor evolution. However, studying chromatin accessibility in situ has been challenging due to the availability of technical platforms. Here, we leveraged the spatial ATAC-seq platform to profile the chromatin accessibility landscape of tumors from six BC patients. Our analyses revealed prominent heterogeneity within tumor regulatory modules and spatial variations in immune cell composition and stromal structures, offering a framework for investigation of the molecular architecture underlying ITH. Moreover, we identified two tumor subclones with potential common origin but distinct immune infiltration conferred by regulatory cascades, suggesting that epigenetic regulation may further contribute to the divergent tumor microenvironments and phenotypic diversity of these subclones. Our study provides novel insights into the molecular mechanisms driving ITH and opens up potential avenues for therapeutic intervention.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Layering Methylome and Transcriptome in the Same Tissue Slice. 同一组织切片的分层甲基组和转录组。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-12 DOI: 10.1093/gpbjnl/qzaf136

Yang Xiao, Sai Ma

引用次数: 0

Generalizable Single-cell Multimodal Data Integration with Self-supervised Learning. 基于自监督学习的广义单细胞多模态数据集成。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/gpbjnl/qzaf129

Jinhui Shi, Shuofeng Hu, Runyan Liu, Jiahao Zhou, Jing Wang, Xiaomin Ying, Zhen He

Recent breakthroughs in single-cell multi-omics technologies have enabled simultaneous measurement of diverse cellular modalities, offering unprecedented biological insights. However, integrating such multimodal data faces dual challenges: Small-scale paired-modality studies (hundreds of cells) risk overfitting, while large-scale reference atlases often struggle to generalize effectively to new datasets. To overcome these challenges, we present multimodal integration with self-supervised learning (MINERVA), a unified deep learning framework employing self-supervised strategies for single-cell multimodal integration. MINERVA outperforms six state-of-the-art methods in dimensionality reduction, missing feature imputation, and batch effect correction, even with limited training cells. For large-scale applications, MINERVA constructs scalable multi-tissue references that support zero-shot knowledge transfer to unseen datasets, instant cell type annotation, novel cell states identification, and comprehensive downstream analyses, all without model retraining. Uniquely bridging small-scale precision with atlas-level generalization, MINERVA serves as a versatile tool for both de novo integration and cost-effective atlas reuse in single-cell research.

单细胞多组学技术的最新突破使多种细胞模式的同时测量成为可能，提供了前所未有的生物学见解。然而，整合这样的多模态数据面临双重挑战：小规模的成对模态研究（数百个单元）存在过拟合的风险，而大规模的参考地图集往往难以有效地推广到新的数据集。为了克服这些挑战，我们提出了多模态集成与自监督学习（MINERVA），这是一个统一的深度学习框架，采用自监督策略进行单细胞多模态集成。MINERVA在降维、缺失特征输入和批处理效果校正方面优于六种最先进的方法，即使训练细胞有限。对于大规模应用，MINERVA构建了可扩展的多组织参考，支持零shot知识转移到未见过的数据集，即时细胞类型注释，新细胞状态识别和全面的下游分析，所有这些都无需模型再训练。MINERVA独特地将小规模精度与图谱级别的泛化相结合，可作为单细胞研究中从头集成和具有成本效益的图谱重用的多功能工具。

{"title":"Generalizable Single-cell Multimodal Data Integration with Self-supervised Learning.","authors":"Jinhui Shi, Shuofeng Hu, Runyan Liu, Jiahao Zhou, Jing Wang, Xiaomin Ying, Zhen He","doi":"10.1093/gpbjnl/qzaf129","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf129","url":null,"abstract":"Recent breakthroughs in single-cell multi-omics technologies have enabled simultaneous measurement of diverse cellular modalities, offering unprecedented biological insights. However, integrating such multimodal data faces dual challenges: Small-scale paired-modality studies (hundreds of cells) risk overfitting, while large-scale reference atlases often struggle to generalize effectively to new datasets. To overcome these challenges, we present multimodal integration with self-supervised learning (MINERVA), a unified deep learning framework employing self-supervised strategies for single-cell multimodal integration. MINERVA outperforms six state-of-the-art methods in dimensionality reduction, missing feature imputation, and batch effect correction, even with limited training cells. For large-scale applications, MINERVA constructs scalable multi-tissue references that support zero-shot knowledge transfer to unseen datasets, instant cell type annotation, novel cell states identification, and comprehensive downstream analyses, all without model retraining. Uniquely bridging small-scale precision with atlas-level generalization, MINERVA serves as a versatile tool for both de novo integration and cost-effective atlas reuse in single-cell research.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Single-cell Transcriptome-wide Association Study Reveals Susceptibility Genes for Age-related Hearing Loss. 一项单细胞转录组关联研究揭示了与年龄相关的听力损失的易感基因。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-06 DOI: 10.1093/gpbjnl/qzaf137

Yuanfeng Li, Tao Zeng, Wenyu Song, Yahui Wang, Lili Ren, Chenning Yang, Gangqiao Zhou, Yuguang Niu

Age-related hearing loss (ARHL) is the most common type of hearing loss. Genetic factors are considered to play important roles in the development of ARHL. To identify novel susceptibility genes and cell types relevant to ARHL, we performed a two-stage single-cell transcriptome-wide association study (scTWAS) on ARHL in 96,372 cases and 141,590 controls of European descent. In the discovery stage, we identified 1034 gene-cell pairs that showed suggestive associations with ARHL (P < 1.0 × 10-5), representing 450 genes across various cell types. These genes were enriched in multiple pathways, including the immune-related, estrogen signaling, and oxidative damage response pathways. Besides, we provided prominent genetic evidence for putative drug repurposing, highlighting several genes as potential targets, including NR3C2, CHRM4 and SHBG. Further, we validated the significant association of 41 genes with ARHL in the replication stage of scTWAS, including previously reported genes such as HLA-DRA, as well as novel candidates such as TNF, ZC3HAV1, and SLC44A4. Among these novel candidates, several are highly biologically plausible in the development of ARHL. In conclusion, this scTWAS broadens our understanding of the genetic susceptibility to ARHL, which might be helpful in developing new strategies for the treatment and prevention of ARHL.

年龄相关性听力损失（ARHL）是最常见的听力损失类型。遗传因素被认为在ARHL的发生发展中起重要作用。为了确定与ARHL相关的新的易感基因和细胞类型，我们对96,372例ARHL病例和141,590名欧洲血统的对照进行了两阶段的单细胞转录组关联研究（scTWAS）。在发现阶段，我们发现了1034对与ARHL有关联的基因细胞对（P < 1.0 × 10-5），代表了不同细胞类型的450个基因。这些基因在多种途径中富集，包括免疫相关途径、雌激素信号通路和氧化损伤反应途径。此外，我们为推测的药物再利用提供了突出的遗传证据，突出了几个基因作为潜在的靶点，包括NR3C2、CHRM4和SHBG。此外，我们验证了41个基因在scTWAS复制阶段与ARHL的显著关联，包括先前报道的基因，如HLA-DRA，以及新的候选基因，如TNF、ZC3HAV1和SLC44A4。在这些新的候选药物中，有几种在ARHL的发展中具有高度的生物学合理性。总之，这项研究拓宽了我们对ARHL遗传易感性的认识，可能有助于制定新的治疗和预防ARHL的策略。

{"title":"A Single-cell Transcriptome-wide Association Study Reveals Susceptibility Genes for Age-related Hearing Loss.","authors":"Yuanfeng Li, Tao Zeng, Wenyu Song, Yahui Wang, Lili Ren, Chenning Yang, Gangqiao Zhou, Yuguang Niu","doi":"10.1093/gpbjnl/qzaf137","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf137","url":null,"abstract":"Age-related hearing loss (ARHL) is the most common type of hearing loss. Genetic factors are considered to play important roles in the development of ARHL. To identify novel susceptibility genes and cell types relevant to ARHL, we performed a two-stage single-cell transcriptome-wide association study (scTWAS) on ARHL in 96,372 cases and 141,590 controls of European descent. In the discovery stage, we identified 1034 gene-cell pairs that showed suggestive associations with ARHL (P < 1.0 × 10-5), representing 450 genes across various cell types. These genes were enriched in multiple pathways, including the immune-related, estrogen signaling, and oxidative damage response pathways. Besides, we provided prominent genetic evidence for putative drug repurposing, highlighting several genes as potential targets, including NR3C2, CHRM4 and SHBG. Further, we validated the significant association of 41 genes with ARHL in the replication stage of scTWAS, including previously reported genes such as HLA-DRA, as well as novel candidates such as TNF, ZC3HAV1, and SLC44A4. Among these novel candidates, several are highly biologically plausible in the development of ARHL. In conclusion, this scTWAS broadens our understanding of the genetic susceptibility to ARHL, which might be helpful in developing new strategies for the treatment and prevention of ARHL.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MaizeGEP: A Maize Hybrids Dataset with Genotype, Phenotype, and Envirotype to Develop Genomic Selection Models. MaizeGEP：一个具有基因型、表型和环境型的玉米杂交数据集，用于开发基因组选择模型。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2026-01-02 DOI: 10.1093/gpbjnl/qzaf140

Dongfeng Zhang, Yanyun Han, Shouhui Pan, Zhongqiang Liu, Xiangyu Zhao, Qiusi Zhang, Qi Zhang, Xiaofeng Wang, Jiahao Sun, Kaiyi Wang

The integration of genotype with envirotype is essential for achieving precise phenotypic prediction. However, there is currently a scarcity of datasets derived from variety approval testing trials that encompass a broad spectrum of trial locations, conform to standardized management protocols, and which incorporate comprehensive records of environmental variables. This study introduces MaizeGEP, a dataset consisting of 260 hybrid maize varieties from national maize variety regional trials. This dataset includes 12,233 selected tag single nucleotide polymorphisms for each variety, phenotypic survey data on 11 traits across 2382 year-county locations, and daily meteorological records. We utilized this dataset to conduct analyses on the clustering of 2382 year-county locations, the population structure of the 260 varieties, and genome-wide association studies. Furthermore, a novel mixture of experts (MoE) framework incorporating genotype-envirotype to phenotype (GE2P) algorithms was employed for best linear unbiased estimator (BLUE) value prediction and phenotypic prediction. Additionally, several machine learning and deep learning algorithms, including Bayesian methods, support vector machines (SVM), LightGBM, multilayer perceptron (MLP), DeepGS, DEM, and Cropformer, were utilized to validate the effectiveness of the dataset and improve phenotypic prediction accuracy. The findings suggest that the MaizeGEP dataset serves as a valuable resource for investigating the relationship among genotype, envirotype, and phenotype, as well as predicting cross-environmental performance. This underscores the importance of encouraging researchers to utilize this dataset in developing sophisticated GE2P models. Such models can aid plant breeders in selecting new varieties and facilitating their deployment across diverse regions. MaizeGEP is publicly accessible at http://user.ebreed.cn:9992/scb/.

基因型与环境型的整合对于实现精确的表型预测至关重要。然而，目前从各种批准测试试验中获得的数据集缺乏，这些试验包括广泛的试验地点，符合标准化管理协议，并包含环境变量的综合记录。本研究引入了由国家玉米品种区域试验的260个杂交玉米品种组成的数据集MaizeGEP。该数据集包括每个品种的12,233个选定标签单核苷酸多态性，2382个县/年地点的11个性状的表型调查数据，以及每日气象记录。利用该数据集进行了2382个年份县的聚类分析、260个品种的种群结构分析和全基因组关联研究。此外，结合基因型-环境型和表型（GE2P）算法的新型混合专家（MoE）框架被用于最佳线性无偏估计（BLUE）值预测和表型预测。此外，还使用了几种机器学习和深度学习算法，包括贝叶斯方法、支持向量机（SVM）、LightGBM、多层感知器（MLP）、DeepGS、DEM和Cropformer来验证数据集的有效性，并提高表型预测的准确性。研究结果表明，MaizeGEP数据集为研究基因型、环境型和表型之间的关系以及预测跨环境性能提供了有价值的资源。这强调了鼓励研究人员利用该数据集开发复杂的GE2P模型的重要性。这些模型可以帮助植物育种者选择新品种并促进它们在不同地区的部署。MaizeGEP可在http://user.ebreed.cn:9992/scb/公开访问。

{"title":"MaizeGEP: A Maize Hybrids Dataset with Genotype, Phenotype, and Envirotype to Develop Genomic Selection Models.","authors":"Dongfeng Zhang, Yanyun Han, Shouhui Pan, Zhongqiang Liu, Xiangyu Zhao, Qiusi Zhang, Qi Zhang, Xiaofeng Wang, Jiahao Sun, Kaiyi Wang","doi":"10.1093/gpbjnl/qzaf140","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf140","url":null,"abstract":"The integration of genotype with envirotype is essential for achieving precise phenotypic prediction. However, there is currently a scarcity of datasets derived from variety approval testing trials that encompass a broad spectrum of trial locations, conform to standardized management protocols, and which incorporate comprehensive records of environmental variables. This study introduces MaizeGEP, a dataset consisting of 260 hybrid maize varieties from national maize variety regional trials. This dataset includes 12,233 selected tag single nucleotide polymorphisms for each variety, phenotypic survey data on 11 traits across 2382 year-county locations, and daily meteorological records. We utilized this dataset to conduct analyses on the clustering of 2382 year-county locations, the population structure of the 260 varieties, and genome-wide association studies. Furthermore, a novel mixture of experts (MoE) framework incorporating genotype-envirotype to phenotype (GE2P) algorithms was employed for best linear unbiased estimator (BLUE) value prediction and phenotypic prediction. Additionally, several machine learning and deep learning algorithms, including Bayesian methods, support vector machines (SVM), LightGBM, multilayer perceptron (MLP), DeepGS, DEM, and Cropformer, were utilized to validate the effectiveness of the dataset and improve phenotypic prediction accuracy. The findings suggest that the MaizeGEP dataset serves as a valuable resource for investigating the relationship among genotype, envirotype, and phenotype, as well as predicting cross-environmental performance. This underscores the importance of encouraging researchers to utilize this dataset in developing sophisticated GE2P models. Such models can aid plant breeders in selecting new varieties and facilitating their deployment across diverse regions. MaizeGEP is publicly accessible at http://user.ebreed.cn:9992/scb/.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145897145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmark and Evaluation for Somatic Structural Variants Detection with Long-read Sequencing Data. 利用长读序列数据检测体细胞结构变异的基准与评价。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-12-31 DOI: 10.1093/gpbjnl/qzaf139

Ziting Feng, Xuyan Liu, Yahui Liu, Kailing Tu, Lin Xia, Dan Xie

Somatic structural variations (somatic SVs) are hallmarks of tumors, but their comprehensive detection remains technically challenging. Long-read sequencing (LRS) technology, which generates reads spanning large-scale SVs and their flanking sequences, enables a wide range of prospects for somatic SV detection. However, existing LRS-based somatic SV detection algorithms and pipelines exhibit variable performance that has not been systematically characterized. In this study, we conducted a rigorous evaluation of 51 LRS-based somatic SV detection strategies, integrating 3 reference genomes, 2 aligners, 5 SV callers, and 5 processing methods tailored for SV callers. We use both simulated datasets and empirical data from HCC1395/HCC1395BL cell lines sequenced on Oxford Nanopore (ONT) and Pacific Biosciences (PacBio) platforms for technical assessment. Our findings highlight the need for further refinement of specialized somatic SV detection tools, as no single strategy consistently outperforms across all scenarios. Workflows based on germline SV callers exhibit a high false-positive rate, which cannot be mitigated by increasing sequencing depth or tumor purity. Furthermore, challenges persist in detecting insertions, genomic tandem repeat regions, and ultra-long SVs. We delineate technical bottlenecks in current somatic SV detection approaches and provide recommendations for their further advancement. Additionally, we offer suggestions for selecting specific tools in different application scenarios. This work offers a comprehensive benchmark for somatic SV detection and valuable insights for future LRS-based tools development and methodological improvements.

体细胞结构变异（体细胞SVs）是肿瘤的标志，但其综合检测在技术上仍然具有挑战性。长读测序（LRS）技术可以生成跨越大尺度SV及其侧翼序列的reads，为体细胞SV检测提供了广阔的前景。然而，现有的基于lrs的躯体SV检测算法和管道表现出不同的性能，尚未得到系统的表征。在这项研究中，我们对51种基于lrs的体细胞SV检测策略进行了严格的评估，整合了3个参考基因组、2个比对者、5个SV呼叫者和5种针对SV呼叫者的处理方法。我们使用模拟数据集和在Oxford Nanopore （ONT）和Pacific Biosciences （PacBio）平台上测序的HCC1395/HCC1395BL细胞系的经验数据进行技术评估。我们的研究结果强调了进一步改进专门的躯体SV检测工具的必要性，因为没有一种策略在所有情况下都能始终表现出色。基于种系SV调用者的工作流程表现出很高的假阳性率，不能通过增加测序深度或肿瘤纯度来减轻。此外，在检测插入、基因组串联重复区域和超长sv方面仍然存在挑战。我们描述了当前体细胞SV检测方法的技术瓶颈，并为其进一步发展提供了建议。此外，我们还提供了在不同应用场景中选择特定工具的建议。这项工作为躯体SV检测提供了全面的基准，并为未来基于lrs的工具开发和方法改进提供了有价值的见解。

{"title":"Benchmark and Evaluation for Somatic Structural Variants Detection with Long-read Sequencing Data.","authors":"Ziting Feng, Xuyan Liu, Yahui Liu, Kailing Tu, Lin Xia, Dan Xie","doi":"10.1093/gpbjnl/qzaf139","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf139","url":null,"abstract":"Somatic structural variations (somatic SVs) are hallmarks of tumors, but their comprehensive detection remains technically challenging. Long-read sequencing (LRS) technology, which generates reads spanning large-scale SVs and their flanking sequences, enables a wide range of prospects for somatic SV detection. However, existing LRS-based somatic SV detection algorithms and pipelines exhibit variable performance that has not been systematically characterized. In this study, we conducted a rigorous evaluation of 51 LRS-based somatic SV detection strategies, integrating 3 reference genomes, 2 aligners, 5 SV callers, and 5 processing methods tailored for SV callers. We use both simulated datasets and empirical data from HCC1395/HCC1395BL cell lines sequenced on Oxford Nanopore (ONT) and Pacific Biosciences (PacBio) platforms for technical assessment. Our findings highlight the need for further refinement of specialized somatic SV detection tools, as no single strategy consistently outperforms across all scenarios. Workflows based on germline SV callers exhibit a high false-positive rate, which cannot be mitigated by increasing sequencing depth or tumor purity. Furthermore, challenges persist in detecting insertions, genomic tandem repeat regions, and ultra-long SVs. We delineate technical bottlenecks in current somatic SV detection approaches and provide recommendations for their further advancement. Additionally, we offer suggestions for selecting specific tools in different application scenarios. This work offers a comprehensive benchmark for somatic SV detection and valuable insights for future LRS-based tools development and methodological improvements.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

REC8-Cohesin Preferentially Localizes to Promoters of Genes that are Regulated by Transcription Suppressor BEND2 During Early Meiosis. 在早期减数分裂中，rec8 -内聚蛋白优先定位于受转录抑制因子BEND2调控的基因启动子。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-12-31 DOI: 10.1093/gpbjnl/qzaf138

Dan Xie, Longfei Ma, Jing Sun, Hengyu Nie, Lin Yan, Yalin Xue, Jian Chen, Shuguang Duo, Chunsheng Han

Cohesin plays critical roles in chromatin organization and transcription regulation. REC8 is a meiosis-specific cohesin subunit and is essential for homologous chromosome synapsis, recombination, and segregation. However, little is known about the relationship between the dynamic genome-wide distribution of cohesin and transcription regulation during meiotic initiation. In this study, we report that REC8-cohesin is preferentially localized to open promoter regions of genes involved in spermatogonial differentiation and meiosis at early meiosis from preleptonema to zygonema. Genomic localization of REC8-cohesin is changed by the gene knockout of the transcriptional suppressor BEND2. We also find that REC8 is able to interact with mitotic cyclin CCNA2, that the CCNA2 expression is extended to leptonema in Bend2 knockout mice, and that the meiotic cells of Bend2 knockout mice do not exit the mitotic cell cycle completely. We further found that a large number of genes are commonly bound by BEND2, STRA8, MEIOSIN, and REC8-cohesin. Our study has therefore revealed that genes with open promoters are bound by meiotic cohesin and transcription factors coordinately to facilitate chromatin reorganization and transcription regulation leading to the switch from a mitotic cell cycle to a meiotic one at the initiation stage of meiosis.

内聚蛋白在染色质组织和转录调控中起着关键作用。REC8是减数分裂特异性内聚蛋白亚基，对同源染色体突触、重组和分离至关重要。然而，在减数分裂起始过程中，内聚蛋白的全基因组动态分布与转录调控之间的关系尚不清楚。在这项研究中，我们报道了rec8 -粘接蛋白在轻体前体到颧肿的早期减数分裂中优先定位于参与精原细胞分化和减数分裂的基因的开放启动子区域。基因敲除转录抑制因子BEND2会改变rec8 -内聚蛋白的基因组定位。我们还发现REC8能够与有丝分裂周期蛋白CCNA2相互作用，在Bend2敲除小鼠中CCNA2的表达扩展到瘦素体，并且Bend2敲除小鼠的减数分裂细胞不完全退出有丝分裂细胞周期。我们进一步发现，大量基因通常与BEND2、STRA8、MEIOSIN和rec8黏结蛋白结合。因此，我们的研究表明，具有开放启动子的基因与减数分裂内聚蛋白和转录因子协调结合，促进染色质重组和转录调节，从而在减数分裂起始阶段从有丝分裂细胞周期切换到减数分裂周期。

{"title":"REC8-Cohesin Preferentially Localizes to Promoters of Genes that are Regulated by Transcription Suppressor BEND2 During Early Meiosis.","authors":"Dan Xie, Longfei Ma, Jing Sun, Hengyu Nie, Lin Yan, Yalin Xue, Jian Chen, Shuguang Duo, Chunsheng Han","doi":"10.1093/gpbjnl/qzaf138","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf138","url":null,"abstract":"Cohesin plays critical roles in chromatin organization and transcription regulation. REC8 is a meiosis-specific cohesin subunit and is essential for homologous chromosome synapsis, recombination, and segregation. However, little is known about the relationship between the dynamic genome-wide distribution of cohesin and transcription regulation during meiotic initiation. In this study, we report that REC8-cohesin is preferentially localized to open promoter regions of genes involved in spermatogonial differentiation and meiosis at early meiosis from preleptonema to zygonema. Genomic localization of REC8-cohesin is changed by the gene knockout of the transcriptional suppressor BEND2. We also find that REC8 is able to interact with mitotic cyclin CCNA2, that the CCNA2 expression is extended to leptonema in Bend2 knockout mice, and that the meiotic cells of Bend2 knockout mice do not exit the mitotic cell cycle completely. We further found that a large number of genes are commonly bound by BEND2, STRA8, MEIOSIN, and REC8-cohesin. Our study has therefore revealed that genes with open promoters are bound by meiotic cohesin and transcription factors coordinately to facilitate chromatin reorganization and transcription regulation leading to the switch from a mitotic cell cycle to a meiotic one at the initiation stage of meiosis.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

iRUNNER: A Baseline Mutation Burden Regression for Identifying Gene Interaction Between Rare Variants for Diseases. 赛跑者：用于识别罕见疾病变异之间基因相互作用的基线突变负担回归。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-12-30 DOI: 10.1093/gpbjnl/qzaf135

Hui Jiang, Bin Tang, Kun Li, Liubin Zhang, Junhao Liang, Clara Sze-Man Tang, Paul Kwong-Hang Tam, Binbin Wang, Youqiang Song, Qiang Wang, Mulin Jun Li, Hailiang Huang, Miaoxin Li

Genetic interactions play a crucial role in elucidating the susceptibility and etiology of complex multifactorial diseases. Despite significant efforts to identify disease-associated nonlinear effects in genome-wide association studies, efficient methods for detecting the epistatic impact of rare variants remain lacking. In this study, we proposed iRUNNER, a novel and powerful mutation burden test focused on analyzing the interaction effects of rare variants on a binary trait. Different from conventional association tests comparing cases with controls, iRUNNER evaluates the relative enrichment of rare variant interaction burden of pairwise genes in patients against its baseline, estimated by a recursive truncated negative-binomial regression model that leverages multiple genomic features from public databases. Extensive simulations demonstrated that iRUNNER outperforms existing epistasis tests in statistical power and maintains reasonable type I error rates even when population stratification exists in control samples. Applied to real datasets of five complex diseases, iRUNNER yielded substantial gains in gene-gene interaction detections. Notably, the majority of these signals were missed by alternative methods, especially in small to medium-sized samples. Furthermore, we found that these identified gene pairs of each trait can form interconnected networks, which may provide valuable insights into the underlying molecular mechanisms. We have implemented iRUNNER as a module in our integrative platform KGGSeq (http://pmglab.top/kggseq/) that enables rapid testing of pairwise interactions among all possible non-synonymous rare coding variants within hours.

遗传相互作用在阐明复杂多因子疾病的易感性和病因学方面起着至关重要的作用。尽管在全基因组关联研究中为识别疾病相关的非线性效应做出了重大努力，但仍然缺乏检测罕见变异上位性影响的有效方法。在这项研究中，我们提出了一种新颖而强大的突变负担测试irrunner，专注于分析罕见变异对二元性状的相互作用效应。与将病例与对照组进行比较的传统关联试验不同，irrunner通过利用公共数据库中的多个基因组特征的递归截断负二项回归模型，根据其基线评估患者中罕见变异相互作用负担的相对富集程度。大量的模拟表明，即使在控制样本中存在人口分层，runner在统计能力上优于现有的上位性测试，并保持合理的I型错误率。应用于五种复杂疾病的真实数据集，runner在基因-基因相互作用检测方面取得了实质性进展。值得注意的是，替代方法遗漏了大多数这些信号，特别是在中小型样本中。此外，我们发现这些鉴定出的每个性状的基因对可以形成相互关联的网络，这可能为潜在的分子机制提供有价值的见解。我们已经在我们的集成平台KGGSeq （http://pmglab.top/kggseq/）中实现了irrunner作为模块，可以在数小时内快速测试所有可能的非同义罕见编码变体之间的成对相互作用。

{"title":"iRUNNER: A Baseline Mutation Burden Regression for Identifying Gene Interaction Between Rare Variants for Diseases.","authors":"Hui Jiang, Bin Tang, Kun Li, Liubin Zhang, Junhao Liang, Clara Sze-Man Tang, Paul Kwong-Hang Tam, Binbin Wang, Youqiang Song, Qiang Wang, Mulin Jun Li, Hailiang Huang, Miaoxin Li","doi":"10.1093/gpbjnl/qzaf135","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf135","url":null,"abstract":"Genetic interactions play a crucial role in elucidating the susceptibility and etiology of complex multifactorial diseases. Despite significant efforts to identify disease-associated nonlinear effects in genome-wide association studies, efficient methods for detecting the epistatic impact of rare variants remain lacking. In this study, we proposed iRUNNER, a novel and powerful mutation burden test focused on analyzing the interaction effects of rare variants on a binary trait. Different from conventional association tests comparing cases with controls, iRUNNER evaluates the relative enrichment of rare variant interaction burden of pairwise genes in patients against its baseline, estimated by a recursive truncated negative-binomial regression model that leverages multiple genomic features from public databases. Extensive simulations demonstrated that iRUNNER outperforms existing epistasis tests in statistical power and maintains reasonable type I error rates even when population stratification exists in control samples. Applied to real datasets of five complex diseases, iRUNNER yielded substantial gains in gene-gene interaction detections. Notably, the majority of these signals were missed by alternative methods, especially in small to medium-sized samples. Furthermore, we found that these identified gene pairs of each trait can form interconnected networks, which may provide valuable insights into the underlying molecular mechanisms. We have implemented iRUNNER as a module in our integrative platform KGGSeq (http://pmglab.top/kggseq/) that enables rapid testing of pairwise interactions among all possible non-synonymous rare coding variants within hours.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0