Genomics, proteomics & bioinformatics最新文献_第5页

Noise2read: Accurately Rectify Millions of Erroneous Short Reads Through Graph Learning on Edit Distances. Noise2read：通过编辑距离上的图学习准确纠正数百万错误的短读。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-29 DOI: 10.1093/gpbjnl/qzaf120

Pengyao Ping, Shuquan Su, Xinhui Cai, Tian Lan, Xuan Zhang, Hui Peng, Yi Pan, Wei Liu, Jinyan Li

Although the per-base error rate of short-read sequencing data is very low at 0.1%-0.5%, the percentage/probability of erroneous reads in a dataset can be as high as 10%-15% or in the number millions. As current methods correct only some errors while introducing many new errors, we solve this problem by turning erroneous reads into their original states, without bringing up any non-existing reads to keep the data integrity. The novelty is originated in a computable rule translated from polymerase chain reaction (PCR) erring mechanism that: a rare read is erroneous if it has a neighbouring read of high abundance. With this principle, we construct a graph to link each pair of reads of tiny edit distances to detect a solid part of erroneous reads; then we consider these pairs of reads of tiny edit distances as training data to learn the erring mechanisms to identify possibly remaining hard-case errors between pairs of high-abundance reads. The proposed approach, noise2read, is competent to handle the rectification of erroneous reads from short-read sequencing data whenever PCR is involved. Compared with state-of-the-art methods on tens of evaluation datasets of unique molecular identifier (UMI) based ground truth, noise2read performs significantly better on 19 metrics. Case studies found that noise2read can greatly improve short-reads quality and make substantial impact on genome abundance quantification, isoform identification, single nucleotide polymorphisms (SNP) profiling, and genome editing efficiency estimation. Noise2read is publicly available at https://github.com/JappyPing/noise2read and https://ngdc.cncb.ac.cn/biocode/tool/7951.

虽然短读测序数据的每碱基错误率很低，只有0.1%-0.5%，但数据集中错误读取的百分比/概率可能高达10%-15%或数百万。由于目前的方法只纠正了一些错误，而引入了许多新的错误，我们通过将错误读取转换为其原始状态来解决这个问题，而不引入任何不存在的读取以保持数据的完整性。这种新奇源于聚合酶链反应（PCR）错误机制的一个可计算规则：如果一个罕见的读取有一个邻近的高丰度读取，那么它就是错误的。利用这一原理，我们构建了一个图来连接每一对编辑距离很小的读取，以检测出错误读取的实体部分；然后，我们将这些小编辑距离的读取对作为训练数据来学习错误机制，以识别高丰度读取对之间可能存在的硬案例错误。所提出的方法noise2read能够在涉及PCR的情况下处理短读测序数据的错误校正。与基于地面真值的唯一分子标识符（UMI）的数十个评估数据集的最新方法相比，noise2read在19个指标上表现明显更好。案例研究发现，noise2read可以极大地提高短读段质量，并对基因组丰度定量、同工异构体鉴定、单核苷酸多态性（SNP）分析和基因组编辑效率估计产生重大影响。Noise2read可在https://github.com/JappyPing/noise2read和https://ngdc.cncb.ac.cn/biocode/tool/7951公开获取。

{"title":"Noise2read: Accurately Rectify Millions of Erroneous Short Reads Through Graph Learning on Edit Distances.","authors":"Pengyao Ping, Shuquan Su, Xinhui Cai, Tian Lan, Xuan Zhang, Hui Peng, Yi Pan, Wei Liu, Jinyan Li","doi":"10.1093/gpbjnl/qzaf120","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf120","url":null,"abstract":"Although the per-base error rate of short-read sequencing data is very low at 0.1%-0.5%, the percentage/probability of erroneous reads in a dataset can be as high as 10%-15% or in the number millions. As current methods correct only some errors while introducing many new errors, we solve this problem by turning erroneous reads into their original states, without bringing up any non-existing reads to keep the data integrity. The novelty is originated in a computable rule translated from polymerase chain reaction (PCR) erring mechanism that: a rare read is erroneous if it has a neighbouring read of high abundance. With this principle, we construct a graph to link each pair of reads of tiny edit distances to detect a solid part of erroneous reads; then we consider these pairs of reads of tiny edit distances as training data to learn the erring mechanisms to identify possibly remaining hard-case errors between pairs of high-abundance reads. The proposed approach, noise2read, is competent to handle the rectification of erroneous reads from short-read sequencing data whenever PCR is involved. Compared with state-of-the-art methods on tens of evaluation datasets of unique molecular identifier (UMI) based ground truth, noise2read performs significantly better on 19 metrics. Case studies found that noise2read can greatly improve short-reads quality and make substantial impact on genome abundance quantification, isoform identification, single nucleotide polymorphisms (SNP) profiling, and genome editing efficiency estimation. Noise2read is publicly available at https://github.com/JappyPing/noise2read and https://ngdc.cncb.ac.cn/biocode/tool/7951.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145644031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harnessing Large Cohorts and AI to Bridge Genomic Discovery and Clinical Practice. 利用大队列和人工智能来连接基因组发现和临床实践。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-27 DOI: 10.1093/gpbjnl/qzaf104

Bitao Zhong, Shaoqi Wang, Xiaoxi Jing, Aniruddh P Patel, Yajie Zhao, Minxian Wang

引用次数: 0

A Telomere-to-Telomere Diploid Reference Genome and Centromere Structure of the Chinese Quartet. 端粒-端粒二倍体参考基因组与着丝粒结构。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-26 DOI: 10.1093/gpbjnl/qzaf118

Bo Wang, Peng Jia, Stephen J Bush, Xia Wang, Yi Yang, Yu Zhang, Shijie Wan, Xiaofei Yang, Pengyu Zhang, Yuanting Zheng, Leming Shi, Lianhua Dong, Kai Ye

Recent advances in sequencing technologies have enabled the complete assembly of human genomes from telomere to telomere (T2T), resolving previously inaccessible regions such as centromeres and segmental duplications. Here, we present an updated, higher-quality, haplotype-phased T2T assembly of the Chinese Quartet (T2T-CQ), a family cohort comprising monozygotic twins and their parents, generated using high-coverage ONT ultralong and PacBio HiFi sequencing. The T2T-CQ assembly serves as a crucial reference genome for integrating publicly available multi-omics data and advances the utility of the Quartet reference materials. The T2T-CQ assembly scores highly on multiple metrics of continuity and completeness, with Genome Continuity Inspector (GCI) scores of 77.76 (maternal) and 76.41 (paternal), 21-mer quality values (QV) > 66, and Clipping Reveals Assembly Quality (CRAQ) scores > 99.6 for both haplotypes, enabling complete annotation of centromeric regions. Within these regions, we identified novel 13-mer higher-order repeat patterns on chromosome 17 which exhibited a monophyletic origin and emerged approximately 230 thousand years ago. Overall, this work establishes an essential genomic resource for the Han Chinese population and advances the development of a T2T pan-Chinese reference genome, which will significantly enable future investigations both into population-specific structural variants and the evolutionary dynamics of centromeres.

测序技术的最新进展使人类基因组从端粒到端粒（T2T）的完整组装成为可能，解决了以前无法进入的区域，如着丝粒和片段复制。在这里，我们提出了一个更新的、更高质量的、单倍型分期T2T中国四胞胎（T2T- cq），一个由同卵双胞胎及其父母组成的家庭队列，使用高覆盖ONT ultralong和PacBio HiFi测序生成。T2T-CQ组装体是整合公开可用的多组学数据的关键参考基因组，并推进了四方参考材料的实用性。T2T-CQ组合在连续性和完整性的多个指标上得分很高，基因组连续性检查（GCI）得分为77.76（母系）和76.41（父系），21 mer质量值（QV）为> 66，Clipping显示组装质量（CRAQ）得分为> 99.6，两种单倍型都可以完成着中心点区域的注释。在这些区域内，我们在17号染色体上发现了新的13-mer高阶重复模式，这些模式显示出单系起源，大约出现在23万年前。总的来说，这项工作为汉族人群建立了一个重要的基因组资源，并推进了T2T泛中国参考基因组的发展，这将对未来研究人群特异性结构变异和着丝粒的进化动力学具有重要意义。

{"title":"A Telomere-to-Telomere Diploid Reference Genome and Centromere Structure of the Chinese Quartet.","authors":"Bo Wang, Peng Jia, Stephen J Bush, Xia Wang, Yi Yang, Yu Zhang, Shijie Wan, Xiaofei Yang, Pengyu Zhang, Yuanting Zheng, Leming Shi, Lianhua Dong, Kai Ye","doi":"10.1093/gpbjnl/qzaf118","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf118","url":null,"abstract":"Recent advances in sequencing technologies have enabled the complete assembly of human genomes from telomere to telomere (T2T), resolving previously inaccessible regions such as centromeres and segmental duplications. Here, we present an updated, higher-quality, haplotype-phased T2T assembly of the Chinese Quartet (T2T-CQ), a family cohort comprising monozygotic twins and their parents, generated using high-coverage ONT ultralong and PacBio HiFi sequencing. The T2T-CQ assembly serves as a crucial reference genome for integrating publicly available multi-omics data and advances the utility of the Quartet reference materials. The T2T-CQ assembly scores highly on multiple metrics of continuity and completeness, with Genome Continuity Inspector (GCI) scores of 77.76 (maternal) and 76.41 (paternal), 21-mer quality values (QV) > 66, and Clipping Reveals Assembly Quality (CRAQ) scores > 99.6 for both haplotypes, enabling complete annotation of centromeric regions. Within these regions, we identified novel 13-mer higher-order repeat patterns on chromosome 17 which exhibited a monophyletic origin and emerged approximately 230 thousand years ago. Overall, this work establishes an essential genomic resource for the Han Chinese population and advances the development of a T2T pan-Chinese reference genome, which will significantly enable future investigations both into population-specific structural variants and the evolutionary dynamics of centromeres.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DBP: Adaptive and Interpretable Factor Analysis for Single-cell RNA-seq Data with Deep Beta Processes. DBP：单细胞RNA-seq数据深度β过程的适应性和可解释因子分析。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-26 DOI: 10.1093/gpbjnl/qzaf117

Runyan Liu, Shuofeng Hu, Guohua Dong, Tongtong Kan, Jinhui Shi, Jing Wang, Jiahao Zhou, Zhen He, Xiaomin Ying

Factor analysis is a method that condenses multiple variables into a few latent factors. It can be used to extract the underlying sources of biological variation in high-dimensional data and distill them into interpretable gene programs. However, existing factorization methods lack adaptability in selecting the optimal number of factors and interpretability in capturing biological variation. To address these concerns, we propose Deep Beta Process (DBP), a deep probabilistic framework for adaptive and interpretable factor analysis of single-cell transcriptomic data. DBP achieves adaptive selection of factors through a stick-breaking Beta process and performs batch correction using an adversarial learning strategy. We validate the flexible factor extraction and robust batch correction capabilities of DBP on simulated datasets. We also demonstrate its superior performance in dimensionality reduction and biological interpretability while explaining biological variation from both cell and gene perspectives using factor and loading matrices. The application of DBP to a gastric adenocarcinoma dataset reveals malignant epithelial cell heterogeneity, providing valuable insights for investigating the molecular mechanisms of disease onset and progression. DBP is available at https://github.com/labomics/DBP and https://ngdc.cncb.ac.cn/biocode/tool/BT007954.

因子分析是一种将多个变量浓缩为几个潜在因素的方法。它可以用来从高维数据中提取生物变异的潜在来源，并将它们提炼成可解释的基因程序。然而，现有的因子分解方法在选择最优因子数量方面缺乏适应性，在捕获生物变异方面缺乏可解释性。为了解决这些问题，我们提出了深度β过程（Deep Beta Process， DBP），这是一个深度概率框架，用于单细胞转录组数据的自适应和可解释因子分析。DBP通过断棒Beta过程实现因子的自适应选择，并使用对抗学习策略进行批量校正。我们在模拟数据集上验证了DBP灵活的因子提取和鲁棒的批量校正能力。我们还证明了它在降维和生物可解释性方面的优越性能，同时使用因子和负载矩阵从细胞和基因的角度解释生物变异。DBP在胃腺癌数据集中的应用揭示了恶性上皮细胞的异质性，为研究疾病发生和进展的分子机制提供了有价值的见解。DBP可在https://github.com/labomics/DBP和https://ngdc.cncb.ac.cn/biocode/tool/BT007954上获得。

{"title":"DBP: Adaptive and Interpretable Factor Analysis for Single-cell RNA-seq Data with Deep Beta Processes.","authors":"Runyan Liu, Shuofeng Hu, Guohua Dong, Tongtong Kan, Jinhui Shi, Jing Wang, Jiahao Zhou, Zhen He, Xiaomin Ying","doi":"10.1093/gpbjnl/qzaf117","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf117","url":null,"abstract":"Factor analysis is a method that condenses multiple variables into a few latent factors. It can be used to extract the underlying sources of biological variation in high-dimensional data and distill them into interpretable gene programs. However, existing factorization methods lack adaptability in selecting the optimal number of factors and interpretability in capturing biological variation. To address these concerns, we propose Deep Beta Process (DBP), a deep probabilistic framework for adaptive and interpretable factor analysis of single-cell transcriptomic data. DBP achieves adaptive selection of factors through a stick-breaking Beta process and performs batch correction using an adversarial learning strategy. We validate the flexible factor extraction and robust batch correction capabilities of DBP on simulated datasets. We also demonstrate its superior performance in dimensionality reduction and biological interpretability while explaining biological variation from both cell and gene perspectives using factor and loading matrices. The application of DBP to a gastric adenocarcinoma dataset reveals malignant epithelial cell heterogeneity, providing valuable insights for investigating the molecular mechanisms of disease onset and progression. DBP is available at https://github.com/labomics/DBP and https://ngdc.cncb.ac.cn/biocode/tool/BT007954.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Function and Development of Deep-sea Mussel Bacteriocytes Revealed by SnRNA-seq and Spatial Transcriptomics. SnRNA-seq和空间转录组学揭示了深海贻贝细菌细胞的功能和发育。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-25 DOI: 10.1093/gpbjnl/qzaf109

Hao Chen, Mengna Li, Zhaoshan Zhong, Inge Seim, Minxiao Wang, Chao Lian, Lianhong Zhuo, Xinjiang Wan, Hao Wang, Guanghui Han, Li Zhou, Huan Zhang, Lei Cao, Chaolun Li

The deep-sea chemosynthetic ecosystems are among one of the most unusual ecosystems on Earth, where most megafauna form close symbiotic associations with chemosynthetic microbes to obtain nutrition and shelter from the toxic environment. Despite the diverse forms of symbiotic organs in these deep-sea holobionts, the function and development of bacteriocytes, the host cells harboring symbionts, are still largely uncharacterized. Here, we have conducted the in situ decolonization assay and state-of-the-art single-nucleus and spatial transcriptomics to reveal the function and development of deep-sea mussel bacteriocytes. The bacteriocytes appear to optimize immune processes to facilitate recognition, engulfment, and elimination of endosymbionts. They also interact directly with them in carbohydrate and ammonia metabolism by exchanging metabolic intermediates via transporters such as SLC37A2 and RHBG-A. Bacteriocytes arise from three different proliferating cell types, and their successive development trajectory was delineated by multi-omics data and 3D reconstruction analyses. The molecular functions and the developmental processes of bacteriocytes were found to be guided by the same set of molluscan-conserved transcription factors and may be influenced by endosymbionts through sterol metabolism. The coordination in the functions and development of bacteriocytes and between the host and symbionts highlights the phenotypic plasticity of symbiotic cells, and underpins host-symbiont interdependence in adaptation to the deep sea.

深海化学合成生态系统是地球上最不寻常的生态系统之一，在那里，大多数巨型动物与化学合成微生物形成密切的共生关系，以获取营养和躲避有毒环境。尽管这些深海全息生物的共生器官形式多样，但细菌细胞的功能和发育仍然在很大程度上不为人所知。在这里，我们进行了原位非定殖实验和最先进的单核和空间转录组学来揭示深海贻贝细菌细胞的功能和发育。细菌细胞似乎优化免疫过程，以促进识别，吞噬和消除内共生体。它们还通过SLC37A2和RHBG-A等转运体交换代谢中间体，在碳水化合物和氨代谢中直接与它们相互作用。细菌细胞由三种不同的增殖细胞类型产生，通过多组学数据和三维重建分析描绘了它们的连续发育轨迹。发现细菌细胞的分子功能和发育过程受同一组软体动物保守转录因子的引导，并可能通过固醇代谢受到内共生体的影响。细菌细胞和宿主与共生体在功能和发育上的协调突出了共生细胞的表型可塑性，并为宿主-共生体适应深海环境的相互依赖提供了基础。

{"title":"Function and Development of Deep-sea Mussel Bacteriocytes Revealed by SnRNA-seq and Spatial Transcriptomics.","authors":"Hao Chen, Mengna Li, Zhaoshan Zhong, Inge Seim, Minxiao Wang, Chao Lian, Lianhong Zhuo, Xinjiang Wan, Hao Wang, Guanghui Han, Li Zhou, Huan Zhang, Lei Cao, Chaolun Li","doi":"10.1093/gpbjnl/qzaf109","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf109","url":null,"abstract":"The deep-sea chemosynthetic ecosystems are among one of the most unusual ecosystems on Earth, where most megafauna form close symbiotic associations with chemosynthetic microbes to obtain nutrition and shelter from the toxic environment. Despite the diverse forms of symbiotic organs in these deep-sea holobionts, the function and development of bacteriocytes, the host cells harboring symbionts, are still largely uncharacterized. Here, we have conducted the in situ decolonization assay and state-of-the-art single-nucleus and spatial transcriptomics to reveal the function and development of deep-sea mussel bacteriocytes. The bacteriocytes appear to optimize immune processes to facilitate recognition, engulfment, and elimination of endosymbionts. They also interact directly with them in carbohydrate and ammonia metabolism by exchanging metabolic intermediates via transporters such as SLC37A2 and RHBG-A. Bacteriocytes arise from three different proliferating cell types, and their successive development trajectory was delineated by multi-omics data and 3D reconstruction analyses. The molecular functions and the developmental processes of bacteriocytes were found to be guided by the same set of molluscan-conserved transcription factors and may be influenced by endosymbionts through sterol metabolism. The coordination in the functions and development of bacteriocytes and between the host and symbionts highlights the phenotypic plasticity of symbiotic cells, and underpins host-symbiont interdependence in adaptation to the deep sea.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Macrophages in Hematopoiesis and Related Blood Diseases. 巨噬细胞在造血和相关血液疾病中的作用。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-25 DOI: 10.1093/gpbjnl/qzaf112

Hong Huang, Mengya Gao, Francesca Vinchi, Xiuli An, Wei Li, Yaomei Wang

Emerging evidence indicates that macrophages play important roles in hematopoiesis in addition to their immune functions. The well-known immune-unrelated functions of macrophages include their roles in hematopoiesis, especially quality control of hematopoietic stem/progenitor cells (HSCs/HSPCs), supporting erythropoiesis, and megakaryopoiesis. Several studies, most using mouse models, have explored the roles of macrophages in hematopoiesis in different organs such as the yolk sac (YS), fetal liver (FL), bone marrow (BM) and spleen (SP). We have recently documented the potential roles and underlying mechanisms of macrophages in myeloproliferative neoplasm (MPN), aplastic anemia (AA), and idiopathic thrombocytopenic purpura (ITP). In this article, we review origin of macrophages, introduce the roles of macrophages in HSCs/HSPCs, erythropoiesis, and megakaryopoiesis in four hematopoietic organs, summarize the recent advances of macrophages in MPN, AA and ITP. Finally, we outline the unresolved questions that future studies should address to explore in greater depth of macrophages' role in both normal and disordered hematopoiesis.

越来越多的证据表明，巨噬细胞除了具有免疫功能外，还在造血中发挥重要作用。众所周知，巨噬细胞的免疫无关功能包括其在造血中的作用，特别是造血干细胞/祖细胞（hsc /HSPCs）的质量控制，支持红细胞生成和巨核生成。一些研究（大多数使用小鼠模型）探索了巨噬细胞在不同器官（如卵黄囊（YS）、胎肝（FL）、骨髓（BM）和脾脏（SP））造血中的作用。我们最近记录了巨噬细胞在骨髓增生性肿瘤（MPN）、再生障碍性贫血（AA）和特发性血小板减少性紫癜（ITP）中的潜在作用和潜在机制。本文综述了巨噬细胞的起源，介绍了巨噬细胞在造血干细胞/造血干细胞、红细胞生成和巨核生成等四个造血器官中的作用，总结了巨噬细胞在MPN、AA和ITP中的最新进展。最后，我们概述了未来研究应解决的未解决的问题，以更深入地探索巨噬细胞在正常和紊乱造血中的作用。

{"title":"Macrophages in Hematopoiesis and Related Blood Diseases.","authors":"Hong Huang, Mengya Gao, Francesca Vinchi, Xiuli An, Wei Li, Yaomei Wang","doi":"10.1093/gpbjnl/qzaf112","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf112","url":null,"abstract":"Emerging evidence indicates that macrophages play important roles in hematopoiesis in addition to their immune functions. The well-known immune-unrelated functions of macrophages include their roles in hematopoiesis, especially quality control of hematopoietic stem/progenitor cells (HSCs/HSPCs), supporting erythropoiesis, and megakaryopoiesis. Several studies, most using mouse models, have explored the roles of macrophages in hematopoiesis in different organs such as the yolk sac (YS), fetal liver (FL), bone marrow (BM) and spleen (SP). We have recently documented the potential roles and underlying mechanisms of macrophages in myeloproliferative neoplasm (MPN), aplastic anemia (AA), and idiopathic thrombocytopenic purpura (ITP). In this article, we review origin of macrophages, introduce the roles of macrophages in HSCs/HSPCs, erythropoiesis, and megakaryopoiesis in four hematopoietic organs, summarize the recent advances of macrophages in MPN, AA and ITP. Finally, we outline the unresolved questions that future studies should address to explore in greater depth of macrophages' role in both normal and disordered hematopoiesis.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Regulation of Alternative Polyadenylation Events by PABPC1 Affects Erythroid Progenitor Cell Expansion. PABPC1调控选择性聚腺苷化事件影响红系祖细胞扩增。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-25 DOI: 10.1093/gpbjnl/qzaf116

Yanan Li, Yanbo Yang, Bin Hu, Zi Wang, Wei Wang, Xiaofeng He, Xusheng Wu, Sheng Lin, Narla Mohandas, Hong Liu, Jing Gong, Long Liang, Jing Liu

Erythropoiesis is precisely regulated by multilayered networks. It is crucial for maintaining steady-state hemoglobin levels and ensuring effective oxygen transport. Alternative polyadenylation (APA) is a post-transcriptional regulatory mechanism generating multiple mRNA isoforms from a single gene based on specific 3'-untranslated region sequences. While APA plays a vital role in various cellular processes, the underlying mechanism in erythropoiesis remains largely unexplored. In this study, we employed an integrative approach, combining bioinformatics analyses and experimental validations, to systematically investigate the role of APA in erythropoiesis. We mapped the APA landscape during erythroid differentiation and identified significant APA shifts essential for the differentiation of erythroid cells from burst-forming unit erythroid (BFU-E) to colony-forming unit erythroid (CFU-E). Notably, our findings highlighted polyadenylate-binding protein cytoplasmic 1 (PABPC1) as the primary regulator of APA during these stages. Functional analyses have revealed that knockdown of PABPC1 disrupts erythroid progenitor cell proliferation and differentiation. These results implicate an essential role of PABPC1 in modulating cell fate through APA regulation. Furthermore, we found that decreased PABPC1 levels increased the usage of the proximal polyadenylation sites in the TSC22D1 gene. This shift led to elevated expression of TSC22D1, uncovering a novel mechanism by which APA influences erythroid progenitor expansion and differentiation. Our findings provide novel insights into APA regulation in early erythropoiesis and suggest potential therapeutic strategies for diseases associated with erythropoietic disorders.

红细胞生成是由多层网络精确调控的。它对于维持稳定的血红蛋白水平和确保有效的氧气运输至关重要。选择性多聚腺苷酸化（APA）是一种转录后调控机制，基于特定的3'-非翻译区序列，从单个基因产生多个mRNA亚型。虽然APA在各种细胞过程中起着至关重要的作用，但其在红细胞生成中的潜在机制仍未被充分探索。在这项研究中，我们采用综合的方法，结合生物信息学分析和实验验证，系统地研究了APA在红细胞生成中的作用。我们绘制了红系分化过程中的APA图谱，并确定了APA在红系细胞从突发性形成单位红系（BFU-E）向集落形成单位红系（CFU-E）分化过程中所必需的显著变化。值得注意的是，我们的研究结果强调了多腺苷酸结合蛋白细胞质1 （PABPC1）在这些阶段是APA的主要调节因子。功能分析显示，敲低PABPC1可破坏红细胞祖细胞的增殖和分化。这些结果暗示了PABPC1在通过APA调节细胞命运中的重要作用。此外，我们发现PABPC1水平的降低增加了TSC22D1基因中近端聚腺苷化位点的使用。这种转变导致TSC22D1的表达升高，揭示了APA影响红系祖细胞扩增和分化的新机制。我们的研究结果为APA在早期红细胞生成中的调控提供了新的见解，并为与红细胞生成障碍相关的疾病提供了潜在的治疗策略。

{"title":"Regulation of Alternative Polyadenylation Events by PABPC1 Affects Erythroid Progenitor Cell Expansion.","authors":"Yanan Li, Yanbo Yang, Bin Hu, Zi Wang, Wei Wang, Xiaofeng He, Xusheng Wu, Sheng Lin, Narla Mohandas, Hong Liu, Jing Gong, Long Liang, Jing Liu","doi":"10.1093/gpbjnl/qzaf116","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf116","url":null,"abstract":"Erythropoiesis is precisely regulated by multilayered networks. It is crucial for maintaining steady-state hemoglobin levels and ensuring effective oxygen transport. Alternative polyadenylation (APA) is a post-transcriptional regulatory mechanism generating multiple mRNA isoforms from a single gene based on specific 3'-untranslated region sequences. While APA plays a vital role in various cellular processes, the underlying mechanism in erythropoiesis remains largely unexplored. In this study, we employed an integrative approach, combining bioinformatics analyses and experimental validations, to systematically investigate the role of APA in erythropoiesis. We mapped the APA landscape during erythroid differentiation and identified significant APA shifts essential for the differentiation of erythroid cells from burst-forming unit erythroid (BFU-E) to colony-forming unit erythroid (CFU-E). Notably, our findings highlighted polyadenylate-binding protein cytoplasmic 1 (PABPC1) as the primary regulator of APA during these stages. Functional analyses have revealed that knockdown of PABPC1 disrupts erythroid progenitor cell proliferation and differentiation. These results implicate an essential role of PABPC1 in modulating cell fate through APA regulation. Furthermore, we found that decreased PABPC1 levels increased the usage of the proximal polyadenylation sites in the TSC22D1 gene. This shift led to elevated expression of TSC22D1, uncovering a novel mechanism by which APA influences erythroid progenitor expansion and differentiation. Our findings provide novel insights into APA regulation in early erythropoiesis and suggest potential therapeutic strategies for diseases associated with erythropoietic disorders.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scATAnno: Automated Cell Type Annotation for Single-cell ATAC Sequencing Data. scATAnno：单细胞ATAC测序数据的自动细胞类型注释。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-24 DOI: 10.1093/gpbjnl/qzaf108

Yijia Jiang, Zhirui Hu, Feng Lu, Allen W Lynch, Junchen Jiang, Alexander Zhu, Ziqi Zeng, Yi Zhang, Gongwei Wu, Yingtian Xie, Rong Li, Ningxuan Zhou, Cliff Meyer, Paloma Cejas, Myles Brown, Henry W Long, Xintao Qiu

Recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key analysis task is to determine cell type identity based on the epigenetic data. We introduce scATAnno, a python package designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow generates the reference atlases from publicly available datasets enabling accurate cell type annotation by integrating query data with reference atlases, without the use of scRNA-seq data. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect cell populations within the query data that are distinct from all cell types in the reference data. We compare and benchmark scATAnno against 5 other published approaches for cell annotation, demonstrating superior performance in multiple data sets and metrics. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), triple negative breast cancer (TNBC), and basal cell carcinoma (BCC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a useful tool for scATAC-seq reference building and cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems. scATAnno is available online at https://scatanno-main.readthedocs.io/.

单细胞表观基因组技术的最新进展使得对scATAC-seq分析的需求不断增长。一个关键的分析任务是根据表观遗传数据确定细胞类型。我们介绍scATAnno，这是一个python包，旨在使用大规模的scATAC-seq参考图集自动注释scATAC-seq数据。该工作流从公开可用的数据集生成参考地图集，通过将查询数据与参考地图集集成，实现准确的细胞类型注释，而无需使用scRNA-seq数据。为了提高注释的准确性，我们结合了基于knn和加权距离的不确定性评分，以有效地检测查询数据中与参考数据中所有细胞类型不同的细胞群。我们将scATAnno与其他5种已发表的细胞注释方法进行比较和基准测试，在多个数据集和指标上展示了卓越的性能。我们展示了scATAnno在多个数据集上的应用，包括外周血单核细胞（PBMC）、三阴性乳腺癌（TNBC）和基底细胞癌（BCC），并证明了scATAnno在不同条件下准确地注释了细胞类型。总的来说，scATAnno是scATAC-seq参考构建和scATAC-seq数据中细胞类型注释的有用工具，可以帮助解释复杂生物系统中新的scATAC-seq数据集。scATAnno的网站是https://scatanno-main.readthedocs.io/。

{"title":"scATAnno: Automated Cell Type Annotation for Single-cell ATAC Sequencing Data.","authors":"Yijia Jiang, Zhirui Hu, Feng Lu, Allen W Lynch, Junchen Jiang, Alexander Zhu, Ziqi Zeng, Yi Zhang, Gongwei Wu, Yingtian Xie, Rong Li, Ningxuan Zhou, Cliff Meyer, Paloma Cejas, Myles Brown, Henry W Long, Xintao Qiu","doi":"10.1093/gpbjnl/qzaf108","DOIUrl":"10.1093/gpbjnl/qzaf108","url":null,"abstract":"Recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key analysis task is to determine cell type identity based on the epigenetic data. We introduce scATAnno, a python package designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow generates the reference atlases from publicly available datasets enabling accurate cell type annotation by integrating query data with reference atlases, without the use of scRNA-seq data. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect cell populations within the query data that are distinct from all cell types in the reference data. We compare and benchmark scATAnno against 5 other published approaches for cell annotation, demonstrating superior performance in multiple data sets and metrics. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), triple negative breast cancer (TNBC), and basal cell carcinoma (BCC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a useful tool for scATAC-seq reference building and cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems. scATAnno is available online at https://scatanno-main.readthedocs.io/.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Functional Potential of Pseudogene-associated lncRNA Genes in Mammals. 哺乳动物假基因相关lncRNA基因的功能潜力增强。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-24 DOI: 10.1093/gpbjnl/qzaf113

Ze-Hao Zhang, Bo-Han Li, Yin-Wei Wang, Sheng Hu Qian, Lu Chen, Meng-Wei Shi, Hao Zuo, Zhen-Xia Chen

The functional significance of long non-coding RNAs (lncRNAs) remains a subject of debate, largely due to the complexity and cost associated with their validation experiments. However, emerging evidence suggests that pseudogenes, once viewed as genomic relics, may contribute to the origin of functional lncRNA genes. In this study spanning eight species, we systematically identified pseudogene-associated lncRNA genes using our PacBio long-read sequencing data and published RNA-seq data. Our investigation revealed that pseudogene-associated lncRNA genes exhibit heightened functional attributes compared to their non-pseudogene-associated counterparts. Notably, these pseudogene-associated lncRNAs show protein-binding proficiency, positioning them as potent regulators of gene expression. In particular, pseudogene-associated sense lncRNAs retain protein-binding capabilities inherited from parent genes of pseudogenes, thereby demonstrating greater protein-binding proficiency. Through detailed functional characterization, we elucidated the unique advantages and conserved roles of pseudogene-associated lncRNA genes, particularly in the context of gene expression regulation and DNA repair. Leveraging cross-species expression profiling, we demonstrated the prominent contribution of pseudogene-associated lncRNA genes to aging-related transcriptome changes across nine human tissues and eight mouse tissues. Overall, our findings demonstrate enhanced functional attributes of pseudogene-associated lncRNA genes and shed light on their conserved and close association with aging.

长链非编码rna （lncRNAs）的功能意义仍然是一个有争议的话题，主要是由于其验证实验的复杂性和成本。然而，新出现的证据表明，曾经被视为基因组遗迹的假基因可能有助于功能性lncRNA基因的起源。在这项跨越8个物种的研究中，我们使用PacBio长读测序数据和已发表的RNA-seq数据系统地鉴定了假基因相关的lncRNA基因。我们的研究显示，与非假基因相关的lncRNA基因相比，假基因相关的lncRNA基因表现出更高的功能属性。值得注意的是，这些假基因相关的lncrna显示出蛋白质结合能力，将它们定位为基因表达的有效调节剂。特别是，假基因相关的意义lncrna保留了从假基因亲本基因遗传的蛋白质结合能力，从而显示出更强的蛋白质结合能力。通过详细的功能表征，我们阐明了假基因相关lncRNA基因的独特优势和保守作用，特别是在基因表达调控和DNA修复方面。利用跨物种表达谱，我们证明了假基因相关的lncRNA基因在9个人类组织和8个小鼠组织中对衰老相关转录组变化的显著贡献。总的来说，我们的研究结果证明了伪基因相关lncRNA基因的功能属性增强，并揭示了它们与衰老的保守和密切关联。

{"title":"Enhanced Functional Potential of Pseudogene-associated lncRNA Genes in Mammals.","authors":"Ze-Hao Zhang, Bo-Han Li, Yin-Wei Wang, Sheng Hu Qian, Lu Chen, Meng-Wei Shi, Hao Zuo, Zhen-Xia Chen","doi":"10.1093/gpbjnl/qzaf113","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf113","url":null,"abstract":"The functional significance of long non-coding RNAs (lncRNAs) remains a subject of debate, largely due to the complexity and cost associated with their validation experiments. However, emerging evidence suggests that pseudogenes, once viewed as genomic relics, may contribute to the origin of functional lncRNA genes. In this study spanning eight species, we systematically identified pseudogene-associated lncRNA genes using our PacBio long-read sequencing data and published RNA-seq data. Our investigation revealed that pseudogene-associated lncRNA genes exhibit heightened functional attributes compared to their non-pseudogene-associated counterparts. Notably, these pseudogene-associated lncRNAs show protein-binding proficiency, positioning them as potent regulators of gene expression. In particular, pseudogene-associated sense lncRNAs retain protein-binding capabilities inherited from parent genes of pseudogenes, thereby demonstrating greater protein-binding proficiency. Through detailed functional characterization, we elucidated the unique advantages and conserved roles of pseudogene-associated lncRNA genes, particularly in the context of gene expression regulation and DNA repair. Leveraging cross-species expression profiling, we demonstrated the prominent contribution of pseudogene-associated lncRNA genes to aging-related transcriptome changes across nine human tissues and eight mouse tissues. Overall, our findings demonstrate enhanced functional attributes of pseudogene-associated lncRNA genes and shed light on their conserved and close association with aging.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spanve: A Statistical Method for Downstream-friendly Spatially Variable Genes in Large-scale Data. 跨度：大规模数据中下游友好空间变量基因的统计方法。

IF 7.9

Genomics, proteomics & bioinformatics

Pub Date : 2025-11-24 DOI: 10.1093/gpbjnl/qzaf111

Guoxin Cai, Yichang Chen, Shuqing Chen, Xun Gu, Zhan Zhou

Depicting gene expression in a spatial context through spatial transcriptomics is beneficial for inferring cellular mechanisms. Identifying spatially variable genes is a crucial step in leveraging spatial transcriptome data to understand intricate spatial dynamics. In this study, we developed Spanve, a nonparametric statistical method for detecting spatially variable genes in large-scale spatial transcriptomics datasets by quantifying expression differences between each spot or cell and its local neighbors. This method offers a nonparametric approach for identifying spatial dependencies in gene expression without distributional assumptions. Compared with existing methods, Spanve yields fewer false positives, leading to more accurate identification of spatially variable genes. Furthermore, Spanve improves the performance of downstream spatial transcriptomics analyses including spatial domain detection and cell type deconvolution. These results show the broad application potential of Spanve in advancing our understanding of spatial gene expression patterns within complex tissue microenvironments. Spanve is publicly available at https://github.com/zjupgx/Spanve and https://ngdc.cncb.ac.cn/biocode/tool/BT7724.

通过空间转录组学描述空间背景下的基因表达有助于推断细胞机制。识别空间可变基因是利用空间转录组数据来理解复杂的空间动力学的关键一步。在这项研究中，我们开发了一种非参数统计方法Spanve，通过量化每个点或细胞与其局部邻居之间的表达差异来检测大规模空间转录组学数据集中的空间可变基因。该方法提供了一种非参数方法来识别基因表达的空间依赖性，而不需要进行分布假设。与现有方法相比，Spanve产生更少的假阳性，从而更准确地识别空间可变基因。此外，Spanve还提高了下游空间转录组学分析的性能，包括空间域检测和细胞类型反褶积。这些结果显示了Spanve在促进我们对复杂组织微环境中空间基因表达模式的理解方面的广泛应用潜力。Spanve可以在https://github.com/zjupgx/Spanve和https://ngdc.cncb.ac.cn/biocode/tool/BT7724上公开获取。

{"title":"Spanve: A Statistical Method for Downstream-friendly Spatially Variable Genes in Large-scale Data.","authors":"Guoxin Cai, Yichang Chen, Shuqing Chen, Xun Gu, Zhan Zhou","doi":"10.1093/gpbjnl/qzaf111","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzaf111","url":null,"abstract":"Depicting gene expression in a spatial context through spatial transcriptomics is beneficial for inferring cellular mechanisms. Identifying spatially variable genes is a crucial step in leveraging spatial transcriptome data to understand intricate spatial dynamics. In this study, we developed Spanve, a nonparametric statistical method for detecting spatially variable genes in large-scale spatial transcriptomics datasets by quantifying expression differences between each spot or cell and its local neighbors. This method offers a nonparametric approach for identifying spatial dependencies in gene expression without distributional assumptions. Compared with existing methods, Spanve yields fewer false positives, leading to more accurate identification of spatially variable genes. Furthermore, Spanve improves the performance of downstream spatial transcriptomics analyses including spatial domain detection and cell type deconvolution. These results show the broad application potential of Spanve in advancing our understanding of spatial gene expression patterns within complex tissue microenvironments. Spanve is publicly available at https://github.com/zjupgx/Spanve and https://ngdc.cncb.ac.cn/biocode/tool/BT7724.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0