Kai Li, Ping Zhang, Jinsheng Xu, Zi Wen, Junying Zhang, Zhike Zi, Li Li
Chromatin compartmentalization and epigenomic modification are crucial in cell differentiation and diseases development. However, precise mapping of chromatin compartmental patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartmental patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1-D track features through bi-directional feature reconstruction after resolution-specific binning epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed with 1 kb resolution high-depth experimental data, COCOA generates clear and detailed compartmental patterns, highlighting its superior performance. Finally, we demonstrated that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining chromatin compartmentalization insights from epigenomics in diverse biological scenarios. The COCOA python code is publicly available at https://github.com/onlybugs/COCOA.
{"title":"COCOA: A Framework for Fine-scale Mapping Cell-type-specific Chromatin Compartments with Epigenomic Information.","authors":"Kai Li, Ping Zhang, Jinsheng Xu, Zi Wen, Junying Zhang, Zhike Zi, Li Li","doi":"10.1093/gpbjnl/qzae091","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae091","url":null,"abstract":"<p><p>Chromatin compartmentalization and epigenomic modification are crucial in cell differentiation and diseases development. However, precise mapping of chromatin compartmental patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartmental patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1-D track features through bi-directional feature reconstruction after resolution-specific binning epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed with 1 kb resolution high-depth experimental data, COCOA generates clear and detailed compartmental patterns, highlighting its superior performance. Finally, we demonstrated that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining chromatin compartmentalization insights from epigenomics in diverse biological scenarios. The COCOA python code is publicly available at https://github.com/onlybugs/COCOA.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Pan, Yue Bi, Xiaoyu Wang, Ying Zhang, Geoffrey I Webb, Robin B Gasser, Lukasz Kurgan, Jiangning Song
The accurate identification of catalytic residues contributes to our understanding of enzyme functions in biological processes and pathways. The increasing number of protein sequences necessitates computational tools for the automated prediction of catalytic residues in enzymes. Here, we introduce SCREEN, a graph neural network for the high-throughput prediction of catalytic residues via the integration of enzyme functional and structural information. SCREEN constructs residue representations based on spatial arrangements and incorporates enzyme function priors into such representations through contrastive learning. We demonstrate that SCREEN (i) consistently outperforms currently-available predictors; (ii) provides accurate.
Results: when applied to inferred enzyme structures; and (iii) generalizes well to enzymes dissimilar from those in the training set. We also show that the putative catalytic residues predicted by SCREEN mimic key structural and biophysical characteristics of native catalytic residues. Moreover, using experimental data sets, we show that SCREEN's predictions can be used to distinguish residues with a high mutation tolerance from those likely to cause functional loss when mutated, indicating that this tool might be used to infer disease-associated mutations. SCREEN is publicly available at https://github.com/BioColLab/SCREEN and https://ngdc.cncb.ac.cn/biocode/tool/7580.
{"title":"SCREEN: A Graph-based Contrastive Learning Tool to Infer Catalytic Residues and Assess Enzyme Mutations.","authors":"Tong Pan, Yue Bi, Xiaoyu Wang, Ying Zhang, Geoffrey I Webb, Robin B Gasser, Lukasz Kurgan, Jiangning Song","doi":"10.1093/gpbjnl/qzae094","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae094","url":null,"abstract":"<p><p>The accurate identification of catalytic residues contributes to our understanding of enzyme functions in biological processes and pathways. The increasing number of protein sequences necessitates computational tools for the automated prediction of catalytic residues in enzymes. Here, we introduce SCREEN, a graph neural network for the high-throughput prediction of catalytic residues via the integration of enzyme functional and structural information. SCREEN constructs residue representations based on spatial arrangements and incorporates enzyme function priors into such representations through contrastive learning. We demonstrate that SCREEN (i) consistently outperforms currently-available predictors; (ii) provides accurate.</p><p><strong>Results: </strong>when applied to inferred enzyme structures; and (iii) generalizes well to enzymes dissimilar from those in the training set. We also show that the putative catalytic residues predicted by SCREEN mimic key structural and biophysical characteristics of native catalytic residues. Moreover, using experimental data sets, we show that SCREEN's predictions can be used to distinguish residues with a high mutation tolerance from those likely to cause functional loss when mutated, indicating that this tool might be used to infer disease-associated mutations. SCREEN is publicly available at https://github.com/BioColLab/SCREEN and https://ngdc.cncb.ac.cn/biocode/tool/7580.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangdong Geng, Xuedong Zhang, Jiayu Ma, Hengzhao Liu, Hang Ye, Fan Hao, Miaoqing Liu, Meng Dang, Huijuan Zhou, Mengdi Li, Peng Zhao
The genomic basis and biology of winged fruit are interesting issues in ecological and evolutionary biology. Chinese wingnut (Pterocarya stenoptera) is an important garden and economic tree species in China. The genomic resources of this hardwood tree could provide advanced genomic studies of Juglandaceae and their evolutionary relationships. Here, we reported a high-quality reference genome of P. stenoptera (N50 = 35.15 Mb) and provided a comparative analysis of Juglandaceae genomes. Paralogous relationships among the 16 chromosomes of the Chinese wingnut genome revealed eight main duplications representing the subgenome. Molecular dating suggested that the most recent common ancestor of P. stenopetera and Cyclocarya paliurus diverged from Juglans around 56.7 million years ago (Mya). The expanded and contracted gene families were associated with cutin, suberine, and wax biosynthesis, cytochrome P450, and anthocyanin biosynthesis. We identified large inversion blocks between the P. stenoptera genome and its relatives, which are enriched in genes related lipid biosynthesis and metabolism, and starch and sucrose metabolism. The twenty-eight individuals were clearly clustered into three groups responding to three species, namely Pterocarya macroptera, Pterocarya hupehensis, and P. stenoptera, based on whole genome resequencing data. Morphological and gene expression analysis showed that CAD, COMT, LOX, and MADS-box play important roles during the five developmental stages of wingnuts. Our study highlights the evolutionary history of the P. stenoptera genome and supports P. stenoptera as an appropriate Juglandaceae model for studying winged fruits. These results provide a theoretical basis for the evolution, development, and diversity of woody plant winged fruits.
{"title":"Genome Assembly and Winged fruit Gene Regulation of Chinese Wingnut: Insights from Genomic and Transcriptomic Analyses.","authors":"Fangdong Geng, Xuedong Zhang, Jiayu Ma, Hengzhao Liu, Hang Ye, Fan Hao, Miaoqing Liu, Meng Dang, Huijuan Zhou, Mengdi Li, Peng Zhao","doi":"10.1093/gpbjnl/qzae087","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae087","url":null,"abstract":"<p><p>The genomic basis and biology of winged fruit are interesting issues in ecological and evolutionary biology. Chinese wingnut (Pterocarya stenoptera) is an important garden and economic tree species in China. The genomic resources of this hardwood tree could provide advanced genomic studies of Juglandaceae and their evolutionary relationships. Here, we reported a high-quality reference genome of P. stenoptera (N50 = 35.15 Mb) and provided a comparative analysis of Juglandaceae genomes. Paralogous relationships among the 16 chromosomes of the Chinese wingnut genome revealed eight main duplications representing the subgenome. Molecular dating suggested that the most recent common ancestor of P. stenopetera and Cyclocarya paliurus diverged from Juglans around 56.7 million years ago (Mya). The expanded and contracted gene families were associated with cutin, suberine, and wax biosynthesis, cytochrome P450, and anthocyanin biosynthesis. We identified large inversion blocks between the P. stenoptera genome and its relatives, which are enriched in genes related lipid biosynthesis and metabolism, and starch and sucrose metabolism. The twenty-eight individuals were clearly clustered into three groups responding to three species, namely Pterocarya macroptera, Pterocarya hupehensis, and P. stenoptera, based on whole genome resequencing data. Morphological and gene expression analysis showed that CAD, COMT, LOX, and MADS-box play important roles during the five developmental stages of wingnuts. Our study highlights the evolutionary history of the P. stenoptera genome and supports P. stenoptera as an appropriate Juglandaceae model for studying winged fruits. These results provide a theoretical basis for the evolution, development, and diversity of woody plant winged fruits.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T cell receptors (TCRs) serve key roles in the adaptive immune system by enabling recognition and response to pathogens and irregular cells. Various methods have been developed for TCR construction from single-cell RNA sequencing (scRNA-seq) datasets, each with its unique characteristics. Yet, a comprehensive evaluation of their relative performance under different conditions remains elusive. In this study, we conducted a benchmark analysis utilizing experimental single-cell immune profiling datasets. Additionally, we introduced a novel simulator, YASIM-scTCR (Yet Another SIMulator for single-cell TCR), capable of generating scTCR-seq reads containing diverse TCR-derived sequences with different sequencing depths and read lengths. Our results consistently showed that TRUST4 and MiXCR outperformed others across multiple datasets, while DeRR also demonstrated considerable accuracy. We also discovered that the sequencing depth inherently imposes a critical constraint on successful TCR construction from scRNA-seq data. In summary, we present a benchmark study to aid researchers in choosing the appropriate method for reconstructing TCR from scRNA-seq data.
T细胞受体(TCRs)在适应性免疫系统中发挥关键作用,使病原体和不规则细胞能够识别和应答。从单细胞RNA测序(scRNA-seq)数据集构建TCR的方法多种多样,每种方法都有其独特的特点。然而,对它们在不同条件下的相对性能的综合评价仍然是难以捉摸的。在这项研究中,我们利用实验性单细胞免疫图谱数据集进行了基准分析。此外,我们引入了一种新颖的模拟器,YASIM-scTCR (Yet Another simulator for single-cell TCR),能够生成包含不同测序深度和读取长度的不同TCR衍生序列的scTCR-seq reads。我们的结果一致表明,TRUST4和MiXCR在多个数据集上的表现优于其他方法,而DeRR也表现出相当高的准确性。我们还发现,测序深度固有地对从scRNA-seq数据中成功构建TCR施加了关键约束。综上所述,我们提出了一项基准研究,以帮助研究人员选择合适的方法从scRNA-seq数据中重建TCR。
{"title":"Evaluation of T Cell Receptor Construction Methods from scRNA-Seq Data.","authors":"Ruonan Tian, Zhejian Yu, Ziwei Xue, Jiaxin Wu, Lize Wu, Shuo Cai, Bing Gao, Bing He, Yu Zhao, Jianhua Yao, Linrong Lu, Wanlu Liu","doi":"10.1093/gpbjnl/qzae086","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae086","url":null,"abstract":"<p><p>T cell receptors (TCRs) serve key roles in the adaptive immune system by enabling recognition and response to pathogens and irregular cells. Various methods have been developed for TCR construction from single-cell RNA sequencing (scRNA-seq) datasets, each with its unique characteristics. Yet, a comprehensive evaluation of their relative performance under different conditions remains elusive. In this study, we conducted a benchmark analysis utilizing experimental single-cell immune profiling datasets. Additionally, we introduced a novel simulator, YASIM-scTCR (Yet Another SIMulator for single-cell TCR), capable of generating scTCR-seq reads containing diverse TCR-derived sequences with different sequencing depths and read lengths. Our results consistently showed that TRUST4 and MiXCR outperformed others across multiple datasets, while DeRR also demonstrated considerable accuracy. We also discovered that the sequencing depth inherently imposes a critical constraint on successful TCR construction from scRNA-seq data. In summary, we present a benchmark study to aid researchers in choosing the appropriate method for reconstructing TCR from scRNA-seq data.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongrong Luo, Xiying Li, Ruyun Gao, Mengwei Yang, Juan Cai, Liyuan Dai, Nin Lou, Guangyu Fan, Haohua Zhu, Shasha Wang, Zhishang Zhang, Le Tang, Jiarui Yao, Di Wu, Yuankai Shi, Xiaohong Han
Autoantibodies hold promise for diagnosing lung cancer. However, their effectiveness in early-stage detection needs improvement. We investigated novel IgG and IgM autoantibodies for detection of early-stage lung adenocarcinoma (Early-LUAD) across three independent cohorts of 1246 individuals. A multi-step approach, including Human proteome microarray (HuProtTM) discovery, focused array verification, and ELISA validation, was conducted on 634 individuals with Early-LUAD (stage 0-I), 280 with benign lung disease (BLD), and 332 normal healthy controls (NHC). HuProtTM profiling discovered 417 IgG/IgM candidates, and focused array verified 32 autoantibodies with distinct distributions in Early-LUAD and BLD/NHC. A novel panel of 10 autoantibodies (ELAVL4-IgM, GDA-IgM, GIMAP4-IgM, GIMAP4-IgG, MGMT-IgM, UCHL1-IgM, DCTPP1-IgM, KCMF1-IgM, UCHL1-IgG, and WWP2-IgM) demonstrated a sensitivity of 70.5% and specificities of 77.0% or 80.0% in detecting Early-LUAD from BLD or NHC in ELISA validation. Positive predictive value for distinguishing Early-LUAD from BLD with nodules ≤ 8 mm, 9 ≤ IMD ≤ 20 mm, and > 20 mm significantly increased from 47.27%, 52.00% and 62.90% [low-dose computed tomography (LDCT) alone] to 79.17%, 71.13% and 87.88% (10-autoantibody panel with LDCT), respectively. The combined risk score (CRS), based on 10-autoantibody panel, sex, and imaging maximum diameter, effectively stratified risk for Early-LUAD. Individuals with scores 10-25 and > 25 indicated a higher risk of Early-LUAD compared to the reference (scores < 10), with adjusted odds ratios of 5.28 (95% CI:3.18-8.76) and 9.05 (95% CI:5.40-15.15), respectively. This novel panel of IgG and IgM autoantibodies offers a complementary approach to LDCT in distinguishing Early-LUAD from benign nodules.
{"title":"Novel IgG-IgM Autoantibody Panel Enhances Detection of Early-stage Lung Adenocarcinoma from Benign Nodules.","authors":"Rongrong Luo, Xiying Li, Ruyun Gao, Mengwei Yang, Juan Cai, Liyuan Dai, Nin Lou, Guangyu Fan, Haohua Zhu, Shasha Wang, Zhishang Zhang, Le Tang, Jiarui Yao, Di Wu, Yuankai Shi, Xiaohong Han","doi":"10.1093/gpbjnl/qzae085","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae085","url":null,"abstract":"<p><p>Autoantibodies hold promise for diagnosing lung cancer. However, their effectiveness in early-stage detection needs improvement. We investigated novel IgG and IgM autoantibodies for detection of early-stage lung adenocarcinoma (Early-LUAD) across three independent cohorts of 1246 individuals. A multi-step approach, including Human proteome microarray (HuProtTM) discovery, focused array verification, and ELISA validation, was conducted on 634 individuals with Early-LUAD (stage 0-I), 280 with benign lung disease (BLD), and 332 normal healthy controls (NHC). HuProtTM profiling discovered 417 IgG/IgM candidates, and focused array verified 32 autoantibodies with distinct distributions in Early-LUAD and BLD/NHC. A novel panel of 10 autoantibodies (ELAVL4-IgM, GDA-IgM, GIMAP4-IgM, GIMAP4-IgG, MGMT-IgM, UCHL1-IgM, DCTPP1-IgM, KCMF1-IgM, UCHL1-IgG, and WWP2-IgM) demonstrated a sensitivity of 70.5% and specificities of 77.0% or 80.0% in detecting Early-LUAD from BLD or NHC in ELISA validation. Positive predictive value for distinguishing Early-LUAD from BLD with nodules ≤ 8 mm, 9 ≤ IMD ≤ 20 mm, and > 20 mm significantly increased from 47.27%, 52.00% and 62.90% [low-dose computed tomography (LDCT) alone] to 79.17%, 71.13% and 87.88% (10-autoantibody panel with LDCT), respectively. The combined risk score (CRS), based on 10-autoantibody panel, sex, and imaging maximum diameter, effectively stratified risk for Early-LUAD. Individuals with scores 10-25 and > 25 indicated a higher risk of Early-LUAD compared to the reference (scores < 10), with adjusted odds ratios of 5.28 (95% CI:3.18-8.76) and 9.05 (95% CI:5.40-15.15), respectively. This novel panel of IgG and IgM autoantibodies offers a complementary approach to LDCT in distinguishing Early-LUAD from benign nodules.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huimin Chen, Jiaxin Liu, Gege Tang, Gefei Hao, Guangfu Yang
Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human-virus protein-protein interactions (PPIs) is vital to understanding the principles of human-virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human-virus PPIs offer unprecedented opportunities for bioinformatics research around human-virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human-virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human-virus PPIs based on binding modes.
{"title":"Bioinformatic Resources for Exploring Human-virus Protein-protein Interactions Based on Binding Modes.","authors":"Huimin Chen, Jiaxin Liu, Gege Tang, Gefei Hao, Guangfu Yang","doi":"10.1093/gpbjnl/qzae075","DOIUrl":"10.1093/gpbjnl/qzae075","url":null,"abstract":"<p><p>Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human-virus protein-protein interactions (PPIs) is vital to understanding the principles of human-virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human-virus PPIs offer unprecedented opportunities for bioinformatics research around human-virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human-virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human-virus PPIs based on binding modes.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiplexing across donors has emerged as a popular strategy to increase throughput, reduce costs, overcome technical batch effects, and improve doublet detection in single-cell genomic studies. To eliminate additional experimental steps, endogenous nuclear genome variants are used for demultiplexing pooled single-cell RNA sequencing (scRNA-seq) data by several computational tools. However, these tools have limitations when applied to single-cell sequencing methods that do not cover nuclear genomic regions well, such as single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq). Here, we demonstrate that mitochondrial germline variants are an alternative, robust, and computationally efficient endogenous barcode for sample demultiplexing. We propose MitoSort, a tool that uses mitochondrial germline variants to assign cells to their donor origins and identify cross-genotype doublets in single-cell genomic datasets. We evaluate its performance by using in silico pooled mitochondrial scATAC-seq (mtscATAC-seq) libraries and experimentally multiplexed data with cell hashtags. MitoSort achieves high accuracy and efficiency in genotype clustering and doublet detection for mtscATAC-seq data, addressing the limitations of current computational techniques tailored for scRNA-seq data. Moreover, MitoSort exhibits versatility, and can be applied to various single-cell sequencing approaches beyond mtscATAC-seq provided that the mitochondrial variants are reliably detected. Furthermore, we demonstrate the application of MitoSort in a case study where B cells from eight donors were pooled and assayed by single-cell multi-omics sequencing. Altogether, our results demonstrate the accuracy and efficiency of MitoSort, which enables reliable sample demultiplexing in various single-cell genomic applications. MitoSort is available at https://github.com/tangzhj/MitoSort.
{"title":"MitoSort: Robust Demultiplexing of Pooled Single-cell Genomic Data Using Endogenous Mitochondrial Variants.","authors":"Zhongjie Tang, Weixing Zhang, Peiyu Shi, Sijun Li, Xinhui Li, Yueming Li, Yicong Xu, Yaqing Shu, Zheng Hu, Jin Xu","doi":"10.1093/gpbjnl/qzae073","DOIUrl":"10.1093/gpbjnl/qzae073","url":null,"abstract":"<p><p>Multiplexing across donors has emerged as a popular strategy to increase throughput, reduce costs, overcome technical batch effects, and improve doublet detection in single-cell genomic studies. To eliminate additional experimental steps, endogenous nuclear genome variants are used for demultiplexing pooled single-cell RNA sequencing (scRNA-seq) data by several computational tools. However, these tools have limitations when applied to single-cell sequencing methods that do not cover nuclear genomic regions well, such as single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq). Here, we demonstrate that mitochondrial germline variants are an alternative, robust, and computationally efficient endogenous barcode for sample demultiplexing. We propose MitoSort, a tool that uses mitochondrial germline variants to assign cells to their donor origins and identify cross-genotype doublets in single-cell genomic datasets. We evaluate its performance by using in silico pooled mitochondrial scATAC-seq (mtscATAC-seq) libraries and experimentally multiplexed data with cell hashtags. MitoSort achieves high accuracy and efficiency in genotype clustering and doublet detection for mtscATAC-seq data, addressing the limitations of current computational techniques tailored for scRNA-seq data. Moreover, MitoSort exhibits versatility, and can be applied to various single-cell sequencing approaches beyond mtscATAC-seq provided that the mitochondrial variants are reliably detected. Furthermore, we demonstrate the application of MitoSort in a case study where B cells from eight donors were pooled and assayed by single-cell multi-omics sequencing. Altogether, our results demonstrate the accuracy and efficiency of MitoSort, which enables reliable sample demultiplexing in various single-cell genomic applications. MitoSort is available at https://github.com/tangzhj/MitoSort.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11671100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chromatin organization is important for gene transcription in pig genome. However, its three-dimensional (3D) structure and dynamics are much less investigated than those in human. Here, we applied the long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) method to map the whole-genome chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) in porcine macrophage cells before and after polyinosinic-polycytidylic acid [Poly(I:C)] induction. Our results reveal that Poly(I:C) induction impacts the 3D genome organization in the 3D4/21 cells at the fine-scale chromatin loop level rather than at the large-scale domain level. Furthermore, our findings underscore the pivotal role of CTCF-anchored chromatin interactions in reshaping chromatin architecture during immune responses. Knockout of the CTCF-binding locus further confirms that the CTCF-anchored enhancers are associated with the activation of immune genes via long-range interactions. Notably, the ChIA-PET data also support the spatial relationship between single nucleotide polymorphisms (SNPs) and related gene transcription in 3D genome aspect. Our findings in this study provide new clues and potential targets to explore key elements related to diseases in pigs and are also likely to shed light on elucidating chromatin organization and dynamics underlying the process of mammalian infectious diseases.
{"title":"Virus Infection Induces Immune Gene Activation with CTCF-anchored Enhancers and Chromatin Interactions in Pig Genome.","authors":"Jianhua Cao, Ruimin Ren, Xiaolong Li, Xiaoqian Zhang, Yan Sun, Xiaohuan Tian, Ru Liu, Xiangdong Liu, Yijun Ruan, Guoliang Li, Shuhong Zhao","doi":"10.1093/gpbjnl/qzae062","DOIUrl":"10.1093/gpbjnl/qzae062","url":null,"abstract":"<p><p>Chromatin organization is important for gene transcription in pig genome. However, its three-dimensional (3D) structure and dynamics are much less investigated than those in human. Here, we applied the long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) method to map the whole-genome chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) in porcine macrophage cells before and after polyinosinic-polycytidylic acid [Poly(I:C)] induction. Our results reveal that Poly(I:C) induction impacts the 3D genome organization in the 3D4/21 cells at the fine-scale chromatin loop level rather than at the large-scale domain level. Furthermore, our findings underscore the pivotal role of CTCF-anchored chromatin interactions in reshaping chromatin architecture during immune responses. Knockout of the CTCF-binding locus further confirms that the CTCF-anchored enhancers are associated with the activation of immune genes via long-range interactions. Notably, the ChIA-PET data also support the spatial relationship between single nucleotide polymorphisms (SNPs) and related gene transcription in 3D genome aspect. Our findings in this study provide new clues and potential targets to explore key elements related to diseases in pigs and are also likely to shed light on elucidating chromatin organization and dynamics underlying the process of mammalian infectious diseases.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyong Du, Gregory Gelembiuk, Wynne Moss, Andrew Tritt, Carol Eunmi Lee
Copepods are among the most abundant organisms on the planet and play critical functions in aquatic ecosystems. Among copepods, populations of the Eurytemora affinis species complex are numerically dominant in many coastal habitats and serve as food sources for major fisheries. Intriguingly, certain populations possess the unusual capacity to invade novel salinities on rapid time scales. Despite their ecological importance, high-quality genomic resources have been absent for calanoid copepods, limiting our ability to comprehensively dissect the genome architecture underlying the highly invasive and adaptive capacity of certain populations. Here, we present the first chromosome-level genome of a calanoid copepod, from the Atlantic clade (Eurytemora carolleeae) of the E. affinis species complex. This genome was assembled using high-coverage PacBio long-read and Hi-C sequences of an inbred line, generated through 30 generations of full-sib mating. This genome, consisting of 529.3 Mb (contig N50 = 4.2 Mb, scaffold N50 = 140.6 Mb), was anchored onto four chromosomes. Genome annotation predicted 20,262 protein-coding genes, of which ion transport-related gene families were substantially expanded based on comparative analyses of 12 additional arthropod genomes. Also, we found genome-wide signatures of historical gene body methylation of the ion transport-related genes and the significant clustering of these genes on each chromosome. This genome represents one of the most contiguous copepod genomes to date and is among the highest quality marine invertebrate genomes. As such, this genome provides an invaluable resource to help yield fundamental insights into the ability of this copepod to adapt to rapidly changing environments.
{"title":"The Genome Architecture of the Copepod Eurytemora carolleeae - the Highly Invasive Atlantic Clade of the Eurytemoraaffinis Species Complex.","authors":"Zhenyong Du, Gregory Gelembiuk, Wynne Moss, Andrew Tritt, Carol Eunmi Lee","doi":"10.1093/gpbjnl/qzae066","DOIUrl":"10.1093/gpbjnl/qzae066","url":null,"abstract":"<p><p>Copepods are among the most abundant organisms on the planet and play critical functions in aquatic ecosystems. Among copepods, populations of the Eurytemora affinis species complex are numerically dominant in many coastal habitats and serve as food sources for major fisheries. Intriguingly, certain populations possess the unusual capacity to invade novel salinities on rapid time scales. Despite their ecological importance, high-quality genomic resources have been absent for calanoid copepods, limiting our ability to comprehensively dissect the genome architecture underlying the highly invasive and adaptive capacity of certain populations. Here, we present the first chromosome-level genome of a calanoid copepod, from the Atlantic clade (Eurytemora carolleeae) of the E. affinis species complex. This genome was assembled using high-coverage PacBio long-read and Hi-C sequences of an inbred line, generated through 30 generations of full-sib mating. This genome, consisting of 529.3 Mb (contig N50 = 4.2 Mb, scaffold N50 = 140.6 Mb), was anchored onto four chromosomes. Genome annotation predicted 20,262 protein-coding genes, of which ion transport-related gene families were substantially expanded based on comparative analyses of 12 additional arthropod genomes. Also, we found genome-wide signatures of historical gene body methylation of the ion transport-related genes and the significant clustering of these genes on each chromosome. This genome represents one of the most contiguous copepod genomes to date and is among the highest quality marine invertebrate genomes. As such, this genome provides an invaluable resource to help yield fundamental insights into the ability of this copepod to adapt to rapidly changing environments.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheng Fang, Mingming Dong, Hongqiang Qin, Mingliang Ye
Identification evaluation and result dissemination are essential components in mass spectrometry-based proteomics analysis. The visualization of fragment ions in mass spectrum provides strong evidence for peptide identification and modification localization. Here, we present an easy-to-use tool, named GP-Plotter, for ion annotation of tandem mass spectra and corresponding image output. Identification result files of common searching tools in the community and user-customized files are supported as input of GP-Plotter. Multiple display modes and parameter customization can be achieved in GP-Plotter to present annotated spectra of interest. Different image formats, especially vector graphic formats, are available for image generation which is favorable for data publication. Notably, GP-Plotter is also well-suited for the visualization and evaluation of glycopeptide spectrum assignments with comprehensive annotation of glycan fragment ions. With a user-friendly graphical interface, GP-Plotter is expected to be a universal visualization tool for the community. GP-Plotter has been implemented in the latest version of Glyco-Decipher (v1.0.4) and the standalone GP-Plotter software is also freely available at https://github.com/DICP-1809.
{"title":"GP-Plotter: Flexible Spectral Visualization for Proteomics Data with Emphasis on Glycoproteomics Analysis.","authors":"Zheng Fang, Mingming Dong, Hongqiang Qin, Mingliang Ye","doi":"10.1093/gpbjnl/qzae069","DOIUrl":"10.1093/gpbjnl/qzae069","url":null,"abstract":"<p><p>Identification evaluation and result dissemination are essential components in mass spectrometry-based proteomics analysis. The visualization of fragment ions in mass spectrum provides strong evidence for peptide identification and modification localization. Here, we present an easy-to-use tool, named GP-Plotter, for ion annotation of tandem mass spectra and corresponding image output. Identification result files of common searching tools in the community and user-customized files are supported as input of GP-Plotter. Multiple display modes and parameter customization can be achieved in GP-Plotter to present annotated spectra of interest. Different image formats, especially vector graphic formats, are available for image generation which is favorable for data publication. Notably, GP-Plotter is also well-suited for the visualization and evaluation of glycopeptide spectrum assignments with comprehensive annotation of glycan fragment ions. With a user-friendly graphical interface, GP-Plotter is expected to be a universal visualization tool for the community. GP-Plotter has been implemented in the latest version of Glyco-Decipher (v1.0.4) and the standalone GP-Plotter software is also freely available at https://github.com/DICP-1809.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}