The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.
{"title":"GenBase: A Nucleotide Sequence Database.","authors":"Congfan Bu, Xinchang Zheng, Xuetong Zhao, Tianyi Xu, Xue Bai, Yaokai Jia, Meili Chen, Lili Hao, Jingfa Xiao, Zhang Zhang, Wenming Zhao, Bixia Tang, Yiming Bao","doi":"10.1093/gpbjnl/qzae047","DOIUrl":"10.1093/gpbjnl/qzae047","url":null,"abstract":"<p><p>The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11434157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for AI Breeding in Plants.","authors":"Qian Cheng, Xiangfeng Wang","doi":"10.1093/gpbjnl/qzae051","DOIUrl":"10.1093/gpbjnl/qzae051","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Ren, Mengxue Luo, Jialin Cui, Xin Gao, Hong Zhang, Ping Wu, Zehong Wei, Yakui Tai, Mengdan Li, Kaikun Luo, Shaojun Liu
Intergeneric hybridization greatly reshapes regulatory interactions among allelic and non-allelic genes. However, their effects on growth diversity remain poorly understood in animals. In this study, we conducted whole-genome sequencing and RNA sequencing (RNA-seq) analyses in diverse hybrid varieties resulting from the intergeneric hybridization of goldfish (Carassius auratus red var.) and common carp (Cyprinus carpio). These hybrid individuals were characterized by distinct mitochondrial genomes and copy number variations. Through a weighted gene correlation network analysis, we identified 3693 genes as candidate growth-regulated genes. Among them, the expression of 3672 genes in subgenome R (originating from goldfish) displayed negative correlations with growth rate, whereas 20 genes in subgenome C (originating from common carp) exhibited positive correlations. Notably, we observed intriguing patterns in the expression of slc2a12 in subgenome C, showing opposite correlations with body weight that changed with water temperatures, suggesting differential interactions between feeding activity and weight gain in response to seasonal changes for hybrid animals. In 40.31% of alleles, we observed dominant trans-regulatory effects in the regulatory interaction between distinct alleles from subgenomes R and C. Integrating analyses of allelic-specific expression and DNA methylation data revealed that the influence of DNA methylation on both subgenomes shapes the relative contribution of allelic expression to the growth rate. These findings provide novel insights into the interaction of distinct subgenomes that underlie heterosis in growth traits and contribute to a better understanding of multiple allele traits in animals.
等位基因和非等位基因间的杂交极大地改变了等位基因和非等位基因间的调控相互作用。然而,它们对动物生长多样性的影响仍然知之甚少。在这项研究中,我们对金鱼(Carassius auratus red var.)和鲤鱼(Cyprinus carpio)属间杂交产生的不同杂交品种进行了全基因组测序和 RNA 测序(RNA-seq)分析。这些杂交个体具有不同的线粒体基因组和拷贝数变异。通过加权基因相关网络分析,我们发现了 3693 个候选生长调控基因。其中,R亚基因组(源自金鱼)中3672个基因的表达与生长速度呈负相关,而C亚基因组(源自鲤鱼)中20个基因的表达与生长速度呈正相关。值得注意的是,我们观察到 C 亚基因组中 slc2a12 的表达呈现出耐人寻味的模式,它与体重的相关性与水温的变化相反,这表明杂交动物的摄食活动与体重增加之间存在不同的相互作用,以应对季节变化。综合分析等位基因特异性表达和DNA甲基化数据发现,DNA甲基化对两个亚基因组的影响决定了等位基因表达对生长率的相对贡献。这些发现为了解不同亚基因组之间的相互作用提供了新的视角,而这种相互作用是生长性状异质性的基础,有助于更好地理解动物的多等位基因性状。
{"title":"Variation and Interaction of Distinct Subgenomes Contribute to Growth Diversity in Intergeneric Hybrid Fish.","authors":"Li Ren, Mengxue Luo, Jialin Cui, Xin Gao, Hong Zhang, Ping Wu, Zehong Wei, Yakui Tai, Mengdan Li, Kaikun Luo, Shaojun Liu","doi":"10.1093/gpbjnl/qzae055","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae055","url":null,"abstract":"<p><p>Intergeneric hybridization greatly reshapes regulatory interactions among allelic and non-allelic genes. However, their effects on growth diversity remain poorly understood in animals. In this study, we conducted whole-genome sequencing and RNA sequencing (RNA-seq) analyses in diverse hybrid varieties resulting from the intergeneric hybridization of goldfish (Carassius auratus red var.) and common carp (Cyprinus carpio). These hybrid individuals were characterized by distinct mitochondrial genomes and copy number variations. Through a weighted gene correlation network analysis, we identified 3693 genes as candidate growth-regulated genes. Among them, the expression of 3672 genes in subgenome R (originating from goldfish) displayed negative correlations with growth rate, whereas 20 genes in subgenome C (originating from common carp) exhibited positive correlations. Notably, we observed intriguing patterns in the expression of slc2a12 in subgenome C, showing opposite correlations with body weight that changed with water temperatures, suggesting differential interactions between feeding activity and weight gain in response to seasonal changes for hybrid animals. In 40.31% of alleles, we observed dominant trans-regulatory effects in the regulatory interaction between distinct alleles from subgenomes R and C. Integrating analyses of allelic-specific expression and DNA methylation data revealed that the influence of DNA methylation on both subgenomes shapes the relative contribution of allelic expression to the growth rate. These findings provide novel insights into the interaction of distinct subgenomes that underlie heterosis in growth traits and contribute to a better understanding of multiple allele traits in animals.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Saidur Rhaman, Muhammad Ali, Wenxiu Ye, Bosheng Li
Plants possess diverse cell types and intricate regulatory mechanisms to adapt to the ever-changing environment of nature. Various strategies have been employed to study cell types and their developmental progressions, including single-cell sequencing methods which provide high-dimensional catalogs to address biological concerns. In recent years, single-cell sequencing technologies in transcriptomics, epigenomics, proteomics, metabolomics, and spatial transcriptomics have been increasingly used in plant science to reveal intricate biological relationships at the single-cell level. However, the application of single-cell technologies to plants is more limited due to the challenges posed by cell structure. This review outlines the advancements in single-cell omics technologies, their implications in plant systems, future research applications, and the challenges of single-cell omics in plant systems.
{"title":"Opportunities and Challenges in Advancing Plant Research with Single-cell Omics.","authors":"Mohammad Saidur Rhaman, Muhammad Ali, Wenxiu Ye, Bosheng Li","doi":"10.1093/gpbjnl/qzae026","DOIUrl":"10.1093/gpbjnl/qzae026","url":null,"abstract":"<p><p>Plants possess diverse cell types and intricate regulatory mechanisms to adapt to the ever-changing environment of nature. Various strategies have been employed to study cell types and their developmental progressions, including single-cell sequencing methods which provide high-dimensional catalogs to address biological concerns. In recent years, single-cell sequencing technologies in transcriptomics, epigenomics, proteomics, metabolomics, and spatial transcriptomics have been increasingly used in plant science to reveal intricate biological relationships at the single-cell level. However, the application of single-cell technologies to plants is more limited due to the challenges posed by cell structure. This review outlines the advancements in single-cell omics technologies, their implications in plant systems, future research applications, and the challenges of single-cell omics in plant systems.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanni Li, Eline H van den Berg, Alexander Kurilshikov, Dasha V Zhernakova, Ranko Gacesa, Shixian Hu, Esteban A Lopera-Maya, Alexandra Zhernakova, Vincent E de Meijer, Serena Sanna, Robin P F Dullaart, Hans Blokzijl, Eleonora A M Festen, Jingyuan Fu, Rinse K Weersma
Genetic susceptibility to metabolic associated fatty liver disease (MAFLD) is complex and poorly characterized. Accurate characterization of the genetic background of hepatic fat content would provide insights into disease etiology and causality of risk factors. We performed genome-wide association study (GWAS) on two noninvasive definitions of hepatic fat content: magnetic resonance imaging proton density fat fraction (MRI-PDFF) in 16,050 participants and fatty liver index (FLI) in 388,701 participants from the United Kingdom (UK) Biobank (UKBB). Heritability, genetic overlap, and similarity between hepatic fat content phenotypes were analyzed, and replicated in 10,398 participants from the University Medical Center Groningen (UMCG) Genetics Lifelines Initiative (UGLI). Meta-analysis of GWASs of MRI-PDFF in UKBB revealed five statistically significant loci, including two novel genomic loci harboring CREB3L1 (rs72910057-T, P = 5.40E-09) and GCM1 (rs1491489378-T, P = 3.16E-09), respectively, as well as three previously reported loci: PNPLA3, TM6SF2, and APOE. GWAS of FLI in UKBB identified 196 genome-wide significant loci, of which 49 were replicated in UGLI, with top signals in ZPR1 (P = 3.35E-13) and FTO (P = 2.11E-09). Statistically significant genetic correlation (rg) between MRI-PDFF (UKBB) and FLI (UGLI) GWAS results was found (rg = 0.5276, P = 1.45E-03). Novel MRI-PDFF genetic signals (CREB3L1 and GCM1) were replicated in the FLI GWAS. We identified two novel genes for MRI-PDFF and 49 replicable loci for FLI. Despite a difference in hepatic fat content assessment between MRI-PDFF and FLI, a substantial similar genetic architecture was found. FLI is identified as an easy and reliable approach to study hepatic fat content at the population level.
{"title":"Genome-wide Studies Reveal Genetic Risk Factors for Hepatic Fat Content.","authors":"Yanni Li, Eline H van den Berg, Alexander Kurilshikov, Dasha V Zhernakova, Ranko Gacesa, Shixian Hu, Esteban A Lopera-Maya, Alexandra Zhernakova, Vincent E de Meijer, Serena Sanna, Robin P F Dullaart, Hans Blokzijl, Eleonora A M Festen, Jingyuan Fu, Rinse K Weersma","doi":"10.1093/gpbjnl/qzae031","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae031","url":null,"abstract":"<p><p>Genetic susceptibility to metabolic associated fatty liver disease (MAFLD) is complex and poorly characterized. Accurate characterization of the genetic background of hepatic fat content would provide insights into disease etiology and causality of risk factors. We performed genome-wide association study (GWAS) on two noninvasive definitions of hepatic fat content: magnetic resonance imaging proton density fat fraction (MRI-PDFF) in 16,050 participants and fatty liver index (FLI) in 388,701 participants from the United Kingdom (UK) Biobank (UKBB). Heritability, genetic overlap, and similarity between hepatic fat content phenotypes were analyzed, and replicated in 10,398 participants from the University Medical Center Groningen (UMCG) Genetics Lifelines Initiative (UGLI). Meta-analysis of GWASs of MRI-PDFF in UKBB revealed five statistically significant loci, including two novel genomic loci harboring CREB3L1 (rs72910057-T, P = 5.40E-09) and GCM1 (rs1491489378-T, P = 3.16E-09), respectively, as well as three previously reported loci: PNPLA3, TM6SF2, and APOE. GWAS of FLI in UKBB identified 196 genome-wide significant loci, of which 49 were replicated in UGLI, with top signals in ZPR1 (P = 3.35E-13) and FTO (P = 2.11E-09). Statistically significant genetic correlation (rg) between MRI-PDFF (UKBB) and FLI (UGLI) GWAS results was found (rg = 0.5276, P = 1.45E-03). Novel MRI-PDFF genetic signals (CREB3L1 and GCM1) were replicated in the FLI GWAS. We identified two novel genes for MRI-PDFF and 49 replicable loci for FLI. Despite a difference in hepatic fat content assessment between MRI-PDFF and FLI, a substantial similar genetic architecture was found. FLI is identified as an easy and reliable approach to study hepatic fat content at the population level.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: m6A Profile Dynamics Indicates Regulation of Oyster Development by m6A-RNA Epitranscriptomes.","authors":"","doi":"10.1093/gpbjnl/qzae021","DOIUrl":"10.1093/gpbjnl/qzae021","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11233143/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Increasing the accuracy of the nucleotide sequence alignment is an essential issue in genomics research. Although classic dynamic programming (DP) algorithms (e.g., Smith-Waterman and Needleman-Wunsch) guarantee to produce the optimal result, their time complexity hinders the application of large-scale sequence alignment. Many optimization efforts that aim to accelerate the alignment process generally come from three perspectives: redesigning data structures [e.g., diagonal or striped Single Instruction Multiple Data (SIMD) implementations], increasing the number of parallelisms in SIMD operations (e.g., difference recurrence relation), or reducing search space (e.g., banded DP). However, no methods combine all these three aspects to build an ultra-fast algorithm. In this study, we developed a Banded Striped Aligner (BSAlign) library that delivers accurate alignment results at an ultra-fast speed by knitting a series of novel methods together to take advantage of all of the aforementioned three perspectives with highlights such as active F-loop in striped vectorization and striped move in banded DP. We applied our new acceleration design on both regular and edit distance pairwise alignment. BSAlign achieved 2-fold speed-up than other SIMD-based implementations for regular pairwise alignment, and 1.5-fold to 4-fold speed-up in edit distance-based implementations for long reads. BSAlign is implemented in C programing language and is available at https://github.com/ruanjue/bsalign.
{"title":"BSAlign: A Library for Nucleotide Sequence Alignment.","authors":"Haojing Shao, Jue Ruan","doi":"10.1093/gpbjnl/qzae025","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae025","url":null,"abstract":"<p><p>Increasing the accuracy of the nucleotide sequence alignment is an essential issue in genomics research. Although classic dynamic programming (DP) algorithms (e.g., Smith-Waterman and Needleman-Wunsch) guarantee to produce the optimal result, their time complexity hinders the application of large-scale sequence alignment. Many optimization efforts that aim to accelerate the alignment process generally come from three perspectives: redesigning data structures [e.g., diagonal or striped Single Instruction Multiple Data (SIMD) implementations], increasing the number of parallelisms in SIMD operations (e.g., difference recurrence relation), or reducing search space (e.g., banded DP). However, no methods combine all these three aspects to build an ultra-fast algorithm. In this study, we developed a Banded Striped Aligner (BSAlign) library that delivers accurate alignment results at an ultra-fast speed by knitting a series of novel methods together to take advantage of all of the aforementioned three perspectives with highlights such as active F-loop in striped vectorization and striped move in banded DP. We applied our new acceleration design on both regular and edit distance pairwise alignment. BSAlign achieved 2-fold speed-up than other SIMD-based implementations for regular pairwise alignment, and 1.5-fold to 4-fold speed-up in edit distance-based implementations for long reads. BSAlign is implemented in C programing language and is available at https://github.com/ruanjue/bsalign.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142116457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in high-throughput chromosome conformation capture (Hi-C) techniques have allowed us to map genome-wide chromatin interactions and uncover higher-order chromatin structures, thereby shedding light on the principles of genome architecture and functions. However, statistical methods for detecting changes in large-scale chromatin organization such as topologically associating domains (TADs) are still lacking. Here, we proposed a new statistical method, DiffGR, for detecting differentially interacting genomic regions at the TAD level between Hi-C contact maps. We utilized the stratum-adjusted correlation coefficient to measure similarity of local TAD regions. We then developed a nonparametric approach to identify statistically significant changes of genomic interacting regions. Through simulation studies, we demonstrated that DiffGR can robustly and effectively discover differential genomic regions under various conditions. Furthermore, we successfully revealed cell type-specific changes in genomic interacting regions in both human and mouse Hi-C datasets, and illustrated that DiffGR yielded consistent and advantageous results compared with state-of-the-art differential TAD detection methods. The DiffGR R package is published under the GNU General Public License (GPL) ≥ 2 license and is publicly available at https://github.com/wmalab/DiffGR.
高通量染色体构象捕获(Hi-C)技术的最新进展使我们能够绘制全基因组染色质相互作用图谱,揭示高阶染色质结构,从而揭示基因组结构和功能的原理。然而,检测大规模染色质组织变化(如拓扑关联域(TAD))的统计方法仍然缺乏。在这里,我们提出了一种新的统计方法--DiffGR,用于检测 Hi-C 接触图之间在 TAD 水平上有不同相互作用的基因组区域。我们利用层调整相关系数来衡量局部 TAD 区域的相似性。然后,我们开发了一种非参数方法来识别基因组相互作用区域在统计学上的显著变化。通过模拟研究,我们证明了 DiffGR 可以在各种条件下稳健有效地发现差异基因组区域。此外,我们还成功揭示了人类和小鼠 Hi-C 数据集中基因组相互作用区域的细胞类型特异性变化,并说明与最先进的差异 TAD 检测方法相比,DiffGR 能产生一致且有利的结果。DiffGR R软件包在GNU通用公共许可证(GPL)≥2许可证下发布,可在https://github.com/wmalab/DiffGR 公开获取。
{"title":"DiffGR: Detecting Differentially Interacting Genomic Regions from Hi-C Contact Maps.","authors":"Huiling Liu, Wenxiu Ma","doi":"10.1093/gpbjnl/qzae028","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae028","url":null,"abstract":"<p><p>Recent advances in high-throughput chromosome conformation capture (Hi-C) techniques have allowed us to map genome-wide chromatin interactions and uncover higher-order chromatin structures, thereby shedding light on the principles of genome architecture and functions. However, statistical methods for detecting changes in large-scale chromatin organization such as topologically associating domains (TADs) are still lacking. Here, we proposed a new statistical method, DiffGR, for detecting differentially interacting genomic regions at the TAD level between Hi-C contact maps. We utilized the stratum-adjusted correlation coefficient to measure similarity of local TAD regions. We then developed a nonparametric approach to identify statistically significant changes of genomic interacting regions. Through simulation studies, we demonstrated that DiffGR can robustly and effectively discover differential genomic regions under various conditions. Furthermore, we successfully revealed cell type-specific changes in genomic interacting regions in both human and mouse Hi-C datasets, and illustrated that DiffGR yielded consistent and advantageous results compared with state-of-the-art differential TAD detection methods. The DiffGR R package is published under the GNU General Public License (GPL) ≥ 2 license and is publicly available at https://github.com/wmalab/DiffGR.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Small cell lung cancer (SCLC) is a highly malignant and heterogeneous cancer with limited therapeutic options and prognosis prediction models. Here, we analyzed formalin-fixed, paraffin-embedded (FFPE) samples of surgical resections by proteomic profiling, and stratified SCLC into three proteomic subtypes (S-I, S-II, and S-III) with distinct clinical outcomes and chemotherapy responses. The proteomic subtyping was an independent prognostic factor and performed better than current tumor-node-metastasis or Veterans Administration Lung Study Group staging methods. The subtyping results could be further validated using FFPE biopsy samples from an independent cohort, extending the analysis to both surgical and biopsy samples. The signatures of the S-II subtype in particular suggested potential benefits from immunotherapy. Differentially overexpressed proteins in S-III, the worst prognostic subtype, allowed us to nominate potential therapeutic targets, indicating that patient selection may bring new hope for previously failed clinical trials. Finally, analysis of an independent cohort of SCLC patients who had received immunotherapy validated the prediction that the S-II patients had better progression-free survival and overall survival after first-line immunotherapy. Collectively, our study provides the rationale for future clinical investigations to validate the current findings for more accurate prognosis prediction and precise treatments.
{"title":"Proteomic Stratification of Prognosis and Treatment Options for Small Cell Lung Cancer.","authors":"Zitian Huo, Yaqi Duan, Dongdong Zhan, Xizhen Xu, Nairen Zheng, Jing Cai, Ruifang Sun, Jianping Wang, Fang Cheng, Zhan Gao, Caixia Xu, Wanlin Liu, Yuting Dong, Sailong Ma, Qian Zhang, Yiyun Zheng, Liping Lou, Dong Kuang, Qian Chu, Jun Qin, Guoping Wang, Yi Wang","doi":"10.1093/gpbjnl/qzae033","DOIUrl":"10.1093/gpbjnl/qzae033","url":null,"abstract":"<p><p>Small cell lung cancer (SCLC) is a highly malignant and heterogeneous cancer with limited therapeutic options and prognosis prediction models. Here, we analyzed formalin-fixed, paraffin-embedded (FFPE) samples of surgical resections by proteomic profiling, and stratified SCLC into three proteomic subtypes (S-I, S-II, and S-III) with distinct clinical outcomes and chemotherapy responses. The proteomic subtyping was an independent prognostic factor and performed better than current tumor-node-metastasis or Veterans Administration Lung Study Group staging methods. The subtyping results could be further validated using FFPE biopsy samples from an independent cohort, extending the analysis to both surgical and biopsy samples. The signatures of the S-II subtype in particular suggested potential benefits from immunotherapy. Differentially overexpressed proteins in S-III, the worst prognostic subtype, allowed us to nominate potential therapeutic targets, indicating that patient selection may bring new hope for previously failed clinical trials. Finally, analysis of an independent cohort of SCLC patients who had received immunotherapy validated the prediction that the S-II patients had better progression-free survival and overall survival after first-line immunotherapy. Collectively, our study provides the rationale for future clinical investigations to validate the current findings for more accurate prognosis prediction and precise treatments.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":"22 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}