Jinyeon Jo, Nayoung Ha, Yunmi Ji, Ahra Do, Je Hyun Seo, Bumjo Oh, Sungkyoung Choi, Eun Kyung Choe, Woojoo Lee, Jang Won Son, Sungho Won
East Asian populations exhibit a genetic predisposition to obesity, yet comprehensive research on these traits is limited. We conducted a genome-wide association study (GWAS) with 93,673 Korean subjects to uncover novel genetic loci linked to obesity, examining metrics such as body mass index, waist circumference, body fat ratio, and abdominal fat ratio. Participants were categorized into non-obese, metabolically healthy obese (MHO), and metabolically unhealthy obese (MUO) groups. Using advanced computational methods, we developed a multifaceted polygenic risk scores (PRS) model to predict obesity. Our GWAS identified significant genetic effects with distinct sizes and directions within the MHO and MUO groups compared with the non-obese group. Gene-based and gene-set analyses, along with cluster analysis, revealed heterogeneous patterns of significant genes on chromosomes 3 (MUO group) and 11 (MHO group). In analyses targeting genetic predisposition differences based on metabolic health, odds ratios of high PRS compared with medium PRS showed significant differences between non-obese and MUO, and non-obese and MHO. Similar patterns were seen for low PRS compared with medium PRS. These findings were supported by the estimated genetic correlation (0.89 from bivariate GREML). Regional analyses highlighted significant local genetic correlations on chromosome 11, while single variant approaches suggested widespread pleiotropic effects, especially on chromosome 11. In conclusion, our study identifies specific genetic loci and risks associated with obesity in the Korean population, emphasizing the heterogeneous genetic factors contributing to MHO and MUO.
{"title":"Genetic determinants of obesity in Korean populations: exploring genome-wide associations and polygenic risk scores.","authors":"Jinyeon Jo, Nayoung Ha, Yunmi Ji, Ahra Do, Je Hyun Seo, Bumjo Oh, Sungkyoung Choi, Eun Kyung Choe, Woojoo Lee, Jang Won Son, Sungho Won","doi":"10.1093/bib/bbae389","DOIUrl":"10.1093/bib/bbae389","url":null,"abstract":"<p><p>East Asian populations exhibit a genetic predisposition to obesity, yet comprehensive research on these traits is limited. We conducted a genome-wide association study (GWAS) with 93,673 Korean subjects to uncover novel genetic loci linked to obesity, examining metrics such as body mass index, waist circumference, body fat ratio, and abdominal fat ratio. Participants were categorized into non-obese, metabolically healthy obese (MHO), and metabolically unhealthy obese (MUO) groups. Using advanced computational methods, we developed a multifaceted polygenic risk scores (PRS) model to predict obesity. Our GWAS identified significant genetic effects with distinct sizes and directions within the MHO and MUO groups compared with the non-obese group. Gene-based and gene-set analyses, along with cluster analysis, revealed heterogeneous patterns of significant genes on chromosomes 3 (MUO group) and 11 (MHO group). In analyses targeting genetic predisposition differences based on metabolic health, odds ratios of high PRS compared with medium PRS showed significant differences between non-obese and MUO, and non-obese and MHO. Similar patterns were seen for low PRS compared with medium PRS. These findings were supported by the estimated genetic correlation (0.89 from bivariate GREML). Regional analyses highlighted significant local genetic correlations on chromosome 11, while single variant approaches suggested widespread pleiotropic effects, especially on chromosome 11. In conclusion, our study identifies specific genetic loci and risks associated with obesity in the Korean population, emphasizing the heterogeneous genetic factors contributing to MHO and MUO.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11359806/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3'UTR-APAs have been extensively studied, but intronic polyadenylations (IPAs) remain largely unexplored. We characterized the profiles of 22 260 IPAs in 9679 patient samples across 32 cancer types from the Cancer Genome Atlas cohort. By comparing tumor and paired normal tissues, we identified 180 ~ 4645 dysregulated IPAs in 132 ~ 2249 genes in each of 690 patient tumors from 22 cancer types that showed consistent patterns within individual cancer types. We selected 2741 genes that showed consistently patterns across cancer types, including 1834 pan-cancer tumor-enriched and 907 tumor-depleted IPA genes; the former were amply represented in the functional pathways such as deoxyribonucleic acid damage repair. Expression of IPA isoforms was associated with tumor mutation burden and patient characteristics (e.g. sex, race, cancer stages, and subtypes) in cancer-specific and feature-specific manners, and could be a more accurate prognostic marker than gene expression (summary of all isoforms). In summary, our study reveals the roles and the clinical relevance of tumor-associated IPAs.
{"title":"A pan-cancer interrogation of intronic polyadenylation and its association with cancer characteristics.","authors":"Liang Liu, Peiqing Sun, Wei Zhang","doi":"10.1093/bib/bbae376","DOIUrl":"10.1093/bib/bbae376","url":null,"abstract":"<p><p>3'UTR-APAs have been extensively studied, but intronic polyadenylations (IPAs) remain largely unexplored. We characterized the profiles of 22 260 IPAs in 9679 patient samples across 32 cancer types from the Cancer Genome Atlas cohort. By comparing tumor and paired normal tissues, we identified 180 ~ 4645 dysregulated IPAs in 132 ~ 2249 genes in each of 690 patient tumors from 22 cancer types that showed consistent patterns within individual cancer types. We selected 2741 genes that showed consistently patterns across cancer types, including 1834 pan-cancer tumor-enriched and 907 tumor-depleted IPA genes; the former were amply represented in the functional pathways such as deoxyribonucleic acid damage repair. Expression of IPA isoforms was associated with tumor mutation burden and patient characteristics (e.g. sex, race, cancer stages, and subtypes) in cancer-specific and feature-specific manners, and could be a more accurate prognostic marker than gene expression (summary of all isoforms). In summary, our study reveals the roles and the clinical relevance of tumor-associated IPAs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141854880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Influenza viruses rapidly evolve to evade previously acquired human immunity. Maintaining vaccine efficacy necessitates continuous monitoring of antigenic differences among strains. Traditional serological methods for assessing these differences are labor-intensive and time-consuming, highlighting the need for efficient computational approaches. This paper proposes MetaFluAD, a meta-learning-based method designed to predict quantitative antigenic distances among strains. This method models antigenic relationships between strains, represented by their hemagglutinin (HA) sequences, as a weighted attributed network. Employing a graph neural network (GNN)-based encoder combined with a robust meta-learning framework, MetaFluAD learns comprehensive strain representations within a unified space encompassing both antigenic and genetic features. Furthermore, the meta-learning framework enables knowledge transfer across different influenza subtypes, allowing MetaFluAD to achieve remarkable performance with limited data. MetaFluAD demonstrates excellent performance and overall robustness across various influenza subtypes, including A/H3N2, A/H1N1, A/H5N1, B/Victoria, and B/Yamagata. MetaFluAD synthesizes the strengths of GNN-based encoding and meta-learning to offer a promising approach for accurate antigenic distance prediction. Additionally, MetaFluAD can effectively identify dominant antigenic clusters within seasonal influenza viruses, aiding in the development of effective vaccines and efficient monitoring of viral evolution.
{"title":"MetaFluAD: meta-learning for predicting antigenic distances among influenza viruses.","authors":"Qitao Jia, Yuanling Xia, Fanglin Dong, Weihua Li","doi":"10.1093/bib/bbae395","DOIUrl":"10.1093/bib/bbae395","url":null,"abstract":"<p><p>Influenza viruses rapidly evolve to evade previously acquired human immunity. Maintaining vaccine efficacy necessitates continuous monitoring of antigenic differences among strains. Traditional serological methods for assessing these differences are labor-intensive and time-consuming, highlighting the need for efficient computational approaches. This paper proposes MetaFluAD, a meta-learning-based method designed to predict quantitative antigenic distances among strains. This method models antigenic relationships between strains, represented by their hemagglutinin (HA) sequences, as a weighted attributed network. Employing a graph neural network (GNN)-based encoder combined with a robust meta-learning framework, MetaFluAD learns comprehensive strain representations within a unified space encompassing both antigenic and genetic features. Furthermore, the meta-learning framework enables knowledge transfer across different influenza subtypes, allowing MetaFluAD to achieve remarkable performance with limited data. MetaFluAD demonstrates excellent performance and overall robustness across various influenza subtypes, including A/H3N2, A/H1N1, A/H5N1, B/Victoria, and B/Yamagata. MetaFluAD synthesizes the strengths of GNN-based encoding and meta-learning to offer a promising approach for accurate antigenic distance prediction. Additionally, MetaFluAD can effectively identify dominant antigenic clusters within seasonal influenza viruses, aiding in the development of effective vaccines and efficient monitoring of viral evolution.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11317534/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141916101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marwan Abdellah, Alessandro Foni, Juan José García Cantero, Nadir Román Guerrero, Elvis Boci, Adrien Fleury, Jay S Coggan, Daniel Keller, Judit Planas, Jean-Denis Courcol, Georges Khazen
Understanding the intracellular dynamics of brain cells entails performing three-dimensional molecular simulations incorporating ultrastructural models that can capture cellular membrane geometries at nanometer scales. While there is an abundance of neuronal morphologies available online, e.g. from NeuroMorpho.Org, converting those fairly abstract point-and-diameter representations into geometrically realistic and simulation-ready, i.e. watertight, manifolds is challenging. Many neuronal mesh reconstruction methods have been proposed; however, their resulting meshes are either biologically unplausible or non-watertight. We present an effective and unconditionally robust method capable of generating geometrically realistic and watertight surface manifolds of spiny cortical neurons from their morphological descriptions. The robustness of our method is assessed based on a mixed dataset of cortical neurons with a wide variety of morphological classes. The implementation is seamlessly extended and applied to synthetic astrocytic morphologies that are also plausibly biological in detail. Resulting meshes are ultimately used to create volumetric meshes with tetrahedral domains to perform scalable in silico reaction-diffusion simulations for revealing cellular structure-function relationships. Availability and implementation: Our method is implemented in NeuroMorphoVis, a neuroscience-specific open source Blender add-on, making it freely accessible for neuroscience researchers.
{"title":"Synthesis of geometrically realistic and watertight neuronal ultrastructure manifolds for in silico modeling.","authors":"Marwan Abdellah, Alessandro Foni, Juan José García Cantero, Nadir Román Guerrero, Elvis Boci, Adrien Fleury, Jay S Coggan, Daniel Keller, Judit Planas, Jean-Denis Courcol, Georges Khazen","doi":"10.1093/bib/bbae393","DOIUrl":"10.1093/bib/bbae393","url":null,"abstract":"<p><p>Understanding the intracellular dynamics of brain cells entails performing three-dimensional molecular simulations incorporating ultrastructural models that can capture cellular membrane geometries at nanometer scales. While there is an abundance of neuronal morphologies available online, e.g. from NeuroMorpho.Org, converting those fairly abstract point-and-diameter representations into geometrically realistic and simulation-ready, i.e. watertight, manifolds is challenging. Many neuronal mesh reconstruction methods have been proposed; however, their resulting meshes are either biologically unplausible or non-watertight. We present an effective and unconditionally robust method capable of generating geometrically realistic and watertight surface manifolds of spiny cortical neurons from their morphological descriptions. The robustness of our method is assessed based on a mixed dataset of cortical neurons with a wide variety of morphological classes. The implementation is seamlessly extended and applied to synthetic astrocytic morphologies that are also plausibly biological in detail. Resulting meshes are ultimately used to create volumetric meshes with tetrahedral domains to perform scalable in silico reaction-diffusion simulations for revealing cellular structure-function relationships. Availability and implementation: Our method is implemented in NeuroMorphoVis, a neuroscience-specific open source Blender add-on, making it freely accessible for neuroscience researchers.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11317524/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141916105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Tam, Philip Naderev P Lagniton, Mariano Da Luz, Bojin Zhao, Siddharth Sinha, Chon Lok Lei, San Ming Wang
Somatic variation is a major type of genetic variation contributing to human diseases including cancer. Of the vast quantities of somatic variants identified, the functional impact of many somatic variants, in particular the missense variants, remains unclear. Lack of the functional information prevents the translation of rich variation data into clinical applications. We previously developed a method named Ramachandran Plot-Molecular Dynamics Simulations (RP-MDS), aiming to predict the function of germline missense variants based on their effects on protein structure stability, and successfully applied to predict the deleteriousness of unclassified germline missense variants in multiple cancer genes. We hypothesized that regardless of their different genetic origins, somatic missense variants and germline missense variants could have similar effects on the stability of their affected protein structure. As such, the RP-MDS method designed for germline missense variants should also be applicable to predict the function of somatic missense variants. In the current study, we tested our hypothesis by using the somatic missense variants in TP53 as a model. Of the 397 somatic missense variants analyzed, RP-MDS predicted that 195 (49.1%) variants were deleterious as they significantly disturbed p53 structure. The results were largely validated by using a p53-p21 promoter-green fluorescent protein (GFP) reporter gene assay. Our study demonstrated that deleterious somatic missense variants can be identified by referring to their effects on protein structural stability.
{"title":"Comprehensive classification of TP53 somatic missense variants based on their impact on p53 structural stability.","authors":"Benjamin Tam, Philip Naderev P Lagniton, Mariano Da Luz, Bojin Zhao, Siddharth Sinha, Chon Lok Lei, San Ming Wang","doi":"10.1093/bib/bbae400","DOIUrl":"10.1093/bib/bbae400","url":null,"abstract":"<p><p>Somatic variation is a major type of genetic variation contributing to human diseases including cancer. Of the vast quantities of somatic variants identified, the functional impact of many somatic variants, in particular the missense variants, remains unclear. Lack of the functional information prevents the translation of rich variation data into clinical applications. We previously developed a method named Ramachandran Plot-Molecular Dynamics Simulations (RP-MDS), aiming to predict the function of germline missense variants based on their effects on protein structure stability, and successfully applied to predict the deleteriousness of unclassified germline missense variants in multiple cancer genes. We hypothesized that regardless of their different genetic origins, somatic missense variants and germline missense variants could have similar effects on the stability of their affected protein structure. As such, the RP-MDS method designed for germline missense variants should also be applicable to predict the function of somatic missense variants. In the current study, we tested our hypothesis by using the somatic missense variants in TP53 as a model. Of the 397 somatic missense variants analyzed, RP-MDS predicted that 195 (49.1%) variants were deleterious as they significantly disturbed p53 structure. The results were largely validated by using a p53-p21 promoter-green fluorescent protein (GFP) reporter gene assay. Our study demonstrated that deleterious somatic missense variants can be identified by referring to their effects on protein structural stability.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zhang, Qi Cheng, Zhenyu Wei, Jiayu Xu, Shiwei Wu, Nan Xu, Chengkui Zhao, Lei Yu, Weixing Feng
The T cell receptor (TCR) repertoire is pivotal to the human immune system, and understanding its nuances can significantly enhance our ability to forecast cancer-related immune responses. However, existing methods often overlook the intra- and inter-sequence interactions of T cell receptors (TCRs), limiting the development of sequence-based cancer-related immune status predictions. To address this challenge, we propose BertTCR, an innovative deep learning framework designed to predict cancer-related immune status using TCRs. BertTCR combines a pre-trained protein large language model with deep learning architectures, enabling it to extract deeper contextual information from TCRs. Compared to three state-of-the-art sequence-based methods, BertTCR improves the AUC on an external validation set for thyroid cancer detection by 21 percentage points. Additionally, this model was trained on over 2000 publicly available TCR libraries covering 17 types of cancer and healthy samples, and it has been validated on multiple public external datasets for its ability to distinguish cancer patients from healthy individuals. Furthermore, BertTCR can accurately classify various cancer types and healthy individuals. Overall, BertTCR is the advancing method for cancer-related immune status forecasting based on TCRs, offering promising potential for a wide range of immune status prediction tasks.
{"title":"BertTCR: a Bert-based deep learning framework for predicting cancer-related immune status based on T cell receptor repertoire.","authors":"Min Zhang, Qi Cheng, Zhenyu Wei, Jiayu Xu, Shiwei Wu, Nan Xu, Chengkui Zhao, Lei Yu, Weixing Feng","doi":"10.1093/bib/bbae420","DOIUrl":"10.1093/bib/bbae420","url":null,"abstract":"<p><p>The T cell receptor (TCR) repertoire is pivotal to the human immune system, and understanding its nuances can significantly enhance our ability to forecast cancer-related immune responses. However, existing methods often overlook the intra- and inter-sequence interactions of T cell receptors (TCRs), limiting the development of sequence-based cancer-related immune status predictions. To address this challenge, we propose BertTCR, an innovative deep learning framework designed to predict cancer-related immune status using TCRs. BertTCR combines a pre-trained protein large language model with deep learning architectures, enabling it to extract deeper contextual information from TCRs. Compared to three state-of-the-art sequence-based methods, BertTCR improves the AUC on an external validation set for thyroid cancer detection by 21 percentage points. Additionally, this model was trained on over 2000 publicly available TCR libraries covering 17 types of cancer and healthy samples, and it has been validated on multiple public external datasets for its ability to distinguish cancer patients from healthy individuals. Furthermore, BertTCR can accurately classify various cancer types and healthy individuals. Overall, BertTCR is the advancing method for cancer-related immune status forecasting based on TCRs, offering promising potential for a wide range of immune status prediction tasks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constraints in time and cost of the associated 'wet lab' experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new 'quintuplet' neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID's orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.
绝大多数蛋白质-蛋白质相互作用(PPI)研究都是在少数几种模式生物中进行的,这主要是由于相关 "湿实验室 "实验的时间和成本限制。硅学 PPI 推断方法是克服这些局限性的理想工具,但在跨物种预测方面却往往力不从心。我们介绍的 INTREPPPID 是一种利用新型 "五元组 "神经网络整合同源物数据的方法,该网络由五个具有共享参数的并行编码器构建而成。INTREPPPID 结合了 PPI 分类任务和同源定位任务。后者学习的是同源物的嵌入,它们之间的欧氏距离较小,而所有其他蛋白质的嵌入之间的距离较大。在使用严格的评估数据集进行的种内和跨种任务测试中,INTREPPPID 的表现优于所有其他领先的 PPI 推断方法。我们证明,INTREPPPID 的直向同源定位损失之所以能提高性能,是因为直向同源数据的生物学相关性,而不是因为架构的其他一些似是而非的方面。最后,我们介绍了 PPI.bio 和 PPI Origami,它们分别是 INTREPPPID 的网络服务器界面和用于创建严格评估数据集的软件工具。这两项计划的共同目标是让社区更容易使用和开发 PPI 推断工具。
{"title":"INTREPPPID-an orthologue-informed quintuplet network for cross-species prediction of protein-protein interaction.","authors":"Joseph Szymborski, Amin Emad","doi":"10.1093/bib/bbae405","DOIUrl":"10.1093/bib/bbae405","url":null,"abstract":"<p><p>An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constraints in time and cost of the associated 'wet lab' experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new 'quintuplet' neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID's orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuefan Lin, Zixiang Pan, Yuansong Zeng, Yuedong Yang, Zhiming Dai
Recent advances in single-cell technologies enable the rapid growth of multi-omics data. Cell type annotation is one common task in analyzing single-cell data. It is a challenge that some cell types in the testing set are not present in the training set (i.e. unknown cell types). Most scATAC-seq cell type annotation methods generally assign each cell in the testing set to one known type in the training set but neglect unknown cell types. Here, we present OVAAnno, an automatic cell types annotation method which utilizes open-set domain adaptation to detect unknown cell types in scATAC-seq data. Comprehensive experiments show that OVAAnno successfully identifies known and unknown cell types. Further experiments demonstrate that OVAAnno also performs well on scRNA-seq data. Our codes are available online at https://github.com/lisaber/OVAAnno/tree/master.
{"title":"Detecting novel cell type in single-cell chromatin accessibility data via open-set domain adaptation.","authors":"Yuefan Lin, Zixiang Pan, Yuansong Zeng, Yuedong Yang, Zhiming Dai","doi":"10.1093/bib/bbae370","DOIUrl":"10.1093/bib/bbae370","url":null,"abstract":"<p><p>Recent advances in single-cell technologies enable the rapid growth of multi-omics data. Cell type annotation is one common task in analyzing single-cell data. It is a challenge that some cell types in the testing set are not present in the training set (i.e. unknown cell types). Most scATAC-seq cell type annotation methods generally assign each cell in the testing set to one known type in the training set but neglect unknown cell types. Here, we present OVAAnno, an automatic cell types annotation method which utilizes open-set domain adaptation to detect unknown cell types in scATAC-seq data. Comprehensive experiments show that OVAAnno successfully identifies known and unknown cell types. Further experiments demonstrate that OVAAnno also performs well on scRNA-seq data. Our codes are available online at https://github.com/lisaber/OVAAnno/tree/master.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The internal ribosome entry site (IRES) is a cis-regulatory element that can initiate translation in a cap-independent manner. It is often related to cellular processes and many diseases. Thus, identifying the IRES is important for understanding its mechanism and finding potential therapeutic strategies for relevant diseases since identifying IRES elements by experimental method is time-consuming and laborious. Many bioinformatics tools have been developed to predict IRES, but all these tools are based on structure similarity or machine learning algorithms. Here, we introduced a deep learning model named DeepIRES for precisely identifying IRES elements in messenger RNA (mRNA) sequences. DeepIRES is a hybrid model incorporating dilated 1D convolutional neural network blocks, bidirectional gated recurrent units, and self-attention module. Tenfold cross-validation results suggest that DeepIRES can capture deeper relationships between sequence features and prediction results than other baseline models. Further comparison on independent test sets illustrates that DeepIRES has superior and robust prediction capability than other existing methods. Moreover, DeepIRES achieves high accuracy in predicting experimental validated IRESs that are collected in recent studies. With the application of a deep learning interpretable analysis, we discover some potential consensus motifs that are related to IRES activities. In summary, DeepIRES is a reliable tool for IRES prediction and gives insights into the mechanism of IRES elements.
{"title":"DeepIRES: a hybrid deep learning model for accurate identification of internal ribosome entry sites in cellular and viral mRNAs.","authors":"Jian Zhao, Zhewei Chen, Meng Zhang, Lingxiao Zou, Shan He, Jingjing Liu, Quan Wang, Xiaofeng Song, Jing Wu","doi":"10.1093/bib/bbae439","DOIUrl":"10.1093/bib/bbae439","url":null,"abstract":"<p><p>The internal ribosome entry site (IRES) is a cis-regulatory element that can initiate translation in a cap-independent manner. It is often related to cellular processes and many diseases. Thus, identifying the IRES is important for understanding its mechanism and finding potential therapeutic strategies for relevant diseases since identifying IRES elements by experimental method is time-consuming and laborious. Many bioinformatics tools have been developed to predict IRES, but all these tools are based on structure similarity or machine learning algorithms. Here, we introduced a deep learning model named DeepIRES for precisely identifying IRES elements in messenger RNA (mRNA) sequences. DeepIRES is a hybrid model incorporating dilated 1D convolutional neural network blocks, bidirectional gated recurrent units, and self-attention module. Tenfold cross-validation results suggest that DeepIRES can capture deeper relationships between sequence features and prediction results than other baseline models. Further comparison on independent test sets illustrates that DeepIRES has superior and robust prediction capability than other existing methods. Moreover, DeepIRES achieves high accuracy in predicting experimental validated IRESs that are collected in recent studies. With the application of a deep learning interpretable analysis, we discover some potential consensus motifs that are related to IRES activities. In summary, DeepIRES is a reliable tool for IRES prediction and gives insights into the mechanism of IRES elements.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuanjin Cheng,Murathan T Goktas,Laura M Williamson,Martin Krzywinski,David T Mulder,Lucas Swanson,Jill Slind,Jelena Sihvonen,Cynthia R Chow,Amy Carr,Ian Bosdet,Tracy Tucker,Sean Young,Richard Moore,Karen L Mungall,Stephen Yip,Steven J M Jones
Accurate assessment of fragment abundance within a genome is crucial in clinical genomics applications such as the analysis of copy number variation (CNV). However, this task is often hindered by biased coverage in regions with varying guanine-cytosine (GC) content. These biases are particularly exacerbated in hybridization capture sequencing due to GC effects on probe hybridization and polymerase chain reaction (PCR) amplification efficiency. Such GC content-associated variations can exert a negative impact on the fidelity of CNV calling within hybridization capture panels. In this report, we present panelGC, a novel metric, to quantify and monitor GC biases in hybridization capture sequencing data. We establish the efficacy of panelGC, demonstrating its proficiency in identifying and flagging potential procedural anomalies, even in situations where instrument and experimental monitoring data may not be readily accessible. Validation using real-world datasets demonstrates that panelGC enhances the quality control and reliability of hybridization capture panel sequencing.
{"title":"Enhancing clinical genomic accuracy with panelGC: a novel metric and tool for quantifying and monitoring GC biases in hybridization capture panel sequencing.","authors":"Xuanjin Cheng,Murathan T Goktas,Laura M Williamson,Martin Krzywinski,David T Mulder,Lucas Swanson,Jill Slind,Jelena Sihvonen,Cynthia R Chow,Amy Carr,Ian Bosdet,Tracy Tucker,Sean Young,Richard Moore,Karen L Mungall,Stephen Yip,Steven J M Jones","doi":"10.1093/bib/bbae442","DOIUrl":"https://doi.org/10.1093/bib/bbae442","url":null,"abstract":"Accurate assessment of fragment abundance within a genome is crucial in clinical genomics applications such as the analysis of copy number variation (CNV). However, this task is often hindered by biased coverage in regions with varying guanine-cytosine (GC) content. These biases are particularly exacerbated in hybridization capture sequencing due to GC effects on probe hybridization and polymerase chain reaction (PCR) amplification efficiency. Such GC content-associated variations can exert a negative impact on the fidelity of CNV calling within hybridization capture panels. In this report, we present panelGC, a novel metric, to quantify and monitor GC biases in hybridization capture sequencing data. We establish the efficacy of panelGC, demonstrating its proficiency in identifying and flagging potential procedural anomalies, even in situations where instrument and experimental monitoring data may not be readily accessible. Validation using real-world datasets demonstrates that panelGC enhances the quality control and reliability of hybridization capture panel sequencing.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}