Pub Date : 2023-12-12DOI: 10.1093/bioinformatics/btad744
Adam Midlik, Sreenath Nair, Stephen Anyango, Mandar Deshpande, David Sehnal, Mihaly Varadi, Sameer Velankar
Summary PDBImages is an innovative, open-source Node.js package that harnesses the power of the popular macromolecule structure visualization software Mol*. Designed for use by the scientific community, PDBImages provides a means to generate high-quality images for PDB and AlphaFold DB models. Its unique ability to render and save images directly to files in a browserless mode sets it apart, offering users a streamlined, automated process for macromolecular structure visualization. Here, we detail the implementation of PDBImages, enumerating its diverse image types and elaborating on its user-friendly setup. This powerful tool opens a new gateway for researchers to visualize, analyse, and share their work, fostering a deeper understanding of bioinformatics. Availability and Implementation PDBImages is available as an npm package from https://www.npmjs.com/package/pdb-images. The source code is available from https://github.com/PDBeurope/pdb-images.
{"title":"PDBImages: A Command Line Tool for Automated Macromolecular Structure Visualization","authors":"Adam Midlik, Sreenath Nair, Stephen Anyango, Mandar Deshpande, David Sehnal, Mihaly Varadi, Sameer Velankar","doi":"10.1093/bioinformatics/btad744","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad744","url":null,"abstract":"Summary PDBImages is an innovative, open-source Node.js package that harnesses the power of the popular macromolecule structure visualization software Mol*. Designed for use by the scientific community, PDBImages provides a means to generate high-quality images for PDB and AlphaFold DB models. Its unique ability to render and save images directly to files in a browserless mode sets it apart, offering users a streamlined, automated process for macromolecular structure visualization. Here, we detail the implementation of PDBImages, enumerating its diverse image types and elaborating on its user-friendly setup. This powerful tool opens a new gateway for researchers to visualize, analyse, and share their work, fostering a deeper understanding of bioinformatics. Availability and Implementation PDBImages is available as an npm package from https://www.npmjs.com/package/pdb-images. The source code is available from https://github.com/PDBeurope/pdb-images.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1093/bioinformatics/btad750
Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau
Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"Rarity: Discovering rare cell populations from single-cell imaging data","authors":"Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau","doi":"10.1093/bioinformatics/btad750","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad750","url":null,"abstract":"Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"19 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138683421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1093/bioinformatics/btad747
Herui Liao, Jiayu Shang, Yanni Sun
Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online
{"title":"GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data","authors":"Herui Liao, Jiayu Shang, Yanni Sun","doi":"10.1093/bioinformatics/btad747","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad747","url":null,"abstract":"Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"103 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-09DOI: 10.1093/bioinformatics/btad734
Yang Li, Zihou Guo, Xin Gao, Guohua Wang
Motivation Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multi-omics data. While multi-omics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multi-modal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. Results To address these challenges, we introduce MMCL-CDR, a multi-modal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of Cancer Drug Responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multi-omics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multi-omics and multi-modal drug and cell line modeling. Availability and Implementation MMCL-CDR is available at https://github.com/catly/MMCL-CDR
{"title":"MMCL-CDR: Enhancing Cancer Drug Response Prediction with Multi-Omics and Morphology Images Contrastive Representation Learning","authors":"Yang Li, Zihou Guo, Xin Gao, Guohua Wang","doi":"10.1093/bioinformatics/btad734","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad734","url":null,"abstract":"Motivation Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multi-omics data. While multi-omics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multi-modal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. Results To address these challenges, we introduce MMCL-CDR, a multi-modal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of Cancer Drug Responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multi-omics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multi-omics and multi-modal drug and cell line modeling. Availability and Implementation MMCL-CDR is available at https://github.com/catly/MMCL-CDR","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138563967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-09DOI: 10.1093/bioinformatics/btad715
Cuifang Xu, Jiating Huang, Yongqiang Gao, Weixing Zhao, Yiqi Shen, Feihong Luo, Gang Yu, Feng Zhu, Yan Ni
Motivation Gut dysbiosis is closely associated with obesity and related metabolic diseases including type 2 diabetes (T2D) and non-alcoholic fatty liver disease (NAFLD). The gut microbial features and biomarkers have been increasingly investigated in many studies, which require further validation due to the limited sample size and various confounding factors that may affect microbial compositions in a single study. So far, it lacks a comprehensive bioinformatics pipeline providing automated statistical analysis and integrating multiple independent studies for cross-validation simultaneously. Results OBMeta aims to streamline the standard metagenomics data analysis from diversity analysis, comparative analysis, and functional analysis to co-abundance network analysis. In addition, a curated database has been established with a total of 90 public research projects, covering three different phenotypes (Obesity, T2D, and NAFLD) and more than five different intervention strategies (exercise, diet, probiotics, medication, and surgery). With OBMeta, users can not only analyze their research projects but also search and match public datasets for cross-validation. Moreover, OBMeta provides cross-phenotype and cross-intervention-based advanced validation that maximally supports preliminary findings from an individual study. To summarize, OBMeta is a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases. Availability OBMeta is freely available at: http://obmeta.met-bioinformatics.cn/. Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"OBMeta: a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases","authors":"Cuifang Xu, Jiating Huang, Yongqiang Gao, Weixing Zhao, Yiqi Shen, Feihong Luo, Gang Yu, Feng Zhu, Yan Ni","doi":"10.1093/bioinformatics/btad715","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad715","url":null,"abstract":"Motivation Gut dysbiosis is closely associated with obesity and related metabolic diseases including type 2 diabetes (T2D) and non-alcoholic fatty liver disease (NAFLD). The gut microbial features and biomarkers have been increasingly investigated in many studies, which require further validation due to the limited sample size and various confounding factors that may affect microbial compositions in a single study. So far, it lacks a comprehensive bioinformatics pipeline providing automated statistical analysis and integrating multiple independent studies for cross-validation simultaneously. Results OBMeta aims to streamline the standard metagenomics data analysis from diversity analysis, comparative analysis, and functional analysis to co-abundance network analysis. In addition, a curated database has been established with a total of 90 public research projects, covering three different phenotypes (Obesity, T2D, and NAFLD) and more than five different intervention strategies (exercise, diet, probiotics, medication, and surgery). With OBMeta, users can not only analyze their research projects but also search and match public datasets for cross-validation. Moreover, OBMeta provides cross-phenotype and cross-intervention-based advanced validation that maximally supports preliminary findings from an individual study. To summarize, OBMeta is a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases. Availability OBMeta is freely available at: http://obmeta.met-bioinformatics.cn/. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138574789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-09DOI: 10.1093/bioinformatics/btad745
Hikmet Emre Kaya, Kevin J Naidoo
Motivation Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. Results CytoCopasi, a Maven-based application for Cytoscape that combines the chemical systems analysis features of COPASI with the visualization and database access tools of Cytoscape and its plugin applications has been developed. The diverse functionality of CytoCopasi through ab initio model construction, model construction via pathway and parameter databases KEGG and BRENDA is presented. The comparative systems biology visualization analysis toolset is illustrated through a drug competence study on the cancerous RAF/MEK/ERK pathway. Availability The COPASI files, simulation data, native libraries, and the manual are available on https://github.com/scientificomputing/CytoCopasi Supplementary information Supplementary data is available at Bioinformatics online.
{"title":"CytoCopasi: A Chemical Systems Biology Target and Drug Discovery Visual Data Analytics Platform","authors":"Hikmet Emre Kaya, Kevin J Naidoo","doi":"10.1093/bioinformatics/btad745","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad745","url":null,"abstract":"Motivation Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. Results CytoCopasi, a Maven-based application for Cytoscape that combines the chemical systems analysis features of COPASI with the visualization and database access tools of Cytoscape and its plugin applications has been developed. The diverse functionality of CytoCopasi through ab initio model construction, model construction via pathway and parameter databases KEGG and BRENDA is presented. The comparative systems biology visualization analysis toolset is illustrated through a drug competence study on the cancerous RAF/MEK/ERK pathway. Availability The COPASI files, simulation data, native libraries, and the manual are available on https://github.com/scientificomputing/CytoCopasi Supplementary information Supplementary data is available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"29 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. Results Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which was developed by an integrated analysis of somatic mutation data and protein–protein interaction (PPI) networks and further improved by an information entropy (IE) index. Tested on pancancer and single cancers, MaxCLK outperformed other existing methods with higher accuracy. About pancancer, we predicted 154 driver genes and 787 driver modules. The analysis of co-occurrence and exclusivity between modules and pathways reveals the correlation of their combinations. Overall, our study has deepened the understanding of driver mechanism in PPI topology and found novel driver genes. Availability The source codes for MaxCLK are freely available at https://github.com/ShandongUniversityMasterMa/MaxCLK-main. Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"MaxCLK: discovery of cancer driver genes via maximal clique and information entropy of modules","authors":"Jian Liu, Fubin Ma, Yongdi Zhu, Naiqian Zhang, Lingming Kong, Jia Mi, Haiyan Cong, Rui Gao, Mingyi Wang, Yusen Zhang","doi":"10.1093/bioinformatics/btad737","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad737","url":null,"abstract":"Motivation Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. Results Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which was developed by an integrated analysis of somatic mutation data and protein–protein interaction (PPI) networks and further improved by an information entropy (IE) index. Tested on pancancer and single cancers, MaxCLK outperformed other existing methods with higher accuracy. About pancancer, we predicted 154 driver genes and 787 driver modules. The analysis of co-occurrence and exclusivity between modules and pathways reveals the correlation of their combinations. Overall, our study has deepened the understanding of driver mechanism in PPI topology and found novel driver genes. Availability The source codes for MaxCLK are freely available at https://github.com/ShandongUniversityMasterMa/MaxCLK-main. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"49 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-09DOI: 10.1093/bioinformatics/btad743
Dani Korpela, Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki
Motivation T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains challenging. Results We have developed a new machine learning model that utilizes information about the TCR from both α and β chains, epitope sequence, and MHC. Our method uses ProtBERT embeddings for the amino acid sequences of both chains and the epitope, as well as convolution and multi-head attention architectures. We show the importance of each input feature as well as the benefit of including epitopes with only a few TCRs to the training data. We evaluate our model on existing databases and show that it compares favorably against other state-of-the-art models. Code availability https://github.com/DaniTheOrange/EPIC-TRACE Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings","authors":"Dani Korpela, Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki","doi":"10.1093/bioinformatics/btad743","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad743","url":null,"abstract":"Motivation T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains challenging. Results We have developed a new machine learning model that utilizes information about the TCR from both α and β chains, epitope sequence, and MHC. Our method uses ProtBERT embeddings for the amino acid sequences of both chains and the epitope, as well as convolution and multi-head attention architectures. We show the importance of each input feature as well as the benefit of including epitopes with only a few TCRs to the training data. We evaluate our model on existing databases and show that it compares favorably against other state-of-the-art models. Code availability https://github.com/DaniTheOrange/EPIC-TRACE Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138563087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-08DOI: 10.1093/bioinformatics/btad748
Xinliang Sun, Xiao Jia, Zhangli Lu, Jing Tang, Min Li
Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological structures, which may hinder the capability of GCNs. Results In this study, we propose an adaptive graph convolutional networks approach, termed AdaDR, for drug repositioning by deeply integrating node features and topological structures. Distinct from conventional graph convolution networks, AdaDR models interactive information between them with adaptive graph convolution operation, which enhances the expression of model. Concretely, AdaDR simultaneously extracts embeddings from node features and topological structures and then uses the attention mechanism to learn adaptive importance weights of the embeddings. Experimental results show that AdaDR achieves better performance than multiple baselines for drug repositioning. Moreover, in the case study, exploratory analyses are offered for finding novel drug-disease associations. Availability and implementation The implementation of AdaDR and the preprocessed data is available at: https://github.com/xinliangSun/AdaDR. Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"Drug repositioning with adaptive graph convolutional networks","authors":"Xinliang Sun, Xiao Jia, Zhangli Lu, Jing Tang, Min Li","doi":"10.1093/bioinformatics/btad748","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad748","url":null,"abstract":"Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological structures, which may hinder the capability of GCNs. Results In this study, we propose an adaptive graph convolutional networks approach, termed AdaDR, for drug repositioning by deeply integrating node features and topological structures. Distinct from conventional graph convolution networks, AdaDR models interactive information between them with adaptive graph convolution operation, which enhances the expression of model. Concretely, AdaDR simultaneously extracts embeddings from node features and topological structures and then uses the attention mechanism to learn adaptive importance weights of the embeddings. Experimental results show that AdaDR achieves better performance than multiple baselines for drug repositioning. Moreover, in the case study, exploratory analyses are offered for finding novel drug-disease associations. Availability and implementation The implementation of AdaDR and the preprocessed data is available at: https://github.com/xinliangSun/AdaDR. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-08DOI: 10.1093/bioinformatics/btad746
Herve Emissah, Bengt Ljungquist, Giorgio A Ascoli
Summary Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions. Availability and Implementation The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis. Supplementary information Supplementary data are available at Bioinformatics online.
{"title":"Bibliometric analysis of neuroscience publications quantifies the impact of data sharing","authors":"Herve Emissah, Bengt Ljungquist, Giorgio A Ascoli","doi":"10.1093/bioinformatics/btad746","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad746","url":null,"abstract":"Summary Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions. Availability and Implementation The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis. Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"7 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}