Pub Date : 2023-10-12eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1227193
Kejue Jia, Mesih Kilinc, Robert L Jernigan
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
{"title":"New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.","authors":"Kejue Jia, Mesih Kilinc, Robert L Jernigan","doi":"10.3389/fbinf.2023.1227193","DOIUrl":"10.3389/fbinf.2023.1227193","url":null,"abstract":"<p><p>Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1227193"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10602800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1277923
Thomas Krannich, Marina Herrera Sarrias, Hiba Ben Aribi, Moustafa Shokrof, Alfredo Iacoangeli, Ammar Al-Chalabi, Fritz J Sedlazeck, Ben Busby, Ahmad Al Khleifat
Motivation: For a number of neurological diseases, such as Alzheimer's disease, amyotrophic lateral sclerosis, and many others, certain genes are known to be involved in the disease mechanism. A common question is whether a structural variant in any such gene may be related to drug response in clinical trials and how this relationship can contribute to the lifecycle of drug development. Results: To this end, we introduce VariantSurvival, a tool that identifies changes in survival relative to structural variants within target genes. VariantSurvival matches annotated structural variants with genes that are clinically relevant to neurological diseases. A Cox regression model determines the change in survival between the placebo and clinical trial groups with respect to the number of structural variants in the drug target genes. We demonstrate the functionality of our approach with the exemplary case of the SETX gene. VariantSurvival has a user-friendly and lightweight graphical user interface built on the shiny web application package.
{"title":"VariantSurvival: a tool to identify genotype-treatment response.","authors":"Thomas Krannich, Marina Herrera Sarrias, Hiba Ben Aribi, Moustafa Shokrof, Alfredo Iacoangeli, Ammar Al-Chalabi, Fritz J Sedlazeck, Ben Busby, Ahmad Al Khleifat","doi":"10.3389/fbinf.2023.1277923","DOIUrl":"10.3389/fbinf.2023.1277923","url":null,"abstract":"<p><p><b>Motivation:</b> For a number of neurological diseases, such as Alzheimer's disease, amyotrophic lateral sclerosis, and many others, certain genes are known to be involved in the disease mechanism. A common question is whether a structural variant in any such gene may be related to drug response in clinical trials and how this relationship can contribute to the lifecycle of drug development. <b>Results:</b> To this end, we introduce VariantSurvival, a tool that identifies changes in survival relative to structural variants within target genes. VariantSurvival matches annotated structural variants with genes that are clinically relevant to neurological diseases. A Cox regression model determines the change in survival between the placebo and clinical trial groups with respect to the number of structural variants in the drug target genes. We demonstrate the functionality of our approach with the exemplary case of the <i>SETX</i> gene. VariantSurvival has a user-friendly and lightweight graphical user interface built on the shiny web application package.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1277923"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10598652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54232718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.
{"title":"DeepRaccess: high-speed RNA accessibility prediction using deep learning.","authors":"Kaisei Hara, Natsuki Iwano, Tsukasa Fukunaga, Michiaki Hamada","doi":"10.3389/fbinf.2023.1275787","DOIUrl":"10.3389/fbinf.2023.1275787","url":null,"abstract":"<p><p>RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in <i>E.coli</i> with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1275787"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10597636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50163995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-20eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1178600
David Bryant, Daniel H Huson
NeighborNet constructs phylogenetic networks to visualize distance data. It is a popular method used in a wide range of applications. While several studies have investigated its mathematical features, here we focus on computational aspects. The algorithm operates in three steps. We present a new simplified formulation of the first step, which aims at computing a circular ordering. We provide the first technical description of the second step, the estimation of split weights. We review the third step by constructing and drawing the network. Finally, we discuss how the networks might best be interpreted, review related approaches, and present some open questions.
{"title":"NeighborNet: improved algorithms and implementation.","authors":"David Bryant, Daniel H Huson","doi":"10.3389/fbinf.2023.1178600","DOIUrl":"10.3389/fbinf.2023.1178600","url":null,"abstract":"<p><p>NeighborNet constructs phylogenetic networks to visualize distance data. It is a popular method used in a wide range of applications. While several studies have investigated its mathematical features, here we focus on computational aspects. The algorithm operates in three steps. We present a new simplified formulation of the first step, which aims at computing a circular ordering. We provide the first technical description of the second step, the estimation of split weights. We review the third step by constructing and drawing the network. Finally, we discuss how the networks might best be interpreted, review related approaches, and present some open questions.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1178600"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548196/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1248732
Niloofar Shirvanizadeh, Mauno Vihinen
{"title":"VariBench, new variation benchmark categories and data sets.","authors":"Niloofar Shirvanizadeh, Mauno Vihinen","doi":"10.3389/fbinf.2023.1248732","DOIUrl":"10.3389/fbinf.2023.1248732","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1248732"},"PeriodicalIF":2.8,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10546188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41167306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-13eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1287407
Mohammed Zidane, Ahmad Makky, Matthias Bruhns, Alexander Rochwarger, Sepideh Babaei, Manfred Claassen, Christian M Schürch
[This corrects the article DOI: 10.3389/fbinf.2023.1159381.].
[这更正了文章DOI:10.3389/fbinf.2023.1159381.]。
{"title":"Corrigendum: A review on deep learning applications in highly multiplexed tissue imaging data analysis.","authors":"Mohammed Zidane, Ahmad Makky, Matthias Bruhns, Alexander Rochwarger, Sepideh Babaei, Manfred Claassen, Christian M Schürch","doi":"10.3389/fbinf.2023.1287407","DOIUrl":"10.3389/fbinf.2023.1287407","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fbinf.2023.1159381.].</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1287407"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41170250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peptide informatics is a rapidly growing field that is at the intersection of bioinformatics, chemistry, and biology. Peptides are short chains of amino acids that play important roles in a wide variety of biological processes, such as protein folding, signal transduction, and immune function. Peptide informatics is the use of computational methods to study peptides and their sequence, structure, function, and interactions. Recent advances in peptide informatics have led to a number of new discoveries and applications. For example, new methods have been developed to predict the structure of peptides, which can be used to design new drugs and therapies. New methods for identifying peptide-protein interactions have also been introduced, which can be used to understand the molecular basis of disease.
{"title":"Editorial: Recent advances in peptide informatics: challenges and opportunities.","authors":"Rahul Kumar, Kumardeep Chaudhary, Sandeep Kumar Dhanda","doi":"10.3389/fbinf.2023.1271932","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1271932","url":null,"abstract":"Peptide informatics is a rapidly growing field that is at the intersection of bioinformatics, chemistry, and biology. Peptides are short chains of amino acids that play important roles in a wide variety of biological processes, such as protein folding, signal transduction, and immune function. Peptide informatics is the use of computational methods to study peptides and their sequence, structure, function, and interactions. Recent advances in peptide informatics have led to a number of new discoveries and applications. For example, new methods have been developed to predict the structure of peptides, which can be used to design new drugs and therapies. New methods for identifying peptide-protein interactions have also been introduced, which can be used to understand the molecular basis of disease.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1271932"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10523389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41155909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1233281
Jack M Craig, Sudhir Kumar, S Blair Hedges
The origin of eukaryotes was among the most important events in the history of life, spawning a new evolutionary lineage that led to all complex multicellular organisms. However, the timing of this event, crucial for understanding its environmental context, has been difficult to establish. The fossil and biomarker records are sparse and molecular clocks have thus far not reached a consensus, with dates spanning 2.1-0.91 billion years ago (Ga) for critical nodes. Notably, molecular time estimates for the last common ancestor of eukaryotes are typically hundreds of millions of years younger than the Great Oxidation Event (GOE, 2.43-2.22 Ga), leading researchers to question the presumptive link between eukaryotes and oxygen. We obtained a new time estimate for the origin of eukaryotes using genetic data of both archaeal and bacterial origin, the latter rarely used in past studies. We also avoided potential calibration biases that may have affected earlier studies. We obtained a conservative interval of 2.2-1.5 Ga, with an even narrower core interval of 2.0-1.8 Ga, for the origin of eukaryotes, a period closely aligned with the rise in oxygen. We further reconstructed the history of biological complexity across the tree of life using three universal measures: cell types, genes, and genome size. We found that the rise in complexity was temporally consistent with and followed a pattern similar to the rise in oxygen. This suggests a causal relationship stemming from the increased energy needs of complex life fulfilled by oxygen.
{"title":"The origin of eukaryotes and rise in complexity were synchronous with the rise in oxygen.","authors":"Jack M Craig, Sudhir Kumar, S Blair Hedges","doi":"10.3389/fbinf.2023.1233281","DOIUrl":"10.3389/fbinf.2023.1233281","url":null,"abstract":"<p><p>The origin of eukaryotes was among the most important events in the history of life, spawning a new evolutionary lineage that led to all complex multicellular organisms. However, the timing of this event, crucial for understanding its environmental context, has been difficult to establish. The fossil and biomarker records are sparse and molecular clocks have thus far not reached a consensus, with dates spanning 2.1-0.91 billion years ago (Ga) for critical nodes. Notably, molecular time estimates for the last common ancestor of eukaryotes are typically hundreds of millions of years younger than the Great Oxidation Event (GOE, 2.43-2.22 Ga), leading researchers to question the presumptive link between eukaryotes and oxygen. We obtained a new time estimate for the origin of eukaryotes using genetic data of both archaeal and bacterial origin, the latter rarely used in past studies. We also avoided potential calibration biases that may have affected earlier studies. We obtained a conservative interval of 2.2-1.5 Ga, with an even narrower core interval of 2.0-1.8 Ga, for the origin of eukaryotes, a period closely aligned with the rise in oxygen. We further reconstructed the history of biological complexity across the tree of life using three universal measures: cell types, genes, and genome size. We found that the rise in complexity was temporally consistent with and followed a pattern similar to the rise in oxygen. This suggests a causal relationship stemming from the increased energy needs of complex life fulfilled by oxygen.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1233281"},"PeriodicalIF":2.8,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41142624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.
{"title":"Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets.","authors":"Wanxin Li, Jules Mirone, Ashok Prasad, Nina Miolane, Carine Legrand, Khanh Dao Duc","doi":"10.3389/fbinf.2023.1211819","DOIUrl":"10.3389/fbinf.2023.1211819","url":null,"abstract":"<p><p>Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called <i>DeCOr-MDS</i> (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1211819"},"PeriodicalIF":2.8,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10100807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}