ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献
As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.
随着越来越多的原子结构由冷冻电镜(cryo-EM)密度图确定,对这些结构进行验证是一项重要任务。我们对 2016 年 12 月之前发布的低温电子显微镜结构与 2017 年至 2019 年之间发布的低温电子显微镜结构的变化进行了比较分析,并报告了分析结果。根据分辨率优于 6 Å 的密度图创建的冷冻电镜模型被分为六个数据集。采用了基于直方图的离群点评分(HBOS),并从蛋白质数据库收集了验证报告。结果表明,2016 年 12 月之后发布的 EM 结构的整体质量优于 2017 年之前发布的结构。除了高分辨率数据集(高于 4 Å)中的亮氨酸、苯丙氨酸和丝氨酸外,大多数残基类型的构象质量可能都有所改善。我们观察到,根据 0-4 Å 分辨率密度图解算出的结构与根据 4-6 Å 分辨率密度图得出的结构具有几乎相同的 HBOS 曲线。
{"title":"A Histogram-based Outlier Profile for Atomic Structures Derived from Cryo-Electron Microscopy.","authors":"Lin Chen, Jing He","doi":"10.1145/3307339.3343865","DOIUrl":"10.1145/3307339.3343865","url":null,"abstract":"<p><p>As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279010/pdf/nihms-1662219.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang
Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.
{"title":"Improving Validity of Cause of Death on Death Certificates.","authors":"Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang","doi":"10.1145/3233547.3233581","DOIUrl":"10.1145/3233547.3233581","url":null,"abstract":"Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.
{"title":"Using Combined Features to Analyze Atomic Structures Derived from Cryo-EM Density Maps.","authors":"Lin Chen, Jing He","doi":"10.1145/3233547.3233709","DOIUrl":"https://doi.org/10.1145/3233547.3233709","url":null,"abstract":"<p><p>Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9869907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.
{"title":"Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm.","authors":"Hansaim Lim, Lei Xie","doi":"10.1145/3233547.3233551","DOIUrl":"https://doi.org/10.1145/3233547.3233551","url":null,"abstract":"<p><p>Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37380671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryo-electron microscopy (cryo-EM) is an emerging biophysical technique for structural determination of protein complexes. However, accurate detection of secondary structures is still challenging when cryo-EM density maps are at medium resolutions (5-10 Å). Most of existing methods are image processing methods that do not fully utilize available images in the cryo-EM database. In this paper, we present a deep learning approach to segment secondary structure elements as helices and β-sheets from medium-resolution density maps. The proposed 3D convolutional neural network is shown to detect secondary structure locations with an F1 score between 0.79 and 0.88 for six simulated test cases. The architecture was also applied to an experimentally-derived cryo-EM density map with good accuracy.
{"title":"Exploratory Studies Detecting Secondary Structures in Medium Resolution 3D Cryo-EM Images Using Deep Convolutional Neural Networks.","authors":"Devin Haslam, Tao Zeng, Rongjian Li, Jing He","doi":"10.1145/3233547.3233704","DOIUrl":"https://doi.org/10.1145/3233547.3233704","url":null,"abstract":"<p><p>Cryo-electron microscopy (cryo-EM) is an emerging biophysical technique for structural determination of protein complexes. However, accurate detection of secondary structures is still challenging when cryo-EM density maps are at medium resolutions (5-10 Å). Most of existing methods are image processing methods that do not fully utilize available images in the cryo-EM database. In this paper, we present a deep learning approach to segment secondary structure elements as helices and β-sheets from medium-resolution density maps. The proposed 3D convolutional neural network is shown to detect secondary structure locations with an F1 score between 0.79 and 0.88 for six simulated test cases. The architecture was also applied to an experimentally-derived cryo-EM density map with good accuracy.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233704","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander S Maher, Kenneth A Rostowsky, Nahian F Chowdhury, Andrei Irimia
Connectomics alterations associated with subtle forms of cerebrovascular neuropathology-such as cerebral microbleeds (CMBs)-can result in substantial neurological and/or cognitive deficits in victims of traumatic brain injury (TBI). Quantifying CMB-related connectome changes in mild TBI (mTBI) patients requires ingenious neuroinformatics to integrate structural magnetic resonance imaging (sMRI) with diffusion-weighted imaging (DWI) for patient-tailored profiling while preserving the data scientist's ability to implement population studies. Such solutions, however, can assist the refinement of rehabilitation protocols and streamline large-scale analysis while accommodating the heterogeneity of mTBI. This study describes a pipeline for the multimodal integration of sMRI/DWI/DTI to quantify white matter (WM) neural network circuitry alterations associated with mTBI-related CMBs. The approach incorporates WM streamline matching, topology-compliant streamline prototyping and along-tract analysis within a unified framework. When applied to the analysis of neuroimaging data acquired from both mTBI and healthy control volunteers, the approach facilitates the identification of patient-specific CMB-related connectomic changes while incorporating the ability to perform group analyses. This pipeline for the identification and profiling of connectopathies can assist the adaptation of clinical rehabilitation protocols to patients' individual needs.
{"title":"Neuroinformatics and Analysis of Connectomic Alterations Due to Cerebral Microhemorrhages in Geriatric Mild Neurotrauma.","authors":"Alexander S Maher, Kenneth A Rostowsky, Nahian F Chowdhury, Andrei Irimia","doi":"10.1145/3233547.3233598","DOIUrl":"https://doi.org/10.1145/3233547.3233598","url":null,"abstract":"<p><p>Connectomics alterations associated with subtle forms of cerebrovascular neuropathology-such as cerebral microbleeds (CMBs)-can result in substantial neurological and/or cognitive deficits in victims of traumatic brain injury (TBI). Quantifying CMB-related connectome changes in mild TBI (mTBI) patients requires ingenious neuroinformatics to integrate structural magnetic resonance imaging (sMRI) with diffusion-weighted imaging (DWI) for patient-tailored profiling while preserving the data scientist's ability to implement population studies. Such solutions, however, can assist the refinement of rehabilitation protocols and streamline large-scale analysis while accommodating the heterogeneity of mTBI. This study describes a pipeline for the multimodal integration of sMRI/DWI/DTI to quantify white matter (WM) neural network circuitry alterations associated with mTBI-related CMBs. The approach incorporates WM streamline matching, topology-compliant streamline prototyping and along-tract analysis within a unified framework. When applied to the analysis of neuroimaging data acquired from both mTBI and healthy control volunteers, the approach facilitates the identification of patient-specific CMB-related connectomic changes while incorporating the ability to perform group analyses. This pipeline for the identification and profiling of connectopathies can assist the adaptation of clinical rehabilitation protocols to patients' individual needs.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233598","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36902673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (ULTRA) are competitive with the most heavily used tool for repeat masking (TRF). ULTRA is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.
{"title":"ULTRA: A Model Based Tool to Detect Tandem Repeats.","authors":"Daniel Olson, Travis Wheeler","doi":"10.1145/3233547.3233604","DOIUrl":"https://doi.org/10.1145/3233547.3233604","url":null,"abstract":"<p><p>In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (<i>ULTRA</i>) are competitive with the most heavily used tool for repeat masking (<i>TRF</i>). <i>ULTRA</i> is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233604","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37231821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Nord, Kaitlin Carey, Peter Hornbeck, Travis Wheeler
Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present Mirage, a novel MSA software package for the alignment of alternatively spliced protein isoforms. Mirage aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. Mirage is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. Mirage alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.
{"title":"Splice-Aware Multiple Sequence Alignment of Protein Isoforms.","authors":"Alex Nord, Kaitlin Carey, Peter Hornbeck, Travis Wheeler","doi":"10.1145/3233547.3233592","DOIUrl":"10.1145/3233547.3233592","url":null,"abstract":"<p><p>Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present <i>Mirage</i>, a novel MSA software package for the alignment of alternatively spliced protein isoforms. <i>Mirage</i> aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. <i>Mirage</i> is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. <i>Mirage</i> alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508070/pdf/nihms-993818.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37231822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electron cryo-microscopy (Cryo-EM) technique produces density maps that are 3-dimensional (3D) images of molecules. It is challenging to derive atomic structures of proteins from 3D images of medium resolutions. Twist of a β-strand has been studied extensively while little of the known information has been directly obtained from the 3D image of a β-sheet. We describe a method to characterize the twist of β-strands from the 3D image of a protein. An analysis of 11 β-sheet images shows that the Averaged Minimum Twist (AMT) angle is larger for a close set than for a far set of β-traces.
{"title":"Analysis of β-strand Twist from the 3-dimensional Image of a Protein.","authors":"Tunazzina Islam, Michael Poteat, Jing He","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electron cryo-microscopy (Cryo-EM) technique produces density maps that are 3-dimensional (3D) images of molecules. It is challenging to derive atomic structures of proteins from 3D images of medium resolutions. Twist of a β-strand has been studied extensively while little of the known information has been directly obtained from the 3D image of a β-sheet. We describe a method to characterize the twist of β-strands from the 3D image of a protein. An analysis of 11 β-sheet images shows that the Averaged Minimum Twist (AMT) angle is larger for a close set than for a far set of β-traces.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279011/pdf/nihms967628.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.
{"title":"An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics.","authors":"Muaaz Gul Awan, Fahad Saeed","doi":"10.1145/3107411.3107466","DOIUrl":"https://doi.org/10.1145/3107411.3107466","url":null,"abstract":"<p><p>Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include <i>Binary Spectra</i> and <i>Quantized Indexed Spectra (QIS)</i>. The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between <i>in-core</i> and <i>out-of-core</i> modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3107411.3107466","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35469416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine