ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献_第3页

A Histogram-based Outlier Profile for Atomic Structures Derived from Cryo-Electron Microscopy. 基于直方图的异常值剖面图，用于从冷冻电子显微镜得出的原子结构。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3343865

Lin Chen, Jing He

As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.

随着越来越多的原子结构由冷冻电镜（cryo-EM）密度图确定，对这些结构进行验证是一项重要任务。我们对 2016 年 12 月之前发布的低温电子显微镜结构与 2017 年至 2019 年之间发布的低温电子显微镜结构的变化进行了比较分析，并报告了分析结果。根据分辨率优于 6 Å 的密度图创建的冷冻电镜模型被分为六个数据集。采用了基于直方图的离群点评分（HBOS），并从蛋白质数据库收集了验证报告。结果表明，2016 年 12 月之后发布的 EM 结构的整体质量优于 2017 年之前发布的结构。除了高分辨率数据集（高于 4 Å）中的亮氨酸、苯丙氨酸和丝氨酸外，大多数残基类型的构象质量可能都有所改善。我们观察到，根据 0-4 Å 分辨率密度图解算出的结构与根据 4-6 Å 分辨率密度图得出的结构具有几乎相同的 HBOS 曲线。

{"title":"A Histogram-based Outlier Profile for Atomic Structures Derived from Cryo-Electron Microscopy.","authors":"Lin Chen, Jing He","doi":"10.1145/3307339.3343865","DOIUrl":"10.1145/3307339.3343865","url":null,"abstract":"As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279010/pdf/nihms-1662219.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Validity of Cause of Death on Death Certificates. 提高死亡证明书死因的有效性。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233581

Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang

Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.

在死亡证明上准确报告死亡原因是疾病预防控制中心等国家卫生保护机构制定适当的疾病控制、预防和应急措施的必要条件。在这项研究中，我们利用来自公开可用的专家制定的死因规则的知识来确定国家死亡率数据中死亡证明与专家知识库的不一致程度。我们还报告了医生在死亡证明中填写的最常见的无效因果对。我们使用序列规则挖掘来发现死亡证明中最常见的模式，并将其与基于专家知识的规则进行比较。根据我们的结果，从死亡证明条目中得出的常见模式中有20.1%是不一致的。这些不一致或无效规则的最可能原因是缺少步骤和死亡证明上的非特定ICD-10代码。

{"title":"Improving Validity of Cause of Death on Death Certificates.","authors":"Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang","doi":"10.1145/3233547.3233581","DOIUrl":"10.1145/3233547.3233581","url":null,"abstract":"Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Using Combined Features to Analyze Atomic Structures Derived from Cryo-EM Density Maps. 利用组合特征分析从低温电镜密度图中得到的原子结构。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233709

Lin Chen, Jing He

Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.

低温电子显微镜(cryo-EM)已成为测定蛋白质结构的主要技术。许多原子结构已经从大约3Å分辨率的低温电镜密度图中得到。侧链构象在超分辨率的密度图中可以很好地确定，例如1-2Å。希望有一种不需要超分辨率密度图的统计方法来检测异常侧链。在这项研究中，我们分别分析了分辨率高于1.5Å的x射线密度图和分辨率为2-4 Å和4-6 Å的低温电镜密度图的结构。我们引入了一种基于直方图的异常值评分(HBOS)，用于从低温电镜密度图构建的蛋白质模型的异常检测。该方法使用x射线数据集(

{"title":"Using Combined Features to Analyze Atomic Structures Derived from Cryo-EM Density Maps.","authors":"Lin Chen, Jing He","doi":"10.1145/3233547.3233709","DOIUrl":"https://doi.org/10.1145/3233547.3233709","url":null,"abstract":"Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9869907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm. 基于邻域正则化三因子一类协同过滤算法的转录因子靶基因预测。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233551

Hansaim Lim, Lei Xie

Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.

确定转录因子的靶基因是了解转录调控的关键因素之一。然而，由于大规模实验的成本和内在的复杂性，我们对全基因组TF靶向谱的理解有限。因此，计算预测方法对于预测未观察到的关联是有用的。在这里，我们开发了一种新的一类协同过滤算法tREMAP，该算法基于正则化，加权非负矩阵三因子分解。该算法利用已知的基因- tf关联和蛋白-蛋白相互作用网络预测未观察到的tf靶基因。我们的基准研究表明，在所有四个性能指标AUC、MAP、MPR和HLU方面，tREMAP在转录因子靶基因预测方面都明显优于REMAP(一种基于双因子分解的算法)。当用独立数据集评估时，对前495个预测关联的预测准确率为37.8%，与随机猜测相比，富集系数为4.19。此外，tREMAP预测的许多新关联都得到了文献证据的支持。虽然我们在本研究中只使用了标准的tf靶基因相互作用数据，但tREMAP可以直接应用于组织特异性数据集。tREMAP为进一步完善TF靶基因预测提供了一个整合多组学数据的框架。因此，tREMAP是研究基因调控网络的潜在有用工具。tREMAP的基准数据集和源代码可以在https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP上免费获得。

{"title":"Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm.","authors":"Hansaim Lim, Lei Xie","doi":"10.1145/3233547.3233551","DOIUrl":"https://doi.org/10.1145/3233547.3233551","url":null,"abstract":"Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37380671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Exploratory Studies Detecting Secondary Structures in Medium Resolution 3D Cryo-EM Images Using Deep Convolutional Neural Networks. 利用深度卷积神经网络检测中分辨率三维冷冻电镜图像中二级结构的探索性研究。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233704

Devin Haslam, Tao Zeng, Rongjian Li, Jing He

Cryo-electron microscopy (cryo-EM) is an emerging biophysical technique for structural determination of protein complexes. However, accurate detection of secondary structures is still challenging when cryo-EM density maps are at medium resolutions (5-10 Å). Most of existing methods are image processing methods that do not fully utilize available images in the cryo-EM database. In this paper, we present a deep learning approach to segment secondary structure elements as helices and β-sheets from medium-resolution density maps. The proposed 3D convolutional neural network is shown to detect secondary structure locations with an F1 score between 0.79 and 0.88 for six simulated test cases. The architecture was also applied to an experimentally-derived cryo-EM density map with good accuracy.

低温电子显微镜(cryo-EM)是一种新兴的生物物理技术，用于蛋白质复合物的结构测定。然而，当低温电镜密度图处于中等分辨率时，二级结构的准确检测仍然具有挑战性(5-10 Å)。现有的大多数方法都是图像处理方法，不能充分利用低温电镜数据库中的可用图像。在本文中，我们提出了一种深度学习方法，从中分辨率密度图中分割二级结构元素作为螺旋和β-片。在6个模拟测试用例中，所提出的三维卷积神经网络检测二级结构位置的F1得分在0.79 ~ 0.88之间。该结构还应用于实验导出的低温电镜密度图，具有良好的精度。

引用次数: 6

Neuroinformatics and Analysis of Connectomic Alterations Due to Cerebral Microhemorrhages in Geriatric Mild Neurotrauma. 老年轻度神经外伤脑微出血引起的神经信息学和连接组改变分析。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233598

Alexander S Maher, Kenneth A Rostowsky, Nahian F Chowdhury, Andrei Irimia

Connectomics alterations associated with subtle forms of cerebrovascular neuropathology-such as cerebral microbleeds (CMBs)-can result in substantial neurological and/or cognitive deficits in victims of traumatic brain injury (TBI). Quantifying CMB-related connectome changes in mild TBI (mTBI) patients requires ingenious neuroinformatics to integrate structural magnetic resonance imaging (sMRI) with diffusion-weighted imaging (DWI) for patient-tailored profiling while preserving the data scientist's ability to implement population studies. Such solutions, however, can assist the refinement of rehabilitation protocols and streamline large-scale analysis while accommodating the heterogeneity of mTBI. This study describes a pipeline for the multimodal integration of sMRI/DWI/DTI to quantify white matter (WM) neural network circuitry alterations associated with mTBI-related CMBs. The approach incorporates WM streamline matching, topology-compliant streamline prototyping and along-tract analysis within a unified framework. When applied to the analysis of neuroimaging data acquired from both mTBI and healthy control volunteers, the approach facilitates the identification of patient-specific CMB-related connectomic changes while incorporating the ability to perform group analyses. This pipeline for the identification and profiling of connectopathies can assist the adaptation of clinical rehabilitation protocols to patients' individual needs.

连接组学改变与脑血管神经病变的细微形式相关，如脑微出血(CMBs)，可导致创伤性脑损伤(TBI)患者出现严重的神经和/或认知缺陷。量化轻度TBI (mTBI)患者的cmb相关连接组变化需要巧妙的神经信息学，将结构磁共振成像(sMRI)与扩散加权成像(DWI)结合起来，为患者量身定制分析，同时保留数据科学家实施人群研究的能力。然而，这样的解决方案可以帮助改进康复方案和简化大规模分析，同时适应mTBI的异质性。本研究描述了sMRI/DWI/DTI多模式集成的管道，以量化与mtbi相关的CMBs相关的白质(WM)神经网络电路改变。该方法在一个统一的框架内结合了WM流线匹配、拓扑兼容的流线原型和沿路分析。当应用于分析mTBI和健康对照志愿者获得的神经成像数据时，该方法有助于识别患者特异性cmb相关的连接组变化，同时结合进行组分析的能力。这种连接病变的识别和分析管道可以帮助临床康复方案适应患者的个人需求。

{"title":"Neuroinformatics and Analysis of Connectomic Alterations Due to Cerebral Microhemorrhages in Geriatric Mild Neurotrauma.","authors":"Alexander S Maher, Kenneth A Rostowsky, Nahian F Chowdhury, Andrei Irimia","doi":"10.1145/3233547.3233598","DOIUrl":"https://doi.org/10.1145/3233547.3233598","url":null,"abstract":"Connectomics alterations associated with subtle forms of cerebrovascular neuropathology-such as cerebral microbleeds (CMBs)-can result in substantial neurological and/or cognitive deficits in victims of traumatic brain injury (TBI). Quantifying CMB-related connectome changes in mild TBI (mTBI) patients requires ingenious neuroinformatics to integrate structural magnetic resonance imaging (sMRI) with diffusion-weighted imaging (DWI) for patient-tailored profiling while preserving the data scientist's ability to implement population studies. Such solutions, however, can assist the refinement of rehabilitation protocols and streamline large-scale analysis while accommodating the heterogeneity of mTBI. This study describes a pipeline for the multimodal integration of sMRI/DWI/DTI to quantify white matter (WM) neural network circuitry alterations associated with mTBI-related CMBs. The approach incorporates WM streamline matching, topology-compliant streamline prototyping and along-tract analysis within a unified framework. When applied to the analysis of neuroimaging data acquired from both mTBI and healthy control volunteers, the approach facilitates the identification of patient-specific CMB-related connectomic changes while incorporating the ability to perform group analyses. This pipeline for the identification and profiling of connectopathies can assist the adaptation of clinical rehabilitation protocols to patients' individual needs.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233598","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36902673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

ULTRA: A Model Based Tool to Detect Tandem Repeats. ULTRA:一种基于模型的串联重复序列检测工具

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233604

Daniel Olson, Travis Wheeler

In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (ULTRA) are competitive with the most heavily used tool for repeat masking (TRF). ULTRA is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.

在生物序列中，串联重复序列由数十到数百个重复模式的残基组成，例如atgatgatgatg ('atg'重复)，通常是复制滑移的结果。随着时间的推移，这些重复会衰减，从而使原始的尖锐重复模式在某种程度上变得模糊，但即使是退化的重复也会给序列注释带来问题:当两个序列都包含相似重复的共享模式时，结果可能是序列同源性的错误信号。我们描述了一种用于检测串联重复序列的新隐马尔可夫模型的实现，该模型对标记衰减重复区域的灵敏度显着提高，在大范围的序列组成中呈现低而可靠的错误注释率，并产生遵循稳定分布的分数。在典型的基因组序列中，结果工具(ULTRA)的时间和内存要求与最常用的重复掩蔽工具(TRF)竞争。ULTRA是在开放源码许可下发布的，它为在序列比对工具和注释管道中包含模型奠定了基础。

{"title":"ULTRA: A Model Based Tool to Detect Tandem Repeats.","authors":"Daniel Olson, Travis Wheeler","doi":"10.1145/3233547.3233604","DOIUrl":"https://doi.org/10.1145/3233547.3233604","url":null,"abstract":"In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (ULTRA) are competitive with the most heavily used tool for repeat masking (TRF). ULTRA is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233604","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37231821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Splice-Aware Multiple Sequence Alignment of Protein Isoforms. 蛋白质异构体的剪接感知多序列比对。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2018-08-01 DOI: 10.1145/3233547.3233592

Alex Nord, Kaitlin Carey, Peter Hornbeck, Travis Wheeler

Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present Mirage, a novel MSA software package for the alignment of alternatively spliced protein isoforms. Mirage aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. Mirage is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. Mirage alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.

多序列比对（MSA）是计算基因组学中的一个经典问题。在典型的使用中，MSA 软件需要比对一系列同源基因，如来自多个物种的直向同源基因或一个物种内由重复引起的旁系基因。最近，人们开始关注替代剪接的同工酶在疾病和细胞生物学中的重要性，这凸显了创建能更有效地适应同工酶的 MSA 的必要性。传统上，MSA 的构建采用评分标准，即偏好偶尔出现错配的配对，而不是间隙较长的配对。相对于其他同种异构体，替代剪接的蛋白质同种异构体实际上含有外显子长度的插入或缺失（indels），因此需要一种替代方法。通过大大降低吲哚惩罚可以实现一些改进，但这只是一种修修补补的解决方案。在这项工作中，我们介绍了一种新型 MSA 软件包 Mirage，它可用于配准交替剪接的蛋白质同工酶。Mirage 通过首先将每个蛋白质序列映射到其编码基因组序列，然后根据组成密码子的相对基因组坐标将同工酶相互对齐。Mirage 在将蛋白质映射回其编码外显子方面非常有效，这些蛋白质基因组映射可产生极其精确的种内对齐；这些对齐中的剪接位点信息可用于提高同工酶异构体种间对齐的精确度。镜像比对还揭示了双编码外显子的普遍性，在这种情况下，一个外显子有条件地编码多个开放阅读框，作为帧偏移基因组序列的重叠剪接片段。

{"title":"Splice-Aware Multiple Sequence Alignment of Protein Isoforms.","authors":"Alex Nord, Kaitlin Carey, Peter Hornbeck, Travis Wheeler","doi":"10.1145/3233547.3233592","DOIUrl":"10.1145/3233547.3233592","url":null,"abstract":"Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present Mirage, a novel MSA software package for the alignment of alternatively spliced protein isoforms. Mirage aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. Mirage is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. Mirage alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508070/pdf/nihms-993818.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37231822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of β-strand Twist from the 3-dimensional Image of a Protein. 蛋白质三维图像中β-链扭曲的分析。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2017-08-01

Tunazzina Islam, Michael Poteat, Jing He

Electron cryo-microscopy (Cryo-EM) technique produces density maps that are 3-dimensional (3D) images of molecules. It is challenging to derive atomic structures of proteins from 3D images of medium resolutions. Twist of a β-strand has been studied extensively while little of the known information has been directly obtained from the 3D image of a β-sheet. We describe a method to characterize the twist of β-strands from the 3D image of a protein. An analysis of 11 β-sheet images shows that the Averaged Minimum Twist (AMT) angle is larger for a close set than for a far set of β-traces.

电子冷冻显微镜(Cryo-EM)技术产生分子的三维(3D)图像的密度图。从中等分辨率的三维图像中推导蛋白质的原子结构是一项具有挑战性的工作。人们对β-链的扭曲进行了广泛的研究，但从β-片的三维图像中直接获得的已知信息很少。我们描述了一种方法来表征β-链的扭曲从三维图像的蛋白质。对11张β薄片图像的分析表明，平均最小扭转(AMT)角对于近组β痕迹比远组β痕迹更大。

引用次数: 0

An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics. 基于out - core GPU的大质谱数据降维算法及其在自下而上蛋白质组学中的应用。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2017-08-01 DOI: 10.1145/3107411.3107466

Muaaz Gul Awan, Fahad Saeed

Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.

现代高分辨率质谱仪可以在单一系统生物学实验中产生数百万个光谱。每个光谱由数千个峰组成，但只有少数峰对肽的扣除有积极作用。因此，对质谱数据进行预处理以检测噪声和无用峰是一个活跃的研究领域。由于时间复杂度高，大多数序列降噪算法作为预处理步骤是不切实际的。本文提出了一种基于GPU的MS2光谱降维算法G-MSR。我们提出的算法使用了新颖的数据结构，优化了GPU内部的内存和计算操作。这些新的数据结构包括二元光谱和量化索引光谱(QIS)。前者可以用最少的数据量在CPU和GPU之间传递重要的信息，而后者可以将复杂的三维数据结构存储和处理成一维数组结构，同时保持MS数据的完整性。我们提出的算法还考虑了gpu有限的内存以及基于输入数据大小在核内和核外模式之间的切换。G-MSR实现了386x的峰值加速，并显示在32秒内处理超过一百万个光谱。该算法的代码可以在GitHub上以GPL开源的形式在以下链接中获得:https://github.com/pcdslab/G-MSR。

{"title":"An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics.","authors":"Muaaz Gul Awan, Fahad Saeed","doi":"10.1145/3107411.3107466","DOIUrl":"https://doi.org/10.1145/3107411.3107466","url":null,"abstract":"Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3107411.3107466","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35469416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8