{"title":"A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid","authors":"Feng Chen, Tarikul I. Milon, Poorya Khajouie, Antoinette Myers, Wu Xu","doi":"10.2174/0115748936306625240724102438","DOIUrl":null,"url":null,"abstract":"Background: Proteins play a vital role in sustaining life, requiring the formation of specific 3D structures to manifest their essential biological functions. Structure comparison techniques are benefiting from the ever-expanding repositories of the Protein Data Bank. The development of computational tools for protein and amino acid 3D structural comparisons plays an important role in understanding protein functions. The Triangular Spatial Relationship (TSR)-based was developed for such purpose. Methods: A parallelization strategy and actual implementation on high-performance clusters using the distributed and shared memory programming model, along with the utilization of multi-core CPU and many-core GPU accelerators, were developed. 3D structures of proteins and amino acids are represented by an integer vector in the TSR-based method. This parallelization strategy is designed for the TSR-based method for large-scale 3D structural comparisons of proteins and amino acids in this study. It can also be adapted to other applications where a vector type of data structure is used. Results: Due to the nature of the vector representation of protein and amino acid structures using the TSR-based method, the comparison algorithm is well-suited for parallelization on large scale supercomputers. Performance studies on the representative datasets were conducted to demonstrate the efficiency of the parallelization strategy. It allows comparisons of large 3D protein or amino acid structure datasets to finish within a reasonable amount of time. Conclusion: The case studies, by taking advantage of this parallelization code, demonstrate that applying either mirror image or feature selection in the TSR-based algorithms improves the classifications of protein and amino acid 3D structures. The TSR keys have the advantage of performing structure-based BLAST searches. The parallelization code could be used as a reference for similar future studies.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"97 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936306625240724102438","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Proteins play a vital role in sustaining life, requiring the formation of specific 3D structures to manifest their essential biological functions. Structure comparison techniques are benefiting from the ever-expanding repositories of the Protein Data Bank. The development of computational tools for protein and amino acid 3D structural comparisons plays an important role in understanding protein functions. The Triangular Spatial Relationship (TSR)-based was developed for such purpose. Methods: A parallelization strategy and actual implementation on high-performance clusters using the distributed and shared memory programming model, along with the utilization of multi-core CPU and many-core GPU accelerators, were developed. 3D structures of proteins and amino acids are represented by an integer vector in the TSR-based method. This parallelization strategy is designed for the TSR-based method for large-scale 3D structural comparisons of proteins and amino acids in this study. It can also be adapted to other applications where a vector type of data structure is used. Results: Due to the nature of the vector representation of protein and amino acid structures using the TSR-based method, the comparison algorithm is well-suited for parallelization on large scale supercomputers. Performance studies on the representative datasets were conducted to demonstrate the efficiency of the parallelization strategy. It allows comparisons of large 3D protein or amino acid structure datasets to finish within a reasonable amount of time. Conclusion: The case studies, by taking advantage of this parallelization code, demonstrate that applying either mirror image or feature selection in the TSR-based algorithms improves the classifications of protein and amino acid 3D structures. The TSR keys have the advantage of performing structure-based BLAST searches. The parallelization code could be used as a reference for similar future studies.
背景:蛋白质在维持生命方面发挥着至关重要的作用,需要形成特定的三维结构才能体现其基本生物功能。结构比较技术得益于蛋白质数据库不断扩大的资源库。蛋白质和氨基酸三维结构比较计算工具的开发在了解蛋白质功能方面发挥着重要作用。基于三角空间关系(TSR)的计算工具就是为此而开发的。方法:利用分布式和共享内存编程模型,同时利用多核 CPU 和多核 GPU 加速器,开发了一种并行化策略,并在高性能集群上实际实施。在基于 TSR 的方法中,蛋白质和氨基酸的三维结构由整数向量表示。在本研究中,这种并行化策略是为基于 TSR 的方法设计的,用于蛋白质和氨基酸的大规模三维结构比较。它也可适用于使用矢量类型数据结构的其他应用。结果由于使用基于 TSR 的方法对蛋白质和氨基酸结构进行矢量表示的性质,该比较算法非常适合在大型超级计算机上进行并行化。对代表性数据集进行的性能研究证明了并行化策略的效率。它允许在合理的时间内完成大型三维蛋白质或氨基酸结构数据集的比较。结论利用该并行化代码进行的案例研究表明,在基于 TSR 的算法中应用镜像或特征选择可以改进蛋白质和氨基酸三维结构的分类。TSR 密钥具有执行基于结构的 BLAST 搜索的优势。该并行化代码可作为今后类似研究的参考。
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.