Pub Date : 2021-11-01DOI: 10.1109/mlhpc54614.2021.00010
Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
{"title":"High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.","authors":"Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova","doi":"10.1109/mlhpc54614.2021.00010","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00010","url":null,"abstract":"<p><p>Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.</p>","PeriodicalId":75334,"journal":{"name":"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments","volume":"2021 ","pages":"46-57"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802329/pdf/nihms-1769610.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10267212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}