Hina Ghafoor , Ahtisham Fazeel Abbasi , Muhammad Nabeel Asim , Andreas Dengel
{"title":"CTD-Global (CTD-G):用于激素肽预测的基于组成、转变和分布的新型肽序列编码器","authors":"Hina Ghafoor , Ahtisham Fazeel Abbasi , Muhammad Nabeel Asim , Andreas Dengel","doi":"10.1016/j.imu.2024.101578","DOIUrl":null,"url":null,"abstract":"<div><p>Hormone peptides are small signaling molecules that regulate key cellular processes such as cell growth, and differentiation. Hormone peptide identification is important for understanding their potential associations with certain diseases such as attention deficit hyperactivity disorder, diabetes, and psychiatric disorders. A comprehensive understanding of hormone peptides’ roles in cellular signaling, and immune regulation can provide insights into their therapeutic potential. Hormone peptides are identified through wet-lab approaches which are restricted by resource-intensive processes, limited scalability, and cost ineffectiveness. In an effort to substitute experimental approaches with computational predictors, researchers leveraged the capabilities of machine learning (ML) classifiers. These classifiers have inherent dependency over statistical vectors that are generated by extracting amino acids’ distinctive patterns from peptide sequences. Classifiers utilize these vectors for discriminating peptides into hormone and non-hormone classes. However, the performance of current predictors is constrained due to their inability to effectively extract discriminative amino acids patterns from peptide sequences. Following the need for a powerful predictor, the paper in hand presents a novel sequence encoder namely, CTD-G that transforms peptide sequences into statistical vectors by extracting 3 different types of amino acids patterns namely composition, transition, and distribution. Across public benchmark dataset, the proposed CTD-G encoder potential is compared with 56 existing encoders under two different evaluation strategies namely intrinsic and extrinsic. In Intrinsic evaluation, TSNE-based visualization demonstrates reduced overlap between clusters of hormone and non-hormone peptides with the proposed encoder’s statistical vectors compared to existing encoders. Extrinsic evaluation demonstrates the superiority of the proposed encoder, as 7 out of 11 ML classifiers achieve better performance with its statistical vectors compared to those from existing encoders. Furthermore, the proposed predictor outperforms existing hormone peptide classification predictors by 1.5% in accuracy, 5.36% in sensitivity, 1.80% in specificity, and 2.62% in MCC. To facilitate the scientific community, a web application is available at <span><span>https://sds_genetic_analysis.opendfki.de/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"50 ","pages":"Article 101578"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352914824001345/pdfft?md5=213fc4dace189dd6ba5c4b98542fe484&pid=1-s2.0-S2352914824001345-main.pdf","citationCount":"0","resultStr":"{\"title\":\"CTD-Global (CTD-G): A novel composition, transition, and distribution based peptide sequence encoder for hormone peptide prediction\",\"authors\":\"Hina Ghafoor , Ahtisham Fazeel Abbasi , Muhammad Nabeel Asim , Andreas Dengel\",\"doi\":\"10.1016/j.imu.2024.101578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Hormone peptides are small signaling molecules that regulate key cellular processes such as cell growth, and differentiation. Hormone peptide identification is important for understanding their potential associations with certain diseases such as attention deficit hyperactivity disorder, diabetes, and psychiatric disorders. A comprehensive understanding of hormone peptides’ roles in cellular signaling, and immune regulation can provide insights into their therapeutic potential. Hormone peptides are identified through wet-lab approaches which are restricted by resource-intensive processes, limited scalability, and cost ineffectiveness. In an effort to substitute experimental approaches with computational predictors, researchers leveraged the capabilities of machine learning (ML) classifiers. These classifiers have inherent dependency over statistical vectors that are generated by extracting amino acids’ distinctive patterns from peptide sequences. Classifiers utilize these vectors for discriminating peptides into hormone and non-hormone classes. However, the performance of current predictors is constrained due to their inability to effectively extract discriminative amino acids patterns from peptide sequences. Following the need for a powerful predictor, the paper in hand presents a novel sequence encoder namely, CTD-G that transforms peptide sequences into statistical vectors by extracting 3 different types of amino acids patterns namely composition, transition, and distribution. Across public benchmark dataset, the proposed CTD-G encoder potential is compared with 56 existing encoders under two different evaluation strategies namely intrinsic and extrinsic. In Intrinsic evaluation, TSNE-based visualization demonstrates reduced overlap between clusters of hormone and non-hormone peptides with the proposed encoder’s statistical vectors compared to existing encoders. Extrinsic evaluation demonstrates the superiority of the proposed encoder, as 7 out of 11 ML classifiers achieve better performance with its statistical vectors compared to those from existing encoders. Furthermore, the proposed predictor outperforms existing hormone peptide classification predictors by 1.5% in accuracy, 5.36% in sensitivity, 1.80% in specificity, and 2.62% in MCC. To facilitate the scientific community, a web application is available at <span><span>https://sds_genetic_analysis.opendfki.de/</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":13953,\"journal\":{\"name\":\"Informatics in Medicine Unlocked\",\"volume\":\"50 \",\"pages\":\"Article 101578\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2352914824001345/pdfft?md5=213fc4dace189dd6ba5c4b98542fe484&pid=1-s2.0-S2352914824001345-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informatics in Medicine Unlocked\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352914824001345\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914824001345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
CTD-Global (CTD-G): A novel composition, transition, and distribution based peptide sequence encoder for hormone peptide prediction
Hormone peptides are small signaling molecules that regulate key cellular processes such as cell growth, and differentiation. Hormone peptide identification is important for understanding their potential associations with certain diseases such as attention deficit hyperactivity disorder, diabetes, and psychiatric disorders. A comprehensive understanding of hormone peptides’ roles in cellular signaling, and immune regulation can provide insights into their therapeutic potential. Hormone peptides are identified through wet-lab approaches which are restricted by resource-intensive processes, limited scalability, and cost ineffectiveness. In an effort to substitute experimental approaches with computational predictors, researchers leveraged the capabilities of machine learning (ML) classifiers. These classifiers have inherent dependency over statistical vectors that are generated by extracting amino acids’ distinctive patterns from peptide sequences. Classifiers utilize these vectors for discriminating peptides into hormone and non-hormone classes. However, the performance of current predictors is constrained due to their inability to effectively extract discriminative amino acids patterns from peptide sequences. Following the need for a powerful predictor, the paper in hand presents a novel sequence encoder namely, CTD-G that transforms peptide sequences into statistical vectors by extracting 3 different types of amino acids patterns namely composition, transition, and distribution. Across public benchmark dataset, the proposed CTD-G encoder potential is compared with 56 existing encoders under two different evaluation strategies namely intrinsic and extrinsic. In Intrinsic evaluation, TSNE-based visualization demonstrates reduced overlap between clusters of hormone and non-hormone peptides with the proposed encoder’s statistical vectors compared to existing encoders. Extrinsic evaluation demonstrates the superiority of the proposed encoder, as 7 out of 11 ML classifiers achieve better performance with its statistical vectors compared to those from existing encoders. Furthermore, the proposed predictor outperforms existing hormone peptide classification predictors by 1.5% in accuracy, 5.36% in sensitivity, 1.80% in specificity, and 2.62% in MCC. To facilitate the scientific community, a web application is available at https://sds_genetic_analysis.opendfki.de/.
期刊介绍:
Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.