Ion channel classification through machine learning and protein language model embeddings.

IF 1.5 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Integrative Bioinformatics Pub Date : 2024-11-25 eCollection Date: 2024-12-01 DOI:10.1515/jib-2023-0047
Hamed Ghazikhani, Gregory Butler
{"title":"Ion channel classification through machine learning and protein language model embeddings.","authors":"Hamed Ghazikhani, Gregory Butler","doi":"10.1515/jib-2023-0047","DOIUrl":null,"url":null,"abstract":"<p><p>Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698620/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2023-0047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过机器学习和蛋白质语言模型嵌入进行离子通道分类。
离子通道是调节跨细胞膜离子通量的关键膜蛋白,影响着多种生物功能。传统的湿实验室离子通道鉴定实验耗费大量资源,因此人们越来越重视计算技术。本研究扩展了我们之前在用于离子通道预测的蛋白质语言模型方面的工作,大大提高了方法和性能。我们采用了一系列全面的机器学习算法,包括 k-近邻、随机森林、支持向量机和前馈神经网络,以及一种新颖的卷积神经网络(CNN)方法。这些方法利用来自 ProtBERT、ProtBERT-BFD 和 MembraneBERT 的微调嵌入来区分离子通道和非离子通道。我们的实证研究结果表明,结合了 ProtBERT-BFD 和 CNN 特征的 TooT-BERT-CNN-C 大大超越了现有基准。在我们的原始数据集上,它的马修斯相关系数(MCC)达到了 0.8584,准确率为 98.35%。更令人印象深刻的是,在新开发的更大数据集(DS-Cv2)上,它的马修斯相关系数(MCC)达到了 0.9492,独立测试集的 ROC AUC 达到了 0.9968。这些结果不仅凸显了蛋白质语言模型与深度学习在离子通道分类中的整合能力,还强调了在生物信息学任务中使用最新、全面数据集的重要性。我们的方法代表了离子通道识别计算方法的重大进步,对加速离子通道生物学研究和帮助药物发现工作具有潜在意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Integrative Bioinformatics
Journal of Integrative Bioinformatics Medicine-Medicine (all)
CiteScore
3.10
自引率
5.30%
发文量
27
审稿时长
12 weeks
期刊最新文献
Immunoinformatics-guided design of a multiepitope peptide vaccine targeting the receptor-binding domain of SARS-CoV-2 spike glycoprotein: insights from Indonesian samples. MiRNA target enrichment analysis of co-expression network modules reveals important miRNAs and their roles in breast cancer progression. Exploring the therapeutic potential of Asparagus africanus in polycystic ovarian syndrome: a computational analysis. TREMSUCS-TCGA - an integrated workflow for the identification of biomarkers for treatment success. Ion channel classification through machine learning and protein language model embeddings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1