Robust ensemble of handcrafted and learned approaches for DNA-binding proteins

IF 12.3 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Applied Computing and Informatics Pub Date : 2021-05-04 DOI:10.1108/ACI-03-2021-0051
L. Nanni, S. Brahnam
{"title":"Robust ensemble of handcrafted and learned approaches for DNA-binding proteins","authors":"L. Nanni, S. Brahnam","doi":"10.1108/ACI-03-2021-0051","DOIUrl":null,"url":null,"abstract":"PurposeAutomatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.Design/methodology/approachEfficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.FindingsThe best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.Originality/valueMost DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.","PeriodicalId":37348,"journal":{"name":"Applied Computing and Informatics","volume":"ahead-of-print 1","pages":""},"PeriodicalIF":12.3000,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ACI-03-2021-0051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

PurposeAutomatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.Design/methodology/approachEfficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.FindingsThe best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.Originality/valueMost DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
强大的手工制作和学习方法的dna结合蛋白集合
目的dna结合蛋白(DNA-BP)自动分类是目前一项重要的蛋白质组学技术。不幸的是,文献中报道的许多系统只在一个或两个数据集/任务上进行了测试。本研究的目的是创建最优和通用的DNA-BP分类系统,一个在多个DNA-BP分类任务中执行竞争性的系统。设计/方法/方法高效的DNA-BP分类系统需要发现强大的蛋白质表示和特征提取方法。实验进行了结合和比较从最先进的矩阵/图像蛋白质表示中提取的描述符。这些描述符在单独的支持向量机(svm)上进行训练并进行评估。不同参数设置的卷积神经网络对蛋白质的两种矩阵表示进行了微调。使用加权和规则将决策与支持向量机融合并进行评估,以实验得出最强大的通用DNA-BP分类器系统。在与代表多种DNA-BP分类任务的四个不同数据集的文献进行广泛和公平的比较后,本文提出的最佳集成产生了相当的分类结果,如果不是更好的话,从而证明了所提出系统的能力和通用性。文献中提出的大多数DNA-BP方法仅在一个(很少两个)数据集/任务上得到验证。在这项工作中,作者报告了我们的通用DNA-BP系统在代表不同DNA-BP分类任务的四个数据集上的性能。所提出的最佳分类器系统的优异结果证明了所提出方法的强大功能。这些结果现在可以用于该领域其他研究人员的基线比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Computing and Informatics
Applied Computing and Informatics Computer Science-Information Systems
CiteScore
12.20
自引率
0.00%
发文量
0
审稿时长
39 weeks
期刊介绍: Applied Computing and Informatics aims to be timely in disseminating leading-edge knowledge to researchers, practitioners and academics whose interest is in the latest developments in applied computing and information systems concepts, strategies, practices, tools and technologies. In particular, the journal encourages research studies that have significant contributions to make to the continuous development and improvement of IT practices in the Kingdom of Saudi Arabia and other countries. By doing so, the journal attempts to bridge the gap between the academic and industrial community, and therefore, welcomes theoretically grounded, methodologically sound research studies that address various IT-related problems and innovations of an applied nature. The journal will serve as a forum for practitioners, researchers, managers and IT policy makers to share their knowledge and experience in the design, development, implementation, management and evaluation of various IT applications. Contributions may deal with, but are not limited to: • Internet and E-Commerce Architecture, Infrastructure, Models, Deployment Strategies and Methodologies. • E-Business and E-Government Adoption. • Mobile Commerce and their Applications. • Applied Telecommunication Networks. • Software Engineering Approaches, Methodologies, Techniques, and Tools. • Applied Data Mining and Warehousing. • Information Strategic Planning and Recourse Management. • Applied Wireless Computing. • Enterprise Resource Planning Systems. • IT Education. • Societal, Cultural, and Ethical Issues of IT. • Policy, Legal and Global Issues of IT. • Enterprise Database Technology.
期刊最新文献
Brain tumor classification using ResNet50-convolutional block attention module Automatic measurement of cardiothoracic ratio in chest x-ray images with ProGAN-generated dataset Cyber threat: its origins and consequence and the use of qualitative and quantitative methods in cyber risk assessment Detecting and staging diabetic retinopathy in retinal images using multi-branch CNN A lightweight deep learning approach to mouth segmentation in color images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1