Classification of cohesin family using class specific motifs

Ercument M. Eser, B. Arslan, U. Sezerman
{"title":"Classification of cohesin family using class specific motifs","authors":"Ercument M. Eser, B. Arslan, U. Sezerman","doi":"10.1109/HIBIT.2013.6661687","DOIUrl":null,"url":null,"abstract":"Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"259 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Symposium on Health Informatics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIBIT.2013.6661687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用类特定基序对黏结蛋白家族进行分类
从蛋白质序列中提取基序一直是生物信息学家面临的一项具有挑战性的任务。类特异性基序通常存在于一类中,但在其他类中所占比例较小,可用于蛋白质序列的高度精确分类。在这项研究中,我们提出了一种新的基于评分的方法,使用减少的氨基酸字母来选择特定类别的n-gram基序。内聚蛋白序列与Dockerin模块相互作用,构建最常见和最丰富的有机聚合物纤维素,用于类特异性基序选择,然后将选择的基序作为特征给予J48和SVM算法。分类结果用各种n-gram大小、减少的氨基酸字母和特征数的参数进行检验。结果表明,采用Gbmr14字母表、每族5个特征、4克图案和J48算法,训练正确率为98.61%,测试正确率为94.54%。该技术可推广应用于其他蛋白质家族。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ranking tandem mass spectra: And the impact of database size and scoring function on peptide spectrum matches Classification of cohesin family using class specific motifs Period-doubling route to chaos in shunting inhibitory cellular neural networks A genetic algorithm approach to active subnetwork search applied to GWAS data Use of open linked data in bioinformatics space: A case study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1