GPCR分类特征提取策略的比较分析

Safia Bekhouche, Yamina Mohamed Ben Ali
{"title":"GPCR分类特征提取策略的比较分析","authors":"Safia Bekhouche, Yamina Mohamed Ben Ali","doi":"10.1109/CATA.2018.8398676","DOIUrl":null,"url":null,"abstract":"Protein is an alphabetical sequence of amino acids, this form of sequence can never be processed by data mining and machine learning algorithms that are needed for numerical data. Feature extraction strategies are used to transform the alphabetical sequence into a feature vector representing the properties of this sequence. But each method produces an attributes vector of different size and properties to others. Our work aims to compare three most used feature extraction strategies that are AAC, PseAAC and DC using five selected machine learning algorithms deployed on weka platform, they are evaluated based on Accuracy, F-measure, MCC and error rate measures. This comparison helps us to decide what feature extraction strategy is best suited to work while applying computationally expensive selected machine learning algorithms on a protein sequence data. Experiments suggested that AAC, PseAAC and DC methods would be optimal on GPCR classification at sub sub-family level using MLP algorithm. While working with other classifiers would be optimal if we do not use a huge subset of data so a grand class number. Hence this study concludes that a better performance would be reached when a good classifier is established.","PeriodicalId":231024,"journal":{"name":"2018 4th International Conference on Computer and Technology Applications (ICCTA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparative analysis on features extraction strategies for GPCR classification\",\"authors\":\"Safia Bekhouche, Yamina Mohamed Ben Ali\",\"doi\":\"10.1109/CATA.2018.8398676\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein is an alphabetical sequence of amino acids, this form of sequence can never be processed by data mining and machine learning algorithms that are needed for numerical data. Feature extraction strategies are used to transform the alphabetical sequence into a feature vector representing the properties of this sequence. But each method produces an attributes vector of different size and properties to others. Our work aims to compare three most used feature extraction strategies that are AAC, PseAAC and DC using five selected machine learning algorithms deployed on weka platform, they are evaluated based on Accuracy, F-measure, MCC and error rate measures. This comparison helps us to decide what feature extraction strategy is best suited to work while applying computationally expensive selected machine learning algorithms on a protein sequence data. Experiments suggested that AAC, PseAAC and DC methods would be optimal on GPCR classification at sub sub-family level using MLP algorithm. While working with other classifiers would be optimal if we do not use a huge subset of data so a grand class number. Hence this study concludes that a better performance would be reached when a good classifier is established.\",\"PeriodicalId\":231024,\"journal\":{\"name\":\"2018 4th International Conference on Computer and Technology Applications (ICCTA)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 4th International Conference on Computer and Technology Applications (ICCTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CATA.2018.8398676\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Computer and Technology Applications (ICCTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CATA.2018.8398676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

蛋白质是按字母顺序排列的氨基酸,这种形式的序列永远无法通过数据挖掘和机器学习算法处理,而这些算法需要用于数字数据。特征提取策略用于将字母序列转换为表示该序列属性的特征向量。但是每种方法产生的属性向量的大小和属性都不同于其他方法。我们的工作旨在比较三种最常用的特征提取策略,即AAC, PseAAC和DC,使用部署在weka平台上的五种选择的机器学习算法,并基于准确性,F-measure, MCC和错误率指标对它们进行评估。这种比较有助于我们决定在对蛋白质序列数据应用计算昂贵的选择机器学习算法时,哪种特征提取策略最适合工作。实验表明,采用MLP算法,AAC、PseAAC和DC方法对亚亚家族水平的GPCR分类效果最佳。而使用其他分类器将是最佳的,如果我们不使用大量的数据子集,所以一个大的类数。因此,本研究得出结论,当建立一个好的分类器时,将达到更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparative analysis on features extraction strategies for GPCR classification
Protein is an alphabetical sequence of amino acids, this form of sequence can never be processed by data mining and machine learning algorithms that are needed for numerical data. Feature extraction strategies are used to transform the alphabetical sequence into a feature vector representing the properties of this sequence. But each method produces an attributes vector of different size and properties to others. Our work aims to compare three most used feature extraction strategies that are AAC, PseAAC and DC using five selected machine learning algorithms deployed on weka platform, they are evaluated based on Accuracy, F-measure, MCC and error rate measures. This comparison helps us to decide what feature extraction strategy is best suited to work while applying computationally expensive selected machine learning algorithms on a protein sequence data. Experiments suggested that AAC, PseAAC and DC methods would be optimal on GPCR classification at sub sub-family level using MLP algorithm. While working with other classifiers would be optimal if we do not use a huge subset of data so a grand class number. Hence this study concludes that a better performance would be reached when a good classifier is established.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification of pornographic content on Twitter using support vector machine and Naive Bayes Concepts extraction in ontology learning using language patterns for better accuracy Automatically generate a specific human computer interaction from an interface diagram model The Triple Helix Model: University-industry-governments linkage web-based application recommendation systems for emerging commercial-base research State of the art of telepresence with a virtual reality headset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1