Many QoS-aware service selection approaches assume that the QoS attributes are crisp values and the actual user requirements are not taken into consideration, when the service-oriented applications are constructed. As a result, users searching result may not be correct and good, because there are uncertainties in the data and the optimal solutions but not satisfying some requirements may not be acceptable to some users. In this paper, we propose to use Fuzzy Set Theory (FST) and fuzzy genetic algorithm (FGA) for QoS-based service selection. FST is applied to specify the triangular fuzzy-valued description of the QoS properties. A FGA is proposed to solve the QoS-aware service composition problem, which considers the actual QoS requirements from users in the selection process. Empirical comparisons with two algorithms on different scales of composite service indicate that FGA is highly competitive regards to searching capability.
{"title":"Towards fuzzy QoS driven service selection with user requirements","authors":"Jiajun Xu, Lin Guo, Ruxia Zhang, Yin Zhang, Hualang Hu, Fei Wang, Zhiyuan Pei","doi":"10.1109/PIC.2017.8359548","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359548","url":null,"abstract":"Many QoS-aware service selection approaches assume that the QoS attributes are crisp values and the actual user requirements are not taken into consideration, when the service-oriented applications are constructed. As a result, users searching result may not be correct and good, because there are uncertainties in the data and the optimal solutions but not satisfying some requirements may not be acceptable to some users. In this paper, we propose to use Fuzzy Set Theory (FST) and fuzzy genetic algorithm (FGA) for QoS-based service selection. FST is applied to specify the triangular fuzzy-valued description of the QoS properties. A FGA is proposed to solve the QoS-aware service composition problem, which considers the actual QoS requirements from users in the selection process. Empirical comparisons with two algorithms on different scales of composite service indicate that FGA is highly competitive regards to searching capability.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359512
Guoqing Xia, Yao Shen, Qian-Xiang Lin
Word segmentation is in most cases a base for text analysis and absolutely vital to the accuracy of subsequent natural language processing (NLP) tasks. While word segmentation for normal text has been intensively studied and quite a few algorithms have been proposed, these algorithms however do not work well in special fields, e.g., in clinical text analysis. Besides, most state-of-the-art methods have difficulties in identifying out-of-vocabulary (OOV) words. For these two reasons, in this paper, we propose a semi-supervised CRF (semi-CRF) algorithm for Chinese clinical text word segmentation. Semi-CRF is implemented by modifying the learning objective so as to adapt for partial labeled data. Training data are obtained by applying a bidirectional lexicon matching scheme. A modified Viterbi algorithm using lexicon matching scheme is also proposed for word segmentation on raw sentences. Experiments show that our model has a precision of 93.88% on test data and outperforms two popular open source Chinese word segmentation tools i.e., HanLP and THULAC. By using lexicon, our model is able to be adapted for other domain text word segmentation.
{"title":"Lexicon-based semi-CRF for Chinese clinical text word segmentation","authors":"Guoqing Xia, Yao Shen, Qian-Xiang Lin","doi":"10.1109/PIC.2017.8359512","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359512","url":null,"abstract":"Word segmentation is in most cases a base for text analysis and absolutely vital to the accuracy of subsequent natural language processing (NLP) tasks. While word segmentation for normal text has been intensively studied and quite a few algorithms have been proposed, these algorithms however do not work well in special fields, e.g., in clinical text analysis. Besides, most state-of-the-art methods have difficulties in identifying out-of-vocabulary (OOV) words. For these two reasons, in this paper, we propose a semi-supervised CRF (semi-CRF) algorithm for Chinese clinical text word segmentation. Semi-CRF is implemented by modifying the learning objective so as to adapt for partial labeled data. Training data are obtained by applying a bidirectional lexicon matching scheme. A modified Viterbi algorithm using lexicon matching scheme is also proposed for word segmentation on raw sentences. Experiments show that our model has a precision of 93.88% on test data and outperforms two popular open source Chinese word segmentation tools i.e., HanLP and THULAC. By using lexicon, our model is able to be adapted for other domain text word segmentation.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125916527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359537
Jingjin Guo, Lizhen Liu, Wei Song, Chao Du, Xinlei Zhao
As we all know, research continues in the areas of image classification development in computer vision. People eagerly hope to achieve a perfect classification accuracy, however, many results of these experiments are less than satisfactory because of many complex factors. Therefore, in order to find these factors and improve the classification accuracy, we describe the details of classification methods with logistic regression and support vector machine algorithm, then discuss the impact of different methods on classification results and the factors that affect the classification accuracy in one method. Before that we briefly introduce the image feature extraction which plays a necessary role of image classification.
{"title":"The study of image feature extraction and classification","authors":"Jingjin Guo, Lizhen Liu, Wei Song, Chao Du, Xinlei Zhao","doi":"10.1109/PIC.2017.8359537","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359537","url":null,"abstract":"As we all know, research continues in the areas of image classification development in computer vision. People eagerly hope to achieve a perfect classification accuracy, however, many results of these experiments are less than satisfactory because of many complex factors. Therefore, in order to find these factors and improve the classification accuracy, we describe the details of classification methods with logistic regression and support vector machine algorithm, then discuss the impact of different methods on classification results and the factors that affect the classification accuracy in one method. Before that we briefly introduce the image feature extraction which plays a necessary role of image classification.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"23 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359526
Xiu Lian-cun, Zheng Zhizhong, Che Chunxia, Gao Yang
In recent years, the near-infrared spectroscopy measurement technology in China has made considerable progress; it has been widely used in the fields of agriculture, chemistry, geology, and medicine measurement. The near-infrared spectroscopy is very sensitive to the characteristics of C-H (methyl, a methylene group, a methoxy group, a carboxyl group, an aryl group, etc.), hydroxy O-H, mercapto S-H, and amino N-H. So, it can distinguish the crystallinity of single mineral (clay minerals, chlorite, serpentine, etc.), containing hydroxy silicate minerals (epidote, amphibole, etc.), sulfate minerals (alunite, pyritepotassium alum, gypsum, etc.), and carbonate minerals (calcite, dolomite, etc.) in the layered silicate. The character of near-infrared spectroscopy is one of the important guarantees for the instrument development of the small portable near-infrared mineral analyzer, and for its fast, accurate identification of rock samples. In this paper, we focus on the near-infrared spectral characteristics, identification method, and quantitative analysis method for the low-temperature alteration minerals; at the same time, a portable near infrared spectrometer and its principle were presented. In the experiment, spectral parameters were acquired from the characteristic spectra of altered minerals, and the relationship between altered minerals and ore-forming were established. The measured spectral data from the drilling rock cores in Zijinshan Mine in Fujian province was used as an example to illustrate the credibility of the proposed method.
{"title":"Mineral identification and geological mapping using near-infrared spectroscopy analysis","authors":"Xiu Lian-cun, Zheng Zhizhong, Che Chunxia, Gao Yang","doi":"10.1109/PIC.2017.8359526","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359526","url":null,"abstract":"In recent years, the near-infrared spectroscopy measurement technology in China has made considerable progress; it has been widely used in the fields of agriculture, chemistry, geology, and medicine measurement. The near-infrared spectroscopy is very sensitive to the characteristics of C-H (methyl, a methylene group, a methoxy group, a carboxyl group, an aryl group, etc.), hydroxy O-H, mercapto S-H, and amino N-H. So, it can distinguish the crystallinity of single mineral (clay minerals, chlorite, serpentine, etc.), containing hydroxy silicate minerals (epidote, amphibole, etc.), sulfate minerals (alunite, pyritepotassium alum, gypsum, etc.), and carbonate minerals (calcite, dolomite, etc.) in the layered silicate. The character of near-infrared spectroscopy is one of the important guarantees for the instrument development of the small portable near-infrared mineral analyzer, and for its fast, accurate identification of rock samples. In this paper, we focus on the near-infrared spectral characteristics, identification method, and quantitative analysis method for the low-temperature alteration minerals; at the same time, a portable near infrared spectrometer and its principle were presented. In the experiment, spectral parameters were acquired from the characteristic spectra of altered minerals, and the relationship between altered minerals and ore-forming were established. The measured spectral data from the drilling rock cores in Zijinshan Mine in Fujian province was used as an example to illustrate the credibility of the proposed method.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132012909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359570
Cuilan Wang, Deji Wang
This paper analyzes the complexity of Internet innovation using scale-free network theory. The evolutionary model has also been numerically simulated. The conclusion is that the model has small-world and scale-free features. The small-world effect shows that there is a wide range of high efficiency Internet resource integration. Scale-free features indicate that a few core nodes become central. Therefore, we should not only see the characteristics to enhance the positive effect of innovation performance, but also avoid the risk of lock-in and vulnerability.
{"title":"Research on the complexity of technological innovation supported by Internet","authors":"Cuilan Wang, Deji Wang","doi":"10.1109/PIC.2017.8359570","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359570","url":null,"abstract":"This paper analyzes the complexity of Internet innovation using scale-free network theory. The evolutionary model has also been numerically simulated. The conclusion is that the model has small-world and scale-free features. The small-world effect shows that there is a wide range of high efficiency Internet resource integration. Scale-free features indicate that a few core nodes become central. Therefore, we should not only see the characteristics to enhance the positive effect of innovation performance, but also avoid the risk of lock-in and vulnerability.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134354441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359583
Abshukirov Zhandos, Jian Guo
This paper presents about Combined Cycle Power Plant (CCPP) and decision tree. CCPP considered as the best effective power suppliers to the large temperature incline between its gas turbine passage and the environment or the cooling process, and to help of their engineers, who are able to optimally venture the present temperature level. Moreover, in this paper we did comparison of four types of decision tree algorithms like Decision Stump, Hoeffding Tree, logistic model trees (LMT) and J48. Based on these algorithms we analyzed the behavior of Combined Cycle Power Plant (CCPP), particularly its temperature. The temperature is target variable among other variables. This is why temperature was divided three classes. Theoretical analysis and experimental results have shown that the J48 algorithm is the best algorithm which predict attributes of given instances precisely among other three algorithms. Based on our findings, J48 algorithm can predict precisely the temperature of Combined Cycle Power Plant and predict 97.1676% cases correctly.
本文介绍了联合循环电厂(CCPP)及其决策树。CCPP被认为是其燃气轮机通道与环境或冷却过程之间的大温度倾斜的最佳有效电源供应商,并帮助他们的工程师,他们能够最佳地冒险当前的温度水平。此外,本文还对decision Stump、Hoeffding tree、logistic model trees (LMT)和J48四种决策树算法进行了比较。在此基础上,对联合循环电厂的运行行为进行了分析,特别是对其温度进行了分析。温度是众多变量中的目标变量。这就是温度被分为三类的原因。理论分析和实验结果表明,在三种算法中,J48算法能准确地预测给定实例的属性,是最好的算法。研究结果表明,J48算法能准确预测联合循环电厂的温度,预测正确率为97.1676%。
{"title":"An approach based on decision tree for analysis of behavior with combined cycle power plant","authors":"Abshukirov Zhandos, Jian Guo","doi":"10.1109/PIC.2017.8359583","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359583","url":null,"abstract":"This paper presents about Combined Cycle Power Plant (CCPP) and decision tree. CCPP considered as the best effective power suppliers to the large temperature incline between its gas turbine passage and the environment or the cooling process, and to help of their engineers, who are able to optimally venture the present temperature level. Moreover, in this paper we did comparison of four types of decision tree algorithms like Decision Stump, Hoeffding Tree, logistic model trees (LMT) and J48. Based on these algorithms we analyzed the behavior of Combined Cycle Power Plant (CCPP), particularly its temperature. The temperature is target variable among other variables. This is why temperature was divided three classes. Theoretical analysis and experimental results have shown that the J48 algorithm is the best algorithm which predict attributes of given instances precisely among other three algorithms. Based on our findings, J48 algorithm can predict precisely the temperature of Combined Cycle Power Plant and predict 97.1676% cases correctly.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115381119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359565
Shuang Li, Shizune Takahashi, Keizo Yamada, Masanori Takagi, Jun Sasaki
As the number of foreign travelers to Japan rapidly increases, the Japanese government expects the economic growth of whole rural areas to improve via inbound tourism. However, some rural areas are difficult to access for foreign travelers because of unknown areas, language problems, and transportation inconvenience, among other factors. Thus, developing a tourism support system for foreigners' traveling in the rural areas is necessary. To investigate the most effective way to support foreign travelers, we propose a viewpoint that recommends tourism resources for foreign travelers according to differences in nationality and season. We confirm the feasibility of this system by analyzing social networking service photo data taken by foreign tourists. Finally, we propose a tourism information system considering several needs of foreign travelers.
{"title":"Analysis of SNS photo data taken by foreign tourists to Japan and a proposed adaptive tourism recommendation system","authors":"Shuang Li, Shizune Takahashi, Keizo Yamada, Masanori Takagi, Jun Sasaki","doi":"10.1109/PIC.2017.8359565","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359565","url":null,"abstract":"As the number of foreign travelers to Japan rapidly increases, the Japanese government expects the economic growth of whole rural areas to improve via inbound tourism. However, some rural areas are difficult to access for foreign travelers because of unknown areas, language problems, and transportation inconvenience, among other factors. Thus, developing a tourism support system for foreigners' traveling in the rural areas is necessary. To investigate the most effective way to support foreign travelers, we propose a viewpoint that recommends tourism resources for foreign travelers according to differences in nationality and season. We confirm the feasibility of this system by analyzing social networking service photo data taken by foreign tourists. Finally, we propose a tourism information system considering several needs of foreign travelers.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124669134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359550
Hengxun Li, Ning Wang, Guangjun Hu, Weiqing Yang
In the field of information retrieval, with the rapid growth of the amount of questions and answers, automatic question-answering system comes up to be a hot research direction, which consists of three procedures: question classification, information retrieval and answer extraction. Question classification is the first and most important part of the whole task. Currently, two kinds of algorithms are employed, rule-based algorithms and statistical-model-based algorithms. Rule-based algorithms have good performance in accuracy and pertinence with the shortcoming of relying on professional knowledge and poor scalability. Statistical-model-based algorithms get classification models from training dataset, these methods extract syntax features heuristically and provide better scalability and thus most question classification algorithms are based on statistical-model. However, semantic features have largely been overlooked in existing statistical-model-based question classification algorithms. In this paper, we propose a context-aware hybrid model based on a statistical-model PGM and a semantic language model word2vec. The experimental evaluations demonstrate the capability of the proposed model.
{"title":"PGM-WV: A context-aware hybrid model for heuristic and semantic question classification in question-answering system","authors":"Hengxun Li, Ning Wang, Guangjun Hu, Weiqing Yang","doi":"10.1109/PIC.2017.8359550","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359550","url":null,"abstract":"In the field of information retrieval, with the rapid growth of the amount of questions and answers, automatic question-answering system comes up to be a hot research direction, which consists of three procedures: question classification, information retrieval and answer extraction. Question classification is the first and most important part of the whole task. Currently, two kinds of algorithms are employed, rule-based algorithms and statistical-model-based algorithms. Rule-based algorithms have good performance in accuracy and pertinence with the shortcoming of relying on professional knowledge and poor scalability. Statistical-model-based algorithms get classification models from training dataset, these methods extract syntax features heuristically and provide better scalability and thus most question classification algorithms are based on statistical-model. However, semantic features have largely been overlooked in existing statistical-model-based question classification algorithms. In this paper, we propose a context-aware hybrid model based on a statistical-model PGM and a semantic language model word2vec. The experimental evaluations demonstrate the capability of the proposed model.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126363446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359586
Guimin Huang, Min Tan, Sirui Huang, Ruyu Mo, Ya Zhou
Many attempts on improving entity grid model have been made and boost the research on text coherence. However, only a few of them applied it to noisy data domain and students' essay. Thus, we proposed a novel discourse coherence model, which is based on entity grid model, to evaluate coherence on Chinese students' essay. In allusion to the feature of Chinese students' essay, frequently using lexical repetition and coreference methods in coherence essay, we designed a coreference module, instead of clustering algorithm or knowledge base search methods, to integrate with the enhanced coherence model. In addition, we fully merged coreference feature into similarity assessment of adjacent sentences, and the semantic coherence. Experiments show that our model outperforms entity-based model and LSA methods and has an ideal effect on students' essay automatic assessment.
{"title":"A discourse coherence model for analyzing Chinese students' essay","authors":"Guimin Huang, Min Tan, Sirui Huang, Ruyu Mo, Ya Zhou","doi":"10.1109/PIC.2017.8359586","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359586","url":null,"abstract":"Many attempts on improving entity grid model have been made and boost the research on text coherence. However, only a few of them applied it to noisy data domain and students' essay. Thus, we proposed a novel discourse coherence model, which is based on entity grid model, to evaluate coherence on Chinese students' essay. In allusion to the feature of Chinese students' essay, frequently using lexical repetition and coreference methods in coherence essay, we designed a coreference module, instead of clustering algorithm or knowledge base search methods, to integrate with the enhanced coherence model. In addition, we fully merged coreference feature into similarity assessment of adjacent sentences, and the semantic coherence. Experiments show that our model outperforms entity-based model and LSA methods and has an ideal effect on students' essay automatic assessment.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/PIC.2017.8359520
Liang Qiao, Dongqing Xie
Calcium ions (Ca2+) are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Accurately recognizing Ca2+-binding sites is of significant importance for protein function analysis. Although much progress has been made, challenges remain, especially in the post-genome era where large volume of proteins without being functional annotated are quickly accumulated. In this study, we design a new ab initio predictor, CaSite, to identify Ca2+-binding residues from protein sequence. CaSite first uses evolutionary information, predicted secondary structure, predicted solvent accessibility, and Jensen-Shannon divergence information to represent each residue sample feature. A mean ensemble classifier constructed based on support vector machines (SVM) from multiple random under-samplings is used as the final prediction model, which is effective for relieving the negative influence of the imbalance phenomenon between positive and negative training samples. Experimental results demonstrate that the proposed CaSite achieves a better prediction performance and outperforms the existing sequence-based predictor, Targets.
{"title":"Sequence-based protein-Ca2+ binding site prediction using SVM classifier ensemble with random under-sampling","authors":"Liang Qiao, Dongqing Xie","doi":"10.1109/PIC.2017.8359520","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359520","url":null,"abstract":"Calcium ions (Ca2+) are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Accurately recognizing Ca2+-binding sites is of significant importance for protein function analysis. Although much progress has been made, challenges remain, especially in the post-genome era where large volume of proteins without being functional annotated are quickly accumulated. In this study, we design a new ab initio predictor, CaSite, to identify Ca2+-binding residues from protein sequence. CaSite first uses evolutionary information, predicted secondary structure, predicted solvent accessibility, and Jensen-Shannon divergence information to represent each residue sample feature. A mean ensemble classifier constructed based on support vector machines (SVM) from multiple random under-samplings is used as the final prediction model, which is effective for relieving the negative influence of the imbalance phenomenon between positive and negative training samples. Experimental results demonstrate that the proposed CaSite achieves a better prediction performance and outperforms the existing sequence-based predictor, Targets.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}