Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.203
Yingzhe Ma, Eun-Man Choi
Capture-and-replay technique is a common automatic method for GUI testing. Testing applications on Android platform cannot use directly capture-and-replay technique due to the testing framework which is already set up and technical supported by Google and lack of automatic linking GUI elements to actions handling widget events. Without capture-and-replay testing tools testers must design and implement testing scenarios according to the specification, and make linking every GUI elements to event handling parts all by hand. This paper proposes a more improved and optimized approach than common capture-and-replay technique for automatic testing Android GUI widgets. XML is applied to extract GUI elements from applications based on tracing the actions to handle widget events. After tracing click events using monitoring in capture phase test cases will be created by communicating status of activated widget in replay phase with API events.
{"title":"Functional Test Automation for Android GUI Widgets Using XML","authors":"Yingzhe Ma, Eun-Man Choi","doi":"10.3745/KIPSTD.2012.19D.2.203","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.203","url":null,"abstract":"Capture-and-replay technique is a common automatic method for GUI testing. Testing applications on Android platform cannot use directly capture-and-replay technique due to the testing framework which is already set up and technical supported by Google and lack of automatic linking GUI elements to actions handling widget events. Without capture-and-replay testing tools testers must design and implement testing scenarios according to the specification, and make linking every GUI elements to event handling parts all by hand. This paper proposes a more improved and optimized approach than common capture-and-replay technique for automatic testing Android GUI widgets. XML is applied to extract GUI elements from applications based on tracing the actions to handle widget events. After tracing click events using monitoring in capture phase test cases will be created by communicating status of activated widget in replay phase with API events.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127973460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.133
Taeseok Lee, Do-Heon Jeong, Young-Su Moon, Minsoo Park, mi-hwan Hyun
Queries entered in a search box are the results of users' activities to actively seek information. Therefore, search logs are important data which represent users' information needs. The purpose of this study is to examine if there is a relationship between the results of queries automatically classified and the categories of documents accessed. Search sessions were identified in 2009 NDSL(National Discovery for Science Leaders) log dataset of KISTI (Korea Institute of Science and Technology Information). Queries and items used were extracted by session. The queries were processed using an automatic classifier. The identified queries were then compared with the subject categories of items used. As a result, it was found that the average similarity was 58.8% for the automatic classification of the top 100 queries. Interestingly, this result is a numerical value lower than 76.8%, the result of search evaluated by experts. The reason for this difference explains that the terms used as queries are newly emerging as those of concern in other fields of research.
{"title":"An Analytic Study on the Categorization of Query through Automatic Term Classification","authors":"Taeseok Lee, Do-Heon Jeong, Young-Su Moon, Minsoo Park, mi-hwan Hyun","doi":"10.3745/KIPSTD.2012.19D.2.133","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.133","url":null,"abstract":"Queries entered in a search box are the results of users' activities to actively seek information. Therefore, search logs are important data which represent users' information needs. The purpose of this study is to examine if there is a relationship between the results of queries automatically classified and the categories of documents accessed. Search sessions were identified in 2009 NDSL(National Discovery for Science Leaders) log dataset of KISTI (Korea Institute of Science and Technology Information). Queries and items used were extracted by session. The queries were processed using an automatic classifier. The identified queries were then compared with the subject categories of items used. As a result, it was found that the average similarity was 58.8% for the automatic classification of the top 100 queries. Interestingly, this result is a numerical value lower than 76.8%, the result of search evaluated by experts. The reason for this difference explains that the terms used as queries are newly emerging as those of concern in other fields of research.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129422451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.187
M. Kim, H. La, Soo Dong Kim
As a representative reuse paradigm, the theme of service-oriented Paradigm (SOC) is largely centered on publishing and subscribing reusable services. Here, SOC is the term including service oriented architecture and cloud computing. Service providers can produce high profits with reusable services, and service consumers can develop their applications with less time and effort by reusing the services. Design Patterns (DP) is a set of reusable methods to resolve commonly occurring design problems and to provide design structures to deal with the problems by following open/close princples. However, since DPs are mainly proposed for building object-oriented systems and there are distinguishable differences between object-oriented paradigm and SOC, it is challenging to apply the DPs to SOC design problems. Hence, DPs need to be customized by considering the two aspects; for service providers to design services which are highly reusable and reflect their unique characteristics and for service consumers to develop their target applications by reusing and customizing services as soon as possible. Therefore, we propose a set of DPs that are customized to SOC. With the proposed DPs, we believe that service provider can effectively develop highly reusable services, and service consumers can efficiently adapt services for their applications.
{"title":"Methods to Apply GoF Design Patterns in Service-Oriented Computing","authors":"M. Kim, H. La, Soo Dong Kim","doi":"10.3745/KIPSTD.2012.19D.2.187","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.187","url":null,"abstract":"As a representative reuse paradigm, the theme of service-oriented Paradigm (SOC) is largely centered on publishing and subscribing reusable services. Here, SOC is the term including service oriented architecture and cloud computing. Service providers can produce high profits with reusable services, and service consumers can develop their applications with less time and effort by reusing the services. Design Patterns (DP) is a set of reusable methods to resolve commonly occurring design problems and to provide design structures to deal with the problems by following open/close princples. However, since DPs are mainly proposed for building object-oriented systems and there are distinguishable differences between object-oriented paradigm and SOC, it is challenging to apply the DPs to SOC design problems. Hence, DPs need to be customized by considering the two aspects; for service providers to design services which are highly reusable and reflect their unique characteristics and for service consumers to develop their target applications by reusing and customizing services as soon as possible. Therefore, we propose a set of DPs that are customized to SOC. With the proposed DPs, we believe that service provider can effectively develop highly reusable services, and service consumers can efficiently adapt services for their applications.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123355542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.151
Hyun-Hwa Choi, Kyuchul Lee
Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.
大多数分布式高维索引结构都能提供合理的搜索性能,尤其是在数据集均匀分布的情况下。然而,如果数据集是聚类或倾斜的,搜索性能就会比均匀分布的数据集逐渐下降。我们提出了一种改善基于强聚类或倾斜数据集的分布式向量近似树的 k 近邻搜索性能的方法。其基本思想是计算分布式向量近似树顶树上叶节点的体积,并为其分配不同的比特数,以确保向量近似的识别性能。换句话说,可以为高密度簇分配更多比特。我们使用合成数据集和真实数据集进行了实验,比较了分布式混合溢出树和分布式矢量近似树的搜索性能。实验结果表明,对于强聚类或倾斜的数据集,我们提出的方案与分布式向量近似树的性能改善效果一致。
{"title":"Performance Enhancement of a DVA-tree by the Independent Vector Approximation","authors":"Hyun-Hwa Choi, Kyuchul Lee","doi":"10.3745/KIPSTD.2012.19D.2.151","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.151","url":null,"abstract":"Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126545735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.127
Jong Soo Park
In the Seoul metropolitan bus network, it may be necessary for a bus passenger to pick up a parcel, which has been purchased through e-commerce, at his or her convenient bus stop on the way to home or office. The flow-capturing location-allocation model can be applied to select pickup points for such bus stops so that they maximize the captured passenger flows, where each passenger flow represents an origin-destination (O-D) pair of a passenger trip. In this paper, we propose a fast heuristic algorithm to select pickup points using a large O-D matrix, which has been extracted from five million transportation card transactions. The experimental results demonstrate the bus stops chosen as pickup points in terms of passenger flow and capture ratio, and illustrate the spatial distribution of the top 20 pickup points on a map.
{"title":"Application of the Flow-Capturing Location-Allocation Model to the Seoul Metropolitan Bus Network for Selecting Pickup Points","authors":"Jong Soo Park","doi":"10.3745/KIPSTD.2012.19D.2.127","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.127","url":null,"abstract":"In the Seoul metropolitan bus network, it may be necessary for a bus passenger to pick up a parcel, which has been purchased through e-commerce, at his or her convenient bus stop on the way to home or office. The flow-capturing location-allocation model can be applied to select pickup points for such bus stops so that they maximize the captured passenger flows, where each passenger flow represents an origin-destination (O-D) pair of a passenger trip. In this paper, we propose a fast heuristic algorithm to select pickup points using a large O-D matrix, which has been extracted from five million transportation card transactions. The experimental results demonstrate the bus stops chosen as pickup points in terms of passenger flow and capture ratio, and illustrate the spatial distribution of the top 20 pickup points on a map.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130069804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.147
C. Park
While LDA is a supervised dimension reduction method which finds projective directions to maximize separability between classes, the performance of LDA is severely degraded when the number of labeled data is small. Recently semi-supervised dimension reduction methods have been proposed which utilize abundant unlabeled data and overcome the shortage of labeled data. However, matrix computation usually used in statistical dimension reduction methods becomes hindrance to make the utilization of a large number of unlabeled data difficult, and moreover too much information from unlabeled data may not so helpful compared to the increase of its processing time. In order to solve these problems, we propose an ensemble approach for semi-supervised dimension reduction. Extensive experimental results in text classification demonstrates the effectiveness of the proposed method.
{"title":"A Semi-supervised Dimension Reduction Method Using Ensemble Approach","authors":"C. Park","doi":"10.3745/KIPSTD.2012.19D.2.147","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.147","url":null,"abstract":"While LDA is a supervised dimension reduction method which finds projective directions to maximize separability between classes, the performance of LDA is severely degraded when the number of labeled data is small. Recently semi-supervised dimension reduction methods have been proposed which utilize abundant unlabeled data and overcome the shortage of labeled data. However, matrix computation usually used in statistical dimension reduction methods becomes hindrance to make the utilization of a large number of unlabeled data difficult, and moreover too much information from unlabeled data may not so helpful compared to the increase of its processing time. In order to solve these problems, we propose an ensemble approach for semi-supervised dimension reduction. Extensive experimental results in text classification demonstrates the effectiveness of the proposed method.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-30DOI: 10.3745/KIPSTD.2012.19D.2.139
Bok-Il Seo, Jae-In Kim, Bu-Hyun Hwang
Data Mining is widely used to discover knowledge in many fields. Although there are many methods to discover association rule, most of them are based on frequency-based approaches. Therefore it is not appropriate for stream environment. Because the stream environment has a property that event data are generated continuously. it is expensive to store all data. In this paper, we propose a new method to discover association rules based on stream environment. Our new method is using a variable window for extracting data items. Variable windows have variable size according to the gap of same target event. Our method extracts data using COBJ(Count object) calculation method. FPMDSTN(Frequent pattern Mining over Data Stream using Terminal Node) discovers association rules from the extracted data items. Through experiment, our method is more efficient to apply stream environment than conventional methods.
{"title":"A Method for Frequent Itemsets Mining from Data Stream","authors":"Bok-Il Seo, Jae-In Kim, Bu-Hyun Hwang","doi":"10.3745/KIPSTD.2012.19D.2.139","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.2.139","url":null,"abstract":"Data Mining is widely used to discover knowledge in many fields. Although there are many methods to discover association rule, most of them are based on frequency-based approaches. Therefore it is not appropriate for stream environment. Because the stream environment has a property that event data are generated continuously. it is expensive to store all data. In this paper, we propose a new method to discover association rules based on stream environment. Our new method is using a variable window for extracting data items. Variable windows have variable size according to the gap of same target event. Our method extracts data using COBJ(Count object) calculation method. FPMDSTN(Frequent pattern Mining over Data Stream using Terminal Node) discovers association rules from the extracted data items. Through experiment, our method is more efficient to apply stream environment than conventional methods.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126104618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-03-01DOI: 10.3745/KIPSTD.2012.19D.3.247
Chang-Soo Lee, Sang-Kyoon Park, Jaehong Ahn
It`s not easy for medium or large sized libraries to effectively manage their vast array of books and media data. Recently, in place of magnetic stripes and barcodes, RFID technology has been applied on a small scale to simple book management and theft-prevention initiatives. The development of RFID and USN applied systems and technology has led to RFID and USN being used in a diverse range of industrial fields, including book management systems in libraries. Using the aforementioned technology, the intelligent book management system suggested in this thesis can provide a more practical, effective, content-rich and convenient book management system.
{"title":"Intelligent Library Management System using RFID and USN","authors":"Chang-Soo Lee, Sang-Kyoon Park, Jaehong Ahn","doi":"10.3745/KIPSTD.2012.19D.3.247","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.3.247","url":null,"abstract":"It`s not easy for medium or large sized libraries to effectively manage their vast array of books and media data. Recently, in place of magnetic stripes and barcodes, RFID technology has been applied on a small scale to simple book management and theft-prevention initiatives. The development of RFID and USN applied systems and technology has led to RFID and USN being used in a diverse range of industrial fields, including book management systems in libraries. Using the aforementioned technology, the intelligent book management system suggested in this thesis can provide a more practical, effective, content-rich and convenient book management system.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-02-29DOI: 10.3745/KIPSTD.2012.19D.1.113
Yun-Jung Lee, In-Jun Jung, G. Woo
Today, Internet users can easily create and share the digital contents with others through various online content sharing services such as YouTube. So, many portal sites are flooded with lots of user created contents (UCC) in various media such as texts and videos. Estimating popularity of UCC is a crucial concern to both users and the site administrators. This paper proposes a method to predict the popularity of Internet articles, a kind of UCC, using the dynamics of the online contents themselves. To analyze the dynamics, we regarded the access counts of Internet posts as the popularity of them and analyzed the variation of the access counts. We derived a model to predict the popularity of a post represented by the time series of access counts, which is based on an exponential function. According to the experimental results, the difference between the actual access counts and the predicted ones is not more than 10 for 20,532 posts, which cover about 90.7% of the test set.
{"title":"A Model to Predict Popularity of Internet Posts on Internet Forum Sites","authors":"Yun-Jung Lee, In-Jun Jung, G. Woo","doi":"10.3745/KIPSTD.2012.19D.1.113","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.1.113","url":null,"abstract":"Today, Internet users can easily create and share the digital contents with others through various online content sharing services such as YouTube. So, many portal sites are flooded with lots of user created contents (UCC) in various media such as texts and videos. Estimating popularity of UCC is a crucial concern to both users and the site administrators. This paper proposes a method to predict the popularity of Internet articles, a kind of UCC, using the dynamics of the online contents themselves. To analyze the dynamics, we regarded the access counts of Internet posts as the popularity of them and analyzed the variation of the access counts. We derived a model to predict the popularity of a post represented by the time series of access counts, which is based on an exponential function. According to the experimental results, the difference between the actual access counts and the predicted ones is not more than 10 for 20,532 posts, which cover about 90.7% of the test set.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130589921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-02-29DOI: 10.3745/KIPSTD.2012.19D.1.021
Sung-Hee Park, B. Hansen
Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.
{"title":"Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM","authors":"Sung-Hee Park, B. Hansen","doi":"10.3745/KIPSTD.2012.19D.1.021","DOIUrl":"https://doi.org/10.3745/KIPSTD.2012.19D.1.021","url":null,"abstract":"Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"63 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131410228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}