This paper proposes an improvement of a recently proposed semantic-based crossover, Semantic Similarity-based Crossover (SSC). The new crossover, called the Most Semantic Similarity-based Crossover (MSSC), is tested with Genetic Programming (GP) on a real world problem, as in predicting the tide in Venice Lagoon, Italy. The results are compared with GP using Standard Crossover (SC) and GP using validation sets. The comparative results show that while using validation sets give only limited effect, using semantic-based crossovers, especially MSSC, remarkably improve the ability of GP to predict time series for the tested problem. Further analysis on GP code bloat helps to explain the reason behind this superiority of MSSC.
{"title":"Predicting the Tide with Genetic Programming and Semantic-based Crossovers","authors":"Nguyen Quang Uy, M. O’Neill, N. X. Hoai","doi":"10.1109/KSE.2010.7","DOIUrl":"https://doi.org/10.1109/KSE.2010.7","url":null,"abstract":"This paper proposes an improvement of a recently proposed semantic-based crossover, Semantic Similarity-based Crossover (SSC). The new crossover, called the Most Semantic Similarity-based Crossover (MSSC), is tested with Genetic Programming (GP) on a real world problem, as in predicting the tide in Venice Lagoon, Italy. The results are compared with GP using Standard Crossover (SC) and GP using validation sets. The comparative results show that while using validation sets give only limited effect, using semantic-based crossovers, especially MSSC, remarkably improve the ability of GP to predict time series for the tested problem. Further analysis on GP code bloat helps to explain the reason behind this superiority of MSSC.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129739765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ontology integration is an important task which needs to be performed when several information systems share or exchange knowledge. We consider that most of previous methods of ontology integration merely involve blind or exhaustive matching among all concepts belonging to different ontologies. Therefore, semantic mismatches, logical inconsistencies and conceptual conflicts between ontologies are unavoidable. Additionally, the computational complexity increases rapidly in integrating large ontologies. This research aims at investigating an effective methodology for ontology integration, in which, a propagating inconsistency algorithm has proposed to reduce complexity and mismatching in ontology integration. A reconciled algorithm is suggested to generate a best representation from conflict concepts. In evaluation, we compare our complexity of the algorithms and accuracy of the results with previous approaches’.
{"title":"An Effective Method for Ontology Integration by Propagating Inconsistency","authors":"Trong Hai Duong, Sang-Jin Cha, Geun-Sik Jo","doi":"10.1109/KSE.2010.36","DOIUrl":"https://doi.org/10.1109/KSE.2010.36","url":null,"abstract":"Ontology integration is an important task which needs to be performed when several information systems share or exchange knowledge. We consider that most of previous methods of ontology integration merely involve blind or exhaustive matching among all concepts belonging to different ontologies. Therefore, semantic mismatches, logical inconsistencies and conceptual conflicts between ontologies are unavoidable. Additionally, the computational complexity increases rapidly in integrating large ontologies. This research aims at investigating an effective methodology for ontology integration, in which, a propagating inconsistency algorithm has proposed to reduce complexity and mismatching in ontology integration. A reconciled algorithm is suggested to generate a best representation from conflict concepts. In evaluation, we compare our complexity of the algorithms and accuracy of the results with previous approaches’.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"51 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114091294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a challenge to find out a suitable algorithm for broadcasting information securely and authentically to only target users. Many schemes based on public and symmetric key cryptography have been investigated. However, modeling an efficient scheme that provides both confidentiality and public cipher text authenticity is still an open problem. In this paper, we present an identity-based broadcast signcryption scheme with short cipher text size and public cipher text authenticity. The security of this scheme is proved under computational assumptions and in the random oracle model. Experimental results are also provided and compared with several schemes in both computation and communication cost.
{"title":"An Efficient Identity-Based Broadcast Signcryption Scheme","authors":"Dang Thu Hien, T. N. Tien, Truong Thi Thu Hien","doi":"10.1109/KSE.2010.17","DOIUrl":"https://doi.org/10.1109/KSE.2010.17","url":null,"abstract":"It is a challenge to find out a suitable algorithm for broadcasting information securely and authentically to only target users. Many schemes based on public and symmetric key cryptography have been investigated. However, modeling an efficient scheme that provides both confidentiality and public cipher text authenticity is still an open problem. In this paper, we present an identity-based broadcast signcryption scheme with short cipher text size and public cipher text authenticity. The security of this scheme is proved under computational assumptions and in the random oracle model. Experimental results are also provided and compared with several schemes in both computation and communication cost.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"2674 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124910904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A focused crawler traverses the web selecting out relevant pages according to a predefined topic. While browsing the internet it is difficult to identify relevant pages and predict which links lead to high quality pages. In this paper, we propose a crawler system using genetic algorithm to improve its crawling performance. Apart from estimating the best path to follow, our system also expands its initial keywords by using genetic algorithm during the crawling process. To crawl Vietnamese web pages, we apply a hybrid word segmentation approach which consists of combining automata and part of speech tagging techniques for the Vietnamese text classifier. We experiment our algorithm on Vietnamese websites. Experimental results are reported to show the efficiency of our system.
{"title":"Crawl Topical Vietnamese Web Pages Using Genetic Algorithm","authors":"Nguyen Quoc Nhan, Vu Tuan Son, Huynh Thi Thanh Binh, Tran Duc Khanh","doi":"10.1109/KSE.2010.25","DOIUrl":"https://doi.org/10.1109/KSE.2010.25","url":null,"abstract":"A focused crawler traverses the web selecting out relevant pages according to a predefined topic. While browsing the internet it is difficult to identify relevant pages and predict which links lead to high quality pages. In this paper, we propose a crawler system using genetic algorithm to improve its crawling performance. Apart from estimating the best path to follow, our system also expands its initial keywords by using genetic algorithm during the crawling process. To crawl Vietnamese web pages, we apply a hybrid word segmentation approach which consists of combining automata and part of speech tagging techniques for the Vietnamese text classifier. We experiment our algorithm on Vietnamese websites. Experimental results are reported to show the efficiency of our system.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128685258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A solution for integrating CAM (Computer - Aided Manufacturing) systems into multi-axis nutating table CNC (Computerized Numerical Control ) machines is presented in this paper. The strategy of the method is to build a CL data processing algorithm. Thus, the CL data in ISO format produced by every CAMs can be transformed and translated into G-codes files (Numerical Control files) for controlling CNC machines. An implementation of the integration and real tests performing on industrial 5-axis DMU 50e CNC machine at Dong Anh Mechanical Comp. are carried out to verify the research results.
{"title":"Integration of CAM Systems into Multi-axes Computerized Numerical Control Machines","authors":"C. My","doi":"10.1109/KSE.2010.30","DOIUrl":"https://doi.org/10.1109/KSE.2010.30","url":null,"abstract":"A solution for integrating CAM (Computer - Aided Manufacturing) systems into multi-axis nutating table CNC (Computerized Numerical Control ) machines is presented in this paper. The strategy of the method is to build a CL data processing algorithm. Thus, the CL data in ISO format produced by every CAMs can be transformed and translated into G-codes files (Numerical Control files) for controlling CNC machines. An implementation of the integration and real tests performing on industrial 5-axis DMU 50e CNC machine at Dong Anh Mechanical Comp. are carried out to verify the research results.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133033381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a semi-supervised learning method for Vietnamese part of speech tagging. We take into account two powerful tagging models including Conditional Random Fields (CRFs)and the Guided Online-Learning models (GLs) as base learning models. We then propose a semi-supervised learning tagging model for both CRFs and GLs methods. The main idea is to use of a word-cluster model as an associate source for enrich the feature space of discriminate learning models for both training and decoding processes. Experimental results on Vietnamese Tree-bank data (VTB) showed that the proposed method is effective. Our best model achieved accuracy of 94.10% when tested on VTB, and 92.60% an independent test.
{"title":"A Semi-supervised Learning Method for Vietnamese Part-of-Speech Tagging","authors":"Le-Minh Nguyen, Bach Ngo Xuan, C. Viet, Pham Quang Nhat Minh, Akira Shimazu","doi":"10.1109/KSE.2010.35","DOIUrl":"https://doi.org/10.1109/KSE.2010.35","url":null,"abstract":"This paper presents a semi-supervised learning method for Vietnamese part of speech tagging. We take into account two powerful tagging models including Conditional Random Fields (CRFs)and the Guided Online-Learning models (GLs) as base learning models. We then propose a semi-supervised learning tagging model for both CRFs and GLs methods. The main idea is to use of a word-cluster model as an associate source for enrich the feature space of discriminate learning models for both training and decoding processes. Experimental results on Vietnamese Tree-bank data (VTB) showed that the proposed method is effective. Our best model achieved accuracy of 94.10% when tested on VTB, and 92.60% an independent test.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128583797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated cell counting is a required task which helps examiners in evaluating blood smears. A problem is that clumped cells usually appear in images with various degree of overlapping. This study presents a new method for effectively splitting clumped cells using value in distance transform of image to quickly detect central point. Additionally, a boundary-covering degree of each point is applied to select the best fit points. Another way to cell size estimation based on single cell extraction is also employed. With results from average cell size, central points with their boundary-covering degree, over-lapping cells in the image can be split correctly and rapidly. The robustness and effectiveness of our method have been assessed through the comparison with more than 400 images labeled manually by experts and exhibiting various clumped cell. As the result, the F-measure generally reaches 93.5% and more than 82% clumped cells can be tolerated in the condition of non-distorted shape and well-focused images.
{"title":"A New Method for Splitting Clumped Cells in Red Blood Images","authors":"Ngoc-Tung Nguyen, A. Duong, Hai-Quan Vu","doi":"10.1109/KSE.2010.27","DOIUrl":"https://doi.org/10.1109/KSE.2010.27","url":null,"abstract":"Automated cell counting is a required task which helps examiners in evaluating blood smears. A problem is that clumped cells usually appear in images with various degree of overlapping. This study presents a new method for effectively splitting clumped cells using value in distance transform of image to quickly detect central point. Additionally, a boundary-covering degree of each point is applied to select the best fit points. Another way to cell size estimation based on single cell extraction is also employed. With results from average cell size, central points with their boundary-covering degree, over-lapping cells in the image can be split correctly and rapidly. The robustness and effectiveness of our method have been assessed through the comparison with more than 400 images labeled manually by experts and exhibiting various clumped cell. As the result, the F-measure generally reaches 93.5% and more than 82% clumped cells can be tolerated in the condition of non-distorted shape and well-focused images.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123413655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Sender Policy Framework (SPF) is an open standard specifying a technical method to prevent sender address forgery. This technique requires network administrators to create SPF records for their domains. A philosophic issue, which may limit the deployment of SPF, is that in order to use SPF, a network administrator needs to configure local DNS, but others, not himself, will take benefits from that configuration. Therefore, we proposed the Dynamic Sender Policy Framework (DSPF) approach, in which, the legal IP addresses of servers which send emails are collected and provided by a third-party. The database of SPF records can be updated automatically and can also be used among other email servers and email gateways. Using DSPF, clients may check the SPF records without any extra configuration of their DNS. Results showed that the system is able to filter 98% spam and 100% phishing. Collecting and updating processes of the database are described. Factors that influence database’s performance are discussed.
{"title":"Spam Filter Based on Dynamic Sender Policy Framework","authors":"N. Anh, T. Q. Anh, Nguyen Thang","doi":"10.1109/KSE.2010.11","DOIUrl":"https://doi.org/10.1109/KSE.2010.11","url":null,"abstract":"The Sender Policy Framework (SPF) is an open standard specifying a technical method to prevent sender address forgery. This technique requires network administrators to create SPF records for their domains. A philosophic issue, which may limit the deployment of SPF, is that in order to use SPF, a network administrator needs to configure local DNS, but others, not himself, will take benefits from that configuration. Therefore, we proposed the Dynamic Sender Policy Framework (DSPF) approach, in which, the legal IP addresses of servers which send emails are collected and provided by a third-party. The database of SPF records can be updated automatically and can also be used among other email servers and email gateways. Using DSPF, clients may check the SPF records without any extra configuration of their DNS. Results showed that the system is able to filter 98% spam and 100% phishing. Collecting and updating processes of the database are described. Factors that influence database’s performance are discussed.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124088692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Within the context of privacy preserving data mining, several solutions for privacy-preserving classification rules learning such as association rules mining have been proposed. Each solution was provided for horizontally or vertically distributed scenario. The aim of this work is to study privacy-preserving classification rules learning in two-dimension distributed data, which is a generalisation of both horizontally and vertically distributed data. In this paper, we develop a cryptographic solution for classification rules learning methods. The crucial step in the proposed solution is the privacy-preserving computation of frequencies of a tuple of values, which can ensure each participant's privacy without loss of accuracy. We illustrate the applicability of the method by using it to build the privacy preserving protocol for association rules mining and ID3 decision tree learning
{"title":"Privacy Preserving Classification in Two-Dimension Distributed Data","authors":"Luong The Dung, H. Bao, Nguyễn Thế Bình, T. Hoang","doi":"10.1109/KSE.2010.38","DOIUrl":"https://doi.org/10.1109/KSE.2010.38","url":null,"abstract":"Within the context of privacy preserving data mining, several solutions for privacy-preserving classification rules learning such as association rules mining have been proposed. Each solution was provided for horizontally or vertically distributed scenario. The aim of this work is to study privacy-preserving classification rules learning in two-dimension distributed data, which is a generalisation of both horizontally and vertically distributed data. In this paper, we develop a cryptographic solution for classification rules learning methods. The crucial step in the proposed solution is the privacy-preserving computation of frequencies of a tuple of values, which can ensure each participant's privacy without loss of accuracy. We illustrate the applicability of the method by using it to build the privacy preserving protocol for association rules mining and ID3 decision tree learning","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a method for specification of concurrency and timing constraints of real-time systems. The key idea of the proposed method is to extend the Mazurkiewicz Traces with time in order to capture the concurrency and timing constraints among the services of systems. The method is formal, effective for abstracting and supporting automated checking.
{"title":"An Extension of Mazukiewicz Traces and their Applications in Specification of Real-Time Systems","authors":"Do Van Chieu, D. Hung","doi":"10.1109/KSE.2010.39","DOIUrl":"https://doi.org/10.1109/KSE.2010.39","url":null,"abstract":"This paper proposes a method for specification of concurrency and timing constraints of real-time systems. The key idea of the proposed method is to extend the Mazurkiewicz Traces with time in order to capture the concurrency and timing constraints among the services of systems. The method is formal, effective for abstracting and supporting automated checking.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123155952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}