The association rules mining has an very important impact in data mining. As the rapid growth of datasets, the required memory increase seriously and the operating efficiency declines rapidly. Cloud computing provides efficient and cheap solutions to analyze and implement the association rules mining algorithms in parallel. This paper proposes an improved association mining algorithm based on power set and MapReduce programming model, which can process massive datasets with a cluster of machines on Hadoop platform. The results of the numerical experiments show that the proposed algorithm can achieve higher efficiency in the association rules mining.
{"title":"An Improved Association Rules Mining Algorithm Based on Power Set and Hadoop","authors":"W. Mao, Weibin Guo","doi":"10.1109/ISCC-C.2013.39","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.39","url":null,"abstract":"The association rules mining has an very important impact in data mining. As the rapid growth of datasets, the required memory increase seriously and the operating efficiency declines rapidly. Cloud computing provides efficient and cheap solutions to analyze and implement the association rules mining algorithms in parallel. This paper proposes an improved association mining algorithm based on power set and MapReduce programming model, which can process massive datasets with a cluster of machines on Hadoop platform. The results of the numerical experiments show that the proposed algorithm can achieve higher efficiency in the association rules mining.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129109040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper, based on computer image processing technology, researches the statistical method to the housing vacancy rate, making use of residential building at night images. This method needs three steps, the first step is image preprocessing, to enhance, denoise and correct the building image in the night scene, using the methods of histogram equalization, wavelet transform, the Radon transform and the connection point. The second step is the image threshold segmentation, to segment the images of dark and bright windows with the fixed threshold method and improve the between-cluster variance method. The third step is through the image fusion technology, making use of closed area centroid coordinates in the horizontal and vertical coordinates from big to small order, then determining the location and the number of dark and bright windows, and finally concluding the vacancy rate. Finally, to achieve the hybrid programming of Matlab and Visual c++ by using the application of Matrix, we realize the above functions. We make comparative analysis to the conclusions from this method, and by comparing with the present commonly used methods, we verify the feasibility of the proposed method in this paper.
{"title":"Recognition Methods of Housing Vacancy Based on Digital Image Processing","authors":"Wei Yao, Guifa Teng, Hui Li","doi":"10.1109/ISCC-C.2013.106","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.106","url":null,"abstract":"This paper, based on computer image processing technology, researches the statistical method to the housing vacancy rate, making use of residential building at night images. This method needs three steps, the first step is image preprocessing, to enhance, denoise and correct the building image in the night scene, using the methods of histogram equalization, wavelet transform, the Radon transform and the connection point. The second step is the image threshold segmentation, to segment the images of dark and bright windows with the fixed threshold method and improve the between-cluster variance method. The third step is through the image fusion technology, making use of closed area centroid coordinates in the horizontal and vertical coordinates from big to small order, then determining the location and the number of dark and bright windows, and finally concluding the vacancy rate. Finally, to achieve the hybrid programming of Matlab and Visual c++ by using the application of Matrix, we realize the above functions. We make comparative analysis to the conclusions from this method, and by comparing with the present commonly used methods, we verify the feasibility of the proposed method in this paper.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127751138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe an Instant-Based Qur'an Memorizer Application Interface, which aims at providing a unifying framework for building Qur'an memorizer application. It includes all features for memorizing the Qur'an, and it is feasible to be used in latest handsets. Its unique feature of instant creation gives the user to memorize any Surah or Ayah from the Qur'an. We describe the core components and design patterns of the proposed memorizer with emphasis on key design criteria. These criteria aim at providing the necessary scalability and performance on the one hand, and quality assurance of the Qur'an text on the other.
{"title":"An Instant-Based Qur'an Memorizer Application Interface","authors":"Z. Adhoni, H. Al Hamad, A. A. Siddiqi","doi":"10.1109/ISCC-C.2013.14","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.14","url":null,"abstract":"In this paper, we describe an Instant-Based Qur'an Memorizer Application Interface, which aims at providing a unifying framework for building Qur'an memorizer application. It includes all features for memorizing the Qur'an, and it is feasible to be used in latest handsets. Its unique feature of instant creation gives the user to memorize any Surah or Ayah from the Qur'an. We describe the core components and design patterns of the proposed memorizer with emphasis on key design criteria. These criteria aim at providing the necessary scalability and performance on the one hand, and quality assurance of the Qur'an text on the other.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114065188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper aims to use strategic diagram technique to detect research themes and reveal their evolutionary trends in a scientific field using bibliometric data under practical application. Keywords are selected not only from author-provided and machine-indexed keywords, but also extracted from the full text so as to eliminate the "indexer effect". The keywords are then clustered to detect research themes, which are classified into four categories in a strategic diagram to reveal the research situations according to their strategic positions. Moreover, the strategic diagrams based on analysis of temporal dynamics are used to find out the thematic evolution through the similarity index to detect similar themes of adjacent phases, and the provenance and influence indexes to evaluate interactions of similar themes. Experimental results showed that the method is effective and useful in revealing research themes and their evolutionary trends in a scientific field.
{"title":"Revealing Research Themes and their Evolutionary Trends Using Bibliometric Data Based on Strategic Diagrams","authors":"H. Han, Jie Gui, Shuo Xu","doi":"10.1109/ISCC-C.2013.121","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.121","url":null,"abstract":"The paper aims to use strategic diagram technique to detect research themes and reveal their evolutionary trends in a scientific field using bibliometric data under practical application. Keywords are selected not only from author-provided and machine-indexed keywords, but also extracted from the full text so as to eliminate the \"indexer effect\". The keywords are then clustered to detect research themes, which are classified into four categories in a strategic diagram to reveal the research situations according to their strategic positions. Moreover, the strategic diagrams based on analysis of temporal dynamics are used to find out the thematic evolution through the similarity index to detect similar themes of adjacent phases, and the provenance and influence indexes to evaluate interactions of similar themes. Experimental results showed that the method is effective and useful in revealing research themes and their evolutionary trends in a scientific field.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132556284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.
{"title":"Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals","authors":"Xumi Qu, Yuehui Chen, Shanping Qiao","doi":"10.1109/ISCC-C.2013.101","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.101","url":null,"abstract":"Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to solve the problem that there is a shortage of space and computing power of the traditional spatial data mining algorithm during the processing for massive spatial data information, a combination of Rough set and distributed framework is used in the process of spatial data mining. In this paper, parallel improvement is taken into the algorithm of the traditional Rough set for spatial data mining based on the basic theory of rough set and the Map/Reduce framework, which is efficient and cheap. Then, a spatial data example is utilized to show the feasibility of the improved parallel algorithm. Empirical results show that the improved parallel algorithm of Rough set for spatial data mining can not only effectively improve the efficiency of the algorithm but also meet the need of people to deal with massive spatial data which is hardly to the algorithm of traditional Rough set. Improved Rough set parallel algorithm for spatial data mining can effectively solve the problem of shortage for massive spatial data storage and computing power mining.
{"title":"Improvement of the Data Mining Algorithm of Rough Set under the Framework of Map/Reduce","authors":"Ying Wang, Jiqing Liu, Qiongqiong Liu","doi":"10.1109/ISCC-C.2013.80","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.80","url":null,"abstract":"In order to solve the problem that there is a shortage of space and computing power of the traditional spatial data mining algorithm during the processing for massive spatial data information, a combination of Rough set and distributed framework is used in the process of spatial data mining. In this paper, parallel improvement is taken into the algorithm of the traditional Rough set for spatial data mining based on the basic theory of rough set and the Map/Reduce framework, which is efficient and cheap. Then, a spatial data example is utilized to show the feasibility of the improved parallel algorithm. Empirical results show that the improved parallel algorithm of Rough set for spatial data mining can not only effectively improve the efficiency of the algorithm but also meet the need of people to deal with massive spatial data which is hardly to the algorithm of traditional Rough set. Improved Rough set parallel algorithm for spatial data mining can effectively solve the problem of shortage for massive spatial data storage and computing power mining.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131154024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Compared with the traditional data warehouse applications, the big data analysis is characterized by its large data size and complex query analysis. In order to design the data warehouse architecture suitable for the big data analysis, this paper analyzes and summarizes the current mainstream implementation platform-parallel database, MapReduce and the hybrid architecture based on the above-mentioned two architectures. Moreover, it presents respectively their advantages and disadvantages and describes various researches of and the author's efforts on the big data analysis to make prospects for the future study.
{"title":"Present Situation and Prospect of Data Warehouse Architecture under the Background of Big Data","authors":"Lihua Sun, Mu Hu, K. Ren, Mingming Ren","doi":"10.1109/ISCC-C.2013.102","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.102","url":null,"abstract":"Compared with the traditional data warehouse applications, the big data analysis is characterized by its large data size and complex query analysis. In order to design the data warehouse architecture suitable for the big data analysis, this paper analyzes and summarizes the current mainstream implementation platform-parallel database, MapReduce and the hybrid architecture based on the above-mentioned two architectures. Moreover, it presents respectively their advantages and disadvantages and describes various researches of and the author's efforts on the big data analysis to make prospects for the future study.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article starts with the introduction of the essence of the reliability of embedded system. By introducing some characteristics of embedded system such as failure rate, reliability and mean time to failure to analyze the reliability of embedded system, and set up the model of a single system, series system and parallel system. The models founded were simulated with Simulink software. Finally, the results of the simulation and the example validations indicate that series-parallel hybrid structure is very necessary in order to improve the reliability of embedded system and make the system has a long service life.
{"title":"The Reliability Analysis of Embedded Systems","authors":"Zhongzheng You","doi":"10.1109/ISCC-C.2013.142","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.142","url":null,"abstract":"This article starts with the introduction of the essence of the reliability of embedded system. By introducing some characteristics of embedded system such as failure rate, reliability and mean time to failure to analyze the reliability of embedded system, and set up the model of a single system, series system and parallel system. The models founded were simulated with Simulink software. Finally, the results of the simulation and the example validations indicate that series-parallel hybrid structure is very necessary in order to improve the reliability of embedded system and make the system has a long service life.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
{"title":"A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation","authors":"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li","doi":"10.1109/ISCC-C.2013.33","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.33","url":null,"abstract":"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117071730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces the signal noise difference method and applies this method into the commodity futures price prediction. Based on the prediction rules mined from the data of 25 potential prediction indicators of SHFE CU, a corresponding transaction strategy is established. And we use the market data from 2009 to 2013 to test our transaction strategy, which obtains a result of 147.85% annual yield. In addition, several improvements are discussed to optimize this model.
{"title":"Commodity Futures Price Prediction and Trading Strategies -- A Signal Noise Difference Approach","authors":"Jinhao Zheng, Shoukang Peng","doi":"10.1109/ISCC-C.2013.60","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.60","url":null,"abstract":"This paper introduces the signal noise difference method and applies this method into the commodity futures price prediction. Based on the prediction rules mined from the data of 25 potential prediction indicators of SHFE CU, a corresponding transaction strategy is established. And we use the market data from 2009 to 2013 to test our transaction strategy, which obtains a result of 147.85% annual yield. In addition, several improvements are discussed to optimize this model.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123226319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}