Xiongpai Qin, Yueguo Chen, Guodong Jin, Yang Liu, Yiming Cong, Xiaoyong Du
Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will help us learn recent changes of browsing and shopping behavior of specific customers, which enables us to provide personalized recommendations. To accomplish such analysis, log data should have been loaded quickly into data warehouse without loss. This paper proposes a no loss staging and fast loading solution for log data. Based on open sourced tools such as Kafka, HDFS, and Spark, we have designed and implemented an entity fiber based log data partitioning and staging method, as well as a parallel loading algorithm. Our scheme achieves a data staging performance of around 390,000 records/s, and a data loading performance of around 160,000 records/s.
{"title":"Entity Fiber Based Partitioning, No Loss Staging and Fast Loading of Log Data","authors":"Xiongpai Qin, Yueguo Chen, Guodong Jin, Yang Liu, Yiming Cong, Xiaoyong Du","doi":"10.1109/PDCAT.2016.052","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.052","url":null,"abstract":"Real time analysis of fine granularity of log data can help people gain personalized insights on business. For example, real time analysis of e-commerce log data will help us learn recent changes of browsing and shopping behavior of specific customers, which enables us to provide personalized recommendations. To accomplish such analysis, log data should have been loaded quickly into data warehouse without loss. This paper proposes a no loss staging and fast loading solution for log data. Based on open sourced tools such as Kafka, HDFS, and Spark, we have designed and implemented an entity fiber based log data partitioning and staging method, as well as a parallel loading algorithm. Our scheme achieves a data staging performance of around 390,000 records/s, and a data loading performance of around 160,000 records/s.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"27 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126585393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cuili Wang, Chao Yang, Lin Liu, Ping Wang, Heng Liu
In order to meet the increasing demand of data rate, the next generation Mobile Communications (5G) are becoming more heterogeneous, irregular and complex. This will lead to more complex interference environment. So the traditional fixed geometry hexagon model is no longer applicable. In order to more accurately evaluate the performance of the 5G heterogeneous network, in this paper, we proposes to analyze a downlink two-tiers heterogeneous cellular network (HCN) based on the stochastic geometry, which considers the inter-layer and intra-layer spatial correlation between the BSs. We present our empirical study on average SINR and average throughput for edge and hotspot areas. By comparing with the traditional fixed geometry hexagon model, the stochastic geometry model is more suitable and accurate for the actual 5G heterogeneous cellular networks.
{"title":"Stochastic Geometry Interference Model for 5G Heterogeneous Network","authors":"Cuili Wang, Chao Yang, Lin Liu, Ping Wang, Heng Liu","doi":"10.1109/PDCAT.2016.067","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.067","url":null,"abstract":"In order to meet the increasing demand of data rate, the next generation Mobile Communications (5G) are becoming more heterogeneous, irregular and complex. This will lead to more complex interference environment. So the traditional fixed geometry hexagon model is no longer applicable. In order to more accurately evaluate the performance of the 5G heterogeneous network, in this paper, we proposes to analyze a downlink two-tiers heterogeneous cellular network (HCN) based on the stochastic geometry, which considers the inter-layer and intra-layer spatial correlation between the BSs. We present our empirical study on average SINR and average throughput for edge and hotspot areas. By comparing with the traditional fixed geometry hexagon model, the stochastic geometry model is more suitable and accurate for the actual 5G heterogeneous cellular networks.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127250416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ding, Yidong Li, Xiaolin Xu, Hongwei Xing, Zhen Wang, Liang Chen, G. Wang, Yu Meng
This paper mainly presented a system which can make a prediction to the distribution transformer's load status in smart grid. Since the operation of distribution transformer's load status is generally in the post processing stage at the current stage, lacking forecasting work on distribution transformer's operation and load status. Given the issues above, to reduce costs, ensure the security of power supply, and improve the emergency response capabilities, we presented a prediction system, which can predict the load status of distribution transformer by utilising the data mining algorithm. Besides, the system also provides a platform for the management and maintenance of electrified wire netting's information. In this system, users can conveniently manage the vast and multifarious data sets.
{"title":"A Learning-Based System for Monitoring Electrical Load in Smart Grid","authors":"S. Ding, Yidong Li, Xiaolin Xu, Hongwei Xing, Zhen Wang, Liang Chen, G. Wang, Yu Meng","doi":"10.1109/PDCAT.2016.080","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.080","url":null,"abstract":"This paper mainly presented a system which can make a prediction to the distribution transformer's load status in smart grid. Since the operation of distribution transformer's load status is generally in the post processing stage at the current stage, lacking forecasting work on distribution transformer's operation and load status. Given the issues above, to reduce costs, ensure the security of power supply, and improve the emergency response capabilities, we presented a prediction system, which can predict the load status of distribution transformer by utilising the data mining algorithm. Besides, the system also provides a platform for the management and maintenance of electrified wire netting's information. In this system, users can conveniently manage the vast and multifarious data sets.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":" 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113948699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. R. Valêncio, C. D. Medeiros, L. A. Neves, G. F. D. Zafalon, Rogéria Cristiane Gratão de Souza, A. Colombini
Spatial clustering has been widely studied due to its application in several areas. However, the algorithms of such technique still need to overcome several challenges to achieve satisfactory results on a timely basis. This work presents an algorithm for spatial clustering based on CHSMST, which allows: data clustering considering both distance and similarity, enabling to correlate spatial and nonspatial data, user interaction is not necessary, and use of multithreading technique to improve the performance. The algorithm was tested ia a real database of health area.
{"title":"CHSMST+: An Algorithm for Spatial Clustering","authors":"C. R. Valêncio, C. D. Medeiros, L. A. Neves, G. F. D. Zafalon, Rogéria Cristiane Gratão de Souza, A. Colombini","doi":"10.1109/PDCAT.2016.081","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.081","url":null,"abstract":"Spatial clustering has been widely studied due to its application in several areas. However, the algorithms of such technique still need to overcome several challenges to achieve satisfactory results on a timely basis. This work presents an algorithm for spatial clustering based on CHSMST, which allows: data clustering considering both distance and similarity, enabling to correlate spatial and nonspatial data, user interaction is not necessary, and use of multithreading technique to improve the performance. The algorithm was tested ia a real database of health area.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imbalanced datasets exist widely in real life. The identification of the minority class in imbalanced datasets tends to be the focus of classification. The twin support vector machine (TWSVM) as a variant of enhanced SVM provides an effective technique for data classification. In the paper, we propose to combine a re-sampling technique, which utilizes over-sampling and under-sampling to balance the training data, with TWSVM to deal with imbalanced data classification. Experimental results show that our proposed approach outperforms other state-of–art methods.
不平衡数据集在现实生活中广泛存在。不平衡数据集中的少数类的识别往往是分类的重点。双支持向量机(twin support vector machine, TWSVM)作为一种改进的支持向量机,为数据分类提供了一种有效的方法。在本文中,我们提出将利用过采样和欠采样来平衡训练数据的重采样技术与TWSVM相结合来处理不平衡数据分类。实验结果表明,我们提出的方法优于其他最先进的方法。
{"title":"Combining Re-Sampling with Twin Support Vector Machine for Imbalanced Data Classification","authors":"Lu Cao, Hong-Mei Shen","doi":"10.1109/PDCAT.2016.076","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.076","url":null,"abstract":"Imbalanced datasets exist widely in real life. The identification of the minority class in imbalanced datasets tends to be the focus of classification. The twin support vector machine (TWSVM) as a variant of enhanced SVM provides an effective technique for data classification. In the paper, we propose to combine a re-sampling technique, which utilizes over-sampling and under-sampling to balance the training data, with TWSVM to deal with imbalanced data classification. Experimental results show that our proposed approach outperforms other state-of–art methods.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134254726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Improved FCM algorithm based on genetic algorithm is used to extract user needs in mobile cloud computing. By using its characteristics of fast clustering, users can be divided into the same category with similar attributes and behavior patterns, and then use the similarity recommendation algorithm, which makes the similar user requests can be quickly responded. This algorithm (GAFCM-CF) is proposed in this paper to solve the problem of mobile cloud user attribute collection and user request processing in small and medium network. At the same time, this paper compares the simulation experiment with the traditional MIN-MIN scheduling algorithm, and verifies the effectiveness and efficiency of the algorithm.
{"title":"Research of User Request Algorithm in Mobile Cloud Computing Based on Improved FCM and Collaborative Filtering","authors":"Wu Hong-qiang, Li Xiao-yong, Fang Bin-xing","doi":"10.1109/PDCAT.2016.024","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.024","url":null,"abstract":"Improved FCM algorithm based on genetic algorithm is used to extract user needs in mobile cloud computing. By using its characteristics of fast clustering, users can be divided into the same category with similar attributes and behavior patterns, and then use the similarity recommendation algorithm, which makes the similar user requests can be quickly responded. This algorithm (GAFCM-CF) is proposed in this paper to solve the problem of mobile cloud user attribute collection and user request processing in small and medium network. At the same time, this paper compares the simulation experiment with the traditional MIN-MIN scheduling algorithm, and verifies the effectiveness and efficiency of the algorithm.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128986030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Duan, Kun Wang, Xiaoshan Yu, Liangkai Liu, Huaxi Gu, Yantao Guo
Recently, many energy-aware routing algorithms are proposed to decrease the energy consumption of data center network. However, these methods ignore the effect of working time on energy consumption. In this paper, we analyze the energy consumption model and propose an energy-aware routing algorithm by jointly considering power consumption and working time. According to the simulation results, the flow driven energy-aware routing algorithm saves nearly 50 percent of energy compared with flow preemption energy-aware routing algorithm when transmission rate is limited by the available bandwidth, while it saves about 58.3 percent of energy compared with flow aggregation energy-aware routing algorithm when the transmission rate is limited by the forwarding rate of server's NIC.
{"title":"Flow Driven Energy-Aware Routing Algorithm in Data Center Network","authors":"P. Duan, Kun Wang, Xiaoshan Yu, Liangkai Liu, Huaxi Gu, Yantao Guo","doi":"10.1109/PDCAT.2016.066","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.066","url":null,"abstract":"Recently, many energy-aware routing algorithms are proposed to decrease the energy consumption of data center network. However, these methods ignore the effect of working time on energy consumption. In this paper, we analyze the energy consumption model and propose an energy-aware routing algorithm by jointly considering power consumption and working time. According to the simulation results, the flow driven energy-aware routing algorithm saves nearly 50 percent of energy compared with flow preemption energy-aware routing algorithm when transmission rate is limited by the available bandwidth, while it saves about 58.3 percent of energy compared with flow aggregation energy-aware routing algorithm when the transmission rate is limited by the forwarding rate of server's NIC.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130961781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Jiang, Peibing Du, Tao Sun, Housen Li, Lizhi Cheng, Canqun Yang
Designing fast singular value decomposition (SVD) is significantly interesting in applications. The random direct SVD (RSVD) has provided a fast scheme to compute the well-approximate SVD by unilateral randomized sampling. In this paper, we present an efficient random algorithm in a bilateral sampling way. We also prove that the proposed algorithms can be bounded well and have less computational complexity compared to RSVD when the objective matrix is approximately square. Numerical experiments on graph Laplacian and Hilbert matrix demonstrate the efficiency and stability of the proposed methods.
{"title":"Bilateral Sampling Randomized Singular Value Decomposition","authors":"Hao Jiang, Peibing Du, Tao Sun, Housen Li, Lizhi Cheng, Canqun Yang","doi":"10.1109/PDCAT.2016.027","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.027","url":null,"abstract":"Designing fast singular value decomposition (SVD) is significantly interesting in applications. The random direct SVD (RSVD) has provided a fast scheme to compute the well-approximate SVD by unilateral randomized sampling. In this paper, we present an efficient random algorithm in a bilateral sampling way. We also prove that the proposed algorithms can be bounded well and have less computational complexity compared to RSVD when the objective matrix is approximately square. Numerical experiments on graph Laplacian and Hilbert matrix demonstrate the efficiency and stability of the proposed methods.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent research of 5G show that the most feasible solution to improve the network capacity is the ultra dense deployment of small cells. Inter-cell interference is the key factor in the dense deployment of small cells which limit the system performance. In this paper, we focus on the interference analysis and modeling of heterogeneous small-cell networks (HSCN). Also, the ability to analyze and accurately predict the impact of interference via the use of an interference model is an essential way to improve the network performance. With the establishment of heterogeneous network structure, the statistical characteristics of both strongest interference and overall interference are analyzed, we get the statistical characteristics of the strong interference and the total interference, so as to determine the total interference is determined by the strong interference and the relationship between them is established, and the related model is established. Finally, the analytical conditions on the interference system model parameters are derived and the distributions are determined, on which the statistical properties of total interference power can be accurately modeled by several strong interference.
{"title":"Interference Modeling and Analysis in Heterogeneous Small-Cell Networks","authors":"Chao Yang, Heng Liu, Lin Liu","doi":"10.1109/PDCAT.2016.069","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.069","url":null,"abstract":"The recent research of 5G show that the most feasible solution to improve the network capacity is the ultra dense deployment of small cells. Inter-cell interference is the key factor in the dense deployment of small cells which limit the system performance. In this paper, we focus on the interference analysis and modeling of heterogeneous small-cell networks (HSCN). Also, the ability to analyze and accurately predict the impact of interference via the use of an interference model is an essential way to improve the network performance. With the establishment of heterogeneous network structure, the statistical characteristics of both strongest interference and overall interference are analyzed, we get the statistical characteristics of the strong interference and the total interference, so as to determine the total interference is determined by the strong interference and the relationship between them is established, and the related model is established. Finally, the analytical conditions on the interference system model parameters are derived and the distributions are determined, on which the statistical properties of total interference power can be accurately modeled by several strong interference.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liu Na, Lu Ying, Tang Xiao-jun, Wang Hai-wen, Xiao Peng, Li Ming-xia
Collaborative filtering algorithms make use of interactions rates between users and items for generating recommendations. Similarity among users or items is calculated based on rating mostly, without considering explicit properties of users or items involved. In this paper, we proposed collaborative filtering algorithm using topic model. We describe user-item matrix as document-word matrix and user are represented as random mixtures over item, each item is characterized by a distribution over users. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on MovieLens data sets.
{"title":"Improved Collaborative Filtering Algorithm Using Topic Model","authors":"Liu Na, Lu Ying, Tang Xiao-jun, Wang Hai-wen, Xiao Peng, Li Ming-xia","doi":"10.1109/PDCAT.2016.079","DOIUrl":"https://doi.org/10.1109/PDCAT.2016.079","url":null,"abstract":"Collaborative filtering algorithms make use of interactions rates between users and items for generating recommendations. Similarity among users or items is calculated based on rating mostly, without considering explicit properties of users or items involved. In this paper, we proposed collaborative filtering algorithm using topic model. We describe user-item matrix as document-word matrix and user are represented as random mixtures over item, each item is characterized by a distribution over users. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on MovieLens data sets.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128166167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}