Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949212
Lina Zhao, Fang Ma, Hongwei Yang
Robust recovery of multiple subspace structures from high-dimensional data with noise has received considerable attention in computer vision and pattern recognition. Low-Rank Representation (LRR) as a typical method has made satisfactory results in subspace clustering. Latent Low-Rank Representation (LLRR) is an advanced version of LRR, which considers the row and column of data to solve the insufficient samples problem. However, they fail to exploit the local structures of data. To address this problem, Latent Sparse Low-Rank Representation (LSLRR) is proposed to capture the local and global structures of data by considering sparse and low-rank constraints simultaneously. In this way, LSLRR not only solves the clustering problem, but also extracts significant features for classification. Inexact Augmented Lagrange Multiplier method (IALM) is utilized to solve its objective function. Experimental results in subspace clustering and salient features extraction demonstrate the proposed LSLRR have a favorable performance.
{"title":"Subspace Clustering and Feature Extraction Based on Latent Sparse Low-Rank Representation","authors":"Lina Zhao, Fang Ma, Hongwei Yang","doi":"10.1109/ICMLC48188.2019.8949212","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949212","url":null,"abstract":"Robust recovery of multiple subspace structures from high-dimensional data with noise has received considerable attention in computer vision and pattern recognition. Low-Rank Representation (LRR) as a typical method has made satisfactory results in subspace clustering. Latent Low-Rank Representation (LLRR) is an advanced version of LRR, which considers the row and column of data to solve the insufficient samples problem. However, they fail to exploit the local structures of data. To address this problem, Latent Sparse Low-Rank Representation (LSLRR) is proposed to capture the local and global structures of data by considering sparse and low-rank constraints simultaneously. In this way, LSLRR not only solves the clustering problem, but also extracts significant features for classification. Inexact Augmented Lagrange Multiplier method (IALM) is utilized to solve its objective function. Experimental results in subspace clustering and salient features extraction demonstrate the proposed LSLRR have a favorable performance.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129604917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949232
Lu-ning Zhang, Xin Zuo, Jian-wei Liu, Weimin Li, Nobuyasu Ito
In the article [1], we can get a tighter upper bound of expected regret in Theorem 1 and 4, there are also some critical incorrect statements in the proof of Theorem 2, we modified the incorrect statements in this comment and a correction version of Theorem 2 is also presented.
{"title":"Comments on “Finite-Time Analysis of the Multiarmed Bandit Problem”","authors":"Lu-ning Zhang, Xin Zuo, Jian-wei Liu, Weimin Li, Nobuyasu Ito","doi":"10.1109/ICMLC48188.2019.8949232","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949232","url":null,"abstract":"In the article [1], we can get a tighter upper bound of expected regret in Theorem 1 and 4, there are also some critical incorrect statements in the proof of Theorem 2, we modified the incorrect statements in this comment and a correction version of Theorem 2 is also presented.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128889256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949209
Xingchen Yang, J. Yan, Yi-Zhao, Honghai Liu
Despite the prosperous development of the ultrasound-based human-machine interface, its reliability in the practical applications is still unevaluated. This paper gives priority to exploring the limb position effect on the ultrasound-based gesture recognition, where wearable A-mode ultrasound is utilized instead of its cumbersome B-mode counterpart. An online experiment under eight different limb positions is conducted to validate the performance of the ultrasound-based gesture recognition, with eight able-bodied subjects employed. Results show that the influence of limb movement on the ultrasound-based gesture recognition is not significant. Overall, the real-time motion completion rate and motion recognition accuracy are 97.1% and 94.5% across different limb positions, albeit only training at a natural limb position. Moreover, it takes only 177 ms for the system to successfully recognize the intended motions across various limb positions. These results demonstrate the reliability of the ultrasound-based gesture interaction, paving the way for its practical applications.
{"title":"Exploring the LIMB Position Effect on Wearable-Ultrasound-Based Gesture Recognition","authors":"Xingchen Yang, J. Yan, Yi-Zhao, Honghai Liu","doi":"10.1109/ICMLC48188.2019.8949209","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949209","url":null,"abstract":"Despite the prosperous development of the ultrasound-based human-machine interface, its reliability in the practical applications is still unevaluated. This paper gives priority to exploring the limb position effect on the ultrasound-based gesture recognition, where wearable A-mode ultrasound is utilized instead of its cumbersome B-mode counterpart. An online experiment under eight different limb positions is conducted to validate the performance of the ultrasound-based gesture recognition, with eight able-bodied subjects employed. Results show that the influence of limb movement on the ultrasound-based gesture recognition is not significant. Overall, the real-time motion completion rate and motion recognition accuracy are 97.1% and 94.5% across different limb positions, albeit only training at a natural limb position. Moreover, it takes only 177 ms for the system to successfully recognize the intended motions across various limb positions. These results demonstrate the reliability of the ultrasound-based gesture interaction, paving the way for its practical applications.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129124674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949308
K. Matsubayashi, T. Anjiki, Shunji Maeda
Garbage power generation is expected to play an important role as a renewable source of stable power. Steam is produced from the heat of combustion, and this is used to generate electrical energy. In order to supply continuous power, the combustion must be controlled so as to create a stable steam flow. In this report, we analyze sensor data from and infrared images of the furnace with the aim of stabilizing the steam flow.
{"title":"Analyzing Causal Relationships of Sensor Data and Infrared Images to Stabilize Garbage Power Generation","authors":"K. Matsubayashi, T. Anjiki, Shunji Maeda","doi":"10.1109/ICMLC48188.2019.8949308","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949308","url":null,"abstract":"Garbage power generation is expected to play an important role as a renewable source of stable power. Steam is produced from the heat of combustion, and this is used to generate electrical energy. In order to supply continuous power, the combustion must be controlled so as to create a stable steam flow. In this report, we analyze sensor data from and infrared images of the furnace with the aim of stabilizing the steam flow.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949255
Wendi Li, Yi Zhu, Ting Wang, Wing W. Y. Ng
In cases, the same or similar network architecture is used to deal with related but different tasks, where tasks come from different statistical distributions in the sample input space and share some common features. Multi-Task Learning (MTL) combines multiple related tasks for training at the same time, so as to learn some shared feature representation among multiple tasks. However, it is difficult to improve each task when statistical distributions of these related tasks are greatly different. This is caused by the difficulty of extracting an effective generalization of feature representation from multiple tasks. Moreover, it also slows down the convergence rate of MTL. Therefore, we propose a MTL method based on the Localized Generalization Error Model (L-GEM). The L-GEM improves the generalization capability of the trained model by minimizing the upper bound of generalization error of it with respect to unseen samples similar to training samples. It also helps to narrow the gap between different tasks due to different statistical distributions in MTL. Experimental results show that the L-GEM speeds up the training process while significantly improves the final convergence results.
{"title":"Multi-Task Learning With Localized Generalization Error Model","authors":"Wendi Li, Yi Zhu, Ting Wang, Wing W. Y. Ng","doi":"10.1109/ICMLC48188.2019.8949255","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949255","url":null,"abstract":"In cases, the same or similar network architecture is used to deal with related but different tasks, where tasks come from different statistical distributions in the sample input space and share some common features. Multi-Task Learning (MTL) combines multiple related tasks for training at the same time, so as to learn some shared feature representation among multiple tasks. However, it is difficult to improve each task when statistical distributions of these related tasks are greatly different. This is caused by the difficulty of extracting an effective generalization of feature representation from multiple tasks. Moreover, it also slows down the convergence rate of MTL. Therefore, we propose a MTL method based on the Localized Generalization Error Model (L-GEM). The L-GEM improves the generalization capability of the trained model by minimizing the upper bound of generalization error of it with respect to unseen samples similar to training samples. It also helps to narrow the gap between different tasks due to different statistical distributions in MTL. Experimental results show that the L-GEM speeds up the training process while significantly improves the final convergence results.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949320
D. Hassan
Predicting the emergence of future research collaborations between authors in academic social network is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of link between a pair of nodes in social networks (SN). Since the majority of previous research work on link prediction only considered predictor variables (i.e., features) extracted from SN structure, this paper aims to investigate the impact of using other types of predictor variables on solving link prediction problem in co-authorship network. It proposes a new method for supervised link prediction in co-authorship networks using predictors extracted by: computing the similarity between the research interests of each two author nodes in the network, the similarity between their affiliations, the sum of their research performance indices as well as the similarity between the two author nodes themselves. The preliminary results of our approach show that the sum of research performance indices of two author nodes has the most impact on the performance of supervised link prediction which motivates us to do further analysis on using such a predictor.
{"title":"Supervised Link Prediction in Co-Authorship Networks Based on Research Performance and Similarity of Research Interests and Affiliations","authors":"D. Hassan","doi":"10.1109/ICMLC48188.2019.8949320","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949320","url":null,"abstract":"Predicting the emergence of future research collaborations between authors in academic social network is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of link between a pair of nodes in social networks (SN). Since the majority of previous research work on link prediction only considered predictor variables (i.e., features) extracted from SN structure, this paper aims to investigate the impact of using other types of predictor variables on solving link prediction problem in co-authorship network. It proposes a new method for supervised link prediction in co-authorship networks using predictors extracted by: computing the similarity between the research interests of each two author nodes in the network, the similarity between their affiliations, the sum of their research performance indices as well as the similarity between the two author nodes themselves. The preliminary results of our approach show that the sum of research performance indices of two author nodes has the most impact on the performance of supervised link prediction which motivates us to do further analysis on using such a predictor.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132947416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949221
Z. Chaczko, Peter Wajs-Chaczko, David Tien, Y. Haidar
Monitoring the presence of micro-plastics in human and animal habitats is fast becoming an important research theme due to a need to preserve healthy ecosystems. Microplastics pollute the environment and can represent a serious threat for biological organisms including the human body, as they can be inadvertently consumed through the food chain. To perceive and understand the level of microplastics pollution threats in the environment there is a need to design and develop reliable methodologies and tools that can detect and classify the different types of the microplastics. This paper presents results of our work related to exploration of methods and techniques useful for detecting suspicious objects in their respective ecosystem captured in hyperspectral images and then classifying these objects with the use of Neural Networks technique.
{"title":"Detection of Microplastics Using Machine Learning","authors":"Z. Chaczko, Peter Wajs-Chaczko, David Tien, Y. Haidar","doi":"10.1109/ICMLC48188.2019.8949221","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949221","url":null,"abstract":"Monitoring the presence of micro-plastics in human and animal habitats is fast becoming an important research theme due to a need to preserve healthy ecosystems. Microplastics pollute the environment and can represent a serious threat for biological organisms including the human body, as they can be inadvertently consumed through the food chain. To perceive and understand the level of microplastics pollution threats in the environment there is a need to design and develop reliable methodologies and tools that can detect and classify the different types of the microplastics. This paper presents results of our work related to exploration of methods and techniques useful for detecting suspicious objects in their respective ecosystem captured in hyperspectral images and then classifying these objects with the use of Neural Networks technique.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129501424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949230
Sheng-An Yang, Meng-Han Yang
In recent years, the use of electronic health records (EHR) has increased dramatically. Mining hidden knowledge in “big data” from EHR has become a subject worthy of exploration. On the other hand, many recent applications used deep artificial neural network (ANN) to analyze EHR data and yielded great performance. Accordingly, this study developed functional models using deep ANN, and tried to validate effectiveness of this method in regression analysis and classification problem. Based on datasets downloaded from the UC Irvine Machine Learning Repository, the output mean squared error value 0.840 was within the range of one variance for the regression analysis. Similarly, the prediction accuracy 73.0% on the testing data was reported for the classification problem. Another focus of this study was identifying critical attributes using the layer-wise relevance propagation (LRP) algorithm to improve interpretability of deep ANN. According to evaluation outcomes, the identified features would match with those recognized by univariate analysis. In summary, effectiveness of deep ANN and LRP on application problems has been validated in this study.
{"title":"Developing the Interpretability of Deep Artificial Neural Network on Application Problems","authors":"Sheng-An Yang, Meng-Han Yang","doi":"10.1109/ICMLC48188.2019.8949230","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949230","url":null,"abstract":"In recent years, the use of electronic health records (EHR) has increased dramatically. Mining hidden knowledge in “big data” from EHR has become a subject worthy of exploration. On the other hand, many recent applications used deep artificial neural network (ANN) to analyze EHR data and yielded great performance. Accordingly, this study developed functional models using deep ANN, and tried to validate effectiveness of this method in regression analysis and classification problem. Based on datasets downloaded from the UC Irvine Machine Learning Repository, the output mean squared error value 0.840 was within the range of one variance for the regression analysis. Similarly, the prediction accuracy 73.0% on the testing data was reported for the classification problem. Another focus of this study was identifying critical attributes using the layer-wise relevance propagation (LRP) algorithm to improve interpretability of deep ANN. According to evaluation outcomes, the identified features would match with those recognized by univariate analysis. In summary, effectiveness of deep ANN and LRP on application problems has been validated in this study.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125102898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949196
Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu
Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.
{"title":"Numeric Prediction of Dissolved Oxygen Status Through Two-Stage Training for Classification-Driven Regression","authors":"Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu","doi":"10.1109/ICMLC48188.2019.8949196","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949196","url":null,"abstract":"Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ICMLC48188.2019.8949260
Jian Zhang, Zijiang Yang, Y. Benslimane
The combination of data mining and machine learning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the worldwide fast growth of the Information Communication Technology (ICT), data come with significant large volume, high velocity and extensive variety. In this paper, four popular data mining methods are applied on Apache Spark using large volume of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The output of the paper convincingly presents useful strategies of computing resource allocation and tuning to make full advantage of the in-memory system of Apache Spark with the tasks of data mining and machine learning on educational datasets.
{"title":"Exploring and Evaluating the Scalability and Eficinecy of Apache Spark Using Educational Datasets","authors":"Jian Zhang, Zijiang Yang, Y. Benslimane","doi":"10.1109/ICMLC48188.2019.8949260","DOIUrl":"https://doi.org/10.1109/ICMLC48188.2019.8949260","url":null,"abstract":"The combination of data mining and machine learning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the worldwide fast growth of the Information Communication Technology (ICT), data come with significant large volume, high velocity and extensive variety. In this paper, four popular data mining methods are applied on Apache Spark using large volume of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The output of the paper convincingly presents useful strategies of computing resource allocation and tuning to make full advantage of the in-memory system of Apache Spark with the tasks of data mining and machine learning on educational datasets.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}