With the large-scale growth of data, traditional single-machine data processing methods are difficult to deal with massive data, especially iterative clustering algorithms that require frequent reading and writing operations. On the basis of Spark framework, this paper proposes a distributed possibilistic c-means algorithm based on memory computing, called Spark-PCM. The proposed method improves the related processing of distributed matrix operation and is implemented on the Spark platform. Experimental results show that the proposed Spark-PCM algorithm runs in a linear relationship with the number of nodes and has a good scalability, which indicates that it has higher scalability and adaptability to large-scale data.
{"title":"A Distributed PCM Clustering Algorithm Based on Spark","authors":"Yong Zhang, Hao Liu, Tianzhen Chen, Di Tang","doi":"10.1145/3318299.3318315","DOIUrl":"https://doi.org/10.1145/3318299.3318315","url":null,"abstract":"With the large-scale growth of data, traditional single-machine data processing methods are difficult to deal with massive data, especially iterative clustering algorithms that require frequent reading and writing operations. On the basis of Spark framework, this paper proposes a distributed possibilistic c-means algorithm based on memory computing, called Spark-PCM. The proposed method improves the related processing of distributed matrix operation and is implemented on the Spark platform. Experimental results show that the proposed Spark-PCM algorithm runs in a linear relationship with the number of nodes and has a good scalability, which indicates that it has higher scalability and adaptability to large-scale data.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126870304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing number of technology, Wide area measurement plays an important role in real-time analysis of low frequency oscillation, transient stability control, voltage phase angle and amplitude measurement in modern smart grid. Wide area measurement has huge amount of data and how to store data is an important issue in wide area measurement. This paper presents the storage method of wide-area measurement data, studies the load distribution, adopts the Magician dynamic migration framework, which is developed based on Xen, and proposes the hierarchical copy algorithm and compression algorithm to optimize the mass data storage in wide-area measurement. Through the research, the problem of excessive data in wide-area measurement can be better solved, the reasonable storage of wide-area measurement data can be realized, and the role of wide-area measurement in smart grid can be better played.
{"title":"Application of Load Balancing Technology Based on Dynamic Migration in Wide Area Measurement Data Storage","authors":"Allam Maalla","doi":"10.1145/3318299.3318361","DOIUrl":"https://doi.org/10.1145/3318299.3318361","url":null,"abstract":"With the increasing number of technology, Wide area measurement plays an important role in real-time analysis of low frequency oscillation, transient stability control, voltage phase angle and amplitude measurement in modern smart grid. Wide area measurement has huge amount of data and how to store data is an important issue in wide area measurement. This paper presents the storage method of wide-area measurement data, studies the load distribution, adopts the Magician dynamic migration framework, which is developed based on Xen, and proposes the hierarchical copy algorithm and compression algorithm to optimize the mass data storage in wide-area measurement. Through the research, the problem of excessive data in wide-area measurement can be better solved, the reasonable storage of wide-area measurement data can be realized, and the role of wide-area measurement in smart grid can be better played.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132341220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This report provides an overview of machine learning and data analysis with explanation of the advantages and disadvantages of different methods. I also demonstrate a practical implementation of the described methods on a dataset of real estate prices.
{"title":"A Review of Methods Used in Machine Learning and Data Analysis","authors":"Qingyang Wu","doi":"10.1145/3318299.3318300","DOIUrl":"https://doi.org/10.1145/3318299.3318300","url":null,"abstract":"This report provides an overview of machine learning and data analysis with explanation of the advantages and disadvantages of different methods. I also demonstrate a practical implementation of the described methods on a dataset of real estate prices.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131423577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The process of a classification application is usually dynamic and long. During the process of an application, better classification application effect can be acquired by enlarging and adjusting the training dataset continuously, for example, modifying the wrong labels of original instances. For this kind of dynamic classification applications, how to build an interpretable classifier which can help domain experts to understand each label's meanings reflected from the dataset, then to compare and discriminate them with their own mastered domain knowledge, and finally to adjust and optimize the training set to enhance the effect of classification applications, is a neglected but worth studying issue. Therefore, an interpretable classification model based on characteristic element extraction is proposed in this paper. The proposed classifier is constructed by extracting positive and negative characteristic elements for all class labels which can intuitively reflect their instinct characteristics. Thus, it has high interpretability obviously and can effectively help domain experts optimize classification effect. At the same time, experiment results show that our classifier also has higher accuracy compared with other kinds of classical classifiers. Consequently, the classification model proposed in this paper is effective and efficient, especially in practical applications.
{"title":"An Interpretable Classification Model Based on Characteristic Element Extraction","authors":"Mingwei Zhang, Xiuxiu He, Bin Zhang","doi":"10.1145/3318299.3318370","DOIUrl":"https://doi.org/10.1145/3318299.3318370","url":null,"abstract":"The process of a classification application is usually dynamic and long. During the process of an application, better classification application effect can be acquired by enlarging and adjusting the training dataset continuously, for example, modifying the wrong labels of original instances. For this kind of dynamic classification applications, how to build an interpretable classifier which can help domain experts to understand each label's meanings reflected from the dataset, then to compare and discriminate them with their own mastered domain knowledge, and finally to adjust and optimize the training set to enhance the effect of classification applications, is a neglected but worth studying issue. Therefore, an interpretable classification model based on characteristic element extraction is proposed in this paper. The proposed classifier is constructed by extracting positive and negative characteristic elements for all class labels which can intuitively reflect their instinct characteristics. Thus, it has high interpretability obviously and can effectively help domain experts optimize classification effect. At the same time, experiment results show that our classifier also has higher accuracy compared with other kinds of classical classifiers. Consequently, the classification model proposed in this paper is effective and efficient, especially in practical applications.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131515828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xubo Gao, Qiusheng Zheng, F. Verri, Rafael D. Rodrigues, Liang Zhao
Multilayer complex networks are suitable models to represent high-dimensional heterogeneous systems with special importance in big data era. Community structures in a multilayer network can be drastically changed in comparison to the set of isolated monolayer networks composited by the same sets of nodes due to the existence of interlayer connections. For this reason, community detection in multilayer networks, as an unsupervised learning task, has turned out to be an interesting research topic in data mining and analysis in complex systems. In this paper, we propose a modified version of the particle competition model for multilayer network community detection. The original model was designed to community detection in monolayer unweighted and undirected networks. The modified version presented in this paper can be in turn applied to multilayer, weighted, and/or directed networks. Moreover, we also propose a localized measure to determine the optimal number of particles corresponding to the correct number of detected communities. Computer simulations shows the better performance of the proposed technique over the state of the art ones.
{"title":"Particle Competition for Multilayer Network Community Detection","authors":"Xubo Gao, Qiusheng Zheng, F. Verri, Rafael D. Rodrigues, Liang Zhao","doi":"10.1145/3318299.3318320","DOIUrl":"https://doi.org/10.1145/3318299.3318320","url":null,"abstract":"Multilayer complex networks are suitable models to represent high-dimensional heterogeneous systems with special importance in big data era. Community structures in a multilayer network can be drastically changed in comparison to the set of isolated monolayer networks composited by the same sets of nodes due to the existence of interlayer connections. For this reason, community detection in multilayer networks, as an unsupervised learning task, has turned out to be an interesting research topic in data mining and analysis in complex systems. In this paper, we propose a modified version of the particle competition model for multilayer network community detection. The original model was designed to community detection in monolayer unweighted and undirected networks. The modified version presented in this paper can be in turn applied to multilayer, weighted, and/or directed networks. Moreover, we also propose a localized measure to determine the optimal number of particles corresponding to the correct number of detected communities. Computer simulations shows the better performance of the proposed technique over the state of the art ones.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"33 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114039517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at minimizing makespan (the end time of the final machine) in flexible job shop scheduling problems (FJSP), a hybrid quantum behaved particle swarm optimization algorithm based on Lévy flights is proposed in this paper. Firstly, the algorithm uses the quantum probability amplitude coding method to establish a relationship between the process sequence and the particle position to solve job process sequencing sub-problem. Then uses the global selection, local selection and probability random selection to select the machine for each process. Finally, the Lévy flights is used to improve variant mode and enhance the effect of variation, the elitist strategy combined with neighborhood search is used after each iteration to improve the quality of the results. Experiments in a classical case show that the algorithm is effective and feasible for solving flexible job shop scheduling problems.
{"title":"An Improved Hybrid Quantum Particle Swarm Optimization Algorithm for FJSP","authors":"Qiwen Zhang, Songqi Hu","doi":"10.1145/3318299.3318359","DOIUrl":"https://doi.org/10.1145/3318299.3318359","url":null,"abstract":"Aiming at minimizing makespan (the end time of the final machine) in flexible job shop scheduling problems (FJSP), a hybrid quantum behaved particle swarm optimization algorithm based on Lévy flights is proposed in this paper. Firstly, the algorithm uses the quantum probability amplitude coding method to establish a relationship between the process sequence and the particle position to solve job process sequencing sub-problem. Then uses the global selection, local selection and probability random selection to select the machine for each process. Finally, the Lévy flights is used to improve variant mode and enhance the effect of variation, the elitist strategy combined with neighborhood search is used after each iteration to improve the quality of the results. Experiments in a classical case show that the algorithm is effective and feasible for solving flexible job shop scheduling problems.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"89 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115983899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to a popularization of SNS and increase of web pages, many documents can be obtained from the internet. However, it is difficult to process a huge set of document data manually. Therefore, various classification methods based on machine learning have been proposed. In this paper, a classification method which can visualize the relationship among the documents using Word2Vec and Spherical SOM is proposed, and the performance is examined in experiments of visualization and numerical evaluation of classification accuracy.
{"title":"The Classification of the Documents Based on Word Embedding and 2-layer Spherical Self Organizing Maps","authors":"Koki Yoshioka, H. Dozono","doi":"10.1145/3318299.3318378","DOIUrl":"https://doi.org/10.1145/3318299.3318378","url":null,"abstract":"Due to a popularization of SNS and increase of web pages, many documents can be obtained from the internet. However, it is difficult to process a huge set of document data manually. Therefore, various classification methods based on machine learning have been proposed. In this paper, a classification method which can visualize the relationship among the documents using Word2Vec and Spherical SOM is proposed, and the performance is examined in experiments of visualization and numerical evaluation of classification accuracy.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recurrent neural networks (RNNs) are widely studied in recent years, since RNNs are capable of modeling the significant nonlinear dynamical systems. Echo state network (ESN) is a novel type of RNN with an interconnected reservoir to model temporal dynamics of complex sequential information. In this paper, a novel ESN structure is developed and employed to conduct fault prognosis. Fault prognosis is vital in predictive maintenance, which is a prevalent research area that mainly concentrates on predicting the remaining useful life of a machine and reducing the machine's downtime. Attention model is integrated to a typical ESN and thus different importance levels of different input elements can be adaptively treated. To further enhance the generalization of the prediction model, genetic algorithm is applied to adaptively optimize the parameters of the attention-based ESN. The proposed prognostic approach is verified on the NASA's turbofan benchmark dataset. Experimental results show that the attention-based ESN can not only achieve superior prediction accuracy but also obtain substantial improvement on stability.
{"title":"Attention Based Echo State Network: A Novel Approach for Fault Prognosis","authors":"Chongdang Liu, Rong Yao, Linxuan Zhang, Yuan Liao","doi":"10.1145/3318299.3318325","DOIUrl":"https://doi.org/10.1145/3318299.3318325","url":null,"abstract":"Recurrent neural networks (RNNs) are widely studied in recent years, since RNNs are capable of modeling the significant nonlinear dynamical systems. Echo state network (ESN) is a novel type of RNN with an interconnected reservoir to model temporal dynamics of complex sequential information. In this paper, a novel ESN structure is developed and employed to conduct fault prognosis. Fault prognosis is vital in predictive maintenance, which is a prevalent research area that mainly concentrates on predicting the remaining useful life of a machine and reducing the machine's downtime. Attention model is integrated to a typical ESN and thus different importance levels of different input elements can be adaptively treated. To further enhance the generalization of the prediction model, genetic algorithm is applied to adaptively optimize the parameters of the attention-based ESN. The proposed prognostic approach is verified on the NASA's turbofan benchmark dataset. Experimental results show that the attention-based ESN can not only achieve superior prediction accuracy but also obtain substantial improvement on stability.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a low-dimensional vector representation method for the concepts and instances of an ontology. The main idea is to transform the ontological entities into digestible data for machine learning and deep learning algorithms that only use digital inputs. The generated vectors will represent the semantics contained in the source ontology. We use the semantic relationships connecting the concepts as a landmark to train expert neural networks using the noise contrastive estimation technique to project them into a vector space specific to this relationship with weightings dependent on their frequency. The resulting vectors are then combined and fed into an autoencoder to generate a denser representation. The generated representation vectors can be used to find the semantically similar ontology entities, allowing creating a semantic network automatically. Thus, semantically similar ontology entities will have relatively close corresponding vector representations in the projection space.
{"title":"An Ontology Embedding Approach Based on Multiple Neural Networks","authors":"Achref Benarab, Fahad Rafique, Jianguo Sun","doi":"10.1145/3318299.3318365","DOIUrl":"https://doi.org/10.1145/3318299.3318365","url":null,"abstract":"In this paper, we present a low-dimensional vector representation method for the concepts and instances of an ontology. The main idea is to transform the ontological entities into digestible data for machine learning and deep learning algorithms that only use digital inputs. The generated vectors will represent the semantics contained in the source ontology. We use the semantic relationships connecting the concepts as a landmark to train expert neural networks using the noise contrastive estimation technique to project them into a vector space specific to this relationship with weightings dependent on their frequency. The resulting vectors are then combined and fed into an autoencoder to generate a denser representation. The generated representation vectors can be used to find the semantically similar ontology entities, allowing creating a semantic network automatically. Thus, semantically similar ontology entities will have relatively close corresponding vector representations in the projection space.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132884433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Green travel, low-carbon travel, harmonious and livable have become the main objectives of urban development. Public transport-oriented urban development mode can effectively alleviate traffic congestion, reduce energy consumption, reduce environmental pollution. Considering the influence of spatial-temporal heterogeneity on the choice of urban residents' travel modes, a cross-classification selection model is constructed based on hierarchical modeling theory to capture the spatial-temporal heterogeneity quantitatively. Bayesian estimation method is selected to estimate the model parameters, and then the influencing factors of urban residents' travel mode choice behavior are revealed. Combining with typical cases, this paper compares and analyzes the differences between the results of the model analysis under the two scenarios of neglecting spatial-temporal heterogeneity and considering spatial-temporal heterogeneity, so as to provide a scientific basis for public transport-oriented urban planning.
{"title":"Analysis Method of Travel Mode Choice of Urban Residents Based on Spatial-temporal Heterogeneity","authors":"K. Zhou, Xiao Peng, Zhong Guo","doi":"10.1145/3318299.3318333","DOIUrl":"https://doi.org/10.1145/3318299.3318333","url":null,"abstract":"Green travel, low-carbon travel, harmonious and livable have become the main objectives of urban development. Public transport-oriented urban development mode can effectively alleviate traffic congestion, reduce energy consumption, reduce environmental pollution. Considering the influence of spatial-temporal heterogeneity on the choice of urban residents' travel modes, a cross-classification selection model is constructed based on hierarchical modeling theory to capture the spatial-temporal heterogeneity quantitatively. Bayesian estimation method is selected to estimate the model parameters, and then the influencing factors of urban residents' travel mode choice behavior are revealed. Combining with typical cases, this paper compares and analyzes the differences between the results of the model analysis under the two scenarios of neglecting spatial-temporal heterogeneity and considering spatial-temporal heterogeneity, so as to provide a scientific basis for public transport-oriented urban planning.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134643297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}