In this paper, we present a prognosis architecture that allows the computation of the Remaining Useful Life (RUL) of a failing process. A process subject to an incipient fault experiments slowly developing degradation. Sensor measurements and Condition Monitoring (CM) data extracted from the system allow to follow up the process drift. The prognosis architecture we propose makes use of a dynamical clustering algorithm to model the data in a feature space. This algorithm uses a sliding window scheme on which the model is iteratively updated. Metrics applied on the parameters of this model are used to compute a drift severity indicator, which is also an indicator of the health of the system. The architecture for prognosis is applied on a benchmark of wind turbine. The used benchmark has been constructed to serve as a realistic wind turbine model. It was used in the context of a global scale fault diagnosis and fault tolerant control competition. The benchmark also proposed a drifting fault scenario that we used to test our approach.
{"title":"Prognosis Based on Handling Drifts in Dynamical Environments: Application to a Wind Turbine Benchmark","authors":"Antoine Chammas, E. Duviella, S. Lecoeuche","doi":"10.1109/ICMLA.2012.131","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.131","url":null,"abstract":"In this paper, we present a prognosis architecture that allows the computation of the Remaining Useful Life (RUL) of a failing process. A process subject to an incipient fault experiments slowly developing degradation. Sensor measurements and Condition Monitoring (CM) data extracted from the system allow to follow up the process drift. The prognosis architecture we propose makes use of a dynamical clustering algorithm to model the data in a feature space. This algorithm uses a sliding window scheme on which the model is iteratively updated. Metrics applied on the parameters of this model are used to compute a drift severity indicator, which is also an indicator of the health of the system. The architecture for prognosis is applied on a benchmark of wind turbine. The used benchmark has been constructed to serve as a realistic wind turbine model. It was used in the context of a global scale fault diagnosis and fault tolerant control competition. The benchmark also proposed a drifting fault scenario that we used to test our approach.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128094781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We tackle the problem of answering maximum probabilistic top-k tuple set queries. We use a sliding-window model on uncertain data streams and present an efficient algorithm for processing sliding-window queries on uncertain streams. In each sliding window, the algorithm selects the k tuples with the highest probabilities from sets of different numbers of the tuples with the highest scores. Then, the algorithm computes existential probability of the top-k tuples, and chooses the set with the highest probability as the top-k query result. We theoretically prove the correctness of the algorithm. Our experimental results show that our algorithm requires lower time and space complexity than other existing algorithms.
{"title":"An Efficient Algorithm for top-k Queries on Uncertain Data Streams","authors":"Caiyan Dai, Ling Chen, Yixin Chen, Keming Tang","doi":"10.1109/ICMLA.2012.57","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.57","url":null,"abstract":"We tackle the problem of answering maximum probabilistic top-k tuple set queries. We use a sliding-window model on uncertain data streams and present an efficient algorithm for processing sliding-window queries on uncertain streams. In each sliding window, the algorithm selects the k tuples with the highest probabilities from sets of different numbers of the tuples with the highest scores. Then, the algorithm computes existential probability of the top-k tuples, and chooses the set with the highest probability as the top-k query result. We theoretically prove the correctness of the algorithm. Our experimental results show that our algorithm requires lower time and space complexity than other existing algorithms.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133318543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Atlas selection plays an important role in multiatlas based image segmentation. In atlas selection methods, manifold learning based techniques have recently emerged as very promisingly. However, due to the complexity of anatomical structures in raw images, it is difficult to get accurate atlas selection results by measuring only the distance between raw images on the manifolds. In this paper, we tackle this problem by proposing a label image constrained atlas selection (LICAS) method to exploit the shape and size information of the regions to be segmented from the label images. Constrained by the label images, a new manifold projection method is developed to help uncover the intrinsic similarity between the regions of interest across images. Compared with other existing methods, the experimental results of segmentation on 60 Magnetic Resonance (MR) images showed that the selected atlases are closer to the target structure and more accurate segmentation can be obtained by using the proposed method.
{"title":"Multi-atlas Based Image Selection with Label Image Constraint","authors":"Yihui Cao, Xuelong Li, Pingkun Yan","doi":"10.1109/ICMLA.2012.232","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.232","url":null,"abstract":"Atlas selection plays an important role in multiatlas based image segmentation. In atlas selection methods, manifold learning based techniques have recently emerged as very promisingly. However, due to the complexity of anatomical structures in raw images, it is difficult to get accurate atlas selection results by measuring only the distance between raw images on the manifolds. In this paper, we tackle this problem by proposing a label image constrained atlas selection (LICAS) method to exploit the shape and size information of the regions to be segmented from the label images. Constrained by the label images, a new manifold projection method is developed to help uncover the intrinsic similarity between the regions of interest across images. Compared with other existing methods, the experimental results of segmentation on 60 Magnetic Resonance (MR) images showed that the selected atlases are closer to the target structure and more accurate segmentation can be obtained by using the proposed method.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122631087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent approaches in image super-resolution suggest learning dictionary pairs to model the relationship between low-resolution and high-resolution image patches with sparsity constraints on the patch representation. Most of the previous approaches in this direction assume for simplicity that the sparse codes for a low-resolution patch are equal to those of the corresponding high-resolution patch. However, this invariance assumption is not quite accurate especially for large scaling factors where the optimal weights and indices of representative features are not fixed across the scaling transformation. In this paper, we propose an augmented coupled dictionary learning scheme that compensates for the inaccuracy of the invariance assumption. First, we learn a dictionary for the low-resolution image space. Then, we compute an augmented dictionary in the high-resolution image space where novel augmented dictionary atoms are inferred from the training error of the low-resolution dictionary. For a low-resolution test image, the sparse codes of the low-resolution patches and the lowresolution dictionary training error are combined with the trained high-resolution dictionary to produce a high-resolution image. Our experimental results compare favourably with the non-augmented scheme.
{"title":"Augmented Coupled Dictionary Learning for Image Super-Resolution","authors":"M. Rushdi, J. Ho","doi":"10.1109/ICMLA.2012.52","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.52","url":null,"abstract":"Recent approaches in image super-resolution suggest learning dictionary pairs to model the relationship between low-resolution and high-resolution image patches with sparsity constraints on the patch representation. Most of the previous approaches in this direction assume for simplicity that the sparse codes for a low-resolution patch are equal to those of the corresponding high-resolution patch. However, this invariance assumption is not quite accurate especially for large scaling factors where the optimal weights and indices of representative features are not fixed across the scaling transformation. In this paper, we propose an augmented coupled dictionary learning scheme that compensates for the inaccuracy of the invariance assumption. First, we learn a dictionary for the low-resolution image space. Then, we compute an augmented dictionary in the high-resolution image space where novel augmented dictionary atoms are inferred from the training error of the low-resolution dictionary. For a low-resolution test image, the sparse codes of the low-resolution patches and the lowresolution dictionary training error are combined with the trained high-resolution dictionary to produce a high-resolution image. Our experimental results compare favourably with the non-augmented scheme.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123017726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a framework which enables medical decision making in the presence of partial information. At its core is ontology-based automated reasoning, machine learning techniques are integrated to enhance existing patient datasets in order to address the issue of missing data. Our approach supports interoperability between different health information systems. This is clarified in a sample implementation that combines three separate datasets (patient data, drug-drug interactions and drug prescription rules) to demonstrate the effectiveness of our algorithms in producing effective medical decisions. In short, we demonstrate the potential for machine learning to support a task where there is a critical need from medical professionals by coping with missing or noisy patient data and enabling the use of multiple medical datasets.
{"title":"Integrating Machine Learning Into a Medical Decision Support System to Address the Problem of Missing Patient Data","authors":"Atif Khan, J. Doucette, R. Cohen, D. Lizotte","doi":"10.1109/ICMLA.2012.82","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.82","url":null,"abstract":"In this paper, we present a framework which enables medical decision making in the presence of partial information. At its core is ontology-based automated reasoning, machine learning techniques are integrated to enhance existing patient datasets in order to address the issue of missing data. Our approach supports interoperability between different health information systems. This is clarified in a sample implementation that combines three separate datasets (patient data, drug-drug interactions and drug prescription rules) to demonstrate the effectiveness of our algorithms in producing effective medical decisions. In short, we demonstrate the potential for machine learning to support a task where there is a critical need from medical professionals by coping with missing or noisy patient data and enabling the use of multiple medical datasets.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127940325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano
Ensemble feature selection has recently become a topic of interest for researchers, especially in the area of bioinformatics. The benefits of ensemble feature selection include increased feature (gene) subset stability and usefulness as well as comparable (or better) classification performance compared to using a single feature selection method. However, existing work on ensemble feature selection has concentrated on data diversity (using a single feature selection method on multiple datasets or sampled data from a single dataset), neglecting two other potential sources of diversity. We present these two new approaches for gene selection, functional diversity (using multiple feature selection technique on a single dataset) and hybrid (a combination of data and functional diversity). To demonstrate the value of these new approaches, we measure the similarity between the feature subsets created by each of the three approaches across twenty-six datasets and ten feature selection techniques (or an ensemble of these techniques as appropriate). We also compare the classification performance of models built using each of the three ensembles. Our results show that the similarity between the functional diversity and hybrid approaches is much higher than the similarity between either of those and data diversity, with the distinction between data diversity and our new approaches being particularly strong for hard-to-learn datasets. In addition to having the highest similarity, functional and hybrid diversity generally show greater classification performance than data diversity, especially when selecting small feature subsets. These results demonstrate that these new approaches can both provide a different feature subset than the existing approach and that the resulting novel feature subset is potentially of interest to researchers. To our knowledge there has been no study which explores these new approaches to ensemble feature selection within the domain of bioinformatics.
{"title":"Comparing Two New Gene Selection Ensemble Approaches with the Commonly-Used Approach","authors":"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/ICMLA.2012.175","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.175","url":null,"abstract":"Ensemble feature selection has recently become a topic of interest for researchers, especially in the area of bioinformatics. The benefits of ensemble feature selection include increased feature (gene) subset stability and usefulness as well as comparable (or better) classification performance compared to using a single feature selection method. However, existing work on ensemble feature selection has concentrated on data diversity (using a single feature selection method on multiple datasets or sampled data from a single dataset), neglecting two other potential sources of diversity. We present these two new approaches for gene selection, functional diversity (using multiple feature selection technique on a single dataset) and hybrid (a combination of data and functional diversity). To demonstrate the value of these new approaches, we measure the similarity between the feature subsets created by each of the three approaches across twenty-six datasets and ten feature selection techniques (or an ensemble of these techniques as appropriate). We also compare the classification performance of models built using each of the three ensembles. Our results show that the similarity between the functional diversity and hybrid approaches is much higher than the similarity between either of those and data diversity, with the distinction between data diversity and our new approaches being particularly strong for hard-to-learn datasets. In addition to having the highest similarity, functional and hybrid diversity generally show greater classification performance than data diversity, especially when selecting small feature subsets. These results demonstrate that these new approaches can both provide a different feature subset than the existing approach and that the resulting novel feature subset is potentially of interest to researchers. To our knowledge there has been no study which explores these new approaches to ensemble feature selection within the domain of bioinformatics.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129193549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a novel Artificial Neural Network (ANN) to predict software effort from use case diagrams based on the Use Case Point (UCP) model. The inputs of this model are software size, productivity and complexity, while the output is the predicted software effort. A multiple linear regression model with three independent variables (same inputs of the ANN) and one dependent variable (effort) is also introduced. Our data repository contains 240 data points in which, 214 are industrial and 26 are educational projects. Both the regression and ANN models were trained using 168 data points and tested using 72 data points. The ANN model was evaluated using the MMER and PRED criteria against the regression model, as well as the UCP model that estimates effort from use cases. Results show that the ANN model is a competitive model with respect to other regression models and can be used as an alternative to predict software effort based on the UCP method.
{"title":"Estimating Software Effort Using an ANN Model Based on Use Case Points","authors":"A. B. Nassif, Luiz Fernando Capretz, D. Ho","doi":"10.1109/ICMLA.2012.138","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.138","url":null,"abstract":"In this paper, we propose a novel Artificial Neural Network (ANN) to predict software effort from use case diagrams based on the Use Case Point (UCP) model. The inputs of this model are software size, productivity and complexity, while the output is the predicted software effort. A multiple linear regression model with three independent variables (same inputs of the ANN) and one dependent variable (effort) is also introduced. Our data repository contains 240 data points in which, 214 are industrial and 26 are educational projects. Both the regression and ANN models were trained using 168 data points and tested using 72 data points. The ANN model was evaluated using the MMER and PRED criteria against the regression model, as well as the UCP model that estimates effort from use cases. Results show that the ANN model is a competitive model with respect to other regression models and can be used as an alternative to predict software effort based on the UCP method.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper a previously proposed method of choosing auxiliary fitness functions is applied to adaptive selection of helper-objectives. Helper-objectives are used in evolutionary computation to enhance the optimization of the primary objective. The method based on choosing between objectives of a single-objective evolutionary algorithm with reinforcement learning is briefly described. It is tested on a model problem. From the results of the experiment, it can be concluded that the method allows to automatically select the most effective helper-objectives and ignore the ineffective ones. It is also shown that the proposed method outperforms multi-objective evolutionary algorithms, that were used with helper-objectives originally.
{"title":"Adaptive Selection of Helper-Objectives with Reinforcement Learning","authors":"Arina Buzdalova, M. Buzdalov","doi":"10.1109/ICMLA.2012.159","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.159","url":null,"abstract":"In this paper a previously proposed method of choosing auxiliary fitness functions is applied to adaptive selection of helper-objectives. Helper-objectives are used in evolutionary computation to enhance the optimization of the primary objective. The method based on choosing between objectives of a single-objective evolutionary algorithm with reinforcement learning is briefly described. It is tested on a model problem. From the results of the experiment, it can be concluded that the method allows to automatically select the most effective helper-objectives and ignore the ineffective ones. It is also shown that the proposed method outperforms multi-objective evolutionary algorithms, that were used with helper-objectives originally.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prediction of O-linked glycosylation sites in proteins is a challenging problem. In this paper, we introduced a new method to predict glycosylation sites in proteins. First, we built a Markov random field (MRF) to represent the sequence position relationship and model the underlying distribution of glycosylation sites. We then considered glycosylation site prediction as a class imbalance problem and employed the AdaBoost algorithm to improve the predictive performance of the classifier. We applied our method to two types of proteins: the transmembrane (TM) proteins and the non-transmembrane (non-TM) proteins. We showed that for both datasets, our methods outperform existing methods. We also showed that the performance of the system was improved significantly with the help of AdaBoost.
{"title":"O-linked Glycosylation Site Prediction Using Ensemble of Graphical Models","authors":"A. Sriram, Feng Luo","doi":"10.1109/ICMLA.2012.210","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.210","url":null,"abstract":"Prediction of O-linked glycosylation sites in proteins is a challenging problem. In this paper, we introduced a new method to predict glycosylation sites in proteins. First, we built a Markov random field (MRF) to represent the sequence position relationship and model the underlying distribution of glycosylation sites. We then considered glycosylation site prediction as a class imbalance problem and employed the AdaBoost algorithm to improve the predictive performance of the classifier. We applied our method to two types of proteins: the transmembrane (TM) proteins and the non-transmembrane (non-TM) proteins. We showed that for both datasets, our methods outperform existing methods. We also showed that the performance of the system was improved significantly with the help of AdaBoost.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of our data-mining multi-agent system is to facilitate data-mining experiments without the necessary knowledge of the most suitable machine learning method and its parameters to the data. In order to replace the experts knowledge, the meta-learning subsystems are proposed including the parameter-space search and method recommendation based on previous experiments. In this paper we show the results of the parameter-space search with several search algorithms - tabulation, random search, simmulated annealing, and genetic algorithm.
{"title":"Combining Parameter Space Search and Meta-learning for Data-Dependent Computational Agent Recommendation","authors":"O. Kazík, K. Pesková, M. Pilát, Roman Neruda","doi":"10.1109/ICMLA.2012.137","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.137","url":null,"abstract":"The goal of our data-mining multi-agent system is to facilitate data-mining experiments without the necessary knowledge of the most suitable machine learning method and its parameters to the data. In order to replace the experts knowledge, the meta-learning subsystems are proposed including the parameter-space search and method recommendation based on previous experiments. In this paper we show the results of the parameter-space search with several search algorithms - tabulation, random search, simmulated annealing, and genetic algorithm.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116145254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}