Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477990
I. Strumberger, Eva Tuba, N. Bačanin, M. Beko, M. Tuba
In this paper we present bare bones fireworks algorithm implemented and adjusted for solving radio frequency identification (RFID) network planning problem. Bare bones fireworks algorithm is new and simplified version of the fireworks metaheuristic. This approach for the RFID network planning problem was not implemented before according to the literature survey. RFID network planning problem is a well known hard optimization problem and it poses one of the most fundamental challenges in the process of deployment of the RFID network. We tested bare bones fireworks algorithm on one problem model found in the literature and performed comparative analysis with approaches tested on the same problem formulation. We also performed additional set of experiments where the number of readers is considered as the algorithm's parameter. Results obtained from empirical tests prove the robustness and efficiency of the bare bones fireworks metaheuristic for tackling the RFID network planning problem and categorize this new version of the fireworks algorithm as state-of-the-art method for dealing with NP-hard tasks.
{"title":"Bare Bones Fireworks Algorithm for the RFID Network Planning Problem","authors":"I. Strumberger, Eva Tuba, N. Bačanin, M. Beko, M. Tuba","doi":"10.1109/CEC.2018.8477990","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477990","url":null,"abstract":"In this paper we present bare bones fireworks algorithm implemented and adjusted for solving radio frequency identification (RFID) network planning problem. Bare bones fireworks algorithm is new and simplified version of the fireworks metaheuristic. This approach for the RFID network planning problem was not implemented before according to the literature survey. RFID network planning problem is a well known hard optimization problem and it poses one of the most fundamental challenges in the process of deployment of the RFID network. We tested bare bones fireworks algorithm on one problem model found in the literature and performed comparative analysis with approaches tested on the same problem formulation. We also performed additional set of experiments where the number of readers is considered as the algorithm's parameter. Results obtained from empirical tests prove the robustness and efficiency of the bare bones fireworks metaheuristic for tackling the RFID network planning problem and categorize this new version of the fireworks algorithm as state-of-the-art method for dealing with NP-hard tasks.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133275175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477724
S. A. Fernandez, D. Fantinato, J. Filho, R. Attux, Daniel G. Silva
Blind inversion of nonlinear systems is a complex task that requires some sort of prior information about the source e.g. whether it is composed of independent samples or, particularly in this work, a dependence “signature” which is assumed to be known via the autocorrentropy function. Furthermore, it involves the solution of a nonlinear, multimodal optimization problem to determine the parameters of the inverse model. Thus, we propose a blind method for Wiener systems inversion, which is composed of a correntropy-based criterion in association to the well-known CLONALG immune-inspired optimization metaheuristic. The empirical results validate the methodology for continuous and discrete signals.
{"title":"Immune-Inspired Optimization with Autocorrentropy Function for Blind Inversion of Wiener Systems","authors":"S. A. Fernandez, D. Fantinato, J. Filho, R. Attux, Daniel G. Silva","doi":"10.1109/CEC.2018.8477724","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477724","url":null,"abstract":"Blind inversion of nonlinear systems is a complex task that requires some sort of prior information about the source e.g. whether it is composed of independent samples or, particularly in this work, a dependence “signature” which is assumed to be known via the autocorrentropy function. Furthermore, it involves the solution of a nonlinear, multimodal optimization problem to determine the parameters of the inverse model. Thus, we propose a blind method for Wiener systems inversion, which is composed of a correntropy-based criterion in association to the well-known CLONALG immune-inspired optimization metaheuristic. The empirical results validate the methodology for continuous and discrete signals.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122492495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477871
David Lynch, David Fagan, S. Kucera, H. Claussen, M. O’Neill
Small Cells are being deployed alongside pre-existing Macro Cells in order to satisfy demand during the current era of exponential growth in mobile traffic. Heterogeneous networks are economical because both cell tiers share the same scarce and expensive spectrum. However, customers at cell edges experience severe cross-tier interference in channel sharing Het-Nets, resulting in poor service quality. Techniques for improving fairness globally have been developed in previous works. In this paper, a novel method for service differentiation at the level of individual customers is proposed. The proposed algorithm redistributes spectrum on a millisecond timescale, so that premium customers experience minimum downlink rates exceeding a target threshold. System level simulations indicate that downlink rate targets of at least 1 [Mbps] are always satisfied under the proposed scheme. By contrast, naive scheduling achieves the 1 [Mbps] target only 83% of the time. Quality of service can be improved for premium customers without significantly impacting global fairness metrics. Flexible service differentiation will be key to effectively monetizing the next generation of 5G wireless communications networks.
{"title":"Managing Quality of Service Through Intelligent Scheduling in Heterogeneous Wireless Communications Networks","authors":"David Lynch, David Fagan, S. Kucera, H. Claussen, M. O’Neill","doi":"10.1109/CEC.2018.8477871","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477871","url":null,"abstract":"Small Cells are being deployed alongside pre-existing Macro Cells in order to satisfy demand during the current era of exponential growth in mobile traffic. Heterogeneous networks are economical because both cell tiers share the same scarce and expensive spectrum. However, customers at cell edges experience severe cross-tier interference in channel sharing Het-Nets, resulting in poor service quality. Techniques for improving fairness globally have been developed in previous works. In this paper, a novel method for service differentiation at the level of individual customers is proposed. The proposed algorithm redistributes spectrum on a millisecond timescale, so that premium customers experience minimum downlink rates exceeding a target threshold. System level simulations indicate that downlink rate targets of at least 1 [Mbps] are always satisfied under the proposed scheme. By contrast, naive scheduling achieves the 1 [Mbps] target only 83% of the time. Quality of service can be improved for premium customers without significantly impacting global fairness metrics. Flexible service differentiation will be key to effectively monetizing the next generation of 5G wireless communications networks.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"410 27","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120892151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477799
H. Shishido, J. C. Estrella, C. Toledo
Cloud computing provides infrastructure for executing workflows that require high processing and storage capacity. Although there are several algorithms for scheduling workflows, few consider security criterion. Algorithms that cover security usually optimize either cost or makespan. However, there are cases where the user would like to choose or evaluate among different solutions that present a trade-off between monetary cost and execution time (makespan) of the workflow. The selection of the tasks, which involve confidential/sensitive data, has to prioritize the safe execution of the workflow. In this paper, we propose a multi-objective optimization for scheduling of workflow tasks in cloud environments by considering cost and makespan under different task selection policies. Extensive experiments in real-world workflows with different policies show that our approach returns several solutions in the Pareto frontier for both cost and makespan. The results revealed a reasonable ability to find Pareto frontiers during the optimization process.
{"title":"Multi-Objective Optimization for Workflow Scheduling Under Task Selection Policies in Clouds","authors":"H. Shishido, J. C. Estrella, C. Toledo","doi":"10.1109/CEC.2018.8477799","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477799","url":null,"abstract":"Cloud computing provides infrastructure for executing workflows that require high processing and storage capacity. Although there are several algorithms for scheduling workflows, few consider security criterion. Algorithms that cover security usually optimize either cost or makespan. However, there are cases where the user would like to choose or evaluate among different solutions that present a trade-off between monetary cost and execution time (makespan) of the workflow. The selection of the tasks, which involve confidential/sensitive data, has to prioritize the safe execution of the workflow. In this paper, we propose a multi-objective optimization for scheduling of workflow tasks in cloud environments by considering cost and makespan under different task selection policies. Extensive experiments in real-world workflows with different policies show that our approach returns several solutions in the Pareto frontier for both cost and makespan. The results revealed a reasonable ability to find Pareto frontiers during the optimization process.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114783298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477819
Victor Cunha, Luciana S. Pessoa, M. Vellasco, R. Tanscheit, M. Pacheco
The occurrence of a disaster brings about damages, destruction, ecological disruption, loss of human life, human suffering, deterioration of health and health service of sufficient magnitude to require external assistance, demanding the mobilization and deployment of emergency rescue units within the affected area, in order to reduce casualties and economic losses. The scheduling of those units is one of the key issues in the emergency response phase and can be seen as a generalization of the unrelated parallel machine scheduling problem with sequence and machine dependent setup. The objective is to minimize the total weighted completion time of the incidents to be attended, where the weight correspond to its severity level. We propose a biased random-key genetic algorithm to tackle this problem, considering fuzzy required processing times for the incidents, and compare the solutions with those generated by a constructive heuristic, from the literature, developed to deal with this problem. Our results show that the genetic algorithm's solutions are 2.17% better than those obtained with the constructive heuristic when applied to instances with up to 40 incidents and 40 rescue units.
{"title":"A Biased Random-Key Genetic Algorithm for the Rescue Unit Allocation and Scheduling Problem","authors":"Victor Cunha, Luciana S. Pessoa, M. Vellasco, R. Tanscheit, M. Pacheco","doi":"10.1109/CEC.2018.8477819","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477819","url":null,"abstract":"The occurrence of a disaster brings about damages, destruction, ecological disruption, loss of human life, human suffering, deterioration of health and health service of sufficient magnitude to require external assistance, demanding the mobilization and deployment of emergency rescue units within the affected area, in order to reduce casualties and economic losses. The scheduling of those units is one of the key issues in the emergency response phase and can be seen as a generalization of the unrelated parallel machine scheduling problem with sequence and machine dependent setup. The objective is to minimize the total weighted completion time of the incidents to be attended, where the weight correspond to its severity level. We propose a biased random-key genetic algorithm to tackle this problem, considering fuzzy required processing times for the incidents, and compare the solutions with those generated by a constructive heuristic, from the literature, developed to deal with this problem. Our results show that the genetic algorithm's solutions are 2.17% better than those obtained with the constructive heuristic when applied to instances with up to 40 incidents and 40 rescue units.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114987743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477697
Junkun Zhong, Yuping Wang, Hui Du, Wuning Tong
In view of the shortcomings that many clustering algorithms such as K-means clustering algorithm are not suitable for the non-convex dataset and the Affinity Propagation (AP) algorithm may cluster two adjacent different class points into one class, we proposed a new clustering algorithm by using boundary information. The idea of the proposed algorithm in this paper is as follows: First, use the number of points contained in each point's neighborhood as its density, and consider the points whose density are below the average density as boundary points. Then, count the number of boundary points. If the number of boundary points is larger than a given threshold then clustering is carried out by transfer ideas directly, otherwise boundary points will be regarded as the cluster boundary wall. When the boundary points are encountered in the transitive clustering process, the transfer stopped and selected an unprocessed non-boundary point to start clustering process as above again until all non-boundary points are processed, so as to effectively prevent clustering two adjacent different class points into one class. Because of the clustering of transfer idea, the proposed algorithm is applicable to nonconvex datasets, and different clustering schemes are adopted according to the number of boundary points which increases the applicability of the algorithm. Experimental results on synthetic datasets and standard datasets show that the algorithm proposed in this paper is efficient.
{"title":"A New Clustering Algorithm by Using Boundary Information","authors":"Junkun Zhong, Yuping Wang, Hui Du, Wuning Tong","doi":"10.1109/CEC.2018.8477697","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477697","url":null,"abstract":"In view of the shortcomings that many clustering algorithms such as K-means clustering algorithm are not suitable for the non-convex dataset and the Affinity Propagation (AP) algorithm may cluster two adjacent different class points into one class, we proposed a new clustering algorithm by using boundary information. The idea of the proposed algorithm in this paper is as follows: First, use the number of points contained in each point's neighborhood as its density, and consider the points whose density are below the average density as boundary points. Then, count the number of boundary points. If the number of boundary points is larger than a given threshold then clustering is carried out by transfer ideas directly, otherwise boundary points will be regarded as the cluster boundary wall. When the boundary points are encountered in the transitive clustering process, the transfer stopped and selected an unprocessed non-boundary point to start clustering process as above again until all non-boundary points are processed, so as to effectively prevent clustering two adjacent different class points into one class. Because of the clustering of transfer idea, the proposed algorithm is applicable to nonconvex datasets, and different clustering schemes are adopted according to the number of boundary points which increases the applicability of the algorithm. Experimental results on synthetic datasets and standard datasets show that the algorithm proposed in this paper is efficient.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"32 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123227320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477934
A. G. Bari, Alessio Gaspar, R. P. Wiegand, Anthony Bucci
The Population-based Pareto Hill Climber (P-PHC) algorithm exemplifies coevolutionary computation approaches that manage a group of candidate solutions both used as a population to explore the underlying search space as well as an archive preserving solutions that meet the adopted solution concept. In some circumstances when parsimonious evaluations are desired, inefficiencies can arise from using the same group of candidate solutions for both purposes. The reliance, in such algorithms, on the otherwise beneficial Pareto dominance concept can create bottlenecks on search progress as most newly generated solutions are non-dominated, and thus appear equally qualified to selection, when compared to the current ones they should eventually replace. We propose new selection conditions that include both Pareto dominated and Pareto non-dominated solutions, as well as other factors to help provide distinctions for selection. The potential benefits of also considering Pareto non-dominated solutions are illustrated by a visualization of the underlying interaction space in terms of levels. In addition, we define some new performance metrics that allow one to compare our various selection methods in terms of ideal evaluation of coevolution. Fewer duplicate solutions are retained in the final generation, thus allowing for more efficient usage of the fixed population size.
基于种群的Pareto Hill Climber (P-PHC)算法举例说明了协同进化计算方法,该方法管理一组候选解决方案,这些解决方案既用作探索底层搜索空间的种群,又用于满足所采用的解决方案概念的存档保存解决方案。在某些情况下,当需要进行简洁的评估时,为两个目的使用同一组候选解决方案可能会导致效率低下。在这样的算法中,对帕累托支配概念的依赖可能会对搜索过程造成瓶颈,因为大多数新生成的解决方案都是非支配的,因此与它们最终应该取代的当前解决方案相比,它们似乎同样适合选择。我们提出了新的选择条件,包括帕累托支配和帕累托非支配的解决方案,以及其他因素,以帮助提供选择的区别。考虑帕累托非主导解决方案的潜在好处,可以通过层次的潜在交互空间的可视化来说明。此外,我们定义了一些新的性能指标,允许人们根据共同进化的理想评估来比较我们的各种选择方法。在最后一代中保留较少的重复解决方案,从而允许更有效地使用固定的种群大小。
{"title":"Selection Methods to Relax Strict Acceptance Condition in Test-Based Coevolution","authors":"A. G. Bari, Alessio Gaspar, R. P. Wiegand, Anthony Bucci","doi":"10.1109/CEC.2018.8477934","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477934","url":null,"abstract":"The Population-based Pareto Hill Climber (P-PHC) algorithm exemplifies coevolutionary computation approaches that manage a group of candidate solutions both used as a population to explore the underlying search space as well as an archive preserving solutions that meet the adopted solution concept. In some circumstances when parsimonious evaluations are desired, inefficiencies can arise from using the same group of candidate solutions for both purposes. The reliance, in such algorithms, on the otherwise beneficial Pareto dominance concept can create bottlenecks on search progress as most newly generated solutions are non-dominated, and thus appear equally qualified to selection, when compared to the current ones they should eventually replace. We propose new selection conditions that include both Pareto dominated and Pareto non-dominated solutions, as well as other factors to help provide distinctions for selection. The potential benefits of also considering Pareto non-dominated solutions are illustrated by a visualization of the underlying interaction space in terms of levels. In addition, we define some new performance metrics that allow one to compare our various selection methods in terms of ideal evaluation of coevolution. Fewer duplicate solutions are retained in the final generation, thus allowing for more efficient usage of the fixed population size.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"28 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123418045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477730
Ye Tian, Xiaoshu Xiang, Xing-yi Zhang, Ran Cheng, Yaochu Jin
The effectiveness of evolutionary algorithms have been verified on multi-objective optimization, and a large number of multi-objective evolutionary algorithms have been proposed during the last two decades. To quantitatively compare the performance of different algorithms, a set of uniformly distributed reference points sampled on the Pareto fronts of benchmark problems are needed in the calculation of many performance metrics. However, not much work has been done to investigate the method for sampling reference points on Pareto fronts, even though it is not an easy task for many Pareto fronts with irregular shapes. More recently, an evolutionary multi-objective optimization platform was proposed by us, called PlatEMO, which can automatically generate reference points on each Pareto front and use them to calculate the performance metric values. In this paper, we report the reference point sampling methods used in PlatEMO for different types of Pareto fronts. Experimental results show that the reference points generated by the proposed sampling methods can evaluate the performance of algorithms more accurately than randomly sampled reference points.
{"title":"Sampling Reference Points on the Pareto Fronts of Benchmark Multi-Objective Optimization Problems","authors":"Ye Tian, Xiaoshu Xiang, Xing-yi Zhang, Ran Cheng, Yaochu Jin","doi":"10.1109/CEC.2018.8477730","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477730","url":null,"abstract":"The effectiveness of evolutionary algorithms have been verified on multi-objective optimization, and a large number of multi-objective evolutionary algorithms have been proposed during the last two decades. To quantitatively compare the performance of different algorithms, a set of uniformly distributed reference points sampled on the Pareto fronts of benchmark problems are needed in the calculation of many performance metrics. However, not much work has been done to investigate the method for sampling reference points on Pareto fronts, even though it is not an easy task for many Pareto fronts with irregular shapes. More recently, an evolutionary multi-objective optimization platform was proposed by us, called PlatEMO, which can automatically generate reference points on each Pareto front and use them to calculate the performance metric values. In this paper, we report the reference point sampling methods used in PlatEMO for different types of Pareto fronts. Experimental results show that the reference points generated by the proposed sampling methods can evaluate the performance of algorithms more accurately than randomly sampled reference points.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125376879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.
{"title":"Genetic Programming for Preprocessing Tandem Mass Spectra to Improve the Reliability of Peptide Identification","authors":"Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng","doi":"10.1109/CEC.2018.8477810","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477810","url":null,"abstract":"Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121690008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/CEC.2018.8477959
Henry E. L. Cagnini, M. Basgalupp, Rodrigo C. Barros
Ensemble learning is the machine learning paradigm that aims at integrating several base learners into a single system under the assumption that the collective consensus outperforms a single strong learner, be it regarding effectiveness, efficiency, or any other problem-specific metric. Ensemble learning comprises three main phases: generation, selection, and integration, and there are several possible (deterministic or stochastic) strategies for executing one or more of those phases. In this paper, our focus is on improving the predictive accuracy of the well-known AdaBoost algorithm. By using its former voting weights as starting point in a global search carried by an Estimation of Distribution Algorithm, we are capable of improving AdaBoost up to $approx 11$ % regarding predictive accuracy in a thorough experimental analysis with multiple public datasets.
{"title":"Increasing Boosting Effectiveness with Estimation of Distribution Algorithms","authors":"Henry E. L. Cagnini, M. Basgalupp, Rodrigo C. Barros","doi":"10.1109/CEC.2018.8477959","DOIUrl":"https://doi.org/10.1109/CEC.2018.8477959","url":null,"abstract":"Ensemble learning is the machine learning paradigm that aims at integrating several base learners into a single system under the assumption that the collective consensus outperforms a single strong learner, be it regarding effectiveness, efficiency, or any other problem-specific metric. Ensemble learning comprises three main phases: generation, selection, and integration, and there are several possible (deterministic or stochastic) strategies for executing one or more of those phases. In this paper, our focus is on improving the predictive accuracy of the well-known AdaBoost algorithm. By using its former voting weights as starting point in a global search carried by an Estimation of Distribution Algorithm, we are capable of improving AdaBoost up to $approx 11$ % regarding predictive accuracy in a thorough experimental analysis with multiple public datasets.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}