Operational effectiveness of the wind and hydrokinetic turbines depend on the performance of the airfoils chosen. Standard airfoils historically used for wind and hydrokinetic turbines had and have the maximum lift coefficients of about 1.3 at the stall angle of attack, which is about 12o. At these conditions, the minimum flow velocities to generate electric power are about 7 m/s and 3 m/s for wind turbine and hydrokinetic turbine, respectively. Using leading edge slat, the fluid dynamics governing the flow field eliminates the separation bubble by the injection of the high momentum fluid through the slat over the main airfoil-by meaning of the flow control delays the stall up to an angle of attack of 20o, with a maximum lift coefficient of 2.2. In this study, NACA 2415 was chosen as a representative of hydrofoils while NACA 22 and NACA 97, were chosen as slat profiles, respectively. This flow has been numerically simulated by FLUENT, employing the Realizable k-e turbulence model. In the design of the wind and hydrokinetic turbines, the performance of the airfoils presented by aerodynamics CL = f (a,d), CD = f (a,d) and CL/CD = f (a,d) are the basic parameters. In this paper, optimum values of the angle of attack, slat angle and clearance space between slat and main airfoil leading to maximum lift and minimum drag, and consequently to maximum CL/CD have been numerically determined. Hence, using airfoil and hydrofoil with leading edge slat in the wind and hydrokinetic turbines, minimum wind and hydrokinetic flow velocities to produce meaningful and practical mechanical power reduces to 3-4 m /s for wind turbines and 1-1.5 m/s or less for hydrokinetic turbines. Consequently, using hydrofoil with leading edge slat may re-define the potentials of wind power and hydrokinetic power potential of the countries in the positive manner.
风力和水动力涡轮机的运行效率取决于所选择的翼型的性能。历史上用于风力和水动力涡轮机的标准翼型在失速攻角时的最大升力系数约为1.3,约为120。在此条件下,风力机和水动力机发电的最小流速分别约为7 m/s和3 m/s。利用前缘板条,流体动力学控制流场消除分离气泡的注入高动量流体通过板条在主翼型的意思流动控制延迟失速高达迎角200,最大升力系数为2.2。在本研究中,选择NACA 2415作为水翼的代表,NACA 22和NACA 97分别作为板形。采用Realizable k-e湍流模型,用FLUENT对该湍流进行了数值模拟。在风力和水动力涡轮设计中,空气动力学CL = f (a,d)、CD = f (a,d)和CL/CD = f (a,d)所表示的翼型性能是基本参数。在本文中,攻角的最佳值,板条角和板条与主翼型之间的间隙空间导致最大升力和最小阻力,从而最大的CL/CD已经数值确定。因此,在风力和水动力涡轮机中使用前缘狭缝的翼型和水翼,产生有意义和实用的机械动力的最小风力和水动力流速降低到风力涡轮机的3-4米/秒,水动力涡轮机的1-1.5米/秒或更低。因此,采用前缘翼板的水翼可以从积极的角度重新定义各国的风力发电潜力和水动力潜力。
{"title":"Performance Analysis of a Hydrofoil with and without Leading Edge Slat","authors":"T. Yavuz, B. Kilkis, Hursit Akpinar, Özgür Erol","doi":"10.1109/ICMLA.2011.113","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.113","url":null,"abstract":"Operational effectiveness of the wind and hydrokinetic turbines depend on the performance of the airfoils chosen. Standard airfoils historically used for wind and hydrokinetic turbines had and have the maximum lift coefficients of about 1.3 at the stall angle of attack, which is about 12o. At these conditions, the minimum flow velocities to generate electric power are about 7 m/s and 3 m/s for wind turbine and hydrokinetic turbine, respectively. Using leading edge slat, the fluid dynamics governing the flow field eliminates the separation bubble by the injection of the high momentum fluid through the slat over the main airfoil-by meaning of the flow control delays the stall up to an angle of attack of 20o, with a maximum lift coefficient of 2.2. In this study, NACA 2415 was chosen as a representative of hydrofoils while NACA 22 and NACA 97, were chosen as slat profiles, respectively. This flow has been numerically simulated by FLUENT, employing the Realizable k-e turbulence model. In the design of the wind and hydrokinetic turbines, the performance of the airfoils presented by aerodynamics CL = f (a,d), CD = f (a,d) and CL/CD = f (a,d) are the basic parameters. In this paper, optimum values of the angle of attack, slat angle and clearance space between slat and main airfoil leading to maximum lift and minimum drag, and consequently to maximum CL/CD have been numerically determined. Hence, using airfoil and hydrofoil with leading edge slat in the wind and hydrokinetic turbines, minimum wind and hydrokinetic flow velocities to produce meaningful and practical mechanical power reduces to 3-4 m /s for wind turbines and 1-1.5 m/s or less for hydrokinetic turbines. Consequently, using hydrofoil with leading edge slat may re-define the potentials of wind power and hydrokinetic power potential of the countries in the positive manner.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128794014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
{"title":"Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters","authors":"J. Silva, Eduardo R. Hruschka","doi":"10.1109/ICMLA.2011.67","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.67","url":null,"abstract":"Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126004490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian
Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.
{"title":"Network-Based Filtering of Unreliable Markers in Genome Mapping","authors":"O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian","doi":"10.1109/ICMLA.2011.103","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.103","url":null,"abstract":"Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"12 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126114561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.
{"title":"Extended Finite-State Machine Induction Using SAT-Solver","authors":"V. Ulyantsev, F. Tsarev","doi":"10.1109/ICMLA.2011.166","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.166","url":null,"abstract":"In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto
Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually "sticky traps", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.
{"title":"Towards Automatic Classification on Flying Insects Using Inexpensive Sensors","authors":"Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto","doi":"10.1109/ICMLA.2011.145","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.145","url":null,"abstract":"Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually \"sticky traps\", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.
{"title":"A New Control Method for dc-dc Converter by Neural Network Predictor with Repetitive Training","authors":"F. Kurokawa, K. Ueno, H. Maruta, H. Osuga","doi":"10.1109/ICMLA.2011.17","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.17","url":null,"abstract":"This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127448775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have already proposed an idea of simultaneous implementation of population subdivision and training data set subdivision, which leads to significant decrease in computation time of genetics-based machine learning (GBML) for large data sets. In our idea, a population is subdivided into multiple sub-populations as in island models where subdivided training data are rotated over the sub-populations. In this paper, we focus on the effect of training data rotation on the generalization ability and the computation time of our hybrid fuzzy GBML algorithm. First we show parallel distributed implementation of our hybrid fuzzy GBML algorithm. Then we examine the effect of training data rotation through computational experiments where both single-population (i.e., non-parallel) and multi-population (i.e., parallel) versions of our GBML algorithm are applied to a multi-class high-dimensional problem with a large number of training patterns. Experimental results show that training data rotation improves the generalization ability of our GBML algorithm. It is also shown that the population size is more directly related to the computation time than the training data set size.
{"title":"Training Data Subdivision and Periodical Rotation in Hybrid Fuzzy Genetics-Based Machine Learning","authors":"H. Ishibuchi, S. Mihara, Y. Nojima","doi":"10.1109/ICMLA.2011.147","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.147","url":null,"abstract":"We have already proposed an idea of simultaneous implementation of population subdivision and training data set subdivision, which leads to significant decrease in computation time of genetics-based machine learning (GBML) for large data sets. In our idea, a population is subdivided into multiple sub-populations as in island models where subdivided training data are rotated over the sub-populations. In this paper, we focus on the effect of training data rotation on the generalization ability and the computation time of our hybrid fuzzy GBML algorithm. First we show parallel distributed implementation of our hybrid fuzzy GBML algorithm. Then we examine the effect of training data rotation through computational experiments where both single-population (i.e., non-parallel) and multi-population (i.e., parallel) versions of our GBML algorithm are applied to a multi-class high-dimensional problem with a large number of training patterns. Experimental results show that training data rotation improves the generalization ability of our GBML algorithm. It is also shown that the population size is more directly related to the computation time than the training data set size.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124150873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.
{"title":"Ranking Interactions for a Curation Task","authors":"S. Clematide, Fabio Rinaldi","doi":"10.1109/ICMLA.2011.119","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.119","url":null,"abstract":"One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.
{"title":"Kernel Methods for Minimum Entropy Encoding","authors":"S. Melacci, M. Gori","doi":"10.1109/ICMLA.2011.83","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.83","url":null,"abstract":"Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114244683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.
{"title":"Microarray Classification Using Sub-space Grids","authors":"M. Wani","doi":"10.1109/ICMLA.2011.125","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.125","url":null,"abstract":"The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}