首页 > 最新文献

2011 10th International Conference on Machine Learning and Applications and Workshops最新文献

英文 中文
Performance Analysis of a Hydrofoil with and without Leading Edge Slat 带和不带前缘板的水翼性能分析
T. Yavuz, B. Kilkis, Hursit Akpinar, Özgür Erol
Operational effectiveness of the wind and hydrokinetic turbines depend on the performance of the airfoils chosen. Standard airfoils historically used for wind and hydrokinetic turbines had and have the maximum lift coefficients of about 1.3 at the stall angle of attack, which is about 12o. At these conditions, the minimum flow velocities to generate electric power are about 7 m/s and 3 m/s for wind turbine and hydrokinetic turbine, respectively. Using leading edge slat, the fluid dynamics governing the flow field eliminates the separation bubble by the injection of the high momentum fluid through the slat over the main airfoil-by meaning of the flow control delays the stall up to an angle of attack of 20o, with a maximum lift coefficient of 2.2. In this study, NACA 2415 was chosen as a representative of hydrofoils while NACA 22 and NACA 97, were chosen as slat profiles, respectively. This flow has been numerically simulated by FLUENT, employing the Realizable k-e turbulence model. In the design of the wind and hydrokinetic turbines, the performance of the airfoils presented by aerodynamics CL = f (a,d), CD = f (a,d) and CL/CD = f (a,d) are the basic parameters. In this paper, optimum values of the angle of attack, slat angle and clearance space between slat and main airfoil leading to maximum lift and minimum drag, and consequently to maximum CL/CD have been numerically determined. Hence, using airfoil and hydrofoil with leading edge slat in the wind and hydrokinetic turbines, minimum wind and hydrokinetic flow velocities to produce meaningful and practical mechanical power reduces to 3-4 m /s for wind turbines and 1-1.5 m/s or less for hydrokinetic turbines. Consequently, using hydrofoil with leading edge slat may re-define the potentials of wind power and hydrokinetic power potential of the countries in the positive manner.
风力和水动力涡轮机的运行效率取决于所选择的翼型的性能。历史上用于风力和水动力涡轮机的标准翼型在失速攻角时的最大升力系数约为1.3,约为120。在此条件下,风力机和水动力机发电的最小流速分别约为7 m/s和3 m/s。利用前缘板条,流体动力学控制流场消除分离气泡的注入高动量流体通过板条在主翼型的意思流动控制延迟失速高达迎角200,最大升力系数为2.2。在本研究中,选择NACA 2415作为水翼的代表,NACA 22和NACA 97分别作为板形。采用Realizable k-e湍流模型,用FLUENT对该湍流进行了数值模拟。在风力和水动力涡轮设计中,空气动力学CL = f (a,d)、CD = f (a,d)和CL/CD = f (a,d)所表示的翼型性能是基本参数。在本文中,攻角的最佳值,板条角和板条与主翼型之间的间隙空间导致最大升力和最小阻力,从而最大的CL/CD已经数值确定。因此,在风力和水动力涡轮机中使用前缘狭缝的翼型和水翼,产生有意义和实用的机械动力的最小风力和水动力流速降低到风力涡轮机的3-4米/秒,水动力涡轮机的1-1.5米/秒或更低。因此,采用前缘翼板的水翼可以从积极的角度重新定义各国的风力发电潜力和水动力潜力。
{"title":"Performance Analysis of a Hydrofoil with and without Leading Edge Slat","authors":"T. Yavuz, B. Kilkis, Hursit Akpinar, Özgür Erol","doi":"10.1109/ICMLA.2011.113","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.113","url":null,"abstract":"Operational effectiveness of the wind and hydrokinetic turbines depend on the performance of the airfoils chosen. Standard airfoils historically used for wind and hydrokinetic turbines had and have the maximum lift coefficients of about 1.3 at the stall angle of attack, which is about 12o. At these conditions, the minimum flow velocities to generate electric power are about 7 m/s and 3 m/s for wind turbine and hydrokinetic turbine, respectively. Using leading edge slat, the fluid dynamics governing the flow field eliminates the separation bubble by the injection of the high momentum fluid through the slat over the main airfoil-by meaning of the flow control delays the stall up to an angle of attack of 20o, with a maximum lift coefficient of 2.2. In this study, NACA 2415 was chosen as a representative of hydrofoils while NACA 22 and NACA 97, were chosen as slat profiles, respectively. This flow has been numerically simulated by FLUENT, employing the Realizable k-e turbulence model. In the design of the wind and hydrokinetic turbines, the performance of the airfoils presented by aerodynamics CL = f (a,d), CD = f (a,d) and CL/CD = f (a,d) are the basic parameters. In this paper, optimum values of the angle of attack, slat angle and clearance space between slat and main airfoil leading to maximum lift and minimum drag, and consequently to maximum CL/CD have been numerically determined. Hence, using airfoil and hydrofoil with leading edge slat in the wind and hydrokinetic turbines, minimum wind and hydrokinetic flow velocities to produce meaningful and practical mechanical power reduces to 3-4 m /s for wind turbines and 1-1.5 m/s or less for hydrokinetic turbines. Consequently, using hydrofoil with leading edge slat may re-define the potentials of wind power and hydrokinetic power potential of the countries in the positive manner.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128794014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters 基于k-均值的变聚类演化数据流扩展算法
J. Silva, Eduardo R. Hruschka
Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
文献中已经提出了许多基于广泛使用的k-Means的数据流聚类算法。它们中的大多数假设簇的数量k是已知的,并且是用户先验地固定的。为了放松这个在实际应用中通常不现实的假设,我们描述了一个允许从数据中自动估计k的算法框架。我们通过使用三种最先进的聚类数据流算法(Stream LSearch, CluStream和Stream k++)以及两种众所周知的估计聚类数量的算法(即:有序多次运行k-Means (OMRk)和平分k-Means (BkM))来说明所提出框架的潜力。作为额外的贡献,我们通过实验比较了合成数据流和真实数据流中产生的算法实例。统计显著性分析表明,OMRk产生最好的数据分区,而BkM的计算效率更高。此外,Stream k++与OMRk的结合在准确性和效率之间取得了最佳的平衡。
{"title":"Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters","authors":"J. Silva, Eduardo R. Hruschka","doi":"10.1109/ICMLA.2011.67","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.67","url":null,"abstract":"Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126004490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Network-Based Filtering of Unreliable Markers in Genome Mapping 基因组定位中基于网络的不可靠标记过滤
O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian
Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.
基因组作图,或对染色体上DNA标记排序的实验测定,是基因组测序和测序基因组最终组装的重要步骤。提出的研究解决了识别不能可靠放置的标记的问题。如果在标准映射过程中包含这样的标记,则可能导致总体上较差的映射。传统的识别标记不能一致放置的技术是基于重新采样的,对于大量重新采样的种群,这需要一个计算成本很高的过程。我们提出了一种基于网络的方法,该方法使用标记之间的配对相似性,并证明该方法的结果在很大程度上与计算成本更高的传统方法相匹配。利用小麦基因组辐射杂交图谱的数据对该方法进行了评价。
{"title":"Network-Based Filtering of Unreliable Markers in Genome Mapping","authors":"O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian","doi":"10.1109/ICMLA.2011.103","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.103","url":null,"abstract":"Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"12 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126114561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Extended Finite-State Machine Induction Using SAT-Solver 基于sat求解器的扩展有限状态机感应
V. Ulyantsev, F. Tsarev
In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.
本文描述了基于sat求解器的扩展有限状态机(EFSM)感应方法。归纳算法的输入数据是一组测试场景。该算法包括场景树构造、兼容图构造、布尔公式构造、sat求解器调用和从满足赋值构造有限状态机几个步骤。这些扩展的有限状态机可用于基于自动机的编程,其中程序被设计为自动控制对象。每个自动化被控对象都包含一个有限状态机和一个被控对象。所描述的方法已经在250到2000个随机生成的场景集上进行了测试,并在闹钟控制EFSM诱导问题上进行了测试,其中它大大优于遗传算法。
{"title":"Extended Finite-State Machine Induction Using SAT-Solver","authors":"V. Ulyantsev, F. Tsarev","doi":"10.1109/ICMLA.2011.166","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.166","url":null,"abstract":"In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Towards Automatic Classification on Flying Insects Using Inexpensive Sensors 利用廉价传感器对飞虫进行自动分类
Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto
Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually "sticky traps", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.
昆虫与人类的生活和福祉密切相关,无论是积极的还是消极的。据估计,人类消耗的全部食物中至少有三分之二是由昆虫授粉的,但疟疾(一种由按蚊属的雌蚊子传播的疾病)每年造成约100万人死亡。由于昆虫对人类的重要性,研究人员开发了一系列机械、化学、生物和教育工具,以帮助减轻昆虫的有害影响,并增强它们的有益影响。然而,这些工具的效率取决于尽可能早地了解迁徙/侵扰/人口的时间和地点。昆虫检测和计数通常是通过陷阱进行的,通常是“粘性陷阱”,定期收集和人工分析。主要的问题是,该过程在材料和人力时间方面都很昂贵,并且在放置疏水阀和检查疏水阀之间存在滞后。这种滞后可能只有一周,但对蚊子或沙蝇来说,这可能是它们成年寿命的一半以上。我们正在开发一种廉价的光学传感器,它使用激光束从远处探测、计数并最终对飞虫进行分类。我们的目标是利用分类技术提供精确到物种/性别水平的疾病媒介的实时计数。公共卫生工作者、政府和非政府组织可以利用这些信息,在资源有限的情况下规划最佳干预战略。在这项工作中,我们介绍了我们对三种昆虫进行研究的一些初步结果。我们表明,使用我们的简单传感器,我们可以准确地分类这些物种,以他们的翅膀拍击频率为特征。我们进一步讨论了如何用其他信息来源来增强传感器,以便扩展我们的想法,对更多的物种进行分类。
{"title":"Towards Automatic Classification on Flying Insects Using Inexpensive Sensors","authors":"Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto","doi":"10.1109/ICMLA.2011.145","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.145","url":null,"abstract":"Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually \"sticky traps\", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
A New Control Method for dc-dc Converter by Neural Network Predictor with Repetitive Training 一种基于重复训练神经网络预测器的dc-dc变换器控制新方法
F. Kurokawa, K. Ueno, H. Maruta, H. Osuga
This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.
提出了一种基于预测的数字控制dc-dc变换器。该方法采用神经网络控制与传统的P-I-D控制相协调来改善暂态响应。基于预测的控制项由神经网络的重复训练得到的预测数据组成。当负载快速变化时,这种方法可以非常有效地改善暂态响应。与传统的负载电阻阶跃变化相比,该方法有效地抑制了输出电压过冲和电抗器电流过冲。该方法基于神经网络学习,在不需要改变算法的情况下,具有较高的可用性,为电路系统的设计提供了简便的方法。将电路的P-I-D控制参数设置为非最优参数,并以相同的方式使用所提出的方法,也证实了所提出方法的充分可用性。
{"title":"A New Control Method for dc-dc Converter by Neural Network Predictor with Repetitive Training","authors":"F. Kurokawa, K. Ueno, H. Maruta, H. Osuga","doi":"10.1109/ICMLA.2011.17","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.17","url":null,"abstract":"This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127448775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Training Data Subdivision and Periodical Rotation in Hybrid Fuzzy Genetics-Based Machine Learning 混合模糊遗传机器学习中的训练数据细分和周期轮换
H. Ishibuchi, S. Mihara, Y. Nojima
We have already proposed an idea of simultaneous implementation of population subdivision and training data set subdivision, which leads to significant decrease in computation time of genetics-based machine learning (GBML) for large data sets. In our idea, a population is subdivided into multiple sub-populations as in island models where subdivided training data are rotated over the sub-populations. In this paper, we focus on the effect of training data rotation on the generalization ability and the computation time of our hybrid fuzzy GBML algorithm. First we show parallel distributed implementation of our hybrid fuzzy GBML algorithm. Then we examine the effect of training data rotation through computational experiments where both single-population (i.e., non-parallel) and multi-population (i.e., parallel) versions of our GBML algorithm are applied to a multi-class high-dimensional problem with a large number of training patterns. Experimental results show that training data rotation improves the generalization ability of our GBML algorithm. It is also shown that the population size is more directly related to the computation time than the training data set size.
我们已经提出了同时实现人口细分和训练数据集细分的想法,这使得基于遗传的机器学习(GBML)在大数据集上的计算时间显著减少。在我们的想法中,一个种群被细分为多个子种群,就像在岛屿模型中那样,细分的训练数据在子种群上旋转。本文主要研究了训练数据旋转对混合模糊GBML算法泛化能力和计算时间的影响。首先,我们展示了混合模糊GBML算法的并行分布式实现。然后,我们通过计算实验检验了训练数据旋转的效果,其中将我们的GBML算法的单种群(即非并行)和多种群(即并行)版本应用于具有大量训练模式的多类高维问题。实验结果表明,训练数据旋转提高了GBML算法的泛化能力。研究还表明,与训练数据集大小相比,总体大小与计算时间的关系更为直接。
{"title":"Training Data Subdivision and Periodical Rotation in Hybrid Fuzzy Genetics-Based Machine Learning","authors":"H. Ishibuchi, S. Mihara, Y. Nojima","doi":"10.1109/ICMLA.2011.147","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.147","url":null,"abstract":"We have already proposed an idea of simultaneous implementation of population subdivision and training data set subdivision, which leads to significant decrease in computation time of genetics-based machine learning (GBML) for large data sets. In our idea, a population is subdivided into multiple sub-populations as in island models where subdivided training data are rotated over the sub-populations. In this paper, we focus on the effect of training data rotation on the generalization ability and the computation time of our hybrid fuzzy GBML algorithm. First we show parallel distributed implementation of our hybrid fuzzy GBML algorithm. Then we examine the effect of training data rotation through computational experiments where both single-population (i.e., non-parallel) and multi-population (i.e., parallel) versions of our GBML algorithm are applied to a multi-class high-dimensional problem with a large number of training patterns. Experimental results show that training data rotation improves the generalization ability of our GBML algorithm. It is also shown that the population size is more directly related to the computation time than the training data set size.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124150873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Ranking Interactions for a Curation Task 排序互动的策展任务
S. Clematide, Fabio Rinaldi
One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.
生物医学文本挖掘系统希望从文献中提取的关键信息之一是不同类型生物医学实体(蛋白质、基因、疾病、药物等)之间的相互作用。可以考虑不同类型的实体,例如,作为生物创意竞争评估的一部分,蛋白质-蛋白质相互作用已被广泛研究。然而,更复杂的相互作用,如基因、药物和疾病之间的相互作用越来越引起人们的兴趣。不同的数据库被用来作为评价提取和排序技术的参考。本文的目的是描述一种基于机器学习的重新排序方法,用于从文献中提取候选交互。使用来自Pharm GKB数据库的数据对结果进行评估。当结果被应用于支持人类管理员时,一个好的排名的重要性尤为明显。
{"title":"Ranking Interactions for a Curation Task","authors":"S. Clematide, Fabio Rinaldi","doi":"10.1109/ICMLA.2011.119","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.119","url":null,"abstract":"One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Kernel Methods for Minimum Entropy Encoding 最小熵编码的核方法
S. Melacci, M. Gori
Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.
根据信息论学习(ITL)的基本原理,本文提出了一种新的数据聚类方法——最小熵编码器(MEEs)。我们考虑一组函数,将每个输入点投影到最小熵配置(代码)上。编码函数由内核机建模,生成的代码收集集群隶属概率。包括两个正则化器,分别用于平衡输出特征的分布和支持光滑解,从而导致可以通过共轭梯度或凹凸过程有效解决的无约束优化问题。研究了最大边际聚类算法与最大边际聚类算法的关系,结果表明该算法克服了一些关键问题,如缺乏多类扩展和需要面对大量约束的问题。对所提出方法的几个基准进行的大规模评估显示,在准确性和计算复杂性方面,该方法都优于最先进的技术。
{"title":"Kernel Methods for Minimum Entropy Encoding","authors":"S. Melacci, M. Gori","doi":"10.1109/ICMLA.2011.83","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.83","url":null,"abstract":"Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114244683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Microarray Classification Using Sub-space Grids 基于子空间网格的微阵列分类
M. Wani
The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.
本文介绍了如何利用子空间网格提取微阵列分类规则。本文首先描述了从投影低维空间中获取子空间网格的主成分分析(PCA)算法。然后使用递归过程获得规则,其中子空间网格构成规则的前提。提取的规则集在训练和测试数据集上进行评估。PCA算法的子空间网格的特点是不同类别的数据重叠,即使在规则中使用两个以上的前提也不能完全解决数据重叠的问题。因此,所获得的规则并不能准确地区分不同的阶级。为了提高规则集的有效性,采用多元判别分析(MDA)算法代替PCA算法从投影的低维空间中获取子空间网格。这些MDA算法的子空间网格提高了系统的分类精度。然而,提取的规则集的大小很大,并且这些规则对与数据相关的局部变化很敏感。为了解决这些问题,本文探讨了同时使用PCA和MDA算法来投影低维空间以获得子空间网格。所得到的规则集产生更好的分类精度结果。本文对基于规则的系统进行了综合评价。该系统在62个样本(40个结肠肿瘤和22个正常结肠组织)的数据集上进行了测试。结果表明,结合PCA和MDA算法在低维空间投影得到的子空间网格,提高了微阵列数据分类结果的准确性。
{"title":"Microarray Classification Using Sub-space Grids","authors":"M. Wani","doi":"10.1109/ICMLA.2011.125","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.125","url":null,"abstract":"The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2011 10th International Conference on Machine Learning and Applications and Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1