首页 > 最新文献

2010 Ninth International Conference on Machine Learning and Applications最新文献

英文 中文
Modelling Turkey's Energy Consumption Based on Artificial Neural Network 基于人工神经网络的土耳其能源消费模型
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.105
H. S. Kuyuk, O. Ozkan, R. Kayikci, S. Bayraktaroglu
Energy plays a fundamental role in an economy. Turkey has the world's 15th largest GDP-Purchasing power parity and 17th largest Nominal GDP. Economists and political scientists classify Turkey as a newly industrialized country. In this study, an alternative model for Turkey’s energy consumption is proposed for the time between 1980 and 2004. Artificial neural network based model (ANN) is preferred as a forecasting tool. Gross domestic product (GDP), which is based on purchasing power parity, industrial production index and total population are utilized in the model. It is found that the energy consumption has direct relations with the Industrial Production Index. Moreover, population and GDP has causality effects.
能源在经济中起着基础性的作用。土耳其的GDP(按购买力平价计算)排名世界第15位,名义GDP排名世界第17位。经济学家和政治学家将土耳其归类为新兴工业化国家。在本研究中,提出了1980年至2004年期间土耳其能源消耗的替代模型。基于人工神经网络的模型(ANN)是首选的预测工具。模型中使用了基于购买力平价的国内生产总值(GDP)、工业生产指数和总人口。研究发现,能源消耗与工业生产指数有直接关系。此外,人口与GDP之间存在因果关系。
{"title":"Modelling Turkey's Energy Consumption Based on Artificial Neural Network","authors":"H. S. Kuyuk, O. Ozkan, R. Kayikci, S. Bayraktaroglu","doi":"10.1109/ICMLA.2010.105","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.105","url":null,"abstract":"Energy plays a fundamental role in an economy. Turkey has the world's 15th largest GDP-Purchasing power parity and 17th largest Nominal GDP. Economists and political scientists classify Turkey as a newly industrialized country. In this study, an alternative model for Turkey’s energy consumption is proposed for the time between 1980 and 2004. Artificial neural network based model (ANN) is preferred as a forecasting tool. Gross domestic product (GDP), which is based on purchasing power parity, industrial production index and total population are utilized in the model. It is found that the energy consumption has direct relations with the Industrial Production Index. Moreover, population and GDP has causality effects.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Application of Principal Surfaces to Segmentation in 4D-CT for Radiation Treatment Planning 主曲面分割在4D-CT放射治疗规划中的新应用
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.116
S. You, E. Cansizoglu, Deniz Erdoğmuş, J. Tanyi, Jayashree Kalpathy-Cramer
Radiation therapy is one of the most effective options used in the treatment of about half of all people with cancer. A critical goal in radiation therapy is to deliver optimal radiation doses to the observed tumor while sparing the surrounding healthy tissues. Radiation oncologists typically manually delineate normal and diseased structures on three-dimensional computed tomography~(3D-CT) scans. Manual delineation is a labor intensive, tedious and time-consuming task. In recent years, concerns about respiration induced motion have led to the popularity of four-dimensional computed tomography~(4D-CT) for the tracking of tumors and deformation of organs. However, as manually contouring in all phases would be prohibitively expensive, the development of fast, robust, and automatic segmentation tools has been an active area of research in 4D radiotherapy. In this paper, we describe a novel application of principal surfaces for the propagation of contours in 4D-CT studies. Regions of interest~(ROIs) are manually delineated slice-by-slice in the reference 3D-CT scans. Edges are detected on all of the slices of the target 3D-CT phase. A kernel density estimation~(KDE) based on the detected edges is then calculated. The principal surface algorithm is applied to find the ridges of the edge KDE to provide the object contours. Manually drawn contours from the reference phase are used as an initialization. Contours of ROIs are propagated recursively in all consecutive phases to complete a respiration cycle. Results are provided for a phantom data set of simulated tumor motion as well as on a de-identified data set of the lung of a patient. Evaluation of the efficacy of automatic segmentation in organs and tumors are based on the comparison between manually drawn contours and automatically delineated contours. The Dice coefficients are approximately 0.97 for the lung tumor on the phantom data sets and 0.95 for the patient data sets. The centroid distances between manually delineated lung volume and automatically segmented lung volume in each CT direction are
放射治疗是治疗大约一半癌症患者最有效的选择之一。放射治疗的一个关键目标是向观察到的肿瘤提供最佳的辐射剂量,同时保留周围的健康组织。放射肿瘤学家通常在三维计算机断层扫描(3D-CT)上手动描绘正常和病变结构。手工描绘是一项劳动密集、繁琐且耗时的任务。近年来,由于对呼吸引起的运动的关注,四维计算机断层扫描(4D-CT)被广泛应用于肿瘤和器官变形的跟踪。然而,由于在所有阶段手动轮廓将是非常昂贵的,快速,鲁棒和自动分割工具的开发一直是4D放疗研究的活跃领域。在本文中,我们描述了主曲面在4D-CT研究中传播轮廓的新应用。在参考3D-CT扫描中,逐片手动划定感兴趣区域(roi)。在目标3D-CT阶段的所有切片上检测边缘。然后根据检测到的边缘计算核密度估计~(KDE)。采用主曲面算法求边缘KDE的脊线,给出目标轮廓。从参考阶段手动绘制的轮廓被用作初始化。roi的轮廓在所有连续阶段递归传播,以完成呼吸循环。结果提供了模拟肿瘤运动的幻影数据集以及患者肺的去识别数据集。对器官和肿瘤自动分割效果的评价是基于人工绘制的轮廓和自动绘制的轮廓的比较。幻象数据集上肺肿瘤的Dice系数约为0.97,患者数据集的Dice系数约为0.95。人工圈定的肺体积与自动分割的肺体积在各CT方向上的质心距离为
{"title":"A Novel Application of Principal Surfaces to Segmentation in 4D-CT for Radiation Treatment Planning","authors":"S. You, E. Cansizoglu, Deniz Erdoğmuş, J. Tanyi, Jayashree Kalpathy-Cramer","doi":"10.1109/ICMLA.2010.116","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.116","url":null,"abstract":"Radiation therapy is one of the most effective options used in the treatment of about half of all people with cancer. A critical goal in radiation therapy is to deliver optimal radiation doses to the observed tumor while sparing the surrounding healthy tissues. Radiation oncologists typically manually delineate normal and diseased structures on three-dimensional computed tomography~(3D-CT) scans. Manual delineation is a labor intensive, tedious and time-consuming task. In recent years, concerns about respiration induced motion have led to the popularity of four-dimensional computed tomography~(4D-CT) for the tracking of tumors and deformation of organs. However, as manually contouring in all phases would be prohibitively expensive, the development of fast, robust, and automatic segmentation tools has been an active area of research in 4D radiotherapy. In this paper, we describe a novel application of principal surfaces for the propagation of contours in 4D-CT studies. Regions of interest~(ROIs) are manually delineated slice-by-slice in the reference 3D-CT scans. Edges are detected on all of the slices of the target 3D-CT phase. A kernel density estimation~(KDE) based on the detected edges is then calculated. The principal surface algorithm is applied to find the ridges of the edge KDE to provide the object contours. Manually drawn contours from the reference phase are used as an initialization. Contours of ROIs are propagated recursively in all consecutive phases to complete a respiration cycle. Results are provided for a phantom data set of simulated tumor motion as well as on a de-identified data set of the lung of a patient. Evaluation of the efficacy of automatic segmentation in organs and tumors are based on the comparison between manually drawn contours and automatically delineated contours. The Dice coefficients are approximately 0.97 for the lung tumor on the phantom data sets and 0.95 for the patient data sets. The centroid distances between manually delineated lung volume and automatically segmented lung volume in each CT direction are","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using Randomised Vectors in Transcription Factor Binding Site Predictions 利用随机载体预测转录因子结合位点
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.82
F. Rezwan, Yi Sun, N. Davey, R. Adams, A. Rust, M. Robinson
Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original labeled data.
寻找DNA中结合位点的位置是一个难题。虽然一些结合位点的位置已经通过实验确定,但基因组的其他部分可能包含也可能不包含结合位点。这给可训练分类器中的负数据带来了问题。这里我们表明,与原始标记数据相比,使用随机化的负数据可以大大提高分类器的性能。
{"title":"Using Randomised Vectors in Transcription Factor Binding Site Predictions","authors":"F. Rezwan, Yi Sun, N. Davey, R. Adams, A. Rust, M. Robinson","doi":"10.1109/ICMLA.2010.82","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.82","url":null,"abstract":"Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original labeled data.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126272496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Wind Speed Forecasting Based on Second Order Blind Identification and Autoregressive Model 基于二阶盲识别和自回归模型的风速预测
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.106
U. Fırat, Ş. Engin, M. Saraçlar, Aysin Ertüzün
Wind power may present undesirable discontinuities and fluctuations due to considerable variations in wind speed, which may affect adversely the smooth operation of the grid. Effective wind forecast is essential in order to report the amount of energy supply with high accuracy, which is crucial for planning energy resources for power system operators. Variations in wind power cannot be sufficiently estimated by persistence type basic forecasting methods particularly in medium and long terms. Therefore a new statistical method is presented here in this paper based on independent component analysis (ICA) and autoregressive (AR) model. ICA is utilized in order to exploit the hidden factors which may exist in the wind speed time-series. It is understood that ICA, especially ICA methods based on exploiting the time structure like second order blind identification (SOBI) can be used as a preliminary step in wind speed forecasting.
由于风速的巨大变化,风力发电可能出现不希望出现的不连续性和波动,这可能对电网的平稳运行产生不利影响。有效的风力预报是准确报告供电量的必要条件,对电力系统运营商进行能源规划至关重要。持续型的基本预测方法不能充分估计风力发电的变化,特别是中长期的变化。为此,本文提出了一种基于独立成分分析(ICA)和自回归(AR)模型的统计方法。为了挖掘风速时间序列中可能存在的隐藏因素,采用了独立分量分析。据了解,ICA,特别是基于二阶盲识别(SOBI)等利用时间结构的ICA方法,可以作为风速预报的初步步骤。
{"title":"Wind Speed Forecasting Based on Second Order Blind Identification and Autoregressive Model","authors":"U. Fırat, Ş. Engin, M. Saraçlar, Aysin Ertüzün","doi":"10.1109/ICMLA.2010.106","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.106","url":null,"abstract":"Wind power may present undesirable discontinuities and fluctuations due to considerable variations in wind speed, which may affect adversely the smooth operation of the grid. Effective wind forecast is essential in order to report the amount of energy supply with high accuracy, which is crucial for planning energy resources for power system operators. Variations in wind power cannot be sufficiently estimated by persistence type basic forecasting methods particularly in medium and long terms. Therefore a new statistical method is presented here in this paper based on independent component analysis (ICA) and autoregressive (AR) model. ICA is utilized in order to exploit the hidden factors which may exist in the wind speed time-series. It is understood that ICA, especially ICA methods based on exploiting the time structure like second order blind identification (SOBI) can be used as a preliminary step in wind speed forecasting.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122193838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Comparative Analysis of DNA Microarray Data through the Use of Feature Selection Techniques 利用特征选择技术对DNA微阵列数据进行比较分析
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.29
D. Dittman, T. Khoshgoftaar, Randall Wald, J. V. Hulse
One of today’s most important scientific research topics is discovering the genetic links between cancers. This paper contains the results of a comparison of three different cancers (breast, colon, and lung) based on the results of feature selection techniques on a data set created from DNA micro array data consisting of samples from all three cancers. The data was run through a set of eighteen feature rankers which ordered the genes by importance with respect to a targeted cancer. This process was repeated three times, each time with a different target cancer. The rankings were then compared, keeping each feature ranker static while varying the cancers being compared. The cancers were evaluated both in pairs and all together, for matching genes. The results of the comparison show a large correlation between the two known hereditary cancers, breast and colon, and little correlation between lung cancer and the other cancers. This is the first study to apply eighteen different feature rankers in a bioinformatics case study, eleven of which were recently proposed and implemented by our research team.
当今最重要的科学研究课题之一是发现癌症之间的遗传联系。这篇论文包含了三种不同癌症(乳腺癌、结肠癌和肺癌)的比较结果,基于特征选择技术对由所有三种癌症样本组成的DNA微阵列数据创建的数据集的结果。这些数据是通过一组18个特征排序器来运行的,这些特征排序器根据基因对目标癌症的重要性进行排序。这个过程重复了三次,每次针对不同的目标癌症。然后对这些排名进行比较,保持每个特征的排名不变,同时改变被比较的癌症。为了匹配基因,研究人员对这些癌症进行了成对或一起评估。比较的结果显示,乳腺癌和结肠癌这两种已知的遗传性癌症之间有很大的相关性,而肺癌和其他癌症之间的相关性很小。这是第一次在生物信息学案例研究中应用18种不同的特征排序器,其中11种是我们的研究团队最近提出并实施的。
{"title":"Comparative Analysis of DNA Microarray Data through the Use of Feature Selection Techniques","authors":"D. Dittman, T. Khoshgoftaar, Randall Wald, J. V. Hulse","doi":"10.1109/ICMLA.2010.29","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.29","url":null,"abstract":"One of today’s most important scientific research topics is discovering the genetic links between cancers. This paper contains the results of a comparison of three different cancers (breast, colon, and lung) based on the results of feature selection techniques on a data set created from DNA micro array data consisting of samples from all three cancers. The data was run through a set of eighteen feature rankers which ordered the genes by importance with respect to a targeted cancer. This process was repeated three times, each time with a different target cancer. The rankings were then compared, keeping each feature ranker static while varying the cancers being compared. The cancers were evaluated both in pairs and all together, for matching genes. The results of the comparison show a large correlation between the two known hereditary cancers, breast and colon, and little correlation between lung cancer and the other cancers. This is the first study to apply eighteen different feature rankers in a bioinformatics case study, eleven of which were recently proposed and implemented by our research team.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"52 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114133314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Nonlinear Dynamical Multi-Scale Model of Associative Memory 联想记忆的非线性动态多尺度模型
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.135
A. Duda, S. Levinson
How can we get such reliable behavior from the mind when the brain is made up of such unreliable elements as neurons? We propose that the answer is related to the emergence of stable brain states and we offer a model that illustrates how such states could arise. We discuss a new ab initio nonlinear dynamical multi-scale model that will serve as the foundation for an associative memory. Scale 0 consists of spiking Hodgkin-Huxley (HH) neurons. Scale 1 consists of components that are made up of large populations of HH neurons whose topological structure evolves according to a Hebbian-plasticity rule based on synchronous firing. The component's state is captured by the variance of phase synchrony for the population. Many such components are sparsely connected to form a large network, whose state can be captured by the n-tuple consisting of the individual states of each member component. Scale 2 takes the state of the overall network and upon examining the particular interrelationships of each component (determining how the state of one component affects the state of others) is able to generate a class of trajectories that is multistationary and stable periodic. Such a class we consider a memory, the encoding of many such memories leads to the creation of a robust associative memory. The details of the different scales are examined.
当大脑是由像神经元这样不可靠的元素组成时,我们怎么能从头脑中获得如此可靠的行为呢?我们提出答案与稳定的大脑状态的出现有关,我们提供了一个模型来说明这种状态是如何产生的。我们讨论了一个新的从头开始的非线性动态多尺度模型,它将作为联想记忆的基础。0级包括尖峰霍奇金-赫胥黎(HH)神经元。尺度1由大量HH神经元组成,其拓扑结构根据基于同步放电的hebbian -可塑性规则进化。组件的状态由总体的相位同步方差捕获。许多这样的组件稀疏连接,形成一个大的网络,其状态可以被由每个成员组件的单独状态组成的n元组捕获。尺度2采用整个网络的状态,并在检查每个组件的特定相互关系(确定一个组件的状态如何影响其他组件的状态)之后,能够生成一类多平稳和稳定周期的轨迹。这样的一类我们认为是一个记忆,许多这样的记忆的编码导致创建一个健全的联想记忆。检查了不同尺度的细节。
{"title":"Nonlinear Dynamical Multi-Scale Model of Associative Memory","authors":"A. Duda, S. Levinson","doi":"10.1109/ICMLA.2010.135","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.135","url":null,"abstract":"How can we get such reliable behavior from the mind when the brain is made up of such unreliable elements as neurons? We propose that the answer is related to the emergence of stable brain states and we offer a model that illustrates how such states could arise. We discuss a new ab initio nonlinear dynamical multi-scale model that will serve as the foundation for an associative memory. Scale 0 consists of spiking Hodgkin-Huxley (HH) neurons. Scale 1 consists of components that are made up of large populations of HH neurons whose topological structure evolves according to a Hebbian-plasticity rule based on synchronous firing. The component's state is captured by the variance of phase synchrony for the population. Many such components are sparsely connected to form a large network, whose state can be captured by the n-tuple consisting of the individual states of each member component. Scale 2 takes the state of the overall network and upon examining the particular interrelationships of each component (determining how the state of one component affects the state of others) is able to generate a class of trajectories that is multistationary and stable periodic. Such a class we consider a memory, the encoding of many such memories leads to the creation of a robust associative memory. The details of the different scales are examined.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115317103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Feature Transformation and Model Design Using Minimum Classification Error 基于最小分类误差的特征转换与模型设计
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.122
M. Ratnagiri, L. Rabiner, B. Juang
A Minimum Classification Error (MCE) based recognition system that also estimates a global feature transformation matrix has been implemented. Unlike earlier studies, we make the explicit assumption that the covariance matrix of the Gaussian mixtures is diagonal when estimating the transformation matrix. This is necessary for mathematical consistency between the model and the transformation matrix estimates. Experimental results show a reduction of up to 50% in the word error rate as compared to Maximum Likelihood estimation.
实现了一种基于最小分类误差(MCE)的识别系统,该系统同时估计了全局特征变换矩阵。与以往的研究不同,我们在估计变换矩阵时明确假设高斯混合物的协方差矩阵是对角的。这对于模型和变换矩阵估计之间的数学一致性是必要的。实验结果表明,与最大似然估计相比,该方法的错误率降低了50%。
{"title":"Feature Transformation and Model Design Using Minimum Classification Error","authors":"M. Ratnagiri, L. Rabiner, B. Juang","doi":"10.1109/ICMLA.2010.122","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.122","url":null,"abstract":"A Minimum Classification Error (MCE) based recognition system that also estimates a global feature transformation matrix has been implemented. Unlike earlier studies, we make the explicit assumption that the covariance matrix of the Gaussian mixtures is diagonal when estimating the transformation matrix. This is necessary for mathematical consistency between the model and the transformation matrix estimates. Experimental results show a reduction of up to 50% in the word error rate as compared to Maximum Likelihood estimation.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116722226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Effective Virtual Machine Monitor Intrusion Detection Using Feature Selection on Highly Imbalanced Data 基于高度不平衡数据特征选择的有效虚拟机监控入侵检测
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.127
Malak Alshawabkeh, Micha Moffie, Fatemeh Azmandian, J. Aslam, Jennifer G. Dy, D. Kaeli
Virtualization is becoming an increasingly popular service hosting platform. Recently, intrusion detection systems (IDSs) which utilize virtualization have been introduced. One particular challenge present in current virtualization-based IDS systems is considered in this paper. IDS systems are commonly faced with high-dimensionality imbalanced data. Improved feature selection methods are needed to achieve more accurate detection when presented with imbalanced data. These methods must select the right set of features which will lead to a lower number of false alarms and higher correct detection rates. In this paper we propose a new Boosting-based feature selection that evaluates the relative importance of individual features using the fractional absolute confidence that Boosting produces. Our approach accounts for the sample distributions by optimizing for the area under the Receive Operating Characteristic (ROC) curve (i.e., Area Under the Curve(AUC)). Empirical results on different commercial virtual appliances and malwares indicate that proper input feature selection is key if we want an effective virtualization-based IDS that is lightweight, efficient and effective.
虚拟化正在成为一个日益流行的服务托管平台。近年来,引入了利用虚拟化技术的入侵检测系统(ids)。本文考虑了当前基于虚拟化的IDS系统中存在的一个特殊挑战。入侵检测系统通常面临高维不平衡数据。当面对不平衡数据时,需要改进特征选择方法来实现更准确的检测。这些方法必须选择正确的特征集,这将导致更少的误报数量和更高的正确检测率。在本文中,我们提出了一种新的基于Boosting的特征选择方法,该方法使用Boosting产生的分数绝对置信度来评估单个特征的相对重要性。我们的方法通过优化接收工作特征(ROC)曲线下的面积(即曲线下面积(AUC))来解释样本分布。对不同商业虚拟设备和恶意软件的经验结果表明,如果我们想要一个轻量级、高效和有效的基于虚拟化的IDS,正确的输入特征选择是关键。
{"title":"Effective Virtual Machine Monitor Intrusion Detection Using Feature Selection on Highly Imbalanced Data","authors":"Malak Alshawabkeh, Micha Moffie, Fatemeh Azmandian, J. Aslam, Jennifer G. Dy, D. Kaeli","doi":"10.1109/ICMLA.2010.127","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.127","url":null,"abstract":"Virtualization is becoming an increasingly popular service hosting platform. Recently, intrusion detection systems (IDSs) which utilize virtualization have been introduced. One particular challenge present in current virtualization-based IDS systems is considered in this paper. IDS systems are commonly faced with high-dimensionality imbalanced data. Improved feature selection methods are needed to achieve more accurate detection when presented with imbalanced data. These methods must select the right set of features which will lead to a lower number of false alarms and higher correct detection rates. In this paper we propose a new Boosting-based feature selection that evaluates the relative importance of individual features using the fractional absolute confidence that Boosting produces. Our approach accounts for the sample distributions by optimizing for the area under the Receive Operating Characteristic (ROC) curve (i.e., Area Under the Curve(AUC)). Empirical results on different commercial virtual appliances and malwares indicate that proper input feature selection is key if we want an effective virtualization-based IDS that is lightweight, efficient and effective.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125636207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Classification of Live Moths Combining Texture, Color and Shape Primitives 结合纹理、颜色和形状基元的活蛾分类
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.142
Gustavo E. A. P. A. Batista, Bilson J. L. Campana, Eamonn J. Keogh
Each year, insect-borne diseases kill more than one million people, and harmful insects destroy tens of billions of dollars worth of crops and livestock. At the same time, beneficial insects pollinate three-quarters of all food consumed by humans. Given the extraordinary impact of insects on human life, it is somewhat surprising that machine learning has made very little impact on understanding (and hence, controlling) insects. In this work we discuss why this is the case, and argue that a confluence of facts make the time ripe for machine learning research to reach out to the entomological community and help them solve some important problems. As a concrete example, we show how we can solve an important classification problem in commercial entomology by leveraging off recent progress in shape, color and texture measures.
每年,虫媒疾病夺去一百多万人的生命,有害昆虫摧毁了价值数百亿美元的农作物和牲畜。与此同时,人类消耗的食物中有四分之三是由益虫授粉的。考虑到昆虫对人类生活的巨大影响,机器学习在理解(从而控制)昆虫方面几乎没有什么影响,这有点令人惊讶。在这项工作中,我们讨论了为什么会出现这种情况,并认为事实的融合使得机器学习研究进入昆虫学界并帮助他们解决一些重要问题的时机成熟。作为一个具体的例子,我们展示了如何利用最近在形状、颜色和纹理测量方面的进展来解决商业昆虫学中一个重要的分类问题。
{"title":"Classification of Live Moths Combining Texture, Color and Shape Primitives","authors":"Gustavo E. A. P. A. Batista, Bilson J. L. Campana, Eamonn J. Keogh","doi":"10.1109/ICMLA.2010.142","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.142","url":null,"abstract":"Each year, insect-borne diseases kill more than one million people, and harmful insects destroy tens of billions of dollars worth of crops and livestock. At the same time, beneficial insects pollinate three-quarters of all food consumed by humans. Given the extraordinary impact of insects on human life, it is somewhat surprising that machine learning has made very little impact on understanding (and hence, controlling) insects. In this work we discuss why this is the case, and argue that a confluence of facts make the time ripe for machine learning research to reach out to the entomological community and help them solve some important problems. As a concrete example, we show how we can solve an important classification problem in commercial entomology by leveraging off recent progress in shape, color and texture measures.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125075596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Unsupervised Speaker Clustering in a Linear Discriminant Subspace 线性判别子空间中的无监督说话人聚类
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.159
Theodoros Giannakopoulos, Sergios Petridis
We present an approach for grouping single-speaker speech segments into speaker-specific clusters. Our approach is based on applying the K-means clustering algorithm to a suitable discriminant subspace, where the euclidean distance reflect speaker differences. A core feature of our approach is approximating speaker-conditional statistics, that are not available, with single-speaker segments statistics, which can be evaluated, thus making possible to apply the LDA algorithm for finding the optimal discriminative subspace, using unlabeled data. To illustrate our method, we present examples of clusters generated by our approach when applied to the ICMLA 2010 Speaker Clustering Challenge datasets.
我们提出了一种方法,将单个说话人的语音片段分组到说话人特定的集群中。我们的方法是基于将K-means聚类算法应用于合适的判别子空间,其中欧几里德距离反映说话者的差异。我们的方法的一个核心特征是近似演讲者条件统计,这是不可用的,单演讲者段统计,这是可以评估的,因此可以应用LDA算法来寻找最优的判别子空间,使用未标记的数据。为了说明我们的方法,我们给出了应用于ICMLA 2010演讲者聚类挑战数据集的方法生成的聚类示例。
{"title":"Unsupervised Speaker Clustering in a Linear Discriminant Subspace","authors":"Theodoros Giannakopoulos, Sergios Petridis","doi":"10.1109/ICMLA.2010.159","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.159","url":null,"abstract":"We present an approach for grouping single-speaker speech segments into speaker-specific clusters. Our approach is based on applying the K-means clustering algorithm to a suitable discriminant subspace, where the euclidean distance reflect speaker differences. A core feature of our approach is approximating speaker-conditional statistics, that are not available, with single-speaker segments statistics, which can be evaluated, thus making possible to apply the LDA algorithm for finding the optimal discriminative subspace, using unlabeled data. To illustrate our method, we present examples of clusters generated by our approach when applied to the ICMLA 2010 Speaker Clustering Challenge datasets.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2010 Ninth International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1