2012 11th International Conference on Machine Learning and Applications最新文献

英文中文

Improving Image Segmentation Using Genetic Algorithm 利用遗传算法改进图像分割

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.134

Huynh Thi Thanh Binh, M. Loi, N. T. Thuy

This paper presents a new approach to the problem of semantic segmentation of digital images. We aim to improve the performance of some state-of-the-art approaches for the task. We exploit a new version of texton feature [28], which can encode image texture and object layout for learning a robust classifier. We propose to use a genetic algorithm for the learning parameters of weak classifiers in a boosting learning set up. We conducted extensive experiments on benchmark image datasets and compared the segmentation results with current proposed systems. The experimental results show that the performance of our system is comparable to, or even outperforms, those state-of-the-art algorithms. This is a promising approach as in this empirical study we used only texture-layout filter responses as feature and a basic setting of genetic algorithm. The framework is simple and can be extended and improved for many learning problems.

本文提出了一种新的数字图像语义分割方法。我们的目标是改进一些最先进的方法的性能。我们利用了一个新版本的texton特征[28]，它可以编码图像纹理和对象布局，以学习一个鲁棒分类器。我们提出使用遗传算法对弱分类器的学习参数进行增强学习。我们在基准图像数据集上进行了大量实验，并将分割结果与当前提出的系统进行了比较。实验结果表明，我们的系统性能与那些最先进的算法相当，甚至优于这些算法。这是一种很有前途的方法，因为在本实证研究中，我们只使用纹理布局过滤器响应作为特征和遗传算法的基本设置。该框架很简单，可以针对许多学习问题进行扩展和改进。

引用次数: 6

Differentiable Kernels in Generalized Matrix Learning Vector Quantization 广义矩阵学习向量量化中的可微核

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.231

M. Kaden, D. Nebel, M. Riedel, Michael Biehl, T. Villmann

In the present paper we investigate the application of differentiable kernel for generalized matrix learning vector quantization as an alternative kernel-based classifier, which additionally provides classification dependent data visualization. We show that the concept of differentiable kernels allows a prototype description in the data space but equipped with the kernel metric. Moreover, using the visualization properties of the original matrix learning vector quantization we are able to optimize the class visualization by inherent visualization mapping learning also in this new kernel-metric data space.

在本文中，我们研究了可微核在广义矩阵学习向量量化中的应用，作为一种替代的基于核的分类器，它还提供了与分类相关的数据可视化。我们证明了可微核的概念允许在数据空间中描述原型，但配备了核度量。此外，利用原始矩阵学习向量量化的可视化特性，我们也能够在这个新的核度量数据空间中通过固有的可视化映射学习来优化类的可视化。

引用次数: 14

Error-Driven Adaptive, Virtual Machine Model-Based Control with High Availability Platform 基于错误驱动自适应虚拟机模型的高可用性控制平台

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.133

Aman H. Bura, Bo Chen, Li Yu

An error-driven adaptive model-based control system, for optimizing machine or assembly plant performance and operation under normal and fault conditions, is proposed. In such complex system it is imperative to differentiate between a system failure and a sensor failure or between process noise and measurement noise. In this paper, we present a comprehensive approach based on a hierarchical, multilevel control techniques. The approach is designed to provide sensor measurement validation, associates a degree of integrity with each measurement, identifies faulty sensors, and estimates the actual system states and sensor values in spite of faulty measurements. Using Virtual Machine Model concept, the method is achieved in three steps: state prediction, fault detection & sensor measurement and system online update or correction. A combination of flexible least square algorithm and adaptive Kalman filtering method are implemented to learn and predict system behavior. The experimental results show that the proposed model and algorithms can efficiently identify faulty components, reduce noise errors injected by sensors/system and thus providing self healing. The Virtual Machine Model (VMM) architecture described in this paper has proved to have several advantages over traditional models, the proposed model allows easy application provisioning, upgrades and maintenance, it provides fault tolerance, speedy disaster recovery and high availability platform.

提出了一种基于误差驱动的自适应模型控制系统，用于在正常和故障条件下优化机器或装配厂的性能和运行。在这种复杂的系统中，必须区分系统故障和传感器故障，或者区分过程噪声和测量噪声。在本文中，我们提出了一种基于分层、多级控制技术的综合方法。该方法旨在提供传感器测量验证，与每次测量关联一定程度的完整性，识别故障传感器，并在错误测量的情况下估计实际系统状态和传感器值。该方法采用虚拟机模型的概念，分状态预测、故障检测与传感器测量、系统在线更新或修正三步实现。采用柔性最小二乘算法和自适应卡尔曼滤波相结合的方法对系统行为进行学习和预测。实验结果表明，所提出的模型和算法能够有效地识别故障部件，减少传感器/系统注入的噪声误差，从而实现自修复。本文所描述的虚拟机模型(Virtual Machine Model, VMM)体系结构与传统模型相比具有许多优点，所提出的模型允许简单的应用程序配置、升级和维护，它提供了容错、快速灾难恢复和高可用性平台。

{"title":"Error-Driven Adaptive, Virtual Machine Model-Based Control with High Availability Platform","authors":"Aman H. Bura, Bo Chen, Li Yu","doi":"10.1109/ICMLA.2012.133","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.133","url":null,"abstract":"An error-driven adaptive model-based control system, for optimizing machine or assembly plant performance and operation under normal and fault conditions, is proposed. In such complex system it is imperative to differentiate between a system failure and a sensor failure or between process noise and measurement noise. In this paper, we present a comprehensive approach based on a hierarchical, multilevel control techniques. The approach is designed to provide sensor measurement validation, associates a degree of integrity with each measurement, identifies faulty sensors, and estimates the actual system states and sensor values in spite of faulty measurements. Using Virtual Machine Model concept, the method is achieved in three steps: state prediction, fault detection & sensor measurement and system online update or correction. A combination of flexible least square algorithm and adaptive Kalman filtering method are implemented to learn and predict system behavior. The experimental results show that the proposed model and algorithms can efficiently identify faulty components, reduce noise errors injected by sensors/system and thus providing self healing. The Virtual Machine Model (VMM) architecture described in this paper has proved to have several advantages over traditional models, the proposed model allows easy application provisioning, upgrades and maintenance, it provides fault tolerance, speedy disaster recovery and high availability platform.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126505217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation of Susceptibility to Landslides Using Neural Networks Based on the FALCON-ART Model 基于FALCON-ART模型的神经网络滑坡易感性估计

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.122

Álvaro Viloria, C. Chang, M. C. P. Socorro, J. Viloria

Landslides are processes of erosion of catastrophic character which alter the morphology of the landscape and affect people, productive land and infrastructure. Recently, there have been several attempts to apply neural networks to predict landscape susceptibility to landslides. However, the knowledge of the neural network is expressed in a mathematical model that does not allow establishing, intuitively, relationships between the factors causing landslides. This makes it difficult for experts to interpret the output of the network, to support their results with a set of inference rules. This limitation could be overcome by a model based on the FALCON neural network, which allows not only a classification for data clustering with fuzzy logic, but also generates a set of fuzzy rules from data training. For this reason, the FALCON-ART neural network has been implemented in this study to create a set of models of susceptibility to landslides on the watershed of the Caramacate River in north-central. The input data of the model included a landslide scar map from 1992, and variables derived from a digital elevation model and a SPOT-satellite image. A cross validation determined that the best result achieved a 74% success rate in predicting areas susceptible to landslides.

山体滑坡是一种灾难性的侵蚀过程，它会改变景观的形态，影响人类、生产性土地和基础设施。最近，已经有一些尝试应用神经网络来预测景观对滑坡的易感性。然而，神经网络的知识是在数学模型中表达的，它不能直观地建立导致滑坡的因素之间的关系。这使得专家很难解释网络的输出，用一组推理规则来支持他们的结果。基于FALCON神经网络的模型可以克服这一限制，该模型不仅可以使用模糊逻辑对数据聚类进行分类，还可以从数据训练中生成一组模糊规则。为此，本研究采用FALCON-ART神经网络建立了中北部卡拉马卡特河流域滑坡易感性模型。该模型的输入数据包括1992年的滑坡疤痕图，以及从数字高程模型和spot卫星图像中导出的变量。交叉验证结果表明，预测易发生山体滑坡地区的最佳成功率为74%。

{"title":"Estimation of Susceptibility to Landslides Using Neural Networks Based on the FALCON-ART Model","authors":"Álvaro Viloria, C. Chang, M. C. P. Socorro, J. Viloria","doi":"10.1109/ICMLA.2012.122","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.122","url":null,"abstract":"Landslides are processes of erosion of catastrophic character which alter the morphology of the landscape and affect people, productive land and infrastructure. Recently, there have been several attempts to apply neural networks to predict landscape susceptibility to landslides. However, the knowledge of the neural network is expressed in a mathematical model that does not allow establishing, intuitively, relationships between the factors causing landslides. This makes it difficult for experts to interpret the output of the network, to support their results with a set of inference rules. This limitation could be overcome by a model based on the FALCON neural network, which allows not only a classification for data clustering with fuzzy logic, but also generates a set of fuzzy rules from data training. For this reason, the FALCON-ART neural network has been implemented in this study to create a set of models of susceptibility to landslides on the watershed of the Caramacate River in north-central. The input data of the model included a landslide scar map from 1992, and variables derived from a digital elevation model and a SPOT-satellite image. A cross validation determined that the best result achieved a 74% success rate in predicting areas susceptible to landslides.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"30 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125700505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Distributed Privacy Preserving Decision Support System for Predicting Hospitalization Risk in Hospitals with Insufficient Data 数据不足医院住院风险预测的分布式隐私保护决策支持系统

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.180

George Mathew, Zoran Obradovic

Building prediction models for suggestive knowledge from multiple sources dynamically is of great interest from a clinical decision support point of view. This is valuable in situations where the local clinical data repository does not have sufficient number of records to draw conclusions from. However, due to privacy concerns, hospitals are reluctant to divulge patient records. Consequently, a distributed model building mechanism that can use just the statistics from multiple hospitals' databases is valuable. Our DIDT algorithm builds a model in that fashion. In this study, using National Inpatient Sample (NIS) data for 2009, we demonstrate that DIDT algorithm can be used to help collaboratively build a better decision-making model in situations where hospitals have small number of records that are insufficient to make good local models. Based on 262 attributes used for model building, we showed that 9 collaborating hospitals each with less than 100 cases of hospitalizations related to diabetes were able to achieve 9.9% improvement in accuracies of hospitalization prediction collectively using a distributed model as compared to relying on local models developed on their own. When relying on local risk prediction models for diabetes at these 9 hospitals, 159 of 357 patients were misclassified and prediction was impossible for another 16 patients. Our integrated model reduced the misclassification to 138 effectively providing accurate early diagnostics to 37 additional patients. We also introduce the concept of banding to improve DIDT algorithm so as to logically combine multiple hospitals when large number of hospitals is involved for reduction in cross-validation folds.

从临床决策支持的角度来看，动态地为来自多个来源的暗示性知识建立预测模型是非常有趣的。在当地临床数据存储库没有足够数量的记录来得出结论的情况下，这是有价值的。然而，出于隐私考虑，医院不愿透露患者记录。因此，可以只使用来自多个医院数据库的统计数据的分布式模型构建机制是有价值的。我们的DIDT算法以这种方式构建一个模型。在本研究中，我们使用2009年的国家住院病人样本(NIS)数据，我们证明了在医院记录较少，不足以建立良好的本地模型的情况下，DIDT算法可以帮助协同构建更好的决策模型。基于用于模型构建的262个属性，我们表明，与依赖各自开发的本地模型相比，使用分布式模型的9家合作医院(每家医院与糖尿病相关的住院病例少于100例)能够实现9.9%的住院预测准确性提高。9家医院在依赖当地糖尿病风险预测模型时，357例患者中有159例分类错误，另有16例患者无法预测。我们的综合模型将错误分类减少到138例，有效地为另外37例患者提供了准确的早期诊断。我们还引入了条带的概念来改进DIDT算法，以便在涉及大量医院时将多个医院进行逻辑组合，减少交叉验证折叠。

{"title":"Distributed Privacy Preserving Decision Support System for Predicting Hospitalization Risk in Hospitals with Insufficient Data","authors":"George Mathew, Zoran Obradovic","doi":"10.1109/ICMLA.2012.180","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.180","url":null,"abstract":"Building prediction models for suggestive knowledge from multiple sources dynamically is of great interest from a clinical decision support point of view. This is valuable in situations where the local clinical data repository does not have sufficient number of records to draw conclusions from. However, due to privacy concerns, hospitals are reluctant to divulge patient records. Consequently, a distributed model building mechanism that can use just the statistics from multiple hospitals' databases is valuable. Our DIDT algorithm builds a model in that fashion. In this study, using National Inpatient Sample (NIS) data for 2009, we demonstrate that DIDT algorithm can be used to help collaboratively build a better decision-making model in situations where hospitals have small number of records that are insufficient to make good local models. Based on 262 attributes used for model building, we showed that 9 collaborating hospitals each with less than 100 cases of hospitalizations related to diabetes were able to achieve 9.9% improvement in accuracies of hospitalization prediction collectively using a distributed model as compared to relying on local models developed on their own. When relying on local risk prediction models for diabetes at these 9 hospitals, 159 of 357 patients were misclassified and prediction was impossible for another 16 patients. Our integrated model reduced the misclassification to 138 effectively providing accurate early diagnostics to 37 additional patients. We also introduce the concept of banding to improve DIDT algorithm so as to logically combine multiple hospitals when large number of hospitals is involved for reduction in cross-validation folds.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116148718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

CPG and Reflexes Combined Adaptive Walking Control for AIBO CPG和反射相结合的AIBO自适应步行控制

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.81

Xianchao Zhao, Jiaqi Zhang, Chenkun Qi

From basic neuro-physiological evidences, it is now generally accepted that animals' walking control is subject to the combination function of central pattern generator(CPG) located at the spinal cords and reflexes from the peripheral stimulus. Since phase oscillators have the advantage of mathematical tractability, it's convenient to adjust the phase relationship between them. In this paper, coupled phase oscillators were designed to simulate CPG's behavior and establish vestibular reflex with feedbacks from accelerator sensors. Afterward, the synchronization condition of this proposed CPG model was studied. Forward and backward walking, gait transfers between trot and walk were realized as well. With feedbacks, AIBO detected uphill and downhill terrain and changed its posture automatically to fit for the new environment. Simulations were done in Webots to verify this method.

从基本的神经生理学证据来看，目前普遍认为动物的行走控制是由位于脊髓的中枢模式发生器(central pattern generator, CPG)和外周刺激反射共同作用的结果。由于相位振荡器具有数学上易于处理的优点，因此可以方便地调整它们之间的相位关系。本文设计了耦合相位振荡器来模拟CPG的行为，并利用加速器传感器的反馈建立前庭反射。随后，对该CPG模型的同步条件进行了研究。实现了向前和向后行走，小跑和步行之间的步态转换。通过反馈，AIBO可以检测到上坡和下坡的地形，并自动改变姿势以适应新的环境。在Webots中进行了仿真以验证该方法。

引用次数: 5

Writing with Style: Venue Classification 文体写作:场所分类

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.50

Zaihan Yang, Brian D. Davison

As early as the late nineteenth century, scientists began research in author attribution, mostly by identifying the writing styles of authors. Following research over centuries has repeatedly demonstrated that people tend to have distinguishable writing styles. Today we not only have more authors, but we also have all different kinds of publications: journals, conferences, workshops, etc., covering different topics and requiring different writing formats. In spite of successful research in author attribution, no work has been carried out to find out whether publication venues are similarly distinguishable by their writing styles. Our work takes the first step into exploring this problem. By approaching the problem using a traditional classification method, we extract three types of writing style-based features and carry out detailed experiments in examining the different impacts among features, and classification techniques, as well as the influence of venue content, topics and genres. Experiments on real data from ACM and Cite Seer digital libraries demonstrate our approach to be an effective method in distinguishing venues in terms of their writing styles.

早在19世纪末，科学家们就开始研究作者的归属，主要是通过识别作者的写作风格。几个世纪以来的研究一再表明，人们倾向于有不同的写作风格。今天我们不仅有了更多的作者，而且我们也有了各种各样的出版物:期刊、会议、研讨会等，涵盖了不同的主题，需要不同的写作格式。尽管在作者归属方面的研究取得了成功，但还没有人研究出版场所的写作风格是否具有相似的可区分性。我们的工作为探索这个问题迈出了第一步。通过传统的分类方法，我们提取了三种基于写作风格的特征，并进行了详细的实验，研究了特征和分类技术之间的不同影响，以及场地内容、主题和体裁的影响。对ACM和Cite Seer数字图书馆的真实数据进行的实验表明，我们的方法是根据写作风格区分场所的有效方法。

{"title":"Writing with Style: Venue Classification","authors":"Zaihan Yang, Brian D. Davison","doi":"10.1109/ICMLA.2012.50","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.50","url":null,"abstract":"As early as the late nineteenth century, scientists began research in author attribution, mostly by identifying the writing styles of authors. Following research over centuries has repeatedly demonstrated that people tend to have distinguishable writing styles. Today we not only have more authors, but we also have all different kinds of publications: journals, conferences, workshops, etc., covering different topics and requiring different writing formats. In spite of successful research in author attribution, no work has been carried out to find out whether publication venues are similarly distinguishable by their writing styles. Our work takes the first step into exploring this problem. By approaching the problem using a traditional classification method, we extract three types of writing style-based features and carry out detailed experiments in examining the different impacts among features, and classification techniques, as well as the influence of venue content, topics and genres. Experiments on real data from ACM and Cite Seer digital libraries demonstrate our approach to be an effective method in distinguishing venues in terms of their writing styles.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection 海洋水轮机故障数据类不平衡对可靠状态检测的影响研究

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.53

Janell Duhaney, T. Khoshgoftaar, Amri Napolitano

Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner's ability to reliably identify changes in the ocean turbine's operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.

在许多现实世界的数据集中，类不平衡是很普遍的。当数据集中的一个或多个类中的示例数量明显少于其余类中的实例数量时，就会出现这种情况。当在高度不平衡的数据集上训练时，传统的机器学习技术通常可以简单地忽略少数类，并将所有实例标记为多数类，以最大限度地提高准确性。这一问题已经在许多领域得到了研究，但很少或没有关于故障数据中类不平衡对海洋水轮机状态监测的影响的研究。这项研究通过深入了解振动数据中的班级不平衡如何影响学习者可靠地识别海洋涡轮机运行状态变化的能力，从而首次努力弥合这一差距。为此，我们在四个具有不同类别分布(一个平衡和三个不平衡)的数据集上训练时，经验地评估了三种流行但非常不同的机器学习算法的性能，以区分正常状态和异常状态。本研究中使用的所有数据都是从海洋涡轮机的试验台收集的，并进行了采样以模拟不同程度的不平衡。我们在这里发现，与其他领域一样，当在高度倾斜的类分布数据上进行训练时，这三个学习器似乎总体上受到了影响(0.1%的样本处于故障/异常状态，而其余99.9%的样本处于正常运行状态)。然而，值得注意的是，逻辑回归和决策树分类器在只有5%的示例总数代表异常状态(其余95%因此表明正常操作)时的表现优于没有不平衡时的表现。

{"title":"Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection","authors":"Janell Duhaney, T. Khoshgoftaar, Amri Napolitano","doi":"10.1109/ICMLA.2012.53","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.53","url":null,"abstract":"Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner's ability to reliably identify changes in the ocean turbine's operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125272469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Convolutional Neural Networks Applied to Human Face Classification 卷积神经网络在人脸分类中的应用

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.177

Brian Cheung

Convolutional neural network models have covered a broad scope of computer vision applications, achieving competitive performance with minimal domain knowledge. In this work, we apply such a model to a task designed to deter automated systems. We trained a convolutional neural network to distinguish between images of human faces from computer generated avatars as part of the ICMLA 2012 Face Recognition Challenge. The network achieved a classification accuracy of 99% on the Avatar CAPTCHA dataset. Furthermore, we demonstrated the potential of utilizing support vector machines on the same problem and achieved equally competitive performance.

卷积神经网络模型已经覆盖了广泛的计算机视觉应用，以最少的领域知识实现了具有竞争力的性能。在这项工作中，我们将这样的模型应用于旨在阻止自动化系统的任务。作为ICMLA 2012人脸识别挑战赛的一部分，我们训练了一个卷积神经网络来区分人脸图像和计算机生成的虚拟形象。该网络在Avatar CAPTCHA数据集上实现了99%的分类准确率。此外，我们展示了在相同的问题上使用支持向量机的潜力，并取得了同样有竞争力的性能。

引用次数: 34

Classification of Urban Scenes from Geo-referenced Images in Urban Street-View Context 城市街景背景下基于地理参考图像的城市场景分类

2012 11th International Conference on Machine Learning and Applications

Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.171

C. Iovan, David Picard, Nicolas Thome, M. Cord

This paper addresses the challenging problem of scene classification in street-view georeferenced images of urban environments. More precisely, the goal of this task is semantic image classification, consisting in predicting in a given image, the presence or absence of a pre-defined class (e.g. shops, vegetation, etc.). The approach is based on the BOSSA representation, which enriches the Bag of Words (BoW) model, in conjunction with the Spatial Pyramid Matching scheme and kernel-based machine learning techniques. The proposed method handles problems that arise in large scale urban environments due to acquisition conditions (static and dynamic objects/pedestrians) combined with the continuous acquisition of data along the vehicle's direction, the varying light conditions and strong occlusions (due to the presence of trees, traffic signs, cars, etc.) giving rise to high intra-class variability. Experiments were conducted on a large dataset of high resolution images collected from two main avenues from the 12th district in Paris and the approach shows promising results.

本文研究了城市环境街景地理参考图像的场景分类问题。更准确地说，该任务的目标是语义图像分类，包括在给定图像中预测预定义类(例如商店，植被等)的存在或不存在。该方法基于BOSSA表示，它丰富了单词袋(BoW)模型，并结合了空间金字塔匹配方案和基于核的机器学习技术。所提出的方法处理了在大规模城市环境中由于采集条件(静态和动态物体/行人)以及沿着车辆方向连续采集数据、不同的光照条件和强烈的遮挡(由于存在树木、交通标志、汽车等)而产生的高类内可变性而产生的问题。在巴黎12区的两条主要道路上收集的高分辨率图像的大型数据集上进行了实验，该方法显示了令人满意的结果。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 11th International Conference on Machine Learning and Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀