2011 10th International Conference on Machine Learning and Applications and Workshops最新文献

英文中文

Simple Reinforcement Learning for Small-Memory Agent 小记忆体的简单强化学习

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.127

A. Notsu, Katsuhiro Honda, H. Ichihashi, Yuki Komori

In this paper, we propose Simple Reinforcement Learning for a reinforcement learning agent that has small memory. In the real world, learning is difficult because there are an infinite number of states and actions that need a large number of stored memories and learning times. To solve a problem, estimated values are categorized as ``GOOD" or ``NO GOOD" in the reinforcement learning process. Additionally, the alignment sequence of estimated values is changed because they are regarded as an important sequence themselves. We conducted some simulations and observed the influence of our methods. Several simulation results show no bad influence on learning speed.

在本文中，我们提出了一种简单的强化学习方法，用于具有小内存的强化学习代理。在现实世界中，学习是困难的，因为有无数的状态和动作需要大量的存储记忆和学习时间。为了解决问题，在强化学习过程中，估计值被分类为“GOOD”或“NO GOOD”。此外，由于估计值本身被视为一个重要序列，因此改变了估计值的对齐顺序。我们进行了一些模拟，并观察了我们的方法的影响。多个仿真结果表明，对学习速度没有不良影响。

引用次数: 7

Comparison of Two Methods for Finding Biomedical Categories in Medline Medline中两种生物医学分类查找方法的比较

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.50

Lana Yeganova, Won Kim, Donald C. Comeau, W. Wilbur

In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline®. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent headwords from noun phrases in Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms, potential biomedical categories. We study and compare these two alternative sets of terms to identify semantic categories in Medline. Our method is completely data-driven.

在本文中，我们描述并比较了Medline®中自动学习有意义的生物医学类别的两种方法。第一种方法是一种简单的统计方法，它使用词性和频率信息从Medline的名词短语中提取频繁的标题词列表。第二种方法实现了一种基于对齐的技术，用于学习指示一对名词短语之间的上下位/上位关系的常见通用模式。然后，我们将这些模式应用于Medline，以收集频繁的中词，潜在的生物医学类别。我们研究并比较了这两组备选术语，以确定Medline中的语义类别。我们的方法完全是数据驱动的。

引用次数: 0

Combining Corpus-Based Features for Selecting Best Natural Language Sentences 结合基于语料库的特征选择最佳自然语言句子

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.170

Foaad Khosmood, R. Levinson

Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora. Specifically, we use the Microsoft Web N-Gram service and the text of the Brown Corpus to extract features from all candidate sentences and compare them against each other. We consider three feature combination methods: A handcrafted decision tree, linear regression and linear powerset regression. We find that while each method has particular strengths, the linear power set regression performs best against our human-evaluated test data.

自然语言文本的自动释义有许多有趣的应用，从帮助更好的翻译到生成更好、更合适的风格语言。在本文中，我们关注的问题是从一组机器生成的释义句子中挑选出最好的英语句子，每个句子都被设计成与人类生成的原文表达相同的内容。我们提出了一个基于大型语料库实例的句子评分系统。具体来说，我们使用Microsoft Web N-Gram服务和Brown语料库的文本从所有候选句子中提取特征并相互比较。我们考虑了三种特征组合方法:手工决策树、线性回归和线性幂集回归。我们发现，虽然每种方法都有自己的优势，但线性幂集回归在人类评估的测试数据中表现最好。

引用次数: 1

Energy Efficiency for Unmanned Aerial Vehicles 无人机的能源效率

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.159

Balemir Uragun

This paper emphasizes the energy efficiency issue for unmanned aerial vehicles (UAVs). The power requirement for an UAV system was modeled with the aid of energy requiring from all possible sub-systems. In this model, a single UAV system was broken down by the six power consumption components. The scientific research areas and emerging technologies assisted UAV design stages involved in the mainly "six-part" load components; those are (1) Control, (2) Data processing, (3) Communication, (4) Payloads, including sensors with actuators, (5) External Loads as system perturbation, and (6) System Dynamicity with a performance criteria.

本文重点研究了无人机的能源效率问题。利用所有可能子系统的能量需求对无人机系统的功率需求进行建模。在该模型中，单个无人机系统被分解为六个功耗组件。科研领域和新兴技术辅助无人机设计阶段主要涉及“六部分”载荷组件;它们是(1)控制，(2)数据处理，(3)通信，(4)有效载荷，包括带有执行器的传感器，(5)作为系统扰动的外部载荷，以及(6)具有性能标准的系统动态。

引用次数: 50

The Combination of Clinical, Dose-Related and Imaging Features Helps Predict Radiation-Induced Normal-Tissue Toxicity in Lung-cancer Patients -- An in-silico Trial Using Machine Learning Techniques 临床、剂量相关和影像学特征的结合有助于预测肺癌患者辐射诱导的正常组织毒性——一项使用机器学习技术的计算机试验

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.139

G. Nalbantov, A. Dekker, D. Ruysscher, P. Lambin, E. Smirnov

The amount of delivered radiation dose to the tumor in non-small cell lung cancer (NSCLC) patients is limited by the negative side effects on normal tissues. The most dose-limiting factor in radiotherapy is the radiation-induced lung toxicity (RILT). RILT is generally measured semi-quantitatively, by a dyspnea, or shortness-of-breath, score. In general, about 20-30% of patients develop RILT several months after treatment, and in about 70% of the patients the delivered dose is insufficient to control the tumor growth. Ideally, if the RILT score would be known in advance, then the dose treatment plan for the low-toxicity-risk patients could be adjusted so that higher dose is delivered to the tumor to better control it. A number of possible predictors of RILT have been proposed in the literature, including dose-related and clinical/demographic patient characteristics available prior to radiotherapy. In addition, the use of imaging features -- which are noninvasive in nature - has been gaining momentum. Thus, anatomic as well as functional/metabolic information from CT and PET scanner images respectively are used in daily clinical practice, which provide further information about the status of a patient. In this study we assessed whether machine learning techniques can successfully be applied to predict post-radiation lung damage, proxied by dyspnea score, based on clinical, dose-related (dosimetric) and image features. Our dataset included 78 NSCLC patients. The patients were divided into two groups: no-deterioration-of-dyspnea, and deterioration-of-dyspnea patients. Several machine-learning binary classifiers were applied to discriminate the two groups. The results, evaluated using the area under the ROC curve in a cross-validation procedure, are highly promising. This outcome could open the possibility to deliver better, individualized dose-treatment plans for lung cancer patients and help the overall clinical decision making (treatment) process.

非小细胞肺癌(NSCLC)患者的肿瘤放射剂量受到对正常组织的负面影响的限制。放射治疗中最大的剂量限制因素是辐射诱发的肺毒性(RILT)。RILT通常是半定量测量，通过呼吸困难或呼吸短促评分。一般来说，约20-30%的患者在治疗几个月后出现RILT，约70%的患者给予的剂量不足以控制肿瘤生长。理想情况下，如果提前知道RILT评分，就可以调整低毒性风险患者的剂量治疗计划，向肿瘤提供更高的剂量，更好地控制肿瘤。文献中已经提出了许多可能的RILT预测因素，包括放射治疗前可用的剂量相关和临床/人口统计学患者特征。此外，使用非侵入性的成像特征已经获得了动力。因此，在日常临床实践中分别使用CT和PET扫描图像的解剖以及功能/代谢信息，这提供了关于患者状态的进一步信息。在这项研究中，我们评估了机器学习技术是否可以成功地应用于预测辐射后肺损伤，以呼吸困难评分为代表，基于临床，剂量相关(剂量学)和图像特征。我们的数据集包括78名非小细胞肺癌患者。患者分为两组:无呼吸困难恶化组和呼吸困难恶化组。使用几个机器学习二元分类器来区分两组。结果，在交叉验证程序中使用ROC曲线下的面积进行评估，是非常有希望的。这一结果可能为肺癌患者提供更好的、个性化的剂量治疗方案，并有助于整体临床决策(治疗)过程。

{"title":"The Combination of Clinical, Dose-Related and Imaging Features Helps Predict Radiation-Induced Normal-Tissue Toxicity in Lung-cancer Patients -- An in-silico Trial Using Machine Learning Techniques","authors":"G. Nalbantov, A. Dekker, D. Ruysscher, P. Lambin, E. Smirnov","doi":"10.1109/ICMLA.2011.139","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.139","url":null,"abstract":"The amount of delivered radiation dose to the tumor in non-small cell lung cancer (NSCLC) patients is limited by the negative side effects on normal tissues. The most dose-limiting factor in radiotherapy is the radiation-induced lung toxicity (RILT). RILT is generally measured semi-quantitatively, by a dyspnea, or shortness-of-breath, score. In general, about 20-30% of patients develop RILT several months after treatment, and in about 70% of the patients the delivered dose is insufficient to control the tumor growth. Ideally, if the RILT score would be known in advance, then the dose treatment plan for the low-toxicity-risk patients could be adjusted so that higher dose is delivered to the tumor to better control it. A number of possible predictors of RILT have been proposed in the literature, including dose-related and clinical/demographic patient characteristics available prior to radiotherapy. In addition, the use of imaging features -- which are noninvasive in nature - has been gaining momentum. Thus, anatomic as well as functional/metabolic information from CT and PET scanner images respectively are used in daily clinical practice, which provide further information about the status of a patient. In this study we assessed whether machine learning techniques can successfully be applied to predict post-radiation lung damage, proxied by dyspnea score, based on clinical, dose-related (dosimetric) and image features. Our dataset included 78 NSCLC patients. The patients were divided into two groups: no-deterioration-of-dyspnea, and deterioration-of-dyspnea patients. Several machine-learning binary classifiers were applied to discriminate the two groups. The results, evaluated using the area under the ROC curve in a cross-validation procedure, are highly promising. This outcome could open the possibility to deliver better, individualized dose-treatment plans for lung cancer patients and help the overall clinical decision making (treatment) process.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133517809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Intelligent Decision Support Tool for a Travelling Wave Ultrasonic Motor Based on k-Nearest Neighbor Algorithm 基于k-最近邻算法的行波超声电机智能决策支持工具

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.33

Ş. Sağiroğlu, H. Kahraman, M. Yesilbudak, I. Colak

Driving frequency, amplitude and phase difference of two-phase sinusoidal voltages are the input parameters which have influence on speed stability of travelling wave ultrasonic motors (TWUSMs).These parameters are also time-varying due to the variations in operating temperature. In addition, a complete mathematical model of the TWUSM has not been derived yet. Owing to these reasons, a machine learning approach is required for determining the compatibility of operating parameters related to speed stability of TWUSMs. For this purpose, an intelligent decision support tool has been designed for TWUSMs in this study. The input parameters such as driving frequency, amplitude, phase difference of two-phase sinusoidal voltages and operating temperature were evaluated by the k-nearest neighbor algorithm in the decision support tool. The results have shown that the proposed tool provides effective results in the compatibility determination of operating parameters related to speed stability of TWUSMs.

两相正弦电压的驱动频率、幅值和相位差是影响行波超声电机速度稳定性的输入参数。由于工作温度的变化，这些参数也随时间变化。此外，TWUSM的完整数学模型尚未导出。由于这些原因，需要一种机器学习方法来确定与twusm速度稳定性相关的操作参数的兼容性。为此，本研究为twusm设计了智能决策支持工具。采用决策支持工具中的k近邻算法对驱动频率、幅值、两相正弦电压相位差和工作温度等输入参数进行评估。结果表明，该工具在TWUSMs速度稳定性相关操作参数的相容性测定中提供了有效的结果。

引用次数: 0

Predicting "Essential" Genes across Microbial Genomes: A Machine Learning Approach 预测微生物基因组中的“必要”基因:一种机器学习方法

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.114

Krishna Palaniappan, Sumitra Mukherjee

Essential genes constitute the minimal set of genes an organism needs for its survival. Identification of essential genes is of theoretical interest to genome biologist and has practical applications in medicine and biotechnology. This paper presents and evaluates machine learning approaches to the problem of predicting essential genes in microbial genomes using solely sequence derived input features. We investigate three different supervised classification methods -- Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Tree (DT) -- for this binary classification task. The classifiers are trained and evaluated using 37830 examples obtained from 14 experimentally validated, taxonomically diverse microbial genomes whose essential genes are known. A set of 52 relevant genomic sequence derived features is used as input for the classifiers. The models were evaluated using novel blind testing schemes Leave-One-Genome-Out (LOGO) and Leave-One-Taxon-group-Out (LOTO) and 10-fold stratified cross validation (10-f-cv) strategy on both the full multi-genome datasets and its class imbalance reduced variants. Experimental results (10 X 10-f-cv) indicate SVM and ANN perform better than DT with Area under the Receiver Operating Characteristics (AU-ROC) scores of 0.80, 0.79 and 0.68 respectively. This study demonstrates that supervised machine learning methods can be used to predict essential genes in microbial genomes by using only gene sequence and features derived from it. LOGO and LOTO Blind test results suggest that the trained classifiers generalize across genomes and taxonomic boundaries.

基本基因构成了生物体生存所需的最小基因集。必需基因的鉴定是基因组生物学家的理论兴趣，在医学和生物技术方面具有实际应用。本文提出并评估了机器学习方法来预测微生物基因组中仅使用序列衍生输入特征的基本基因的问题。我们研究了三种不同的监督分类方法——支持向量机(SVM)、人工神经网络(ANN)和决策树(DT)——用于这个二元分类任务。分类器的训练和评估使用了37830个样本，这些样本来自14个经过实验验证的、分类上多样化的微生物基因组，这些基因组的基本基因是已知的。一组52个相关的基因组序列衍生特征被用作分类器的输入。采用新颖的盲检验方案Leave-One-Genome-Out (LOGO)和Leave-One-Taxon-group-Out (LOTO)，以及10倍分层交叉验证(10-f-cv)策略，对完整的多基因组数据集及其类失衡减少的变体进行了模型评估。实验结果(10 X 10-f-cv)表明，SVM和ANN在Receiver Operating characteristic (AU-ROC)下的面积分别为0.80、0.79和0.68，优于DT。本研究表明，监督机器学习方法可以通过仅使用基因序列和从中衍生的特征来预测微生物基因组中的必需基因。LOGO和LOTO盲测试结果表明，训练的分类器可以跨基因组和分类边界进行泛化。

{"title":"Predicting \"Essential\" Genes across Microbial Genomes: A Machine Learning Approach","authors":"Krishna Palaniappan, Sumitra Mukherjee","doi":"10.1109/ICMLA.2011.114","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.114","url":null,"abstract":"Essential genes constitute the minimal set of genes an organism needs for its survival. Identification of essential genes is of theoretical interest to genome biologist and has practical applications in medicine and biotechnology. This paper presents and evaluates machine learning approaches to the problem of predicting essential genes in microbial genomes using solely sequence derived input features. We investigate three different supervised classification methods -- Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Tree (DT) -- for this binary classification task. The classifiers are trained and evaluated using 37830 examples obtained from 14 experimentally validated, taxonomically diverse microbial genomes whose essential genes are known. A set of 52 relevant genomic sequence derived features is used as input for the classifiers. The models were evaluated using novel blind testing schemes Leave-One-Genome-Out (LOGO) and Leave-One-Taxon-group-Out (LOTO) and 10-fold stratified cross validation (10-f-cv) strategy on both the full multi-genome datasets and its class imbalance reduced variants. Experimental results (10 X 10-f-cv) indicate SVM and ANN perform better than DT with Area under the Receiver Operating Characteristics (AU-ROC) scores of 0.80, 0.79 and 0.68 respectively. This study demonstrates that supervised machine learning methods can be used to predict essential genes in microbial genomes by using only gene sequence and features derived from it. LOGO and LOTO Blind test results suggest that the trained classifiers generalize across genomes and taxonomic boundaries.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

An Intelligent Power Factor Correction Approach Based on Linear Regression and Ridge Regression Methods 基于线性回归和岭回归的智能功率因数校正方法

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.34

R. Bayindir, Murat Gök, E. Kabalci, O. Kaplan

This study introduces an intelligent power factor correction approach based on Linear Regression (LR) and Ridge Regression (RR) methods. The 10-fold Cross Validation (CV) test protocol has been used to evaluate the performance. The best test performance has been obtained from the LR in comparison with RR. The empirical results have evaluated that the selected intelligent compensators developed in this work might overcome the problems met in the literature providing accurate, simple and low-cost solution for compensation.

提出了一种基于线性回归(LR)和岭回归(RR)方法的智能功率因数校正方法。使用10倍交叉验证(CV)测试协议来评估性能。与RR相比，LR获得了最佳的测试性能。实证结果表明，本文开发的智能补偿器可以克服文献中遇到的问题，提供准确、简单和低成本的补偿方案。

引用次数: 7

Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection 用于取证计算的文档聚类:一种改进计算机检测的方法

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.59

Luís Filipe da Cruz Nassif, Eduardo R. Hruschka

In computer forensic analysis, hundreds of thousands of files are usually examined. Much of those files consist of unstructured text, whose analysis by computer examiners is difficult to be performed. In this context, automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. We present an approach that applies clustering algorithms to forensic analysis of computers seized in police investigations. We illustrate the proposed approach by carrying out experimentation with five clustering algorithms (K-means, K-medoids, Single Link, Complete Link, and Average Link) applied to five datasets obtained from computers seized in real-world investigations. In addition, two relative validity indexes were used to automatically estimate the number of clusters. Related studies in the literature are significantly more limited than our study. Our experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. If suitably initialized, partitional algorithms (K-means and K-medoids) can also yield to very good results. Finally, we also present and discuss practical results that can be useful for researchers and practitioners of forensic computing.

在计算机取证分析中，通常要检查数十万个文件。这些文件大多由非结构化文本组成，计算机审查员很难对其进行分析。在这种情况下，自动化的分析方法是非常有趣的。特别是，聚类文档的算法可以促进从被分析的文档中发现新的和有用的知识。我们提出了一种方法，将聚类算法应用于警方调查中查获的计算机的法医分析。我们通过将五种聚类算法(K-means, k - medioids, Single Link, Complete Link和Average Link)应用于从现实世界调查中捕获的计算机中获得的五个数据集的实验来说明所提出的方法。此外，采用两个相对效度指标自动估计聚类数量。文献中相关研究的局限性明显大于我们的研究。实验表明，平均链接算法和完全链接算法为我们的应用领域提供了最好的结果。如果适当地初始化，分区算法(K-means和K-medoids)也可以产生非常好的结果。最后，我们还提出并讨论了对法医计算的研究人员和实践者有用的实际结果。

{"title":"Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection","authors":"Luís Filipe da Cruz Nassif, Eduardo R. Hruschka","doi":"10.1109/ICMLA.2011.59","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.59","url":null,"abstract":"In computer forensic analysis, hundreds of thousands of files are usually examined. Much of those files consist of unstructured text, whose analysis by computer examiners is difficult to be performed. In this context, automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. We present an approach that applies clustering algorithms to forensic analysis of computers seized in police investigations. We illustrate the proposed approach by carrying out experimentation with five clustering algorithms (K-means, K-medoids, Single Link, Complete Link, and Average Link) applied to five datasets obtained from computers seized in real-world investigations. In addition, two relative validity indexes were used to automatically estimate the number of clusters. Related studies in the literature are significantly more limited than our study. Our experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. If suitably initialized, partitional algorithms (K-means and K-medoids) can also yield to very good results. Finally, we also present and discuss practical results that can be useful for researchers and practitioners of forensic computing.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Predictive Subspace Clustering 预测子空间聚类

2011 10th International Conference on Machine Learning and Applications and Workshops

Pub Date : 2011-12-18 DOI: 10.1109/ICMLA.2011.117

B. McWilliams, G. Montana

The problem of detecting clusters in high-dimensional data is increasingly common in machine learning applications, for instance in computer vision and bioinformatics. Recently, a number of approaches in the field of subspace clustering have been proposed which search for clusters in subspaces of unknown dimensions. Learning the number of clusters, the dimension of each subspace, and the correct assignments is a challenging task, and many existing algorithms often perform poorly in the presence of subspaces that have different dimensions and possibly overlap, or are otherwise computationally expensive. In this work we present a novel approach to subspace clustering that learns the numbers of clusters and the dimensionality of each subspace in an efficient way. We assume that the data points in each cluster are well represented in low-dimensions by a PCA model. We propose a measure of predictive influence of data points modelled by PCA which we minimise to drive the clustering process. The proposed predictive subspace clustering algorithm is assessed on both simulated data and on the popular Yale faces database where state-of-the-art performance and speed are obtained.

检测高维数据中的聚类问题在机器学习应用中越来越普遍，例如在计算机视觉和生物信息学中。近年来，在子空间聚类领域提出了许多在未知维数的子空间中搜索聚类的方法。学习集群的数量、每个子空间的维度以及正确的分配是一项具有挑战性的任务，并且许多现有算法在存在具有不同维度且可能重叠的子空间时通常表现不佳，或者计算成本很高。在这项工作中，我们提出了一种新的子空间聚类方法，该方法可以有效地学习聚类的数量和每个子空间的维数。我们假设每个聚类中的数据点通过PCA模型在低维中很好地表示。我们提出了一个由PCA建模的数据点的预测影响的度量，我们最小化以驱动聚类过程。提出的预测子空间聚类算法在模拟数据和流行的耶鲁人脸数据库上进行了评估，获得了最先进的性能和速度。

{"title":"Predictive Subspace Clustering","authors":"B. McWilliams, G. Montana","doi":"10.1109/ICMLA.2011.117","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.117","url":null,"abstract":"The problem of detecting clusters in high-dimensional data is increasingly common in machine learning applications, for instance in computer vision and bioinformatics. Recently, a number of approaches in the field of subspace clustering have been proposed which search for clusters in subspaces of unknown dimensions. Learning the number of clusters, the dimension of each subspace, and the correct assignments is a challenging task, and many existing algorithms often perform poorly in the presence of subspaces that have different dimensions and possibly overlap, or are otherwise computationally expensive. In this work we present a novel approach to subspace clustering that learns the numbers of clusters and the dimensionality of each subspace in an efficient way. We assume that the data points in each cluster are well represented in low-dimensions by a PCA model. We propose a measure of predictive influence of data points modelled by PCA which we minimise to drive the clustering process. The proposed predictive subspace clustering algorithm is assessed on both simulated data and on the popular Yale faces database where state-of-the-art performance and speed are obtained.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130791973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 10th International Conference on Machine Learning and Applications and Workshops

全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.

﹀