首页 > 最新文献

2010 Ninth International Conference on Machine Learning and Applications最新文献

英文 中文
Extreme Volume Detection for Managed Print Services 托管打印服务的极限体积检测
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.95
J. Handley, Marie-Luise Schneider, Victor Ciriza, J. Earl
A managed print service (MPS) manages the printing, scanning and facsimile devices in an enterprise to control cost and improve availability. Services include supplies replenishment, maintenance, repair, and use reporting. Customers are billed per page printed. Data are collected from a network of devices to facilitate management. The number of pages printed per device must be accurately counted to fairly bill the customer. Software errors, hardware changes, repairs, and human error all contribute to “meter reads” that are exceptionally high and are apt to be challenged by the customer were they to be billed. Account managers periodically review data for each device in an account. This process is tedious and time consuming and an automated solution is desired. Exceptional print volumes are not always salient and detecting them statistically is prone to errors owing to nonstationarity of the data. Mean levels and variances change over time and usage is highly auto correlated which precludes simple detection methods based on deviations from an average background. A solution must also be computationally inexpensive and require little auxiliary storage because hundreds of thousands of streams of device data must be processed. We present an algorithm and system for online detection of extreme print volumes that uses dynamic linear models (DLM) with variance learning. A DLM is a state space time series model comprising a random mean level system process and a random observation process. Both components are updated using Bayesian statistics. After each update, a forecasted value and its estimated variance are calculated. A read is flagged as exceptionally high if its value is highly unlikely with respect to a forecasted value and its standard deviation. We provide implementation details and results of a field test in which error rate was decreased from 26.4% to 0.5% on 728 observed meter reads.
管理打印服务(MPS)管理企业中的打印、扫描和传真设备,以控制成本并提高可用性。服务包括补给品、维护、修理和使用报告。客户按打印页数收费。从设备网络中收集数据,方便管理。必须准确计算每台设备打印的页数,以便公平地向客户收费。软件错误、硬件更改、维修和人为错误都会导致“仪表读数”异常高,并且很容易受到客户的质疑。客户经理定期审查客户中每个设备的数据。这个过程冗长且耗时,需要一个自动化的解决方案。由于数据的非平稳性,异常的印刷量并不总是显著的,并且在统计上检测它们容易出错。平均水平和方差随时间变化,使用情况高度自相关,这使得基于平均背景偏差的简单检测方法无法实现。解决方案还必须在计算上便宜,并且需要很少的辅助存储,因为必须处理数十万个设备数据流。我们提出了一种使用动态线性模型(DLM)和方差学习的在线检测极端打印量的算法和系统。DLM是一个由随机平均水平系统过程和随机观测过程组成的状态空间时间序列模型。这两个组件都使用贝叶斯统计更新。每次更新后,计算预测值及其估计方差。如果读数的值相对于预测值及其标准偏差极不可能,则将其标记为异常高。我们提供了现场测试的实施细节和结果,在728个观察到的仪表读数中,错误率从26.4%下降到0.5%。
{"title":"Extreme Volume Detection for Managed Print Services","authors":"J. Handley, Marie-Luise Schneider, Victor Ciriza, J. Earl","doi":"10.1109/ICMLA.2010.95","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.95","url":null,"abstract":"A managed print service (MPS) manages the printing, scanning and facsimile devices in an enterprise to control cost and improve availability. Services include supplies replenishment, maintenance, repair, and use reporting. Customers are billed per page printed. Data are collected from a network of devices to facilitate management. The number of pages printed per device must be accurately counted to fairly bill the customer. Software errors, hardware changes, repairs, and human error all contribute to “meter reads” that are exceptionally high and are apt to be challenged by the customer were they to be billed. Account managers periodically review data for each device in an account. This process is tedious and time consuming and an automated solution is desired. Exceptional print volumes are not always salient and detecting them statistically is prone to errors owing to nonstationarity of the data. Mean levels and variances change over time and usage is highly auto correlated which precludes simple detection methods based on deviations from an average background. A solution must also be computationally inexpensive and require little auxiliary storage because hundreds of thousands of streams of device data must be processed. We present an algorithm and system for online detection of extreme print volumes that uses dynamic linear models (DLM) with variance learning. A DLM is a state space time series model comprising a random mean level system process and a random observation process. Both components are updated using Bayesian statistics. After each update, a forecasted value and its estimated variance are calculated. A read is flagged as exceptionally high if its value is highly unlikely with respect to a forecasted value and its standard deviation. We provide implementation details and results of a field test in which error rate was decreased from 26.4% to 0.5% on 728 observed meter reads.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133610229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speeding Up Greedy Forward Selection for Regularized Least-Squares 加速正则化最小二乘的贪婪正向选择
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.55
T. Pahikkala, A. Airola, T. Salakoski
We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm compared to previously proposed implementations.
我们提出了一种用于正则化最小二乘(RLS)回归和分类的贪婪前向特征选择算法,也称为最小二乘支持向量机或脊回归。我们称之为贪婪RLS的算法从空特征集开始,并在每次迭代中添加特征,这些特征的添加提供了最佳的留一交叉验证性能。我们的方法比之前提出的方法要快得多,因为它的时间复杂度在训练样例的数量、原始数据集中的特征数量和所选特征集的期望大小之间是线性的。因此,作为一种副作用,我们获得了一种新的训练算法,用于学习稀疏线性RLS预测器,可用于大规模学习。这种速度是可能的,因为基于矩阵演算的略去和特征添加的捷径。与之前提出的实现相比,我们通过实验证明了算法的可扩展性。
{"title":"Speeding Up Greedy Forward Selection for Regularized Least-Squares","authors":"T. Pahikkala, A. Airola, T. Salakoski","doi":"10.1109/ICMLA.2010.55","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.55","url":null,"abstract":"We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm compared to previously proposed implementations.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132189739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A Parallel Algorithm for Predicting the Secondary Structure of Polycistronic MicroRNAs 预测多顺反子microrna二级结构的并行算法
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.80
Dianwei Han, G. Tang, Jun Zhang
MicroRNAs (miRNAs) are newly discovered endogenous small non-coding RNAs (21-25nt) that target their complementary gene transcripts for degradation or translational repression. The biogenesis of a functional miRNA is largely dependent on the secondary structure of the miRNA precursor (pre-miRNA). Recently, it has been shown that miRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design methods to predict such structures for miRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. First, the master processor partitions the input sequence into subsequences and distributes them to the slave processors. The slave processors will then predict the secondary structure based on their individual task. Afterward, the slave processors will return their results to the master processor. Finally, the master processor will merge the partial structures from the slave processors into a whole candidate secondary structure. The optimal structure is obtained by sorting the candidate structures according to their scores. Our experimental results indicate that the actual speed-ups match the trend of theoretic values.
MicroRNAs (miRNAs)是新发现的内源性小非编码rna (21-25nt),其靶向其互补基因转录物进行降解或翻译抑制。功能性miRNA的生物发生在很大程度上取决于miRNA前体(pre-miRNA)的二级结构。近年来,已有研究表明,mirna以多顺反子转录单位的形式存在于植物和动物基因组中。设计预测这些结构的方法对于miRNA的发现及其在基因沉默中的应用具有重要意义。本文提出了一种基于主从结构的并行算法,用于从输入序列中预测二级结构。首先,主处理器将输入序列划分为子序列,并将其分发给从处理器。然后,从处理器将根据各自的任务预测二级结构。然后,从处理器将它们的结果返回给主处理器。最后,主处理器将从处理器的部分结构合并为一个完整的候选二级结构。根据候选结构的得分对候选结构进行排序,得到最优结构。实验结果表明,实际加速速度与理论值的趋势相吻合。
{"title":"A Parallel Algorithm for Predicting the Secondary Structure of Polycistronic MicroRNAs","authors":"Dianwei Han, G. Tang, Jun Zhang","doi":"10.1109/ICMLA.2010.80","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.80","url":null,"abstract":"MicroRNAs (miRNAs) are newly discovered endogenous small non-coding RNAs (21-25nt) that target their complementary gene transcripts for degradation or translational repression. The biogenesis of a functional miRNA is largely dependent on the secondary structure of the miRNA precursor (pre-miRNA). Recently, it has been shown that miRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design methods to predict such structures for miRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. First, the master processor partitions the input sequence into subsequences and distributes them to the slave processors. The slave processors will then predict the secondary structure based on their individual task. Afterward, the slave processors will return their results to the master processor. Finally, the master processor will merge the partial structures from the slave processors into a whole candidate secondary structure. The optimal structure is obtained by sorting the candidate structures according to their scores. Our experimental results indicate that the actual speed-ups match the trend of theoretic values.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132600453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Probabilistic Graphical Model of Quantum Systems 量子系统的概率图模型
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.30
Chen-Hsiang Yeang
Quantum systems are promising candidates of future computing and information processing devices. In a large system, information about the quantum states and processes may be incomplete and scattered. To integrate the distributed information we propose a quantum version of probabilistic graphical models. Variables in the model (quantum states and measurement outcomes) are linked by several types of operators (unitary, measurement, and merge/split operators). We propose algorithms for three machine learning tasks in quantum probabilistic graphical models: a belief propagation algorithm for inference of unknown states, an iterative algorithm for simultaneous estimation of parameter values and hidden states, and an active learning algorithm to select measurement operators based on observed evidence. We validate these algorithms on simulated data and point out future extensions toward a more comprehensive theory of quantum probabilistic graphical models.
量子系统是未来计算和信息处理设备的有前途的候选者。在一个大系统中,关于量子态和过程的信息可能是不完整和分散的。为了整合分布式信息,我们提出了一个量子版本的概率图模型。模型中的变量(量子态和测量结果)由几种类型的算子(酉算子、测量算子和合并/分裂算子)连接。我们提出了量子概率图模型中三个机器学习任务的算法:用于推断未知状态的信念传播算法,用于同时估计参数值和隐藏状态的迭代算法,以及基于观测证据选择测量算子的主动学习算法。我们在模拟数据上验证了这些算法,并指出了未来向更全面的量子概率图形模型理论的扩展。
{"title":"A Probabilistic Graphical Model of Quantum Systems","authors":"Chen-Hsiang Yeang","doi":"10.1109/ICMLA.2010.30","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.30","url":null,"abstract":"Quantum systems are promising candidates of future computing and information processing devices. In a large system, information about the quantum states and processes may be incomplete and scattered. To integrate the distributed information we propose a quantum version of probabilistic graphical models. Variables in the model (quantum states and measurement outcomes) are linked by several types of operators (unitary, measurement, and merge/split operators). We propose algorithms for three machine learning tasks in quantum probabilistic graphical models: a belief propagation algorithm for inference of unknown states, an iterative algorithm for simultaneous estimation of parameter values and hidden states, and an active learning algorithm to select measurement operators based on observed evidence. We validate these algorithms on simulated data and point out future extensions toward a more comprehensive theory of quantum probabilistic graphical models.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124517947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Prediction Based Digital Control DC-DC Converter 一种新的基于预测的数字控制DC-DC变换器
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.110
F. Kurokawa, H. Maruta, J. Sakemi, Akihiro Nakamura, H. Osuga
This paper presents a novel prediction based digital control dc-dc converter. In this method, addition to the P-I-D control as the feedback control, the prediction based control is used as the feedfoward control. In the feedfoward control, the neural network based method is adopted. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. It is confirmed that the prediction based control technique is useful to realize the high performance digital control method for the dc-dc converter.
提出了一种基于预测的数字控制dc-dc变换器。该方法除采用P-I-D控制作为反馈控制外,还采用基于预测的控制作为前馈控制。在前馈控制中,采用了基于神经网络的方法。当负载快速变化时,这种方法可以非常有效地改善暂态响应。与传统的负载电阻阶跃变化相比,该方法有效地抑制了输出电压过冲和电抗器电流过冲。结果表明,基于预测的控制技术是实现dc-dc变换器高性能数字控制的有效方法。
{"title":"A New Prediction Based Digital Control DC-DC Converter","authors":"F. Kurokawa, H. Maruta, J. Sakemi, Akihiro Nakamura, H. Osuga","doi":"10.1109/ICMLA.2010.110","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.110","url":null,"abstract":"This paper presents a novel prediction based digital control dc-dc converter. In this method, addition to the P-I-D control as the feedback control, the prediction based control is used as the feedfoward control. In the feedfoward control, the neural network based method is adopted. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. It is confirmed that the prediction based control technique is useful to realize the high performance digital control method for the dc-dc converter.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129161248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Multi-Class Classification Using a New Sigmoid Loss Function for Minimum Classification Error (MCE) 基于最小分类误差的新型Sigmoid损失函数的多类分类
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.20
M. Ratnagiri, L. Rabiner, B. Juang
A new loss function has been introduced for Minimum Classification Error, that approaches optimal Bayes' risk and also gives an improvement in performance over standard MCE systems when evaluated on the Aurora connected digits database.
最小分类误差引入了一个新的损失函数,该函数接近最优贝叶斯风险,并且在Aurora连接数字数据库上进行评估时,性能优于标准MCE系统。
{"title":"Multi-Class Classification Using a New Sigmoid Loss Function for Minimum Classification Error (MCE)","authors":"M. Ratnagiri, L. Rabiner, B. Juang","doi":"10.1109/ICMLA.2010.20","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.20","url":null,"abstract":"A new loss function has been introduced for Minimum Classification Error, that approaches optimal Bayes' risk and also gives an improvement in performance over standard MCE systems when evaluated on the Aurora connected digits database.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129001950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Unsupervised and Online Update of Boosted Temporal Models: The UAL2Boost 增强时间模型的无监督在线更新:UAL2Boost
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.143
P. Ribeiro, Plinio Moreno, J. Santos-Victor
The application of learning-based vision techniques to real scenarios usually requires a tunning procedure, which involves the acquisition and labeling of new data and in situ experiments in order to adapt the learning algorithm to each scenario. We address an automatic update procedure of the L2boost algorithm that is able to adapt the initial models learned off-line. Our method is named UAL2Boost and present three new contributions: (i) an on-line and continuous procedure that updates recursively the current classifier, reducing the storage constraints, (ii) a probabilistic unsupervised update that eliminates the necessity of labeled data in order to adapt the classifier and (iii) a multi-class adaptation method. We show the applicability of the on-line unsupervised adaptation to human action recognition and demonstrate that the system is able to automatically update the parameters of the L2boost with linear temporal models, thus improving the output of the models learned off-line on new video sequences, in a recursive and continuous way. The automatic adaptation of UAL2Boost follows the idea of adapting the classifier incrementally: from simple to complex.
将基于学习的视觉技术应用于真实场景通常需要一个调整过程,其中包括新数据的获取和标记以及现场实验,以使学习算法适应每个场景。我们解决了L2boost算法的自动更新过程,该算法能够适应离线学习的初始模型。我们的方法被命名为UAL2Boost,并提出了三个新的贡献:(i)一个在线和连续的过程,递归地更新当前的分类器,减少存储约束;(ii)一个概率无监督更新,消除了标记数据的必要性,以适应分类器;(iii)一个多类适应方法。我们展示了在线无监督自适应在人类动作识别中的适用性,并证明了系统能够使用线性时间模型自动更新L2boost的参数,从而以递归和连续的方式提高离线学习的模型在新视频序列上的输出。UAL2Boost的自动适应遵循增量适应分类器的思想:从简单到复杂。
{"title":"Unsupervised and Online Update of Boosted Temporal Models: The UAL2Boost","authors":"P. Ribeiro, Plinio Moreno, J. Santos-Victor","doi":"10.1109/ICMLA.2010.143","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.143","url":null,"abstract":"The application of learning-based vision techniques to real scenarios usually requires a tunning procedure, which involves the acquisition and labeling of new data and in situ experiments in order to adapt the learning algorithm to each scenario. We address an automatic update procedure of the L2boost algorithm that is able to adapt the initial models learned off-line. Our method is named UAL2Boost and present three new contributions: (i) an on-line and continuous procedure that updates recursively the current classifier, reducing the storage constraints, (ii) a probabilistic unsupervised update that eliminates the necessity of labeled data in order to adapt the classifier and (iii) a multi-class adaptation method. We show the applicability of the on-line unsupervised adaptation to human action recognition and demonstrate that the system is able to automatically update the parameters of the L2boost with linear temporal models, thus improving the output of the models learned off-line on new video sequences, in a recursive and continuous way. The automatic adaptation of UAL2Boost follows the idea of adapting the classifier incrementally: from simple to complex.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129196496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybridization of Base Classifiers of Random Subsample Ensembles for Enhanced Performance in High Dimensional Feature Spaces 基于杂化的随机子样本集成基分类器在高维特征空间中的增强性能
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.118
Santhosh Pathical, G. Serpen
This paper presents a simulation-based empirical study of the performance profile of random sub sample ensembles with a hybrid mix of base learner composition in high dimensional feature spaces. The performance of hybrid random sub sample ensemble that uses a combination of C4.5, k-nearest neighbor (kNN) and naïve Bayes base learners is assessed through statistical testing in comparison to those of homogeneous random sub sample ensembles that employ only one type of base learner. Simulation study employs five datasets with up to 20K features from the UCI Machine Learning Repository. Random sub sampling without replacement is used to map the original high dimensional feature space of the five datasets to a multiplicity of lower dimensional feature subspaces. The simulation study explores the effect of certain design parameters that include the count of base classifiers and sub sampling rate on the performance of the hybrid random subspace ensemble. The ensemble architecture utilizes the voting combiner in all cases. Simulation results indicate that hybridization of base learners for random sub sample ensemble improves the prediction accuracy rates and projects a more robust performance.
本文对高维特征空间中混合基本学习者组合的随机子样本集成的性能概况进行了基于仿真的实证研究。使用C4.5、k近邻(kNN)和naïve贝叶斯基学习器组合的混合随机子样本集成的性能通过统计检验与仅使用一种基学习器的均匀随机子样本集成的性能进行了比较。模拟研究使用了来自UCI机器学习存储库的五个数据集,其中包含多达20K个特征。采用不替换的随机子采样方法,将5个数据集的原始高维特征空间映射到多个低维特征子空间。仿真研究探讨了某些设计参数(包括基本分类器的数量和子采样率)对混合随机子空间集成性能的影响。集成体系结构在所有情况下都使用投票组合器。仿真结果表明,混合基学习器用于随机子样本集成提高了预测准确率,并具有更强的鲁棒性。
{"title":"Hybridization of Base Classifiers of Random Subsample Ensembles for Enhanced Performance in High Dimensional Feature Spaces","authors":"Santhosh Pathical, G. Serpen","doi":"10.1109/ICMLA.2010.118","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.118","url":null,"abstract":"This paper presents a simulation-based empirical study of the performance profile of random sub sample ensembles with a hybrid mix of base learner composition in high dimensional feature spaces. The performance of hybrid random sub sample ensemble that uses a combination of C4.5, k-nearest neighbor (kNN) and naïve Bayes base learners is assessed through statistical testing in comparison to those of homogeneous random sub sample ensembles that employ only one type of base learner. Simulation study employs five datasets with up to 20K features from the UCI Machine Learning Repository. Random sub sampling without replacement is used to map the original high dimensional feature space of the five datasets to a multiplicity of lower dimensional feature subspaces. The simulation study explores the effect of certain design parameters that include the count of base classifiers and sub sampling rate on the performance of the hybrid random subspace ensemble. The ensemble architecture utilizes the voting combiner in all cases. Simulation results indicate that hybridization of base learners for random sub sample ensemble improves the prediction accuracy rates and projects a more robust performance.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124591146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using a Bayesian Feature-selection Algorithm to Identify Dose-response Models Based on the Shape of the 3D Dose-distribution: An Example from a Head-and-neck Cancer Trial 使用贝叶斯特征选择算法识别基于三维剂量分布形状的剂量反应模型:头颈部癌症试验的一个例子
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.113
F. Buettner, S. Gulliford, S. Webb, M. Partridge, A. Miah, K. Harrington, C. Nutting
A reduction in salivary flow and xerostomia are common side-effects after radiotherapy of head and neck tumours. Xerostomia can be modeled based on the dose to the parotid glands. To date, all spatial information has been discarded and dose-response models are usually reduced to the mean dose. We present novel morphological dose-response models and use multivariate Bayesian logistic regression to model xerostomia. We use 3D invariant statistical moments as morphometric descriptors to quantify the shape of the 3D dose distribution. As this results in a very high number of potential predictors, we apply a Bayesian variable-selection algorithm to find the best model based on any subset of all potential predictors. To do this, we determine the posterior probabilities of being the best model for all potential models and calculate the marginal probabilities that a variable should be included in a model. This was done using a Reversible Jump Markov Chain Monte Carlo algorithm. The performance of the best model was quantified using the deviance information criterion and a leave-one-out cross-validation (LOOCV). This methodology was applied to 64 head and neck cancer patients treated with either intensity-modulated radiotherapy (IMRT) or conventional radiotherapy. Results show a substantial increase in both model-fit and area under the curve (AUC) when including morphological information compared to conventional mean-dose models. The best mean-dose model for IMRT patients only resulted in an AUC of 0.63 after LOOCV while the best morphological model had an AUC of 0.90. For conventional patients the mean-dose model and the morphological model had AUC of 0.55 and 0.86 respectively. For a joint model with all patients pooled together, the mean dose model had an AUC of 0.75 and the morphological model an AUC of 0.88. We have shown that invariant statistical moments are a good morphometric descriptor and by using Bayesian variable selection we were able to identify models with a substantially higher predictive power than conventional mean-dose models.
唾液流量减少和口干是头颈部肿瘤放疗后常见的副作用。口干症可以根据腮腺的剂量来建模。迄今为止,所有空间信息都已被丢弃,剂量-反应模型通常被简化为平均剂量。我们提出了新的形态学剂量反应模型,并使用多变量贝叶斯逻辑回归来模拟口干症。我们使用三维不变统计矩作为形态描述符来量化三维剂量分布的形状。由于这导致潜在预测因子的数量非常多,因此我们应用贝叶斯变量选择算法来基于所有潜在预测因子的任意子集找到最佳模型。为此,我们确定成为所有潜在模型的最佳模型的后验概率,并计算一个变量应该包含在模型中的边际概率。这是使用可逆跳跃马尔可夫链蒙特卡洛算法完成的。使用偏差信息准则和留一交叉验证(LOOCV)对最佳模型的性能进行量化。该方法应用于64例接受调强放疗(IMRT)或常规放疗的头颈癌患者。结果表明,与传统的平均剂量模型相比,当包含形态学信息时,模型拟合和曲线下面积(AUC)都有显着增加。IMRT患者的最佳平均剂量模型LOOCV后的AUC仅为0.63,而最佳形态学模型的AUC为0.90。对于普通患者,平均剂量模型和形态学模型的AUC分别为0.55和0.86。对于合并所有患者的联合模型,平均剂量模型的AUC为0.75,形态学模型的AUC为0.88。我们已经证明,不变统计矩是一个很好的形态计量描述符,通过使用贝叶斯变量选择,我们能够识别具有比传统平均剂量模型高得多的预测能力的模型。
{"title":"Using a Bayesian Feature-selection Algorithm to Identify Dose-response Models Based on the Shape of the 3D Dose-distribution: An Example from a Head-and-neck Cancer Trial","authors":"F. Buettner, S. Gulliford, S. Webb, M. Partridge, A. Miah, K. Harrington, C. Nutting","doi":"10.1109/ICMLA.2010.113","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.113","url":null,"abstract":"A reduction in salivary flow and xerostomia are common side-effects after radiotherapy of head and neck tumours. Xerostomia can be modeled based on the dose to the parotid glands. To date, all spatial information has been discarded and dose-response models are usually reduced to the mean dose. We present novel morphological dose-response models and use multivariate Bayesian logistic regression to model xerostomia. We use 3D invariant statistical moments as morphometric descriptors to quantify the shape of the 3D dose distribution. As this results in a very high number of potential predictors, we apply a Bayesian variable-selection algorithm to find the best model based on any subset of all potential predictors. To do this, we determine the posterior probabilities of being the best model for all potential models and calculate the marginal probabilities that a variable should be included in a model. This was done using a Reversible Jump Markov Chain Monte Carlo algorithm. The performance of the best model was quantified using the deviance information criterion and a leave-one-out cross-validation (LOOCV). This methodology was applied to 64 head and neck cancer patients treated with either intensity-modulated radiotherapy (IMRT) or conventional radiotherapy. Results show a substantial increase in both model-fit and area under the curve (AUC) when including morphological information compared to conventional mean-dose models. The best mean-dose model for IMRT patients only resulted in an AUC of 0.63 after LOOCV while the best morphological model had an AUC of 0.90. For conventional patients the mean-dose model and the morphological model had AUC of 0.55 and 0.86 respectively. For a joint model with all patients pooled together, the mean dose model had an AUC of 0.75 and the morphological model an AUC of 0.88. We have shown that invariant statistical moments are a good morphometric descriptor and by using Bayesian variable selection we were able to identify models with a substantially higher predictive power than conventional mean-dose models.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115880211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian Inferences and Forecasting in Spatial Time Series Models 空间时间序列模型中的贝叶斯推断与预测
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.170
Sung Duck Lee, Duck-Ki Kim
The spatial time series data can be viewed as a set of time series collected simultaneously at a number of spatial locations with time. For example, The Mumps data have a feature to infect adjacent broader regions in accordance with spatial location and time. Therefore, The spatial time series models have many parameters of space and time. In this paper, We propose the method of bayesian inferences and prediction in spatial time series models with a Gibbs Sampler in order to overcome convergence problem in numerical methods. Our results are illustrated by using the data set of mumps cases reported from the Korea Center for Disease Control and Prevention monthly over the years 2001-2009, as well as a simulation study.
空间时间序列数据可以看作是在多个空间位置随时间同时采集的一组时间序列。例如,腮腺炎数据具有根据空间位置和时间感染邻近更广泛区域的特征。因此,空间时间序列模型具有许多时空参数。为了克服数值方法的收敛性问题,提出了基于Gibbs采样器的空间时间序列模型的贝叶斯推理和预测方法。我们的结果是通过使用2001-2009年韩国疾病控制和预防中心每月报告的腮腺炎病例数据集以及模拟研究来说明的。
{"title":"Bayesian Inferences and Forecasting in Spatial Time Series Models","authors":"Sung Duck Lee, Duck-Ki Kim","doi":"10.1109/ICMLA.2010.170","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.170","url":null,"abstract":"The spatial time series data can be viewed as a set of time series collected simultaneously at a number of spatial locations with time. For example, The Mumps data have a feature to infect adjacent broader regions in accordance with spatial location and time. Therefore, The spatial time series models have many parameters of space and time. In this paper, We propose the method of bayesian inferences and prediction in spatial time series models with a Gibbs Sampler in order to overcome convergence problem in numerical methods. Our results are illustrated by using the data set of mumps cases reported from the Korea Center for Disease Control and Prevention monthly over the years 2001-2009, as well as a simulation study.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"522 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114623439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2010 Ninth International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1