首页 > 最新文献

2012 11th International Conference on Machine Learning and Applications最新文献

英文 中文
An Interactive Scatter Plot Metrics Visualization for Decision Trend Analysis 用于决策趋势分析的交互式散点图度量可视化
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.164
Tze-Haw Huang, M. Huang, Kang Zhang
This paper presents a new interactive scatter plot visualization for multi-dimensional data analysis. We apply RST to reduce the visual complexity through dimensionality reduction. We use an innovative point-to-region mouse click concept to enable direct interactions with scatter points that are theoretically impossible. To show the decision trend we use a virtual Z dimension to display a set of linear flows showing approximation of the decision trend. We have conducted a case study to demonstrate the effectiveness and usefulness of our new technique for identifying the impact sources of wine quality through the visual analytics of a wine dataset consisting of 12 attributes with 4898 samples.
提出了一种新的用于多维数据分析的交互式散点图可视化方法。我们利用RST通过降维来降低视觉复杂度。我们使用创新的点对区域鼠标点击概念来实现与分散点的直接交互,这在理论上是不可能的。为了显示决策趋势,我们使用虚拟Z维来显示一组显示近似决策趋势的线性流。我们进行了一个案例研究,通过对包含12个属性和4898个样本的葡萄酒数据集进行视觉分析,来证明我们的新技术在识别葡萄酒质量影响源方面的有效性和实用性。
{"title":"An Interactive Scatter Plot Metrics Visualization for Decision Trend Analysis","authors":"Tze-Haw Huang, M. Huang, Kang Zhang","doi":"10.1109/ICMLA.2012.164","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.164","url":null,"abstract":"This paper presents a new interactive scatter plot visualization for multi-dimensional data analysis. We apply RST to reduce the visual complexity through dimensionality reduction. We use an innovative point-to-region mouse click concept to enable direct interactions with scatter points that are theoretically impossible. To show the decision trend we use a virtual Z dimension to display a set of linear flows showing approximation of the decision trend. We have conducted a case study to demonstrate the effectiveness and usefulness of our new technique for identifying the impact sources of wine quality through the visual analytics of a wine dataset consisting of 12 attributes with 4898 samples.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125996170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Spatial Feature Extraction for Classification of Nonstationary Myoelectric Signals 基于空间特征提取的非平稳肌电信号分类
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.222
David Hofmann
We compare classifiers for the classification of myoelectric signals and show that the performance can be improved by using spatial features that are extracted by independent component analysis. The obtained filters can be interpreted as reflecting the spatial structure of the data source. We find that the performance improves for several preprocessing algorithms, but it affects the relative performance for various classifiers in different ways. A critical performance difference is especially seen when non-stationary signal regimes during the onset of static contractions are included. Although a practically utilizable performance appears to be reached for the present data set by a certain combination of classification and preprocessing algorithms, it remains to be further optimized in order to keep this level for more realistic data sets.
我们比较了不同的分类器对肌电信号的分类效果,发现使用独立分量分析提取的空间特征可以提高分类器的分类性能。获得的过滤器可以解释为反映数据源的空间结构。我们发现,几种预处理算法的性能都有所提高,但对不同分类器的相对性能影响不同。当包括静态收缩开始时的非平稳信号时,尤其可以看到关键的性能差异。虽然通过某种分类和预处理算法的组合,目前的数据集似乎达到了实际可用的性能,但为了在更现实的数据集上保持这一水平,仍需进一步优化。
{"title":"Spatial Feature Extraction for Classification of Nonstationary Myoelectric Signals","authors":"David Hofmann","doi":"10.1109/ICMLA.2012.222","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.222","url":null,"abstract":"We compare classifiers for the classification of myoelectric signals and show that the performance can be improved by using spatial features that are extracted by independent component analysis. The obtained filters can be interpreted as reflecting the spatial structure of the data source. We find that the performance improves for several preprocessing algorithms, but it affects the relative performance for various classifiers in different ways. A critical performance difference is especially seen when non-stationary signal regimes during the onset of static contractions are included. Although a practically utilizable performance appears to be reached for the present data set by a certain combination of classification and preprocessing algorithms, it remains to be further optimized in order to keep this level for more realistic data sets.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128452052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Randomized Sampling for Large Data Applications of SVM 支持向量机随机抽样在大数据中的应用
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.65
Erik M. Ferragut, J. Laska
A trend in machine learning is the application of existing algorithms to ever-larger datasets. Support Vector Machines (SVM) have been shown to be very effective, but have been difficult to scale to large-data problems. Some approaches have sought to scale SVM training by approximating and parallelizing the underlying quadratic optimization problem. This paper pursues a different approach. Our algorithm, which we call Sampled SVM, uses an existing SVM training algorithm to create a new SVM training algorithm. It uses randomized data sampling to better extend SVMs to large data applications. Experiments on several datasets show that our method is faster than and comparably accurate to both the original SVM algorithm it is based on and the Cascade SVM, the leading data organization approach for SVMs in the literature. Further, we show that our approach is more amenable to parallelization than Cascade SVM.
机器学习的一个趋势是将现有算法应用于越来越大的数据集。支持向量机(SVM)已被证明是非常有效的,但很难扩展到大数据问题。一些方法试图通过逼近和并行化潜在的二次优化问题来扩展支持向量机的训练。本文采用了一种不同的方法。我们的算法,我们称之为采样支持向量机,使用现有的支持向量机训练算法来创建一个新的支持向量机训练算法。它使用随机数据采样来更好地将svm扩展到大型数据应用程序。在多个数据集上的实验表明,我们的方法比它所基于的原始支持向量机算法和文献中领先的支持向量机数据组织方法级联支持向量机(Cascade SVM)都要快,而且相当准确。此外,我们表明我们的方法比级联支持向量机更适合并行化。
{"title":"Randomized Sampling for Large Data Applications of SVM","authors":"Erik M. Ferragut, J. Laska","doi":"10.1109/ICMLA.2012.65","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.65","url":null,"abstract":"A trend in machine learning is the application of existing algorithms to ever-larger datasets. Support Vector Machines (SVM) have been shown to be very effective, but have been difficult to scale to large-data problems. Some approaches have sought to scale SVM training by approximating and parallelizing the underlying quadratic optimization problem. This paper pursues a different approach. Our algorithm, which we call Sampled SVM, uses an existing SVM training algorithm to create a new SVM training algorithm. It uses randomized data sampling to better extend SVMs to large data applications. Experiments on several datasets show that our method is faster than and comparably accurate to both the original SVM algorithm it is based on and the Cascade SVM, the leading data organization approach for SVMs in the literature. Further, we show that our approach is more amenable to parallelization than Cascade SVM.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128643953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Multi-label and Adaptive Genre Classification of Web Pages 网页多标签自适应体裁分类
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.106
Chaker Jebari
This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web pages are classified one by one and each web page can affect more than one genre. According to the similarity between the new page and each genre centroid, our approach either adapts the genre centroid under consideration or considers the new page as noise page and discards it. The experiment results show that our approach is very fast and achieves better results than existing multi-label classifiers.
本文提出了一种新的基于质心的网页分类方法,该方法利用从URL、标题、标题和锚点等不同信息源中提取的字符图对网页进行类型分类。为了处理网页的复杂性和网页类型的快速演变,我们的方法实现了一个多标签和自适应的分类方案,其中网页被一个一个地分类,每个网页可以影响多个类型。根据新页面与每个类型质心的相似度,我们的方法要么适应正在考虑的类型质心,要么将新页面视为噪声页面并丢弃。实验结果表明,我们的方法速度非常快,并且比现有的多标签分类器取得了更好的结果。
{"title":"A Multi-label and Adaptive Genre Classification of Web Pages","authors":"Chaker Jebari","doi":"10.1109/ICMLA.2012.106","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.106","url":null,"abstract":"This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web pages are classified one by one and each web page can affect more than one genre. According to the similarity between the new page and each genre centroid, our approach either adapts the genre centroid under consideration or considers the new page as noise page and discards it. The experiment results show that our approach is very fast and achieves better results than existing multi-label classifiers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129357226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Evolving Neural Fuzzy Network with Adaptive Feature Selection 基于自适应特征选择的进化神经模糊网络
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.184
Alisson Marques da Silva, W. Caminhas, A. Lemos, F. Gomide
This paper introduces a neural fuzzy network approach for evolving system modeling. The approach uses neofuzzy neurons and a neural fuzzy structure monished with an incremental learning algorithm that includes adaptive feature selection. The feature selection mechanism starts considering one or more input variables from a given set of variables, and decides if a new variable should be added, or if an existing variable should be excluded or kept as an input. The decision process uses statistical tests and information about the current model performance. The incremental learning scheme simultaneously selects the input variables and updates the neural network weights. The weights are adjusted using a gradient-based scheme with optimal learning rate. The performance of the models obtained with the neural fuzzy modeling approach is evaluated considering weather temperature forecasting problems. Computational results show that the approach is competitive with alternatives reported in the literature, especially in on-line modeling situations where processing time and learning are critical.
本文介绍了一种用于演化系统建模的神经模糊网络方法。该方法使用新模糊神经元和神经模糊结构,并采用包括自适应特征选择的增量学习算法。特征选择机制首先考虑给定变量集中的一个或多个输入变量,然后决定是否应该添加新变量,或者是否应该排除或保留现有变量作为输入。决策过程使用有关当前模型性能的统计测试和信息。增量学习方案在选择输入变量的同时更新神经网络权值。使用基于梯度的最优学习率方案来调整权重。结合天气温度预报问题,对神经模糊建模方法得到的模型的性能进行了评价。计算结果表明,该方法与文献中报道的替代方法相比具有竞争力,特别是在处理时间和学习至关重要的在线建模情况下。
{"title":"Evolving Neural Fuzzy Network with Adaptive Feature Selection","authors":"Alisson Marques da Silva, W. Caminhas, A. Lemos, F. Gomide","doi":"10.1109/ICMLA.2012.184","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.184","url":null,"abstract":"This paper introduces a neural fuzzy network approach for evolving system modeling. The approach uses neofuzzy neurons and a neural fuzzy structure monished with an incremental learning algorithm that includes adaptive feature selection. The feature selection mechanism starts considering one or more input variables from a given set of variables, and decides if a new variable should be added, or if an existing variable should be excluded or kept as an input. The decision process uses statistical tests and information about the current model performance. The incremental learning scheme simultaneously selects the input variables and updates the neural network weights. The weights are adjusted using a gradient-based scheme with optimal learning rate. The performance of the models obtained with the neural fuzzy modeling approach is evaluated considering weather temperature forecasting problems. Computational results show that the approach is competitive with alternatives reported in the literature, especially in on-line modeling situations where processing time and learning are critical.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127167502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Assessing Encoding Techniques through Correlation-Based Metrics 通过基于关联的度量评估编码技术
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.118
G. Armano, E. Tamponi
The performance of a classification system depends on various aspects, including encoding techniques. In fact, encoding techniques play a primary role in the process of tuning a classifier/predictor, as choosing the most appropriate encoder may greatly affect its performance. As of now, evaluating the impact of an encoding technique on a classification system typically requires to train the system and test it by means of a performance metric deemed relevant (e.g., precision, recall, and Matthews correlation coefficients). For this reason, assessing a single encoding technique is a time consuming activity, which introduces some additional degrees of freedom (e.g., parameters of the training algorithm) that may be uncorrelated with the encoding technique to be assessed. In this paper, we propose a family of methods to measure the performance of encoding techniques used in classification tasks, based on the correlation between encoded input data and the corresponding output. The proposed approach provides correlation-based metrics, devised with the primary goal of focusing on the encoding technique, leading other unrelated aspects apart. Notably, the proposed technique allows to save computational time to a great extent, as it needs only a tiny fraction of the time required by standard methods.
分类系统的性能取决于各个方面,包括编码技术。事实上,编码技术在调整分类器/预测器的过程中起着主要作用,因为选择最合适的编码器可能会极大地影响其性能。到目前为止,评估编码技术对分类系统的影响通常需要训练系统,并通过被认为相关的性能指标(例如,精度、召回率和马修斯相关系数)对其进行测试。由于这个原因,评估单一编码技术是一项耗时的活动,它引入了一些额外的自由度(例如,训练算法的参数),这些自由度可能与要评估的编码技术不相关。在本文中,我们提出了一系列方法来衡量分类任务中使用的编码技术的性能,基于编码输入数据与相应输出之间的相关性。所提出的方法提供了基于相关性的度量,其设计的主要目标是关注编码技术,将其他不相关的方面分开。值得注意的是,所提出的技术可以在很大程度上节省计算时间,因为它只需要标准方法所需时间的一小部分。
{"title":"Assessing Encoding Techniques through Correlation-Based Metrics","authors":"G. Armano, E. Tamponi","doi":"10.1109/ICMLA.2012.118","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.118","url":null,"abstract":"The performance of a classification system depends on various aspects, including encoding techniques. In fact, encoding techniques play a primary role in the process of tuning a classifier/predictor, as choosing the most appropriate encoder may greatly affect its performance. As of now, evaluating the impact of an encoding technique on a classification system typically requires to train the system and test it by means of a performance metric deemed relevant (e.g., precision, recall, and Matthews correlation coefficients). For this reason, assessing a single encoding technique is a time consuming activity, which introduces some additional degrees of freedom (e.g., parameters of the training algorithm) that may be uncorrelated with the encoding technique to be assessed. In this paper, we propose a family of methods to measure the performance of encoding techniques used in classification tasks, based on the correlation between encoded input data and the corresponding output. The proposed approach provides correlation-based metrics, devised with the primary goal of focusing on the encoding technique, leading other unrelated aspects apart. Notably, the proposed technique allows to save computational time to a great extent, as it needs only a tiny fraction of the time required by standard methods.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132023179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Muscle Categorization Using Quantitative Needle Electromyography: A 2-Stage Gaussian Mixture Model Based Approach 用定量针肌电图进行肌肉分类:一种基于两阶段高斯混合模型的方法
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.100
M. Abdelmaseeh, P. Poupart, Benn Smith, D. Stashuk
Needle Electromyography, in combination with nerve conduction studies, is the gold standard methodology for assessing the neurophysiologic effects of neuromuscular diseases. Muscle categorization is typically based on visual and auditory assessment of the morphology and activation patterns of its constituent motor units. A procedure which is highly dependent on the skills and level of experience of the examiner. This motivates the development of automated or semi-automated categorization techniques. This paper describes a 2-stage Gaussian mixture model based approach. In the first stage, a muscle is classified as neurogenic or myopathic. The second stage uses a classifier specific to each disease category to confirm or refute the disease involvement. A total of 2556 motor unit potentials sampled from 48 normal, 30 neurogenic and 20 myopathic tibialis anterior muscles were utilized for this study. The proposed approach showed an average accuracy of 91.25%, which is higher than the compared linear and non-linear multi-class schemas. In addition to improved accuracy, the 2-stage approach is more suitable for the muscle categorization, because it has a hierarchical decision structure similar to current clinical practice, and its output can be interpreted as a measure of confidence.
针刺肌电图与神经传导研究相结合,是评估神经肌肉疾病神经生理效应的金标准方法。肌肉分类通常是基于视觉和听觉对其组成运动单元的形态和激活模式的评估。这一程序高度依赖于审查员的技能和经验水平。这促使了自动化或半自动化分类技术的发展。本文描述了一种基于两阶段高斯混合模型的方法。在第一阶段,肌肉被划分为神经源性或肌病性。第二阶段使用特定于每种疾病类别的分类器来确认或驳斥疾病的涉及。本研究共采集了48块正常、30块神经源性和20块肌病性胫骨前肌的2556个运动单位电位。该方法的平均准确率为91.25%,高于线性和非线性多类别模式。除了提高准确性外,两阶段方法更适合肌肉分类,因为它具有类似于当前临床实践的分层决策结构,其输出可以被解释为信心的度量。
{"title":"Muscle Categorization Using Quantitative Needle Electromyography: A 2-Stage Gaussian Mixture Model Based Approach","authors":"M. Abdelmaseeh, P. Poupart, Benn Smith, D. Stashuk","doi":"10.1109/ICMLA.2012.100","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.100","url":null,"abstract":"Needle Electromyography, in combination with nerve conduction studies, is the gold standard methodology for assessing the neurophysiologic effects of neuromuscular diseases. Muscle categorization is typically based on visual and auditory assessment of the morphology and activation patterns of its constituent motor units. A procedure which is highly dependent on the skills and level of experience of the examiner. This motivates the development of automated or semi-automated categorization techniques. This paper describes a 2-stage Gaussian mixture model based approach. In the first stage, a muscle is classified as neurogenic or myopathic. The second stage uses a classifier specific to each disease category to confirm or refute the disease involvement. A total of 2556 motor unit potentials sampled from 48 normal, 30 neurogenic and 20 myopathic tibialis anterior muscles were utilized for this study. The proposed approach showed an average accuracy of 91.25%, which is higher than the compared linear and non-linear multi-class schemas. In addition to improved accuracy, the 2-stage approach is more suitable for the muscle categorization, because it has a hierarchical decision structure similar to current clinical practice, and its output can be interpreted as a measure of confidence.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132079755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast Time Series Classification Based on Infrequent Shapelets 基于非频繁Shapelets的快速时间序列分类
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.44
Qing He, Zhi Dong, Fuzhen Zhuang, Tianfeng Shang, Zhongzhi Shi
Time series shapelets are small and local time series subsequences which are in some sense maximally representative of a class. E.Keogh uses distance of the shapelet to classify objects. Even though shapelet classification can be interpretable and more accurate than many state-of-the-art classifiers, there is one main limitation of shapelets, i.e. shapelet classification training process is offline, and uses subsequence early abandon and admissible entropy pruning strategies, the time to compute is still significant. In this work, we address the later problem by introducing a novel algorithm that finds time series shapelet in significantly less time than the current methods by extracting infrequent time series shapelet candidates. Subsequences that are distinguishable are usually infrequent compared to other subsequences. The algorithm called ISDT (Infrequent Shapelet Decision Tree) uses infrequent shapelet candidates extracting to find shapelet. Experiments demonstrate the efficiency of ISDT algorithm on several benchmark time series datasets. The result shows that ISDT significantly outperforms the current shapelet algorithm.
时间序列shapelets是小的局部时间序列子序列,在某种意义上最大程度地代表了一个类。keogh利用形状的距离对物体进行分类。尽管shapelet分类比许多最先进的分类器具有可解释性和准确性,但shapelet有一个主要的限制,即shapelet分类训练过程是离线的,并且使用子序列早期放弃和允许熵修剪策略,计算时间仍然很长。在这项工作中,我们通过引入一种新的算法来解决后一个问题,该算法通过提取不频繁的时间序列候选形状,在比当前方法更短的时间内找到时间序列形状。与其他子序列相比,可区分的子序列通常不频繁。该算法称为ISDT (infrequency Shapelet Decision Tree),通过对不频繁Shapelet候选者进行提取来寻找Shapelet。实验证明了ISDT算法在多个基准时间序列数据集上的有效性。结果表明,ISDT算法明显优于现有的shapelet算法。
{"title":"Fast Time Series Classification Based on Infrequent Shapelets","authors":"Qing He, Zhi Dong, Fuzhen Zhuang, Tianfeng Shang, Zhongzhi Shi","doi":"10.1109/ICMLA.2012.44","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.44","url":null,"abstract":"Time series shapelets are small and local time series subsequences which are in some sense maximally representative of a class. E.Keogh uses distance of the shapelet to classify objects. Even though shapelet classification can be interpretable and more accurate than many state-of-the-art classifiers, there is one main limitation of shapelets, i.e. shapelet classification training process is offline, and uses subsequence early abandon and admissible entropy pruning strategies, the time to compute is still significant. In this work, we address the later problem by introducing a novel algorithm that finds time series shapelet in significantly less time than the current methods by extracting infrequent time series shapelet candidates. Subsequences that are distinguishable are usually infrequent compared to other subsequences. The algorithm called ISDT (Infrequent Shapelet Decision Tree) uses infrequent shapelet candidates extracting to find shapelet. Experiments demonstrate the efficiency of ISDT algorithm on several benchmark time series datasets. The result shows that ISDT significantly outperforms the current shapelet algorithm.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"8 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130833479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Regularized Probabilistic Latent Semantic Analysis with Continuous Observations 连续观测的正则化概率潜在语义分析
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.102
Hao Zhang, Richard E. Edwards, L. Parker
Probabilistic latent semantic analysis (PLSA) has been widely used in the machine learning community. However, the original PLSAs are not capable of modeling real-valued observations and usually have severe problems with over fitting. To address both issues, we propose a novel, regularized Gaussian PLSA (RG-PLSA) model that combines Gaussian PLSAs and hierarchical Gaussian mixture models (HGMM). We evaluate our model on supervised human action recognition tasks, using two publicly available datasets. Average classification accuracies of 97.69% and 93.72% are achieved on the Weizmann and KTH Action Datasets, respectively, which demonstrate that the RG-PLSA model outperforms Gaussian PLSAs and HGMMs, and is comparable to the state of the art.
概率潜在语义分析(PLSA)在机器学习领域得到了广泛的应用。然而,原始的pls不能模拟实值观测值,并且通常存在严重的过拟合问题。为了解决这两个问题,我们提出了一种新的正则化高斯PLSA (RG-PLSA)模型,该模型结合了高斯PLSA和分层高斯混合模型(HGMM)。我们使用两个公开可用的数据集,在有监督的人类动作识别任务上评估我们的模型。在Weizmann和KTH动作数据集上的平均分类准确率分别达到97.69%和93.72%,这表明RG-PLSA模型优于高斯plsa和HGMMs,并且与目前的水平相当。
{"title":"Regularized Probabilistic Latent Semantic Analysis with Continuous Observations","authors":"Hao Zhang, Richard E. Edwards, L. Parker","doi":"10.1109/ICMLA.2012.102","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.102","url":null,"abstract":"Probabilistic latent semantic analysis (PLSA) has been widely used in the machine learning community. However, the original PLSAs are not capable of modeling real-valued observations and usually have severe problems with over fitting. To address both issues, we propose a novel, regularized Gaussian PLSA (RG-PLSA) model that combines Gaussian PLSAs and hierarchical Gaussian mixture models (HGMM). We evaluate our model on supervised human action recognition tasks, using two publicly available datasets. Average classification accuracies of 97.69% and 93.72% are achieved on the Weizmann and KTH Action Datasets, respectively, which demonstrate that the RG-PLSA model outperforms Gaussian PLSAs and HGMMs, and is comparable to the state of the art.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cross-Domain Facial Expression Recognition Using Supervised Kernel Mean Matching 基于监督核均值匹配的跨域面部表情识别
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.178
Yun-Qian Miao, Rodrigo Araujo, M. Kamel
Even though facial expressions have universal meaning in communications, their appearances show a large amount of variation due to many factors, such as different image acquisition setups, different ages, genders, and cultural backgrounds etc. Collecting enough amounts of annotated samples for each target domain is impractical, this paper investigates the problem of facial expression recognition in the more challenging situation, where the training and testing samples are taken from different domains. To address this problem, after observing the fact of unsatisfactory performance of the Kernel Mean Matching (KMM) algorithm, we propose a supervised extension that matches the distributions in a class-to-class manner, called Supervised Kernel Mean Matching (SKMM). The new approach stands out by taking into consideration both matching the distributions and preserving the discriminative information between classes at the same time. The extensive experimental studies on four cross-dataset facial expression recognition tasks show promising improvements of the proposed method, in which a small number of labeled samples guide the matching process.
尽管面部表情在交流中具有普遍意义,但由于许多因素,例如不同的图像获取设置,不同的年龄,性别和文化背景等,它们的外观表现出很大的差异。为每个目标领域收集足够数量的带注释样本是不切实际的,本文研究了更具挑战性的情况下的面部表情识别问题,其中训练和测试样本取自不同的领域。为了解决这个问题,在观察到核均值匹配(KMM)算法性能不理想的事实后,我们提出了一种监督扩展,以类对类的方式匹配分布,称为监督核均值匹配(SKMM)。新方法的突出之处在于,它既考虑了分布的匹配,又同时考虑了类之间的区别信息的保留。对四个跨数据集面部表情识别任务的广泛实验研究表明,该方法有很大的改进,其中少量标记样本指导匹配过程。
{"title":"Cross-Domain Facial Expression Recognition Using Supervised Kernel Mean Matching","authors":"Yun-Qian Miao, Rodrigo Araujo, M. Kamel","doi":"10.1109/ICMLA.2012.178","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.178","url":null,"abstract":"Even though facial expressions have universal meaning in communications, their appearances show a large amount of variation due to many factors, such as different image acquisition setups, different ages, genders, and cultural backgrounds etc. Collecting enough amounts of annotated samples for each target domain is impractical, this paper investigates the problem of facial expression recognition in the more challenging situation, where the training and testing samples are taken from different domains. To address this problem, after observing the fact of unsatisfactory performance of the Kernel Mean Matching (KMM) algorithm, we propose a supervised extension that matches the distributions in a class-to-class manner, called Supervised Kernel Mean Matching (SKMM). The new approach stands out by taking into consideration both matching the distributions and preserving the discriminative information between classes at the same time. The extensive experimental studies on four cross-dataset facial expression recognition tasks show promising improvements of the proposed method, in which a small number of labeled samples guide the matching process.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125594828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
期刊
2012 11th International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1