首页 > 最新文献

Sixth International Conference on Data Mining (ICDM'06)最新文献

英文 中文
Manifold Clustering of Shapes 形状的流形聚类
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.101
Dragomir Yankov, Eamonn J. Keogh
Shape clustering can significantly facilitate the automatic labeling of objects present in image collections. For example, it could outline the existing groups of pathological cells in a bank of cyto-images; the groups of species on photographs collected from certain aerials; or the groups of objects observed on surveillance scenes from an office building. Here we demonstrate that a nonlinear projection algorithm such as Isomap can attract together shapes of similar objects, suggesting the existence of isometry between the shape space and a low dimensional nonlinear embedding. Whenever there is a relatively small amount of noise in the data, the projection forms compact, convex clusters that can easily be learned by a subsequent partitioning scheme. We further propose a modification of the Isomap projection based on the concept of degree-bounded minimum spanning trees. The proposed approach is demonstrated to move apart bridged clusters and to alleviate the effect of noise in the data.
形状聚类可以显著地促进图像集合中存在的对象的自动标记。例如,它可以在细胞图像库中勾勒出现有的病理细胞群;从某些天线收集的照片上的物种组;或者在办公楼的监控画面中观察到的一组物体。本文证明了一种非线性投影算法(如Isomap)可以将相似物体的形状吸引在一起,这表明形状空间和低维非线性嵌入之间存在等距。每当数据中存在相对少量的噪声时,投影就会形成紧凑的凸簇,可以通过后续的划分方案轻松学习。我们进一步提出了一种基于度有界最小生成树概念的等高图投影修正方法。所提出的方法被证明是分开桥接簇和减轻噪声在数据中的影响。
{"title":"Manifold Clustering of Shapes","authors":"Dragomir Yankov, Eamonn J. Keogh","doi":"10.1109/ICDM.2006.101","DOIUrl":"https://doi.org/10.1109/ICDM.2006.101","url":null,"abstract":"Shape clustering can significantly facilitate the automatic labeling of objects present in image collections. For example, it could outline the existing groups of pathological cells in a bank of cyto-images; the groups of species on photographs collected from certain aerials; or the groups of objects observed on surveillance scenes from an office building. Here we demonstrate that a nonlinear projection algorithm such as Isomap can attract together shapes of similar objects, suggesting the existence of isometry between the shape space and a low dimensional nonlinear embedding. Whenever there is a relatively small amount of noise in the data, the projection forms compact, convex clusters that can easily be learned by a subsequent partitioning scheme. We further propose a modification of the Isomap projection based on the concept of degree-bounded minimum spanning trees. The proposed approach is demonstrated to move apart bridged clusters and to alleviate the effect of noise in the data.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116517044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Identifying Follow-Correlation Itemset-Pairs 识别后续相关项集对
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.84
Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Zifang Huang
An association rule ArarrB is useful to predict that B will likely occur when A occurs. This is a classical association rule. In real world applications, such as bioinformatics and medical research, there are many follow correlations between itemsets A and B: B likely occurs n times after A occurred m times, wrote to . We refer to this follow-correlation as P3.1 itemset-pairs because like that in the example ( Example 2) should be uninterested in association analysis. This paper designs an efficient algorithm for identifying P3.1 itemset-pairs in sequential data. We experimentally evaluate our approach, and demonstrate that the proposed approach is efficient and promising.
关联规则ArarrB用于预测A发生时B可能会发生。这是一个经典的关联规则。在现实世界的应用中,如生物信息学和医学研究中,项目集A和B之间存在许多后续相关性:A出现m次后,B可能出现n次。我们将这种后续相关性称为P3.1项集对,因为在示例(示例2)中应该对关联分析不感兴趣。本文设计了一种识别序列数据中P3.1项集对的高效算法。我们通过实验评估了我们的方法,并证明了该方法是有效的和有前途的。
{"title":"Identifying Follow-Correlation Itemset-Pairs","authors":"Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Zifang Huang","doi":"10.1109/ICDM.2006.84","DOIUrl":"https://doi.org/10.1109/ICDM.2006.84","url":null,"abstract":"An association rule ArarrB is useful to predict that B will likely occur when A occurs. This is a classical association rule. In real world applications, such as bioinformatics and medical research, there are many follow correlations between itemsets A and B: B likely occurs n times after A occurred m times, wrote to <Am, BN>. We refer to this follow-correlation as P3.1 itemset-pairs because <A3, B1> like that in the example ( Example 2) should be uninterested in association analysis. This paper designs an efficient algorithm for identifying P3.1 itemset-pairs in sequential data. We experimentally evaluate our approach, and demonstrate that the proposed approach is efficient and promising.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Integrating Features from Different Sources for Music Information Retrieval 整合不同来源的音乐信息检索功能
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.89
Tao Li, M. Ogihara, Shenghuo Zhu
Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying "similar" artists using both lyrics and acoustic data. In this paper, we present a clustering algorithm that integrates features from both sources to perform bimodal learning. The algorithm is tested on a data set consisting of 570 songs from 53 albums of 41 artists using artist similarity provided by All Music Guide. Experimental results show that the accuracy of artist similarity classifiers can be significantly improved and that artist similarity can be efficiently identified.
高效智能的音乐信息检索是21世纪的重要课题。本文以建立个人音乐信息检索系统为最终目标,研究了同时使用歌词和声学数据识别“相似”艺术家的问题。在本文中,我们提出了一种聚类算法,该算法集成了两个来源的特征来执行双峰学习。使用All Music Guide提供的艺术家相似性,在41位艺术家的53张专辑中的570首歌曲的数据集上测试了该算法。实验结果表明,该方法能显著提高艺术家相似度分类器的准确率,有效地识别出艺术家相似度。
{"title":"Integrating Features from Different Sources for Music Information Retrieval","authors":"Tao Li, M. Ogihara, Shenghuo Zhu","doi":"10.1109/ICDM.2006.89","DOIUrl":"https://doi.org/10.1109/ICDM.2006.89","url":null,"abstract":"Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying \"similar\" artists using both lyrics and acoustic data. In this paper, we present a clustering algorithm that integrates features from both sources to perform bimodal learning. The algorithm is tested on a data set consisting of 570 songs from 53 albums of 41 artists using artist similarity provided by All Music Guide. Experimental results show that the accuracy of artist similarity classifiers can be significantly improved and that artist similarity can be efficiently identified.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134002341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Mining Latent Associations of Objects Using a Typed Mixture Model--A Case Study on Expert/Expertise Mining 使用类型化混合模型挖掘对象的潜在关联——以专家/专业知识挖掘为例
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.109
Shenghua Bao, Yunbo Cao, B. Liu, Yong Yu, Hang Li
This paper studies the problem of discovering latent associations among objects in text documents. Specifically, given two sets of objects and various types of co-occurrence data concerning the objects existing in texts, we aim to discover the hidden or latent associative relationships between the two sets of objects. Existing methods are not directly applicable as they are unable to consider all this information. For example, the probabilistic mixture model called Separable Mixture Model (SMM) proposed by Hofmann can use only one type of co-occurrences to mine latent associations. This paper proposes a more general probabilistic mixture model called the Typed Separable Mixture Model (TSMM), which is able to use all types of co-occurrences within a single framework. Experimental results based on the expert/expertise mining task show that TSMM outperforms SMM significantly.
本文研究了文本文档中对象之间潜在关联的发现问题。具体来说,给定两组对象和文本中存在的各种类型的对象共现数据,我们的目标是发现两组对象之间隐藏或潜在的关联关系。现有的方法不能直接适用,因为它们不能考虑所有这些信息。例如,Hofmann提出的概率混合模型可分离混合模型(SMM)只能使用一种类型的共现来挖掘潜在关联。本文提出了一种更一般的概率混合模型,称为类型可分混合模型(TSMM),它能够在一个框架内使用所有类型的共现。基于专家/经验挖掘任务的实验结果表明,TSMM显著优于SMM。
{"title":"Mining Latent Associations of Objects Using a Typed Mixture Model--A Case Study on Expert/Expertise Mining","authors":"Shenghua Bao, Yunbo Cao, B. Liu, Yong Yu, Hang Li","doi":"10.1109/ICDM.2006.109","DOIUrl":"https://doi.org/10.1109/ICDM.2006.109","url":null,"abstract":"This paper studies the problem of discovering latent associations among objects in text documents. Specifically, given two sets of objects and various types of co-occurrence data concerning the objects existing in texts, we aim to discover the hidden or latent associative relationships between the two sets of objects. Existing methods are not directly applicable as they are unable to consider all this information. For example, the probabilistic mixture model called Separable Mixture Model (SMM) proposed by Hofmann can use only one type of co-occurrences to mine latent associations. This paper proposes a more general probabilistic mixture model called the Typed Separable Mixture Model (TSMM), which is able to use all types of co-occurrences within a single framework. Experimental results based on the expert/expertise mining task show that TSMM outperforms SMM significantly.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Meta Clustering 元聚类
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.103
R. Caruana, M. Elhawary, Nam Nguyen, Casey Smith
Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.
聚类是不明确的。不像有监督的学习,标签导致清晰的性能标准,如准确性和平方误差,聚类质量取决于如何使用聚类。设计能够捕获用户需求的聚类标准是很困难的。大多数聚类算法基于预先指定的聚类准则搜索最优聚类。我们的方法不同。我们搜索数据的许多备选聚类,然后允许用户选择最适合他们需求的聚类。元聚类首先找到各种各样的聚类,然后对这些不同的聚类集合进行聚类,这样用户只需要检查少量性质不同的聚类。我们提出了自动生成各种备选聚类的方法,以及将聚类分组为元聚类的方法。我们在四个测试问题和两个案例研究中评估元聚类。令人惊讶的是,用户最感兴趣的聚类通常不是非常紧凑的聚类。
{"title":"Meta Clustering","authors":"R. Caruana, M. Elhawary, Nam Nguyen, Casey Smith","doi":"10.1109/ICDM.2006.103","DOIUrl":"https://doi.org/10.1109/ICDM.2006.103","url":null,"abstract":"Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 187
STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows STAGGER:使用扩展滑动窗口的数据流的周期性挖掘
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.153
Mohamed G. Elfeky, Walid G. Aref, A. Elmagarmid
Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real-data experiments demonstrate the practicality of the discovered periodic patterns.
传感器设备变得无处不在,特别是在测量和监测应用中。由于所生成的传感器数据流具有实时性、仅追加性和半无限性,因此需要在线增量方法来挖掘流数据类型。在本文中,我们提出了STAGGER:一种用于挖掘数据流周期模式的一遍,在线和增量算法。STAGGER不要求用户预先指定数据的周期率。相反,STAGGER发现了潜在的周期性。STAGGER保持多个扩展滑动窗口交错在流上,其中计算在多个重叠的窗口之间共享。小长度滑动窗口对于早期和实时输出是必要的,但仅限于发现短周期率。当流数据连续到达时,滑动窗口的长度会扩大,以覆盖整个流。较大长度的滑动窗能够发现更长的周期性。STAGGER为每个发现的潜在周期率的频繁周期模式增量地维护一个树状数据结构。与用于发现周期率的基于傅立叶/小波的方法相比,STAGGER不仅发现了更广泛、更准确的周期集合,而且还发现了周期模式本身。事实上,真实数据集和合成数据集的实验结果表明,STAGGER在发现周期率的准确性方面比基于傅里叶/小波的方法要好一个数量级。此外,实际数据实验证明了所发现的周期模式的实用性。
{"title":"STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows","authors":"Mohamed G. Elfeky, Walid G. Aref, A. Elmagarmid","doi":"10.1109/ICDM.2006.153","DOIUrl":"https://doi.org/10.1109/ICDM.2006.153","url":null,"abstract":"Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real-data experiments demonstrate the practicality of the discovered periodic patterns.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116202227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
A Balanced Ensemble Approach to Weighting Classifiers for Text Classification 文本分类中加权分类器的平衡集成方法
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.2
G. Fung, J. Yu, Haixun Wang, D. Cheung, Huan Liu
This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function, which combines the decisions of the individual classifiers in the ensemble. We show that the classification performance is affected by three weight components and they should be included in deriving an effective combination function. They are: (1) Global effectiveness, which measures the effectiveness of a member classifier in classifying a set of unseen documents; (2) Local effectiveness, which measures the effectiveness of a member classifier in classifying the particular domain of an unseen document; and (3) Decision confidence, which describes how confident a classifier is when making a decision when classifying a specific unseen document. We propose a new balanced combination function, called dynamic classifier weighting (DCW), that incorporates the aforementioned three components. The empirical study demonstrates that the new combination function is highly effective for text classification.
本文研究了构建有效的异构集成分类器进行文本分类的问题。这个问题的一个主要挑战是制定一个好的组合函数,它将集成中各个分类器的决策组合在一起。我们表明,分类性能受到三个权重成分的影响,它们应该包含在推导有效的组合函数中。它们是:(1)全局有效性,衡量成员分类器对一组未见文档进行分类的有效性;(2)局部有效性,衡量成员分类器对未见文档的特定域进行分类的有效性;(3)决策置信度,描述分类器在对特定的未见文档进行分类时做出决策时的置信度。我们提出了一个新的平衡组合函数,称为动态分类器加权(DCW),它包含了上述三个组成部分。实证研究表明,该组合函数对文本分类是非常有效的。
{"title":"A Balanced Ensemble Approach to Weighting Classifiers for Text Classification","authors":"G. Fung, J. Yu, Haixun Wang, D. Cheung, Huan Liu","doi":"10.1109/ICDM.2006.2","DOIUrl":"https://doi.org/10.1109/ICDM.2006.2","url":null,"abstract":"This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function, which combines the decisions of the individual classifiers in the ensemble. We show that the classification performance is affected by three weight components and they should be included in deriving an effective combination function. They are: (1) Global effectiveness, which measures the effectiveness of a member classifier in classifying a set of unseen documents; (2) Local effectiveness, which measures the effectiveness of a member classifier in classifying the particular domain of an unseen document; and (3) Decision confidence, which describes how confident a classifier is when making a decision when classifying a specific unseen document. We propose a new balanced combination function, called dynamic classifier weighting (DCW), that incorporates the aforementioned three components. The empirical study demonstrates that the new combination function is highly effective for text classification.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122019707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection 交互式语义视频挖掘与检索平台——在交通监控视频事件检测中的应用
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.20
Xin Chen, Chengcui Zhang
Understanding and retrieving videos based on their semantic contents is an important research topic in multimedia data mining and has found various real- world applications. Most existing video analysis techniques focus on the low level visual features of video data. However, there is a "semantic gap" between the machine-readable features and the high level human concepts i.e. human understanding of the video content. In this paper, an interactive platform for semantic video mining and retrieval is proposed using relevance feedback (RF), a popular technique in the area of content-based image retrieval (CBIR). By tracking semantic objects in a video and then modeling spatio-temporal events based on object trajectories and object interactions, the proposed interactive learning algorithm in the platform is able to mine the spatio-temporal data extracted from the video. An iterative learning process is involved in the proposed platform, which is guided by the user's response to the retrieved results. Although the proposed video retrieval platform is intended for general use and can be tailored to many applications, we focus on its application in traffic surveillance video database retrieval to demonstrate the design details. The effectiveness of the algorithm is demonstrated by our experiments on real-life traffic surveillance videos.
基于视频的语义内容来理解和检索视频是多媒体数据挖掘中的一个重要研究课题,已经在现实世界中得到了广泛的应用。现有的视频分析技术大多侧重于视频数据的底层视觉特征。然而,在机器可读的特征和高层次的人类概念(即人类对视频内容的理解)之间存在“语义差距”。本文利用基于内容的图像检索(CBIR)领域的一种流行技术——相关反馈(RF),提出了一个用于语义视频挖掘和检索的交互式平台。本文提出的交互式学习算法通过对视频中的语义对象进行跟踪,然后基于对象轨迹和对象交互对时空事件进行建模,对视频中提取的时空数据进行挖掘。所提出的平台包含一个迭代学习过程,该过程由用户对检索结果的响应来指导。虽然提出的视频检索平台是通用的,可以针对许多应用进行定制,但我们主要以其在交通监控视频数据库检索中的应用为例来演示设计细节。通过对实际交通监控视频的实验,验证了该算法的有效性。
{"title":"An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection","authors":"Xin Chen, Chengcui Zhang","doi":"10.1109/ICDM.2006.20","DOIUrl":"https://doi.org/10.1109/ICDM.2006.20","url":null,"abstract":"Understanding and retrieving videos based on their semantic contents is an important research topic in multimedia data mining and has found various real- world applications. Most existing video analysis techniques focus on the low level visual features of video data. However, there is a \"semantic gap\" between the machine-readable features and the high level human concepts i.e. human understanding of the video content. In this paper, an interactive platform for semantic video mining and retrieval is proposed using relevance feedback (RF), a popular technique in the area of content-based image retrieval (CBIR). By tracking semantic objects in a video and then modeling spatio-temporal events based on object trajectories and object interactions, the proposed interactive learning algorithm in the platform is able to mine the spatio-temporal data extracted from the video. An iterative learning process is involved in the proposed platform, which is guided by the user's response to the retrieved results. Although the proposed video retrieval platform is intended for general use and can be tailored to many applications, we focus on its application in traffic surveillance video database retrieval to demonstrate the design details. The effectiveness of the algorithm is demonstrated by our experiments on real-life traffic surveillance videos.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124838853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Direct Marketing When There Are Voluntary Buyers 有自愿购买者时的直接营销
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.54
Yi-Ting Lai, Ke Wang, Daymond Ling, Hua Shi, Jason J. Zhang
In traditional direct marketing, the implicit assumption is that customers will only purchase the product if they are contacted. In real business environments, however, there are "voluntary buyers, " who will still make the purchase in the absence of a contact. While no direct promotion is needed for voluntary buyers, the traditional response-driven paradigm tends to target such customers. This paper presents "influential marketing, " targeting only those whose purchase decisions can be positively influenced, i.e. buyers who are non-voluntary. Our novel, practical solution to this problem gives promising results.
在传统的直接营销中,隐含的假设是客户只有在与他们接触时才会购买产品。然而,在真实的商业环境中,存在“自愿购买者”,他们仍然会在没有联系人的情况下进行购买。虽然不需要对自愿购买者进行直接促销,但传统的响应驱动模式倾向于针对这些客户。本文提出了“影响力营销”,只针对那些购买决策可以受到积极影响的人,即非自愿的买家。我们对这个问题的新颖、实用的解决方法取得了可喜的结果。
{"title":"Direct Marketing When There Are Voluntary Buyers","authors":"Yi-Ting Lai, Ke Wang, Daymond Ling, Hua Shi, Jason J. Zhang","doi":"10.1109/ICDM.2006.54","DOIUrl":"https://doi.org/10.1109/ICDM.2006.54","url":null,"abstract":"In traditional direct marketing, the implicit assumption is that customers will only purchase the product if they are contacted. In real business environments, however, there are \"voluntary buyers, \" who will still make the purchase in the absence of a contact. While no direct promotion is needed for voluntary buyers, the traditional response-driven paradigm tends to target such customers. This paper presents \"influential marketing, \" targeting only those whose purchase decisions can be positively influenced, i.e. buyers who are non-voluntary. Our novel, practical solution to this problem gives promising results.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130089082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploratory Under-Sampling for Class-Imbalance Learning 班级不平衡学习的探索性欠抽样
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.68
Xu-Ying Liu, Jianxin Wu, Zhi-Hua Zhou
Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar to EasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.
欠采样是一种类不平衡学习方法,它只使用主要类样本的子集,因此非常有效。主要的不足是忽略了许多重要的类的例子。我们提出了两种算法来克服这一缺陷。EasyEnsemble从主要类中采样几个子集,使用每个子集训练一个学习器,并组合这些学习器的输出。BalanceCascade类似于EasyEnsemble,除了它从进一步考虑中删除了正确分类的训练过的学习者的主要类示例。实验表明,这两种算法的AUC分数都优于现有的许多类不平衡学习方法。此外,它们的训练时间与欠采样的训练时间大致相同,训练速度明显快于其他方法。
{"title":"Exploratory Under-Sampling for Class-Imbalance Learning","authors":"Xu-Ying Liu, Jianxin Wu, Zhi-Hua Zhou","doi":"10.1109/ICDM.2006.68","DOIUrl":"https://doi.org/10.1109/ICDM.2006.68","url":null,"abstract":"Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar to EasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130419350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1475
期刊
Sixth International Conference on Data Mining (ICDM'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1