首页 > 最新文献

2008 IEEE International Conference on Data Mining Workshops最新文献

英文 中文
Enriching Spatial OLAP with Map Generalization: a Conceptual Multidimensional Model 用地图概化丰富空间OLAP:一个概念多维模型
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.80
S. Bimonte, J. Gensel, M. Bertolotto
Map generalization is used to derive maps for secondary scales and/or specific goals. This operation greatly benefits spatial decision support systems as it can provide a global and simplified representation of a phenomenon discarding irrelevant information. The recent popularity of OLAP systems for various application domains has generated much interest for the development of spatial OLAP (SOLAP) models that integrate spatial data in data warehouse and OLAP systems. Although powerful under some respect, current SOLAP models cannot support map generalization capabilities. In this paper, we present a conceptual multidimensional model integrating map generalization. The model extends SOLAP spatial hierarchies introducing multi-association relationships, and supports imprecise measures.
地图泛化用于导出二级比例尺和/或特定目标的地图。这种操作极大地有利于空间决策支持系统,因为它可以提供一个丢弃无关信息的现象的全局和简化表示。最近各种应用领域的OLAP系统的流行引起了人们对开发空间OLAP (SOLAP)模型的极大兴趣,这些模型将空间数据集成到数据仓库和OLAP系统中。尽管当前的SOLAP模型在某些方面很强大,但它不能支持地图泛化功能。本文提出了一种集成地图概化的概念多维模型。该模型扩展了SOLAP的空间层次结构,引入了多关联关系,并支持不精确的度量。
{"title":"Enriching Spatial OLAP with Map Generalization: a Conceptual Multidimensional Model","authors":"S. Bimonte, J. Gensel, M. Bertolotto","doi":"10.1109/ICDMW.2008.80","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.80","url":null,"abstract":"Map generalization is used to derive maps for secondary scales and/or specific goals. This operation greatly benefits spatial decision support systems as it can provide a global and simplified representation of a phenomenon discarding irrelevant information. The recent popularity of OLAP systems for various application domains has generated much interest for the development of spatial OLAP (SOLAP) models that integrate spatial data in data warehouse and OLAP systems. Although powerful under some respect, current SOLAP models cannot support map generalization capabilities. In this paper, we present a conceptual multidimensional model integrating map generalization. The model extends SOLAP spatial hierarchies introducing multi-association relationships, and supports imprecise measures.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134158833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Standards-Based Coastal Sensor Web 基于标准的海岸传感器网
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.131
S. Durbha, R. King, N. Younan, Santosh A. Rajender, Shruthi Bheemireddy
Coastal buoys and stations provide frequent, high quality marine observations for oceanographic study, weather service, atmospheric and public safety. Sharing of the generated data sets requires tremendous efforts and coordination among the different sensor network agencies to come to a shared understanding and for dissemination in a uniform way. Syntactic standardization provides data description models that are agreed upon by all the stakeholders. In addition, there is a need for semantic enrichment of the information sources which would help to understand the context of the data and helps to resolve the meaning, interpretation or usage of the same or related data. The standardized data models facilitate improved information retrieval on a variety of Spatiotemporal scales. In this paper we describe the mining of these information sources through a Web services based framework. The sensor observation service component of this framework allows operations such as spatial, temporal subsetting, filtering etc. Further, the terminology involved in the coastal domain is being conceptualized in the form of Ontology. The knowledgebase being developed using this ontological model is amenable to querying using SPARQL which is a standardized RDF query language. The knowledge-enabled client being developed will allow to process queries on the coastal sensors networks that goes beyond the prevalent key words based searches.
沿海浮标和台站为海洋学研究、气象服务、大气和公共安全提供频繁、高质量的海洋观测。共享生成的数据集需要在不同的传感器网络机构之间进行巨大的努力和协调,以达成共同的理解并以统一的方式传播。语法标准化提供了所有涉众都同意的数据描述模型。此外,还需要对信息源进行语义丰富,这将有助于理解数据的上下文,并有助于解决相同或相关数据的含义、解释或使用。标准化的数据模型有助于改进在不同时空尺度上的信息检索。在本文中,我们描述了如何通过基于Web服务的框架来挖掘这些信息源。该框架的传感器观测服务组件允许诸如空间、时间子集、过滤等操作。此外,沿海领域所涉及的术语正以本体论的形式概念化。使用这个本体论模型开发的知识库可以使用SPARQL进行查询,SPARQL是一种标准化的RDF查询语言。正在开发的知识支持客户端将允许处理沿海传感器网络上的查询,而不仅仅是基于流行的关键字的搜索。
{"title":"Standards-Based Coastal Sensor Web","authors":"S. Durbha, R. King, N. Younan, Santosh A. Rajender, Shruthi Bheemireddy","doi":"10.1109/ICDMW.2008.131","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.131","url":null,"abstract":"Coastal buoys and stations provide frequent, high quality marine observations for oceanographic study, weather service, atmospheric and public safety. Sharing of the generated data sets requires tremendous efforts and coordination among the different sensor network agencies to come to a shared understanding and for dissemination in a uniform way. Syntactic standardization provides data description models that are agreed upon by all the stakeholders. In addition, there is a need for semantic enrichment of the information sources which would help to understand the context of the data and helps to resolve the meaning, interpretation or usage of the same or related data. The standardized data models facilitate improved information retrieval on a variety of Spatiotemporal scales. In this paper we describe the mining of these information sources through a Web services based framework. The sensor observation service component of this framework allows operations such as spatial, temporal subsetting, filtering etc. Further, the terminology involved in the coastal domain is being conceptualized in the form of Ontology. The knowledgebase being developed using this ontological model is amenable to querying using SPARQL which is a standardized RDF query language. The knowledge-enabled client being developed will allow to process queries on the coastal sensors networks that goes beyond the prevalent key words based searches.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TransRank: A Novel Algorithm for Transfer of Rank Learning TransRank:一种新的秩迁移学习算法
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.42
Depin Chen, Jun Yan, Gang Wang, Yan Xiong, Weiguo Fan, Zheng Chen
Recently, learning to rank technique has attracted much attention. However, the lack of labeled training data seriously limits its application in real-world tasks. In this paper, we propose to break this bottleneck by considering the cross-domain ldquotransfer of rank learningrdquo problem. Simultaneously, we propose a novel algorithm called TransRank, which can effectively utilize the labeled data from a source domain to enhance the learning of ranking function in the target domain. The proposed algorithm consists of three key steps. Firstly, we introduce a utility function to select the k-best queries from the source domain labeled data. Secondly, feature augmentation is performed on both source and target domain data, which can straightly adapt the ranking information from source domain to target domain. Finally, we utilize the classical ranking SVM to learn the enhanced ranking function on the augmented features. Experimental results on benchmark datasets well validate our proposed TransRank algorithm.
最近,学习排名技术引起了人们的广泛关注。然而,缺乏标记的训练数据严重限制了其在现实任务中的应用。在本文中,我们提出通过考虑秩学习的跨域ldquoquotransfer问题来打破这一瓶颈。同时,我们提出了一种新的算法TransRank,该算法可以有效地利用源域的标记数据来增强对目标域排序函数的学习。该算法包括三个关键步骤。首先,我们引入效用函数从源域标记数据中选择k-最佳查询。其次,对源域和目标域数据进行特征增强,使排序信息从源域直接适配到目标域;最后,利用经典排序支持向量机学习增强特征的排序函数。在基准数据集上的实验结果很好地验证了TransRank算法的有效性。
{"title":"TransRank: A Novel Algorithm for Transfer of Rank Learning","authors":"Depin Chen, Jun Yan, Gang Wang, Yan Xiong, Weiguo Fan, Zheng Chen","doi":"10.1109/ICDMW.2008.42","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.42","url":null,"abstract":"Recently, learning to rank technique has attracted much attention. However, the lack of labeled training data seriously limits its application in real-world tasks. In this paper, we propose to break this bottleneck by considering the cross-domain ldquotransfer of rank learningrdquo problem. Simultaneously, we propose a novel algorithm called TransRank, which can effectively utilize the labeled data from a source domain to enhance the learning of ranking function in the target domain. The proposed algorithm consists of three key steps. Firstly, we introduce a utility function to select the k-best queries from the source domain labeled data. Secondly, feature augmentation is performed on both source and target domain data, which can straightly adapt the ranking information from source domain to target domain. Finally, we utilize the classical ranking SVM to learn the enhanced ranking function on the augmented features. Experimental results on benchmark datasets well validate our proposed TransRank algorithm.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117320684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Full-Reference Quality Assessment for Video Summary 视频摘要的全参考质量评估
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.55
Tongwei Ren, Yan Liu, Gangshan Wu
As video summarization techniques have attracted more and more attention for efficient multimedia data management, quality assessment of video summary is required. To address the lack of automatic evaluation techniques, this paper proposes a novel framework including several new algorithms to assess the quality of the video summary against a given reference. First, we partition the reference video summary and the candidate video summary into the sequences of summary unit (SU). Then, we utilize alignment based algorithm to match the SUs in the candidate summary with the SUs in the corresponding reference summary. Third, we propose a novel similarity based 4 C - assessment algorithm to evaluate the candidate video summary from the perspective of coverage, conciseness, coherence, and context, respectively. Finally, the individual assessment results are integrated according to userpsilas requirement by a learning based weight adaptation method. The proposed framework and techniques are experimented on a standard dataset of TRECVID 2007 and show the good performance in automatic video summary assessment.
随着视频摘要技术在高效的多媒体数据管理中得到越来越多的关注,需要对视频摘要进行质量评估。为了解决缺乏自动评估技术的问题,本文提出了一个新的框架,包括几个新的算法来评估给定参考的视频摘要的质量。首先,我们将参考视频摘要和候选视频摘要划分为摘要单元(SU)序列;然后,我们利用基于对齐的算法将候选摘要中的SUs与相应参考摘要中的SUs进行匹配。第三,我们提出了一种新的基于相似度的4c评价算法,分别从覆盖性、简洁性、连贯性和上下文角度对候选视频摘要进行评价。最后,采用基于学习的权重自适应方法,根据用户需求对个体评价结果进行整合。本文提出的框架和技术在TRECVID 2007标准数据集上进行了实验,并在视频摘要自动评估中取得了良好的效果。
{"title":"Full-Reference Quality Assessment for Video Summary","authors":"Tongwei Ren, Yan Liu, Gangshan Wu","doi":"10.1109/ICDMW.2008.55","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.55","url":null,"abstract":"As video summarization techniques have attracted more and more attention for efficient multimedia data management, quality assessment of video summary is required. To address the lack of automatic evaluation techniques, this paper proposes a novel framework including several new algorithms to assess the quality of the video summary against a given reference. First, we partition the reference video summary and the candidate video summary into the sequences of summary unit (SU). Then, we utilize alignment based algorithm to match the SUs in the candidate summary with the SUs in the corresponding reference summary. Third, we propose a novel similarity based 4 C - assessment algorithm to evaluate the candidate video summary from the perspective of coverage, conciseness, coherence, and context, respectively. Finally, the individual assessment results are integrated according to userpsilas requirement by a learning based weight adaptation method. The proposed framework and techniques are experimented on a standard dataset of TRECVID 2007 and show the good performance in automatic video summary assessment.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122656943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Harmonic Blind Sound Source Isolation Enhanced by Spectrum Clustering 频谱聚类增强谐波盲声源隔离
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.67
Cynthia Xin Zhang, Wenxin Jiang, Z. Ras
Automatic indexing of music by instruments and their types is a challenging problem, especially when multiple instruments are playing at the same time. We have built a database containing more than one million of music instrument sounds, each described by a large number o features including standard MPEG7 audio descriptors, features for speech recognition, and many new audio features developed by our team. Our previous research results show that all these features only lead to classifiers which successfully identify music instruments in monophonic music (only one instrument playing at a time). Their confidence for polyphonic music is much lower. This brought the need for blind sound source separation algorithms. In this paper, we present a new spectrum clustering enhanced method which improves the estimation of fundamental frequency as well as the balance of the categorization tree of training datasets, and therefore enhances the precision of automatic indexing. The system is recursively detecting the pitch of the predominant sound source, then calculates the features based on the estimated pitch, and then predicts the most similar spectrum by the corresponding classification tree, and finally subtracts the estimated predominant spectrum until silence is detected.
根据乐器及其类型自动索引音乐是一个具有挑战性的问题,特别是当多个乐器同时演奏时。我们已经建立了一个包含超过一百万种乐器声音的数据库,每个声音都由大量的特征描述,包括标准的MPEG7音频描述符,语音识别特征以及我们团队开发的许多新音频特征。我们之前的研究结果表明,所有这些特征只会导致分类器成功识别单音音乐中的乐器(一次只有一种乐器演奏)。他们对复调音乐的信心要低得多。这带来了对盲声源分离算法的需求。本文提出了一种新的频谱聚类增强方法,改进了训练数据集的基频估计和分类树的平衡性,从而提高了自动标引的精度。该系统递归检测主声源的音高,然后根据估计的音高计算特征,然后通过相应的分类树预测最相似的谱,最后减去估计的主谱,直到检测到静音。
{"title":"Harmonic Blind Sound Source Isolation Enhanced by Spectrum Clustering","authors":"Cynthia Xin Zhang, Wenxin Jiang, Z. Ras","doi":"10.1109/ICDMW.2008.67","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.67","url":null,"abstract":"Automatic indexing of music by instruments and their types is a challenging problem, especially when multiple instruments are playing at the same time. We have built a database containing more than one million of music instrument sounds, each described by a large number o features including standard MPEG7 audio descriptors, features for speech recognition, and many new audio features developed by our team. Our previous research results show that all these features only lead to classifiers which successfully identify music instruments in monophonic music (only one instrument playing at a time). Their confidence for polyphonic music is much lower. This brought the need for blind sound source separation algorithms. In this paper, we present a new spectrum clustering enhanced method which improves the estimation of fundamental frequency as well as the balance of the categorization tree of training datasets, and therefore enhances the precision of automatic indexing. The system is recursively detecting the pitch of the predominant sound source, then calculates the features based on the estimated pitch, and then predicts the most similar spectrum by the corresponding classification tree, and finally subtracts the estimated predominant spectrum until silence is detected.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk) 如果基于约束的挖掘是答案:约束是什么?(邀请谈话)
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.96
Jean-François Boulicaut
Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, certainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.
基于约束的挖掘已被证明非常有用。它不仅被应用于许多模式发现设置(例如,顺序模式挖掘),而且最近也被应用于分类和聚类任务(参见,例如)。它似乎是数据库中知识发现(KDD)的归纳数据库视角的关键技术,并且基于约束的挖掘确实是重要数据挖掘问题的答案(例如,支持先验相关性和主观兴趣,但也实现计算可行性)。然而,很少有作者研究约束的本质及其语义。考虑到几个重要的KDD过程示例,我们将在更广泛的上下文中讨论约束的“如何”、“为什么”和“何时”。我们的论点是,大多数典型的数据挖掘方法是基于约束的技术,值得研究和设计它们。在许多情况下,我们利用的约束不是真正显式的(例如,针对给定相似性度量的聚类的目标函数优化)和/或其操作语义相对于其声明性对应(例如,由于一些局部优化启发式而没有强制执行优化约束)的约束。我们认为显式每个基本约束和组合它们的操作符是很重要的,因为这构成了约束的声明性语义,从而构成了挖掘查询。然后,一个经过充分研究的挑战是为或多或少复杂的约束设计一些操作语义,如正确和完整的求解器和/或放松方案。在有用但有限的情况下,设计完全求解器已经得到了广泛的研究(参见,例如,利用单调和反单调原语组合的算法)。然而,很明显,许多相关的约束缺乏这样好的属性。另一方面,理解约束放松策略仍然是相当开放的,当然是因为其内在的启发式本质。有趣的是,最近提出的基于局部模式的全局模式或模型构建的方法,由于约束反向传播的可能性,能够重新审视松弛问题。这可以在约束共聚类的案例研究中讨论。
{"title":"If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk)","authors":"Jean-François Boulicaut","doi":"10.1109/ICDMW.2008.96","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.96","url":null,"abstract":"Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, certainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131183480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.61
Christoph Weyerhaeuser, Tobias Mindnich, Franz Färber, Wolfgang Lehner
Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations.
在商业智能上下文中,数据挖掘算法的重要性正在不断增加,特别是从应用程序和用户的角度来看,一方面需要新颖的算法,另一方面需要利用新颖的系统架构的有效实现。在本文中,我们将重点讨论后一个问题,并报告我们在SAP NetWeaver商业智能加速器(BIA)中开发图形卡处理器技术的经验。BIA代表了一个高度分布式的分析引擎,它支持OLAP和数据挖掘处理原语。该系统以列方式组织数据实体,其操作完全基于主存。由于案例研究表明,作为实际挖掘步骤的必要先决条件,经典的数据挖掘查询花费了很大一部分运行时用于扫描和过滤数据,因此我们的主要目标是加快这一昂贵的扫描和过滤过程。首先,本文概述了SAP NetWeaver BIA中的基本数据挖掘处理技术,并举例说明了扫描和过滤器的实现。在第二步中,我们深入了解了利用图形卡处理器技术的混合系统架构设计的主要特征。最后,我们概述了执行情况,并给出了我们大量评估的细节。
{"title":"Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA","authors":"Christoph Weyerhaeuser, Tobias Mindnich, Franz Färber, Wolfgang Lehner","doi":"10.1109/ICDMW.2008.61","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.61","url":null,"abstract":"Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"15 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125114658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
One-Class Classification of Text Streams with Concept Drift 具有概念漂移的文本流单类分类
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.54
Yang Zhang, Xue Li, M. Orlowska
Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later - this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.
流数据分类的研究大多基于数据可以被完全标记的假设。然而,这是不切实际的。首先,在所有数据到达之前不可能做出完整的标签。其次,通过人力获得完全标记的数据通常是非常昂贵的。第三,用户的兴趣可能会随着时间的推移而变化,因此较早发布的标签可能与较晚发布的标签不一致——这代表了概念漂移。在本文中,我们考虑了文本流上的一类分类问题,其中大量文档以高速到达,并且用户兴趣和数据分布发生变化。在这种情况下,只有少数正面标记的文件可用于培训。我们提出了一种基于堆叠风格的集成方法,并将其与所有其他基于窗口的方法(如单窗口、固定窗口和全内存方法)进行了比较。实验结果表明,所提出的集成方法优于所有其他方法。
{"title":"One-Class Classification of Text Streams with Concept Drift","authors":"Yang Zhang, Xue Li, M. Orlowska","doi":"10.1109/ICDMW.2008.54","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.54","url":null,"abstract":"Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later - this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126869782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Human Action Recognition by Radon Transform Radon变换的人体动作识别
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.26
Yan Chen, Qiang Wu, Xiangjian He
A new feature description is used for human behaviour representation and recognition. The feature is based on Radon transforms of extracted silhouettes. Key postures are selected based on the Radon transform. Key postures are combined to construct an action template for each sequence. Linear discriminant analysis (LDA) is applied to the set of key postures to obtain low dimensional feature vectors. Different classification methods are used to classify each sequence. Experiments are carried out based on a publically available human behaviour database and the results are exciting.
一种新的特征描述用于人类行为的表示和识别。该特征基于提取轮廓的Radon变换。基于Radon变换选择关键姿态。将关键姿势组合起来,为每个序列构建一个动作模板。将线性判别分析(LDA)应用于关键姿态集,得到低维特征向量。每个序列使用不同的分类方法进行分类。实验是基于一个公开的人类行为数据库进行的,结果令人兴奋。
{"title":"Human Action Recognition by Radon Transform","authors":"Yan Chen, Qiang Wu, Xiangjian He","doi":"10.1109/ICDMW.2008.26","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.26","url":null,"abstract":"A new feature description is used for human behaviour representation and recognition. The feature is based on Radon transforms of extracted silhouettes. Key postures are selected based on the Radon transform. Key postures are combined to construct an action template for each sequence. Linear discriminant analysis (LDA) is applied to the set of key postures to obtain low dimensional feature vectors. Different classification methods are used to classify each sequence. Experiments are carried out based on a publically available human behaviour database and the results are exciting.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"58 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Chi-Square Test Based Decision Trees Induction in Distributed Environment 分布式环境下基于卡方检验的决策树归纳
Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.37
Jie Ouyang, Nilesh V. Patel, I. Sethi
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one distributed learning algorithm which extends the original(centralized) CHAID algorithm to its distributed version. This distributed algorithm generates exactly the same results as its centralized counterpart. For completeness, a distributed quantization method is proposed so that continuous data can be processed by our algorithm. Experimental results for several well known data sets are presented and compared with decision trees generated using CHAID with centrally stored data.
基于决策树的分类是一种流行的模式识别和数据挖掘方法。大多数决策树归纳方法假设训练数据存在于一个中心位置。考虑到分布式数据库在地理上分散的增长,分布式环境下决策树归纳方法变得越来越重要。本文提出了一种分布式学习算法,将原有的(集中式)CHAID算法扩展到分布式版本。这种分布式算法产生的结果与集中式算法完全相同。为了完备性,本文提出了一种分布式量化方法,使连续数据可以被算法处理。给出了几个已知数据集的实验结果,并将其与使用CHAID生成的决策树进行了比较。
{"title":"Chi-Square Test Based Decision Trees Induction in Distributed Environment","authors":"Jie Ouyang, Nilesh V. Patel, I. Sethi","doi":"10.1109/ICDMW.2008.37","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.37","url":null,"abstract":"The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one distributed learning algorithm which extends the original(centralized) CHAID algorithm to its distributed version. This distributed algorithm generates exactly the same results as its centralized counterpart. For completeness, a distributed quantization method is proposed so that continuous data can be processed by our algorithm. Experimental results for several well known data sets are presented and compared with decision trees generated using CHAID with centrally stored data.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114501374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2008 IEEE International Conference on Data Mining Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1