首页 > 最新文献

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)最新文献

英文 中文
Mining toxicity structural alerts from SMILES: A new way to derive Structure Activity Relationships 从SMILES中挖掘毒性结构警报:一种导出结构活动关系的新方法
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949444
Thomas Ferrari, G. Gini, N. G. Bakhtyari, E. Benfenati
Encouraged by recent legislations all over the world, aimed to protect human health and environment, in silico techniques have proved their ability to assess the toxicity of chemicals. However, they act often like a black-box, without giving a clear contribution to the scientific insight; such over-optimized methods may be beyond understanding, behaving more like competitors of human experts' knowledge, rather than assistants. In this work, a new Structure-Activity Relationship (SAR) approach is proposed to mine molecular fragments that act like structural alerts for biological activity. The entire process is designed to fit with human reasoning, not only to make its predictions more reliable, but also to enable a clear control by the user, in order to match customized requirements. Such an approach has been implemented and tested on the mutagenicity endpoint, showing marked prediction skills and, more interestingly, discovering much of the knowledge already collected in literature as well as new evidences. The achieved tool is a powerful instrument for both SAR knowledge discovery and for activity prediction on untested compounds.
在世界各地旨在保护人类健康和环境的最新立法的鼓舞下,计算机技术证明了它们评估化学品毒性的能力。然而,它们的行为往往像一个黑盒子,对科学见解没有明确的贡献;这种过度优化的方法可能无法理解,更像是人类专家知识的竞争对手,而不是助手。在这项工作中,提出了一种新的结构-活性关系(SAR)方法来挖掘像生物活性结构警报一样的分子片段。整个过程被设计为适合人类推理,不仅使其预测更可靠,而且使用户能够明确控制,以匹配定制需求。这种方法已经在致突变性终点上实施和测试,显示出显著的预测技能,更有趣的是,发现了许多已经在文献中收集的知识以及新的证据。所实现的工具是一个强大的工具,无论是SAR知识发现和活性预测未测试的化合物。
{"title":"Mining toxicity structural alerts from SMILES: A new way to derive Structure Activity Relationships","authors":"Thomas Ferrari, G. Gini, N. G. Bakhtyari, E. Benfenati","doi":"10.1109/CIDM.2011.5949444","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949444","url":null,"abstract":"Encouraged by recent legislations all over the world, aimed to protect human health and environment, in silico techniques have proved their ability to assess the toxicity of chemicals. However, they act often like a black-box, without giving a clear contribution to the scientific insight; such over-optimized methods may be beyond understanding, behaving more like competitors of human experts' knowledge, rather than assistants. In this work, a new Structure-Activity Relationship (SAR) approach is proposed to mine molecular fragments that act like structural alerts for biological activity. The entire process is designed to fit with human reasoning, not only to make its predictions more reliable, but also to enable a clear control by the user, in order to match customized requirements. Such an approach has been implemented and tested on the mutagenicity endpoint, showing marked prediction skills and, more interestingly, discovering much of the knowledge already collected in literature as well as new evidences. The achieved tool is a powerful instrument for both SAR knowledge discovery and for activity prediction on untested compounds.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A robust F-measure for evaluating discovered process models 用于评估发现的过程模型的稳健f度量
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949428
Jochen De Weerdt, M. D. Backer, J. Vanthienen, B. Baesens
Within process mining research, one of the most important fields of study is process discovery, which can be defined as the extraction of control-flow models from audit trails or information system event logs. The evaluation of discovered process models is an essential but difficult task for any process discovery analysis. With this paper, we propose a novel approach for evaluating discovered process models based on artificially generated negative events. This approach allows for the definition of a behavioral F-measure for discovered process models, which is the main contribution of this paper.
在流程挖掘研究中,最重要的研究领域之一是流程发现,它可以定义为从审计跟踪或信息系统事件日志中提取控制流模型。对于任何流程发现分析来说,评估已发现的流程模型都是一项必要但困难的任务。在本文中,我们提出了一种基于人为产生的负面事件来评估发现过程模型的新方法。这种方法允许为发现的过程模型定义行为f度量,这是本文的主要贡献。
{"title":"A robust F-measure for evaluating discovered process models","authors":"Jochen De Weerdt, M. D. Backer, J. Vanthienen, B. Baesens","doi":"10.1109/CIDM.2011.5949428","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949428","url":null,"abstract":"Within process mining research, one of the most important fields of study is process discovery, which can be defined as the extraction of control-flow models from audit trails or information system event logs. The evaluation of discovered process models is an essential but difficult task for any process discovery analysis. With this paper, we propose a novel approach for evaluating discovered process models based on artificially generated negative events. This approach allows for the definition of a behavioral F-measure for discovered process models, which is the main contribution of this paper.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121069545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Active learning using the data distribution for interactive image classification and retrieval 主动学习利用数据分布进行交互式图像分类和检索
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949446
P. Blanchart, Marin Ferecatu, M. Datcu
In the context of image search and classification, we describe an active learning strategy that relies on the intrinsic data distribution modeled as a mixture of Gaussians to speed up the learning of the target class using an interactive relevance feedback process. The contributions of our work are twofold: First, we introduce a new form of a semi-supervised C-SVM algorithm that exploits the intrinsic data distribution by working directly on equiprobable envelopes of Gaussian mixture components. Second, we introduce an active learning strategy which allows to interactively adjust the equiprobable envelopes in a small number of feedback steps. The proposed method allows the exploitation of the information contained in the unlabeled data and does not suffer from the drawbacks inherent to semi-supervised methods, e.g. computation time and memory requirements. Tests performed on a database of high-resolution satellite images and on a database of color images show that our system compares favorably, in terms of learning speed and ability to manage large volumes of data, to the classic approach using SVM active learning.
在图像搜索和分类的背景下,我们描述了一种主动学习策略,该策略依赖于作为高斯混合模型的固有数据分布,通过交互式相关反馈过程加速目标类的学习。我们的工作有两个方面的贡献:首先,我们引入了一种新形式的半监督C-SVM算法,该算法通过直接处理高斯混合成分的等概率包络来利用固有数据分布。其次,我们引入了一种主动学习策略,该策略允许在少量反馈步骤中交互式地调整等概率包络。该方法允许利用未标记数据中包含的信息,并且不受半监督方法固有的缺点的影响,例如计算时间和内存需求。在高分辨率卫星图像数据库和彩色图像数据库上进行的测试表明,我们的系统在学习速度和管理大量数据的能力方面优于使用SVM主动学习的经典方法。
{"title":"Active learning using the data distribution for interactive image classification and retrieval","authors":"P. Blanchart, Marin Ferecatu, M. Datcu","doi":"10.1109/CIDM.2011.5949446","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949446","url":null,"abstract":"In the context of image search and classification, we describe an active learning strategy that relies on the intrinsic data distribution modeled as a mixture of Gaussians to speed up the learning of the target class using an interactive relevance feedback process. The contributions of our work are twofold: First, we introduce a new form of a semi-supervised C-SVM algorithm that exploits the intrinsic data distribution by working directly on equiprobable envelopes of Gaussian mixture components. Second, we introduce an active learning strategy which allows to interactively adjust the equiprobable envelopes in a small number of feedback steps. The proposed method allows the exploitation of the information contained in the unlabeled data and does not suffer from the drawbacks inherent to semi-supervised methods, e.g. computation time and memory requirements. Tests performed on a database of high-resolution satellite images and on a database of color images show that our system compares favorably, in terms of learning speed and ability to manage large volumes of data, to the classic approach using SVM active learning.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distinguishing defined concepts from prerequisite concepts in learning resources 在学习资源中区分已定义概念和先决概念
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949296
S. Changuel, Nicolas Labroche
The objective of any tutoring system is to provide meaningful learning to the learner, thence it is important to know whether a concept mentioned in a document is a prerequisite for studying that document, or it can be learned from it. In this paper, we study the problem of identifying defined concepts and prerequisite concepts from learning resources available on the web. Statistics and machine learning tools are exploited in order to predict the class of each concept. Two groups of features are constructed to categorise the concepts: contextual features and local features. The contextual features enclose linguistic information and the local features contain the concept properties such as font size and font weigh. An aggregation method is proposed as a solution to the problem of the multiple occurrences of a defined concept in a document. This paper shows that best results are obtained with the SVM classifier than with other classifiers.
任何辅导系统的目标都是为学习者提供有意义的学习,因此了解文件中提到的概念是学习该文件的先决条件还是可以从中学习是很重要的。在本文中,我们研究了从网络上可用的学习资源中识别定义概念和前提概念的问题。利用统计和机器学习工具来预测每个概念的类别。构建了两组特征来对概念进行分类:上下文特征和局部特征。上下文特征包含语言信息,局部特征包含概念属性,如字体大小和字体重量。针对一个定义概念在文档中多次出现的问题,提出了一种聚合方法。结果表明,SVM分类器的分类效果优于其他分类器。
{"title":"Distinguishing defined concepts from prerequisite concepts in learning resources","authors":"S. Changuel, Nicolas Labroche","doi":"10.1109/CIDM.2011.5949296","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949296","url":null,"abstract":"The objective of any tutoring system is to provide meaningful learning to the learner, thence it is important to know whether a concept mentioned in a document is a prerequisite for studying that document, or it can be learned from it. In this paper, we study the problem of identifying defined concepts and prerequisite concepts from learning resources available on the web. Statistics and machine learning tools are exploited in order to predict the class of each concept. Two groups of features are constructed to categorise the concepts: contextual features and local features. The contextual features enclose linguistic information and the local features contain the concept properties such as font size and font weigh. An aggregation method is proposed as a solution to the problem of the multiple occurrences of a defined concept in a document. This paper shows that best results are obtained with the SVM classifier than with other classifiers.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127547564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A multi-Biclustering Combinatorial Based algorithm 一种基于多双聚类的组合算法
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949454
E. Nosova, G. Raiconi, R. Tagliaferri
In the last years a large amount of information about genomes was discovered, increasing the complexity of analysis. Therefore the most advanced techniques and algorithms are required. In many cases researchers use unsupervised clustering. But the inability of clustering to solve a number of tasks requires new algorithms. So, recently, scientists turned their attention to the biclustering techniques. In this paper we propose a novel biclustering technique, that we call Combinatorial Biclustering Algorithm (BCA). This technique permits to solve the following problems: 1) classification of data with respect to rows and columns together; 2) discovering of the overlapped biclusters; 3) definition of the minimal number of rows and columns in biclusters; 4) finding all biclusters together. We apply our model to two synthetic and one real biological data sets and show the results.
在过去的几年里,大量关于基因组的信息被发现,增加了分析的复杂性。因此,需要最先进的技术和算法。在许多情况下,研究人员使用无监督聚类。但是聚类无法解决许多任务需要新的算法。所以,最近,科学家们把注意力转向了双聚类技术。本文提出了一种新的双聚类技术,我们称之为组合双聚类算法(BCA)。该技术允许解决以下问题:1)将数据按行和列进行分类;2)重叠双星团的发现;3)定义双聚类的最小行数和最小列数;4)找到所有的双星团。我们将该模型应用于两个合成数据集和一个真实生物数据集,并展示了结果。
{"title":"A multi-Biclustering Combinatorial Based algorithm","authors":"E. Nosova, G. Raiconi, R. Tagliaferri","doi":"10.1109/CIDM.2011.5949454","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949454","url":null,"abstract":"In the last years a large amount of information about genomes was discovered, increasing the complexity of analysis. Therefore the most advanced techniques and algorithms are required. In many cases researchers use unsupervised clustering. But the inability of clustering to solve a number of tasks requires new algorithms. So, recently, scientists turned their attention to the biclustering techniques. In this paper we propose a novel biclustering technique, that we call Combinatorial Biclustering Algorithm (BCA). This technique permits to solve the following problems: 1) classification of data with respect to rows and columns together; 2) discovering of the overlapped biclusters; 3) definition of the minimal number of rows and columns in biclusters; 4) finding all biclusters together. We apply our model to two synthetic and one real biological data sets and show the results.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113994229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dimensionality reduction mappings 降维映射
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949443
K. Bunte, Michael Biehl, B. Hammer
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
已经建立了大量强大的降维方法,可用于数据可视化和预处理。这些都伴随着正式的评价方案,允许根据一般原则进行定量评价,甚至导致基于这些目标的进一步可视化方案。然而,大多数方法只提供先前给定的有限点集的映射,需要额外的步骤进行样本外扩展。我们提出了基于成本函数概念的降维的一般观点,并基于这一一般原则,将降维扩展到数据流形的显式映射。这提供了简单的样本外扩展。此外,它为数据可视化理论开辟了一条道路,从其对新数据点的泛化能力的角度出发。我们演示了基于简单的全局线性映射和基于原型的局部线性映射的方法。
{"title":"Dimensionality reduction mappings","authors":"K. Bunte, Michael Biehl, B. Hammer","doi":"10.1109/CIDM.2011.5949443","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949443","url":null,"abstract":"A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Efficient accelerometer-based swimming exercise tracking 高效的基于加速度计的游泳运动跟踪
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949430
Pekka Siirtola, P. Laurinen, J. Röning, H. Kinnunen
The study concentrates on tracking swimming exercises based on the data of 3D accelerometer and shows that human activities can be tracked accurately using low sampling rates. The tracking of swimming exercise is done in three phases: first the swimming style and turns are recognized, secondly the number of strokes are counted and thirdly the intensity of swimming is estimated. Tracking is done using efficient methods because the methods presented in the study are designed for light applications which do not allow heavy computing. To keep tracking as light as possible it is studied what is the lowest sampling frequency that can be used and still obtain accurate results. Moreover, two different sensor placements (wrist and upper back) are compared. The results of the study show that tracking can be done with high accuracy using simple methods that are fast to calculate and with a really low sampling frequency. It is shown that an upper back-worn sensor is more accurate than a wrist-worn one when the swimming style is recognized, but when the number of strokes is counted and intensity estimated, the sensors give approximately equally accurate results.
本研究以三维加速度计的数据为基础,对游泳运动进行了跟踪研究,证明了在低采样率的情况下,可以准确地跟踪人类活动。游泳运动的跟踪分三个阶段进行:第一阶段是对游泳姿势和转身的识别,第二阶段是对泳姿的计数,第三阶段是对游泳强度的估计。跟踪是使用有效的方法完成的,因为研究中提出的方法是为不允许大量计算的轻型应用而设计的。为了保持尽可能轻的跟踪,研究了可以使用的最低采样频率,仍然可以获得准确的结果。此外,比较了两种不同的传感器位置(手腕和上背部)。研究结果表明,使用简单、计算速度快、采样频率低的方法可以实现高精度的跟踪。结果表明,在识别游泳姿势时,佩戴在背部的传感器比佩戴在手腕上的传感器更准确,但在计算划水次数和估计游泳强度时,传感器给出的结果大致相同。
{"title":"Efficient accelerometer-based swimming exercise tracking","authors":"Pekka Siirtola, P. Laurinen, J. Röning, H. Kinnunen","doi":"10.1109/CIDM.2011.5949430","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949430","url":null,"abstract":"The study concentrates on tracking swimming exercises based on the data of 3D accelerometer and shows that human activities can be tracked accurately using low sampling rates. The tracking of swimming exercise is done in three phases: first the swimming style and turns are recognized, secondly the number of strokes are counted and thirdly the intensity of swimming is estimated. Tracking is done using efficient methods because the methods presented in the study are designed for light applications which do not allow heavy computing. To keep tracking as light as possible it is studied what is the lowest sampling frequency that can be used and still obtain accurate results. Moreover, two different sensor placements (wrist and upper back) are compared. The results of the study show that tracking can be done with high accuracy using simple methods that are fast to calculate and with a really low sampling frequency. It is shown that an upper back-worn sensor is more accurate than a wrist-worn one when the swimming style is recognized, but when the number of strokes is counted and intensity estimated, the sensors give approximately equally accurate results.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126151134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
OLAP navigation in the Granular Linguistic Model of a Phenomenon 现象粒度语言模型中的OLAP导航
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949458
Carlos Menendez-Gonzalez, G. Triviño
The amount of data provided by computers about objects in our environment increases. Nevertheless, the ability to extract relevant and understandable knowledge from this information remains limited. On-Line Analytical Processing (OLAP) is a well known paradigm used to help users to navigate by the information stored in databases. In the research line of Computational Theory of Perceptions, we have created the Granular Linguistic Model of a Phenomenon. It is a data structure that allows computational systems to generate linguistic descriptions of input data. In this paper, we explore the possibilities of using OLAP to navigate in this structure of information. Inspired in the way humans use NL, we adapt the typical operations in OLAP, namely, drilling, rolling and slicing/dicing, to navigate by hierarchical granular structures of fuzzy perceptions. The long term aim is to create a new type of human computer interface that will assist users in the analysis of the likely huge amount of available information about relevant phenomena. Obtained results show the viability of this approach including a practical demonstration of concept.
计算机提供的关于我们环境中物体的数据量在增加。然而,从这些信息中提取相关和可理解知识的能力仍然有限。联机分析处理(OLAP)是一种众所周知的范例,用于帮助用户通过存储在数据库中的信息进行导航。在感知计算理论的研究方向上,我们创建了一种现象的颗粒语言模型。它是一种数据结构,允许计算系统生成输入数据的语言描述。在本文中,我们探讨了使用OLAP在这种信息结构中导航的可能性。受人类使用自然语言的启发,我们调整了OLAP中的典型操作,即钻孔、滚动和切片/切块,通过模糊感知的分层颗粒结构进行导航。长期目标是创造一种新型的人机界面,帮助用户分析有关相关现象的可能大量的可用信息。得到的结果表明了该方法的可行性,包括概念的实际演示。
{"title":"OLAP navigation in the Granular Linguistic Model of a Phenomenon","authors":"Carlos Menendez-Gonzalez, G. Triviño","doi":"10.1109/CIDM.2011.5949458","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949458","url":null,"abstract":"The amount of data provided by computers about objects in our environment increases. Nevertheless, the ability to extract relevant and understandable knowledge from this information remains limited. On-Line Analytical Processing (OLAP) is a well known paradigm used to help users to navigate by the information stored in databases. In the research line of Computational Theory of Perceptions, we have created the Granular Linguistic Model of a Phenomenon. It is a data structure that allows computational systems to generate linguistic descriptions of input data. In this paper, we explore the possibilities of using OLAP to navigate in this structure of information. Inspired in the way humans use NL, we adapt the typical operations in OLAP, namely, drilling, rolling and slicing/dicing, to navigate by hierarchical granular structures of fuzzy perceptions. The long term aim is to create a new type of human computer interface that will assist users in the analysis of the likely huge amount of available information about relevant phenomena. Obtained results show the viability of this approach including a practical demonstration of concept.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131603066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Link Pattern Prediction with tensor decomposition in multi-relational networks 基于张量分解的多关系网络链接模式预测
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949306
Sheng Gao, Ludovic Denoyer, P. Gallinari
We address the problem of link prediction in collections of objects connected by multiple relation types, where each type may play a distinct role. While traditional link prediction models are limited to single-type link prediction we attempt here to jointly model and predict the multiple relation types, which we refer to as the Link Pattern Prediction (LPP) problem. For that, we propose a tensor decomposition model to solve the LPP problem, which allows to capture the correlations among different relation types and reveal the impact of various relations on prediction performance. The proposed tensor decomposition model is efficiently learned with a conjugate gradient based optimization method. Extensive experiments on real-world datasets demonstrate that this model outperforms the traditional mono-relational model and can achieve better prediction quality.
我们解决了由多个关系类型连接的对象集合中的链接预测问题,其中每个类型可能扮演不同的角色。传统的链路预测模型局限于单一类型的链路预测,本文尝试对多种关系类型进行联合建模和预测,我们称之为链路模式预测(link Pattern prediction, LPP)问题。为此,我们提出了一个张量分解模型来解决LPP问题,该模型可以捕获不同关系类型之间的相关性,并揭示各种关系对预测性能的影响。采用基于共轭梯度的优化方法有效地学习了所提出的张量分解模型。在实际数据集上的大量实验表明,该模型优于传统的单关系模型,可以达到更好的预测质量。
{"title":"Link Pattern Prediction with tensor decomposition in multi-relational networks","authors":"Sheng Gao, Ludovic Denoyer, P. Gallinari","doi":"10.1109/CIDM.2011.5949306","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949306","url":null,"abstract":"We address the problem of link prediction in collections of objects connected by multiple relation types, where each type may play a distinct role. While traditional link prediction models are limited to single-type link prediction we attempt here to jointly model and predict the multiple relation types, which we refer to as the Link Pattern Prediction (LPP) problem. For that, we propose a tensor decomposition model to solve the LPP problem, which allows to capture the correlations among different relation types and reveal the impact of various relations on prediction performance. The proposed tensor decomposition model is efficiently learned with a conjugate gradient based optimization method. Extensive experiments on real-world datasets demonstrate that this model outperforms the traditional mono-relational model and can achieve better prediction quality.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114532877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Trend cluster based compression of geographically distributed data streams 基于趋势聚类的地理分布数据流压缩
Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949298
A. Ciampi, A. Appice, D. Malerba, P. Guccione
In many real-time applications, such as wireless sensor network monitoring, traffic control or health monitoring systems, it is required to analyze continuous and unbounded geographically distributed streams of data (e.g. temperature or humidity measurements transmitted by sensors of weather stations). Storing and querying geo-referenced stream data poses specific challenges both in time (real-time processing) and in space (limited storage capacity). Summarization algorithms can be used to reduce the amount of data to be permanently stored into a data warehouse without losing information for further subsequent analysis. In this paper we present a framework in which data streams are seen as time-varying realizations of stochastic processes. Signal compression techniques, based on transformed domains, are applied and compared with a geometrical segmentation in terms of compression efficiency and accuracy in the subsequent reconstruction.
在许多实时应用中,例如无线传感器网络监测、交通控制或健康监测系统,需要分析连续和无界的地理分布数据流(例如气象站传感器传输的温度或湿度测量值)。存储和查询地理参考流数据在时间(实时处理)和空间(有限的存储容量)上都提出了具体的挑战。可以使用汇总算法减少要永久存储到数据仓库中的数据量,而不会丢失用于进一步后续分析的信息。在本文中,我们提出了一个框架,其中数据流被视为随机过程的时变实现。应用基于变换域的信号压缩技术,并将其与几何分割在后续重建中的压缩效率和精度进行了比较。
{"title":"Trend cluster based compression of geographically distributed data streams","authors":"A. Ciampi, A. Appice, D. Malerba, P. Guccione","doi":"10.1109/CIDM.2011.5949298","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949298","url":null,"abstract":"In many real-time applications, such as wireless sensor network monitoring, traffic control or health monitoring systems, it is required to analyze continuous and unbounded geographically distributed streams of data (e.g. temperature or humidity measurements transmitted by sensors of weather stations). Storing and querying geo-referenced stream data poses specific challenges both in time (real-time processing) and in space (limited storage capacity). Summarization algorithms can be used to reduce the amount of data to be permanently stored into a data warehouse without losing information for further subsequent analysis. In this paper we present a framework in which data streams are seen as time-varying realizations of stochastic processes. Signal compression techniques, based on transformed domains, are applied and compared with a geometrical segmentation in terms of compression efficiency and accuracy in the subsequent reconstruction.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127853487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1