首页 > 最新文献

2015 International Workshop on Data Mining with Industrial Applications (DMIA)最新文献

英文 中文
A Mining Approach to Evaluate Geoportals Usability 一种评估地理门户可用性的挖掘方法
Esther Hochsztain
A geoportal is a basic component of a spatial data infrastructure, used for searching, viewing, and downloading spatial data and services. It can be considered as a web application acting as an access point to the shared geographic information, being the place where distributed geographic data and services can be discovered. A geoportal offers the opportunity for different type of organizations to make their data and services accessible for the whole community of internet users and geoportals usability is considered a key concept. From the organizations point of view, usability is related to how the geoportal can support people to perform their tasks effectively and efficiently. From the end-users' point of view, usability concerns to how a geoportal is perceived, in a satisfying manner, to support users tasks. In this paper we present a mining approach to discover patterns related to geoportals usability evaluation. A framework is proposed based on several data sources in order to identify geoportals strengths and weaknesses affecting usability. The requirements emerging and its implications are analyzed. A geoportal evaluation case study is presented, performing web server logs analysis and System Usability Scale (SUS) questionnaire experimental data analysis applying factor analysis and association rules.
地理门户是空间数据基础设施的基本组成部分,用于搜索、查看和下载空间数据和服务。它可以被看作是一个web应用程序,作为共享地理信息的访问点,是可以发现分布式地理数据和服务的地方。地理门户为不同类型的组织提供了使其数据和服务可供整个互联网用户社区访问的机会,地理门户的可用性被认为是一个关键概念。从组织的角度来看,可用性与地理门户如何支持人们有效和高效地执行任务有关。从最终用户的角度来看,可用性涉及如何以令人满意的方式感知地理门户以支持用户任务。在本文中,我们提出了一种挖掘方法来发现与地理门户可用性评估相关的模式。提出了一个基于多个数据源的框架,以确定地理门户网站影响可用性的优缺点。分析了出现的需求及其影响。以一个地理门户网站评价案例为研究对象,应用因子分析和关联规则对web服务器日志和系统可用性量表(SUS)问卷进行实验数据分析。
{"title":"A Mining Approach to Evaluate Geoportals Usability","authors":"Esther Hochsztain","doi":"10.1109/DMIA.2015.22","DOIUrl":"https://doi.org/10.1109/DMIA.2015.22","url":null,"abstract":"A geoportal is a basic component of a spatial data infrastructure, used for searching, viewing, and downloading spatial data and services. It can be considered as a web application acting as an access point to the shared geographic information, being the place where distributed geographic data and services can be discovered. A geoportal offers the opportunity for different type of organizations to make their data and services accessible for the whole community of internet users and geoportals usability is considered a key concept. From the organizations point of view, usability is related to how the geoportal can support people to perform their tasks effectively and efficiently. From the end-users' point of view, usability concerns to how a geoportal is perceived, in a satisfying manner, to support users tasks. In this paper we present a mining approach to discover patterns related to geoportals usability evaluation. A framework is proposed based on several data sources in order to identify geoportals strengths and weaknesses affecting usability. The requirements emerging and its implications are analyzed. A geoportal evaluation case study is presented, performing web server logs analysis and System Usability Scale (SUS) questionnaire experimental data analysis applying factor analysis and association rules.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134253118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Feature Grouping and Selection on High-Dimensional Microarray Data 高维微阵列数据的特征分组与选择
M. García-Torres, Francisco Gómez-Vela, D. Becerra-Alonso, B. Melián-Batista, Marcos Moreno-Vega
In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power.
在分类任务中,随着维数的增加,分类器的性能会不断提高,直到达到最优的特征数量。在不增加训练样本数量的情况下进一步增加维数会导致分类器性能下降。这一事实被称为维度的诅咒,随着更大数据集的出现和大数据知识发现的需求变得更加相关。在这种情况下,特征分组已成为提供有关特征之间关系的附加信息的有效方法。在这项工作中,我们提出了一种贪婪策略,称为GreedyPGG,该策略基于马尔可夫毯子的概念对特征进行分组。为此,我们引入了优势特征群的概念。我们还提出了一种适应于高维特征选择的可变邻域搜索(VNS),它使用GreedyPGG来减少搜索空间。我们测试了GreedyPGG在合成数据集上的有效性和VNS在微阵列数据集上的有效性。我们将VNS与流行的竞争策略进行比较。结果表明,GreedyPGG能够有效地对相关特征进行分组,VNS是一种竞争策略,能够发现少量具有高预测能力的特征。
{"title":"Feature Grouping and Selection on High-Dimensional Microarray Data","authors":"M. García-Torres, Francisco Gómez-Vela, D. Becerra-Alonso, B. Melián-Batista, Marcos Moreno-Vega","doi":"10.1109/DMIA.2015.18","DOIUrl":"https://doi.org/10.1109/DMIA.2015.18","url":null,"abstract":"In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127674196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Application of Business Intelligence Techniques to Analyze IT Project Management Data 应用商业智能技术分析IT项目管理数据
A. Tasistro
The IT management project must face many challenges, including the identification of the main criteria that lead to success or failure. Project managers generate a lot of data, which is stored in different formats, but in most organizations its use is not systematized with the aim of "learning from data" and generating reusable knowledge. The objective of this article is to present a framework based on Business Intelligence techniques that contributes improving the management of IT projects.
IT管理项目必须面对许多挑战,包括确定导致成功或失败的主要标准。项目经理生成大量数据,这些数据以不同的格式存储,但在大多数组织中,它的使用并没有系统化,目的是“从数据中学习”并生成可重用的知识。本文的目标是提出一个基于商业智能技术的框架,该框架有助于改进IT项目的管理。
{"title":"Application of Business Intelligence Techniques to Analyze IT Project Management Data","authors":"A. Tasistro","doi":"10.1109/DMIA.2015.15","DOIUrl":"https://doi.org/10.1109/DMIA.2015.15","url":null,"abstract":"The IT management project must face many challenges, including the identification of the main criteria that lead to success or failure. Project managers generate a lot of data, which is stored in different formats, but in most organizations its use is not systematized with the aim of \"learning from data\" and generating reusable knowledge. The objective of this article is to present a framework based on Business Intelligence techniques that contributes improving the management of IT projects.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigating the Role of Individual Neurons as Outlier Detectors 研究单个神经元作为异常值检测器的作用
C. López-Vázquez
The main body of the literature states that Artificial Neural Networks must be regarded as a "black box" without further interpretation due to the inherent difficulties for analyze the weights and bias terms. Some authors claim that ANN trained as a regression device tend to organize itself by specializing some neurons to learn the main relationships embedded in the training set, while other neurons are more concerned with the noise. We suggest here a rule to identify the "noise-related" neurons in multilayer perceptron ANN, and we assume that those neurons are activated only when some unusual values (or combination of values) are present. We consider those events as candidates to hold an outlier. The use of the ANN as outlier detector does not require further training, and can be easily applied.
文献的主体指出,由于分析权重和偏差项的固有困难,人工神经网络必须被视为没有进一步解释的“黑匣子”。一些作者声称,作为回归设备训练的人工神经网络倾向于通过专一化一些神经元来学习嵌入在训练集中的主要关系来组织自己,而其他神经元则更关注噪声。我们在这里提出了一个规则来识别多层感知器人工神经网络中的“噪声相关”神经元,我们假设这些神经元只有在出现一些异常值(或值的组合)时才被激活。我们将这些事件视为持有异常值的候选事件。使用人工神经网络作为离群值检测器不需要进一步的训练,并且可以很容易地应用。
{"title":"Investigating the Role of Individual Neurons as Outlier Detectors","authors":"C. López-Vázquez","doi":"10.1109/DMIA.2015.11","DOIUrl":"https://doi.org/10.1109/DMIA.2015.11","url":null,"abstract":"The main body of the literature states that Artificial Neural Networks must be regarded as a \"black box\" without further interpretation due to the inherent difficulties for analyze the weights and bias terms. Some authors claim that ANN trained as a regression device tend to organize itself by specializing some neurons to learn the main relationships embedded in the training set, while other neurons are more concerned with the noise. We suggest here a rule to identify the \"noise-related\" neurons in multilayer perceptron ANN, and we assume that those neurons are activated only when some unusual values (or combination of values) are present. We consider those events as candidates to hold an outlier. The use of the ANN as outlier detector does not require further training, and can be easily applied.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134486138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Selection via Approximated Markov Blankets Using the CFS Method 基于CFS方法的近似马尔可夫毛毯特征选择
Rafael Arias-Michel, M. García-Torres, C. Schaerer, F. Divina
Feature selection has become an important research area in machine learning due to rapid advances in technology. In high-dimensional spaces, the difficulty of classification is intrinsically caused by the existence of irrelevant and redundant features that, in general, degrade the performance of a classifier. Moreover, finding the optimal subset of features becomes intractable even for low-dimensional datasets. In this context, Markov blanket discovery can be used to identify such subset. The approximated Markov blanket (AMb) is an efficient and effective approach to induce Markov blankets from data. However, this approach only considers pairwise comparisons of features. In this paper, we redefine the AMb to consider the interaction among features of a given subset of features. We use the Correlation based Feature Selection (CFS) function to measure such interactions and, as search strategy, the Fast Correlation based Filter (FCBF). The proposal, denoted as FCBFCFS, is compared with the FCBF and tested on synthetic and real-world datasets from the microarray domain. Results show that the inclusion of interactions among features in a subset may led to smaller subsets of features without degrading the classification task.
随着技术的飞速发展,特征选择已成为机器学习的一个重要研究领域。在高维空间中,分类的困难本质上是由不相关和冗余特征的存在引起的,这些特征通常会降低分类器的性能。此外,即使对于低维数据集,寻找最优特征子集也变得棘手。在这种情况下,马尔可夫覆盖发现可以用来识别这样的子集。近似马尔可夫毯是一种从数据中导出马尔可夫毯的有效方法。然而,这种方法只考虑特征的两两比较。在本文中,我们重新定义了特征集,以考虑给定特征子集的特征之间的相互作用。我们使用基于相关性的特征选择(CFS)函数来衡量这种相互作用,并使用基于快速相关性的过滤器(FCBF)作为搜索策略。该方案被称为FCBFCFS,并与FCBF进行了比较,并在微阵列域的合成数据集和实际数据集上进行了测试。结果表明,在一个子集中包含特征之间的相互作用可能会导致更小的特征子集,而不会降低分类任务。
{"title":"Feature Selection via Approximated Markov Blankets Using the CFS Method","authors":"Rafael Arias-Michel, M. García-Torres, C. Schaerer, F. Divina","doi":"10.1109/DMIA.2015.17","DOIUrl":"https://doi.org/10.1109/DMIA.2015.17","url":null,"abstract":"Feature selection has become an important research area in machine learning due to rapid advances in technology. In high-dimensional spaces, the difficulty of classification is intrinsically caused by the existence of irrelevant and redundant features that, in general, degrade the performance of a classifier. Moreover, finding the optimal subset of features becomes intractable even for low-dimensional datasets. In this context, Markov blanket discovery can be used to identify such subset. The approximated Markov blanket (AMb) is an efficient and effective approach to induce Markov blankets from data. However, this approach only considers pairwise comparisons of features. In this paper, we redefine the AMb to consider the interaction among features of a given subset of features. We use the Correlation based Feature Selection (CFS) function to measure such interactions and, as search strategy, the Fast Correlation based Filter (FCBF). The proposal, denoted as FCBFCFS, is compared with the FCBF and tested on synthetic and real-world datasets from the microarray domain. Results show that the inclusion of interactions among features in a subset may led to smaller subsets of features without degrading the classification task.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114016513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards a Data Processing Architecture for the Weather Radar of the INTA Anguil INTA安吉尔气象雷达数据处理体系研究
M. Diván, Yanina Bellini Saibene, Maria de los Ángeles Martín, María Laura Belmonte, Guillermo Lafuente, J. Caldera
The Weather Radar (WR) of the Experimental Agricultural Station (EAS) INTA Anguil produces daily a volume of 17GB of data, which represents about 6.2 Tb annually. The use of such data when they are generated, as well as its subsequent management, use and the possibility of providing services to the public represent a challenge in terms of volume and complexity. The Strategy for Data Stream Processing based on Measurement Metadata (SDSPbMM) is a data stream manager sustained in a measurement and evaluation framework, which incorporates detective and predictive behavior, through the use of measurements and associated metadata. This paper proposes a processing architecture that extends the SDSPbMM to incorporate the processing of big data. This would provide the WR of a detective and predictive behavior on online data, as well as include a layer of public services, which encourages the consumption of data generated by the WR of INTA Anguil.
安吉尔试验农业站(EAS)的气象雷达(WR)每天产生17GB的数据量,相当于每年约6.2 Tb。这些数据产生后的使用,以及随后的管理、使用和向公众提供服务的可能性,在数量和复杂性方面都是一项挑战。基于测量元数据的数据流处理策略(SDSPbMM)是一个在测量和评估框架中维持的数据流管理器,通过使用测量和相关的元数据,它结合了检测和预测行为。本文提出了一种扩展SDSPbMM的处理体系结构,以纳入大数据的处理。这将提供在线数据的检测和预测行为的WR,并包括一个公共服务层,它鼓励使用INTA Anguil的WR生成的数据。
{"title":"Towards a Data Processing Architecture for the Weather Radar of the INTA Anguil","authors":"M. Diván, Yanina Bellini Saibene, Maria de los Ángeles Martín, María Laura Belmonte, Guillermo Lafuente, J. Caldera","doi":"10.1109/DMIA.2015.12","DOIUrl":"https://doi.org/10.1109/DMIA.2015.12","url":null,"abstract":"The Weather Radar (WR) of the Experimental Agricultural Station (EAS) INTA Anguil produces daily a volume of 17GB of data, which represents about 6.2 Tb annually. The use of such data when they are generated, as well as its subsequent management, use and the possibility of providing services to the public represent a challenge in terms of volume and complexity. The Strategy for Data Stream Processing based on Measurement Metadata (SDSPbMM) is a data stream manager sustained in a measurement and evaluation framework, which incorporates detective and predictive behavior, through the use of measurements and associated metadata. This paper proposes a processing architecture that extends the SDSPbMM to incorporate the processing of big data. This would provide the WR of a detective and predictive behavior on online data, as well as include a layer of public services, which encourages the consumption of data generated by the WR of INTA Anguil.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Teaching an Learning Business Intelligence: Business Evaluation Last but Not Least 教与学商业智能:商业评估最后但并非最不重要
Esther Hochsztain, A. Tasistro
Most data mining, business intelligence and data warehousing university courses are focused in techniques and modeling. But they fail in teaching business understanding, prototyping and how to involve business users from the first stages of a business intelligence project. In this paper we review research related to business intelligence teaching identifying strengths and weaknesses of most common ways of teaching and learning. Following we describe our experience teaching business intelligence in different areas. We continue proposing a method to test the project business applications and the return of investment study.
大多数数据挖掘、商业智能和数据仓库大学课程都侧重于技术和建模。但是,在商业智能项目的第一阶段,它们无法教授业务理解、原型以及如何让业务用户参与进来。在本文中,我们回顾了与商业智能教学相关的研究,确定了最常见的教学和学习方式的优缺点。下面我们将介绍我们在不同领域教授商业智能的经验。我们继续提出了一种测试项目商业应用和投资回报研究的方法。
{"title":"Teaching an Learning Business Intelligence: Business Evaluation Last but Not Least","authors":"Esther Hochsztain, A. Tasistro","doi":"10.1109/DMIA.2015.19","DOIUrl":"https://doi.org/10.1109/DMIA.2015.19","url":null,"abstract":"Most data mining, business intelligence and data warehousing university courses are focused in techniques and modeling. But they fail in teaching business understanding, prototyping and how to involve business users from the first stages of a business intelligence project. In this paper we review research related to business intelligence teaching identifying strengths and weaknesses of most common ways of teaching and learning. Following we describe our experience teaching business intelligence in different areas. We continue proposing a method to test the project business applications and the return of investment study.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116549053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive Models of Economic Systems Based on Data Mining 基于数据挖掘的经济系统预测模型
J. Cazal
Data election to build a representative model able to explain socio-economic phenomena is a challenge within the model construction stage itself. Knowing what data to include within the studies and what to discard is a challenge, and again, at the same time, a great amount of possible factors affecting each variable behavior must be found. In complex phenomena, the number of factors affecting a variable is enormous, and isolating a variable can become a hopeless effort. Besides, there are also factors that are difficultly observable or inherently not observable that must be considered, those ones known as errors or perturbations in a relation that have influence in the constructed model outputs. Techniques applied in data mining can give support to the studies in the moment of analyzing the socio-economic phenomena and demonstrate results obtained through a scientific and reliable way. Data mining is proposed as a valid option in the study of indicators contrasting the traditional methodology (econometrics). An experiment was conducted to contrast two cultures in the use of statistical modeling. One assumes that the data are generated by stochastic GIVEN data model (Data Modeling Culture). The other one uses algorithmic models and treats the data as unknown mechanism (Algorithmic Modeling Culture).
为了构建一个能够解释社会经济现象的有代表性的模型而选择数据是模型构建阶段本身的一个挑战。知道在研究中包含哪些数据以及丢弃哪些数据是一项挑战,与此同时,必须找到影响每种变量行为的大量可能因素。在复杂的现象中,影响一个变量的因素数量是巨大的,孤立一个变量可能会变得毫无希望。此外,还必须考虑难以观察到或本质上不可观察到的因素,这些因素被称为影响所构建模型输出的关系中的误差或扰动。数据挖掘技术可以在分析社会经济现象时为研究提供支持,并以科学可靠的方式展示研究结果。数据挖掘被提议作为一种有效的选择,在研究指标对比传统的方法(计量经济学)。进行了一项实验来对比两种文化在使用统计模型方面的差异。假设数据是由随机给定数据模型(data Modeling Culture)生成的。另一种是使用算法模型,将数据视为未知机制(算法建模文化)。
{"title":"Predictive Models of Economic Systems Based on Data Mining","authors":"J. Cazal","doi":"10.1109/DMIA.2015.20","DOIUrl":"https://doi.org/10.1109/DMIA.2015.20","url":null,"abstract":"Data election to build a representative model able to explain socio-economic phenomena is a challenge within the model construction stage itself. Knowing what data to include within the studies and what to discard is a challenge, and again, at the same time, a great amount of possible factors affecting each variable behavior must be found. In complex phenomena, the number of factors affecting a variable is enormous, and isolating a variable can become a hopeless effort. Besides, there are also factors that are difficultly observable or inherently not observable that must be considered, those ones known as errors or perturbations in a relation that have influence in the constructed model outputs. Techniques applied in data mining can give support to the studies in the moment of analyzing the socio-economic phenomena and demonstrate results obtained through a scientific and reliable way. Data mining is proposed as a valid option in the study of indicators contrasting the traditional methodology (econometrics). An experiment was conducted to contrast two cultures in the use of statistical modeling. One assumes that the data are generated by stochastic GIVEN data model (Data Modeling Culture). The other one uses algorithmic models and treats the data as unknown mechanism (Algorithmic Modeling Culture).","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Mining Applications in Entrepreneurship Analysis 数据挖掘在创业分析中的应用
Esther Hochsztain, A. Tasistro, M. Messina
Creative entrepreneurship is considered an important factor in economic development achievement, specially in the knowledge-based society. Universities play a fundamental role in the process of entrepreneurial development and the entrepreneurship ecosystem.CCEEmprende is a program to support entrepreneurs developed by Facultad de Ciencias Económicas y de Administración - Universidad de la República, Uruguay. In this paper we present the use of data mining to improve decision making in entrepreneurship management, based on CCEEmprende projects data. A case study using several data mining and statistical techniques (association rules, decision trees, logistic regression) is developed with two goals: anticipating project success and identifying the most important factors related to project success/failure.
创造性创业被认为是经济发展成就的重要因素,特别是在知识型社会。大学在创业发展过程和创业生态系统中起着基础性作用。CCEEmprende是由乌拉圭科学学院Económicas y de Administración - República大学开发的一个支持企业家的项目。本文以CCEEmprende项目数据为基础,介绍了数据挖掘在创业管理决策中的应用。使用几种数据挖掘和统计技术(关联规则、决策树、逻辑回归)的案例研究有两个目标:预测项目成功和确定与项目成功/失败相关的最重要因素。
{"title":"Data Mining Applications in Entrepreneurship Analysis","authors":"Esther Hochsztain, A. Tasistro, M. Messina","doi":"10.1109/DMIA.2015.21","DOIUrl":"https://doi.org/10.1109/DMIA.2015.21","url":null,"abstract":"Creative entrepreneurship is considered an important factor in economic development achievement, specially in the knowledge-based society. Universities play a fundamental role in the process of entrepreneurial development and the entrepreneurship ecosystem.CCEEmprende is a program to support entrepreneurs developed by Facultad de Ciencias Económicas y de Administración - Universidad de la República, Uruguay. In this paper we present the use of data mining to improve decision making in entrepreneurship management, based on CCEEmprende projects data. A case study using several data mining and statistical techniques (association rules, decision trees, logistic regression) is developed with two goals: anticipating project success and identifying the most important factors related to project success/failure.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123261911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Integrated Strategy Based in Processes, Requirements, Measurement and Evaluation for the Formalization of Necessities in Data Warehouse Projects 基于过程、需求、度量和评价的数据仓库项目需求形式化集成策略
Avalos Veronica Nathali, Diván Mario José
In this work we proposes an Integrated Strategy based in Processes, Requirements, Measurement and Evaluation, whose objective is to identify and maintain a traceability of such requirements at early stages in data warehouse projects. Our strategy starts with the process formalization using SPEM to improve its communicability and extensibility. From the process formalization, we continue with the definition of the measurement and evaluation (M&E) project to quantify the behavior of each process and its necessities. This enables progress in the project scoping, the early identification of its risks, and at the same time establishes a traceability mechanism between the decisions and the artifacts that may generate throughout its life cycle. This represents an important compliment regarding life cycles as proposed by Kimball, in which the requirement phase is not formalized and there is no strategy to clearly define the aspects to quantify and/or analyze to the effects of supporting a decision making process. Finally, an example of the strategy application for one process of the Ministry of Education of Ecuador is shown.
在这项工作中,我们提出了一个基于过程、需求、度量和评估的集成策略,其目标是在数据仓库项目的早期阶段识别并维护这些需求的可追溯性。我们的策略从使用SPEM的过程形式化开始,以提高其可通信性和可扩展性。从过程形式化开始,我们继续测量和评估(M&E)项目的定义,以量化每个过程的行为及其必要性。这使得项目范围的进展,风险的早期识别,同时在决策和可能在其整个生命周期中产生的工件之间建立了可跟踪机制。这代表了Kimball提出的关于生命周期的一个重要的赞美,在这个生命周期中,需求阶段没有形式化,也没有策略来清楚地定义量化和/或分析支持决策制定过程的影响的方面。最后,以厄瓜多尔教育部的一个流程为例,给出了该策略的应用实例。
{"title":"An Integrated Strategy Based in Processes, Requirements, Measurement and Evaluation for the Formalization of Necessities in Data Warehouse Projects","authors":"Avalos Veronica Nathali, Diván Mario José","doi":"10.1109/DMIA.2015.13","DOIUrl":"https://doi.org/10.1109/DMIA.2015.13","url":null,"abstract":"In this work we proposes an Integrated Strategy based in Processes, Requirements, Measurement and Evaluation, whose objective is to identify and maintain a traceability of such requirements at early stages in data warehouse projects. Our strategy starts with the process formalization using SPEM to improve its communicability and extensibility. From the process formalization, we continue with the definition of the measurement and evaluation (M&E) project to quantify the behavior of each process and its necessities. This enables progress in the project scoping, the early identification of its risks, and at the same time establishes a traceability mechanism between the decisions and the artifacts that may generate throughout its life cycle. This represents an important compliment regarding life cycles as proposed by Kimball, in which the requirement phase is not formalized and there is no strategy to clearly define the aspects to quantify and/or analyze to the effects of supporting a decision making process. Finally, an example of the strategy application for one process of the Ministry of Education of Ecuador is shown.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133712530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2015 International Workshop on Data Mining with Industrial Applications (DMIA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1