首页 > 最新文献

2011 3rd Conference on Data Mining and Optimization (DMO)最新文献

英文 中文
Comparison of various Wiener model identification approach in modelling nonlinear process 各种维纳模型辨识方法在非线性过程建模中的比较
Pub Date : 2011-08-04 DOI: 10.1109/DMO.2011.5976517
Imam Mujahidin Iqbal, N. Aziz
An accurate and simple model is essential to implement a model based controller. Wiener model is one of the simplest nonlinear models that can represent any nonlinear process. However, in Wiener Model development, there are several identification approaches available and need to be selected to produce the most accurate model. In this work, the nonlinear - linear approach, the linear - nonlinear approach, and the simultaneous approach are compared in identification of the Wiener model for nonlinear pH neutralization process. The parameters of linear block and the inverse of nonlinear block were obtained from several sets of data that are generated. These approaches are then compared in terms of model accuracy, calculation time, data requirement, and their flexibility.
精确、简单的模型是实现基于模型的控制器的必要条件。维纳模型是可以表示任何非线性过程的最简单的非线性模型之一。然而,在维纳模型开发中,有几种可用的识别方法,需要选择以产生最准确的模型。本文比较了非线性-线性方法、线性-非线性方法和同步方法在识别非线性pH中和过程的Wiener模型中的应用。从生成的几组数据中得到线性块和非线性块的逆参数。然后从模型精度、计算时间、数据需求和灵活性方面对这些方法进行比较。
{"title":"Comparison of various Wiener model identification approach in modelling nonlinear process","authors":"Imam Mujahidin Iqbal, N. Aziz","doi":"10.1109/DMO.2011.5976517","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976517","url":null,"abstract":"An accurate and simple model is essential to implement a model based controller. Wiener model is one of the simplest nonlinear models that can represent any nonlinear process. However, in Wiener Model development, there are several identification approaches available and need to be selected to produce the most accurate model. In this work, the nonlinear - linear approach, the linear - nonlinear approach, and the simultaneous approach are compared in identification of the Wiener model for nonlinear pH neutralization process. The parameters of linear block and the inverse of nonlinear block were obtained from several sets of data that are generated. These approaches are then compared in terms of model accuracy, calculation time, data requirement, and their flexibility.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126551809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Frequent pattern using Multiple Attribute Value for itemset generation 使用多属性值生成项目集的频繁模式
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976503
Zalizah Awang Long, A. Bakar, Abdul Razak Hamdan
Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. While Association Rules Mining (ARM) algorithm especially the Apriori algorithm has been an active research work in recent years. Diverse improvement varies in term of producing more frequent items and also generating further k-length. The idea is to produce better pattern and more interesting rules. In this paper, we propose new approach for ARM based on Multiple Attribute Value within the non-binary search spaces. The proposed algorithm improves the existing frequent pattern mining by generating the most frequent values (item) within the attribute and generate candidate based on the frequent attribute value. The main idea of our work is to discover more meaningful frequent items and maximum k-length items. The experimental results show that our proposed MAV frequent pattern mining enhance the impact in generating more frequents items and maximum length
数据挖掘是在大型关系数据库中的数十个字段之间寻找相关性或模式的过程。而关联规则挖掘(ARM)算法特别是Apriori算法是近年来研究的热点。不同的改进在产生更频繁的物品和产生更多的k长度方面有所不同。其想法是产生更好的模式和更有趣的规则。本文提出了一种基于非二进制搜索空间中多属性值的ARM算法。该算法通过生成属性内最频繁的值(项)来改进现有的频繁模式挖掘,并基于频繁属性值生成候选模式。我们工作的主要思想是发现更多有意义的频繁项和最大k长度项。实验结果表明,我们提出的MAV频繁模式挖掘在生成更多的频率项和最大长度方面具有增强的效果
{"title":"Frequent pattern using Multiple Attribute Value for itemset generation","authors":"Zalizah Awang Long, A. Bakar, Abdul Razak Hamdan","doi":"10.1109/DMO.2011.5976503","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976503","url":null,"abstract":"Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. While Association Rules Mining (ARM) algorithm especially the Apriori algorithm has been an active research work in recent years. Diverse improvement varies in term of producing more frequent items and also generating further k-length. The idea is to produce better pattern and more interesting rules. In this paper, we propose new approach for ARM based on Multiple Attribute Value within the non-binary search spaces. The proposed algorithm improves the existing frequent pattern mining by generating the most frequent values (item) within the attribute and generate candidate based on the frequent attribute value. The main idea of our work is to discover more meaningful frequent items and maximum k-length items. The experimental results show that our proposed MAV frequent pattern mining enhance the impact in generating more frequents items and maximum length","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122540991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling forest fires risk using spatial decision tree 基于空间决策树的森林火灾风险建模
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976512
R. Yaakob, N. Mustapha, A. Nuruddin, I. S. Sitanggang
Forest fires have long been annual events in many parts of Sumatra Indonesia during the dry season. Riau Province is one of the regions in Sumatra where forest fires seriously occur every year mostly because of human factors both on purposes and accidently. Forest fire models have been developed for certain area using the weightage and criterion of variables that involve the subjective and qualitative judging for variables. Determining the weights for each criterion is based on expert knowledge or the previous experienced of the developers that may result too subjective models. In addition, criteria evaluation and weighting method are most applied to evaluate the small problem containing few criteria. This paper presents our initial work in developing a spatial decision tree using the spatial ID3 algorithm and Spatial Join Index applied in the SCART (Spatial Classification and Regression Trees) algorithm. The algorithm is applied on historic forest fires data for a district in Riau namely Rokan Hilir to develop a model for forest fires risk. The modeling forest fire risk includes variables related to physical as well as social and economic. The result is a spatial decision tree containing 138 leaves with distance to nearest river as the first test attribute.
长期以来,印尼苏门答腊岛的许多地区每年旱季都会发生森林火灾。廖内省是苏门答腊岛每年发生森林火灾最严重的地区之一,主要是人为因素造成的,有故意的,也有意外的。利用变量的权重和准则建立了特定区域的森林火灾模型,其中涉及对变量的主观判断和定性判断。确定每个标准的权重是基于专家知识或开发人员以前的经验,这可能导致过于主观的模型。另外,标准评价法和加权法多用于评价标准较少的小问题。本文介绍了我们在使用空间ID3算法和应用于SCART(空间分类和回归树)算法的空间连接索引开发空间决策树方面的初步工作。将该算法应用于廖内省罗干希利尔地区的历史森林火灾数据,建立了森林火灾风险模型。森林火灾风险建模包括与物理以及社会和经济相关的变量。结果是一个包含138个叶子的空间决策树,到最近河流的距离作为第一个测试属性。
{"title":"Modeling forest fires risk using spatial decision tree","authors":"R. Yaakob, N. Mustapha, A. Nuruddin, I. S. Sitanggang","doi":"10.1109/DMO.2011.5976512","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976512","url":null,"abstract":"Forest fires have long been annual events in many parts of Sumatra Indonesia during the dry season. Riau Province is one of the regions in Sumatra where forest fires seriously occur every year mostly because of human factors both on purposes and accidently. Forest fire models have been developed for certain area using the weightage and criterion of variables that involve the subjective and qualitative judging for variables. Determining the weights for each criterion is based on expert knowledge or the previous experienced of the developers that may result too subjective models. In addition, criteria evaluation and weighting method are most applied to evaluate the small problem containing few criteria. This paper presents our initial work in developing a spatial decision tree using the spatial ID3 algorithm and Spatial Join Index applied in the SCART (Spatial Classification and Regression Trees) algorithm. The algorithm is applied on historic forest fires data for a district in Riau namely Rokan Hilir to develop a model for forest fires risk. The modeling forest fire risk includes variables related to physical as well as social and economic. The result is a spatial decision tree containing 138 leaves with distance to nearest river as the first test attribute.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128739178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Integrated Weight Linear method to class imbalanced learning in video data 评价视频数据中班级不平衡学习的综合权重线性方法
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976535
Zainal Apandi, N. Mustapha, L. S. Affendey
With the enormous amount of video data especially with the existence of the noisy and irrelevant information, it would be difficult for a typical detection process to capture a small portion of targeted due to the class imbalance problem. In this paper, class imbalance referred to a very small percentage of positive instance versus negative instances, where the negative instances dominate the detection model, resulting in the degradation of the detection performance. This paper proposed an Integrated Weight Linear (IWL) method that integrate weight linear algorithm (WL) with principle component analysis (PCA) to eliminate imbalanced dataset in soccer video data. PCA is adopted in the first phase with the aim to alleviates the imbalanced data and prepared the reduced instances to the next phase. In the second phase, the reduces instances are refined using the weight linear algorithm. The experiment results using 9 soccer video demonstrate that the integration of PCA and WL is capable to alleviates the imbalanced problem and able to improve classification performance in video data.
由于视频数据量巨大,特别是存在噪声和不相关信息,由于类不平衡问题,典型的检测过程很难捕捉到一小部分目标。在本文中,类不平衡是指正实例与负实例的比例非常小,其中负实例主导了检测模型,导致检测性能下降。提出了一种将权重线性算法(WL)与主成分分析(PCA)相结合的加权线性(IWL)方法来消除足球视频数据中的不平衡数据集。在第一阶段采用主成分分析法,目的是为了缓解数据的不平衡,并为下一阶段准备减少的实例。在第二阶段,使用加权线性算法对约简实例进行细化。以9个足球视频为例的实验结果表明,PCA与WL的结合能够缓解视频数据的不平衡问题,提高视频数据的分类性能。
{"title":"Evaluating Integrated Weight Linear method to class imbalanced learning in video data","authors":"Zainal Apandi, N. Mustapha, L. S. Affendey","doi":"10.1109/DMO.2011.5976535","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976535","url":null,"abstract":"With the enormous amount of video data especially with the existence of the noisy and irrelevant information, it would be difficult for a typical detection process to capture a small portion of targeted due to the class imbalance problem. In this paper, class imbalance referred to a very small percentage of positive instance versus negative instances, where the negative instances dominate the detection model, resulting in the degradation of the detection performance. This paper proposed an Integrated Weight Linear (IWL) method that integrate weight linear algorithm (WL) with principle component analysis (PCA) to eliminate imbalanced dataset in soccer video data. PCA is adopted in the first phase with the aim to alleviates the imbalanced data and prepared the reduced instances to the next phase. In the second phase, the reduces instances are refined using the weight linear algorithm. The experiment results using 9 soccer video demonstrate that the integration of PCA and WL is capable to alleviates the imbalanced problem and able to improve classification performance in video data.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129871820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A greedy constructive approach for Nurse Rostering Problem 护士名册问题的贪婪建设性方法
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976532
Mouna Jamom, M. Ayob, Mohammed Hadwan
Nurse Rostering Problem (NRP) concerns about producing a high quality workable duty roster for the available staff nurses. The aim of this work is to present a greedy constructive heuristic algorithm to generate a feasible initial solution by satisfying the hard constraints. Basically the initial solution includes three steps: first we start by designing a group of shift patterns based on hard and soft constraints. Then, those patterns are rotated for predefined positions and allocated to each nurse. Finally; if the solution is not feasible we use a repair mechanism. In this work, a real world problem from Universiti Kebangsaan Malaysia Medical Centre (UKMMC) is used to test the proposed algorithm. The resulting roster demonstrates that our proposed algorithm generates a good quality duty roster in a reasonable computational time for our case study.
护士名册问题(NRP)关注的是如何为现有的护士编制高质量、可操作的值班名册。本文的目的是提出一种贪婪构造启发式算法,通过满足硬约束来生成可行的初始解。基本上,最初的解决方案包括三个步骤:首先,我们基于硬约束和软约束设计一组转换模式。然后,这些模式被旋转到预定义的位置,并分配给每个护士。最后;如果解决方案不可行,我们就使用修复机制。在这项工作中,来自马来西亚Kebangsaan大学医学中心(UKMMC)的一个现实世界问题被用来测试所提出的算法。结果表明,我们提出的算法在合理的计算时间内为我们的案例研究生成了高质量的值班名册。
{"title":"A greedy constructive approach for Nurse Rostering Problem","authors":"Mouna Jamom, M. Ayob, Mohammed Hadwan","doi":"10.1109/DMO.2011.5976532","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976532","url":null,"abstract":"Nurse Rostering Problem (NRP) concerns about producing a high quality workable duty roster for the available staff nurses. The aim of this work is to present a greedy constructive heuristic algorithm to generate a feasible initial solution by satisfying the hard constraints. Basically the initial solution includes three steps: first we start by designing a group of shift patterns based on hard and soft constraints. Then, those patterns are rotated for predefined positions and allocated to each nurse. Finally; if the solution is not feasible we use a repair mechanism. In this work, a real world problem from Universiti Kebangsaan Malaysia Medical Centre (UKMMC) is used to test the proposed algorithm. The resulting roster demonstrates that our proposed algorithm generates a good quality duty roster in a reasonable computational time for our case study.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A semi-cyclic shift patterns approach for nurse rostering problems 护士名册问题的半循环轮班模式方法
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976525
Mohammed Hadwan, M. Ayob
The paper, at hand, introduces a semi-cyclic shift patterns approach (SCSPA) that solves nurse rostering problem (NRP) at the Medical Centre, Universiti Kebangsaan Malaysia (UKMMC). Since night shift is the most problematic shift to assign due to the extra constraints that it has, the paper proposes a combination of semi-cyclic approach, which first allocates a predesigned night shift patterns cyclically, then allocates a combined morning and evening shift patterns in a non-cyclic manner until fulfilling the hard constraints. This is different from our previous work that adopted a non-cyclic shift pattern approach (NCSPA) to construct all of the possible valid shift patterns, which were a combination of morning, evening and night shifts which were incorporated to yield one-week shift patterns. Next, two shift patterns of one-week were allocated for each nurse until construct the initial roster. This paper presents a comparison between the proposed semi-cyclic approach and the previous non-cyclic approach. Beside the minimum violation penalty, we count the number of good patterns that each algorithm produces in order to measure the quality of constructed duty roster. Then, the approach applies simulated annealing algorithm in order to improve the overall produced roster as to enhance the initial roster that resulted from both algorithms. By using a semi-cyclic approach, two benefits over our previous work are gained, (i) the number of constructed shift patterns decreased remarkably, thus reduces the construction time; and (ii) allocating night shift patterns fairly for all nurses becomes more manageable. Based on the obtained results, the semi-cyclic approach yields a better duty roster as it produces more good patterns compared to our previous Non-cyclic approach.
该论文,在手头,介绍了半循环移位模式方法(SCSPA),解决护士名册问题(NRP)在医学中心,马来西亚Kebangsaan大学(UKMMC)。由于夜班是最有问题的轮班分配,由于它有额外的约束,本文提出了半循环方法的组合,首先循环地分配一个预先设计的夜班模式,然后以非循环的方式分配一个组合的早晚轮班模式,直到满足硬约束。这与我们之前的工作不同,我们采用非循环轮班模式方法(NCSPA)来构建所有可能的有效轮班模式,即早上,晚上和夜班的组合,这些组合被纳入到一周的轮班模式中。接下来,为每个护士分配两个为期一周的轮班模式,直到建立初始花名册。本文将所提出的半循环方法与以前的非循环方法进行了比较。除了最小违例处罚外,我们还计算了每个算法产生的好模式的数量,以衡量构建的值班表的质量。然后,该方法采用模拟退火算法来改进生成的总体花名册,以增强两种算法得到的初始花名册。通过使用半循环方法,与我们以前的工作相比,获得了两个好处,(i)构建移位模式的数量显着减少,从而减少了构建时间;(2)为所有护士公平分配夜班模式变得更容易管理。根据所获得的结果,与之前的非循环方法相比,半循环方法产生了更好的任务表,因为它产生了更多好的模式。
{"title":"A semi-cyclic shift patterns approach for nurse rostering problems","authors":"Mohammed Hadwan, M. Ayob","doi":"10.1109/DMO.2011.5976525","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976525","url":null,"abstract":"The paper, at hand, introduces a semi-cyclic shift patterns approach (SCSPA) that solves nurse rostering problem (NRP) at the Medical Centre, Universiti Kebangsaan Malaysia (UKMMC). Since night shift is the most problematic shift to assign due to the extra constraints that it has, the paper proposes a combination of semi-cyclic approach, which first allocates a predesigned night shift patterns cyclically, then allocates a combined morning and evening shift patterns in a non-cyclic manner until fulfilling the hard constraints. This is different from our previous work that adopted a non-cyclic shift pattern approach (NCSPA) to construct all of the possible valid shift patterns, which were a combination of morning, evening and night shifts which were incorporated to yield one-week shift patterns. Next, two shift patterns of one-week were allocated for each nurse until construct the initial roster. This paper presents a comparison between the proposed semi-cyclic approach and the previous non-cyclic approach. Beside the minimum violation penalty, we count the number of good patterns that each algorithm produces in order to measure the quality of constructed duty roster. Then, the approach applies simulated annealing algorithm in order to improve the overall produced roster as to enhance the initial roster that resulted from both algorithms. By using a semi-cyclic approach, two benefits over our previous work are gained, (i) the number of constructed shift patterns decreased remarkably, thus reduces the construction time; and (ii) allocating night shift patterns fairly for all nurses becomes more manageable. Based on the obtained results, the semi-cyclic approach yields a better duty roster as it produces more good patterns compared to our previous Non-cyclic approach.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115927871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A genetic based wrapper feature selection approach using Nearest Neighbour Distance Matrix 基于最近邻距离矩阵的遗传包装特征选择方法
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976534
M. Sainin, R. Alfred
Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications.
数据挖掘优化中的特征选择有很高的要求,特别是对数据的高维特征向量的选择。特征选择是一种为数据选择最佳特征(或特征组合)以达到相似或更好分类率的方法。目前,特征选择方法主要有三种:过滤、包装和嵌入。本文描述了一种基于遗传的包装方法,该方法优化了嵌入在一种称为监督最近邻距离矩阵(NNDM)的分类技术中的特征选择过程。该方法在从UCI机器学习存储库和其他数据集获得的几个数据集上实现和测试。结果表明,特征选择与监督NNDM相结合对新实例分类的预测精度有显著影响。因此,它可以用于其他需要特征降维的应用,如图像和生物信息学分类。
{"title":"A genetic based wrapper feature selection approach using Nearest Neighbour Distance Matrix","authors":"M. Sainin, R. Alfred","doi":"10.1109/DMO.2011.5976534","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976534","url":null,"abstract":"Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133544555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Soft skills recommendation systems for IT jobs: A Bayesian network approach IT工作软技能推荐系统:贝叶斯网络方法
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976509
Azuraini Abu Bakar, Choo-Yee Ting
Today, soft skills are crucial factors to the success of a project. For a certain set of jobs, soft skills are often considered more crucial than the hard skills or technical skills, in order to perform the job effectively. However, it is not a trivial task to identify the appropriate soft skills for each job. In this light, this study proposed a solution to assist employers when preparing advertisement via identification of suitable soft skills together with its relevancy to that particular job title. Bayesian network is employed to solve this problem because it is suitable for reasoning and decision making under uncertainty. The proposed Bayesian Network is trained using a dataset collected via extracting information from advertisements and also through interview sessions with a few identified experts.
如今,软技能是项目成功的关键因素。对于某些特定的工作,为了有效地完成工作,软技能通常被认为比硬技能或技术技能更重要。然而,为每个工作确定合适的软技能并不是一项简单的任务。鉴于此,本研究提出了一个解决方案,以帮助雇主在准备广告时,通过识别合适的软技能,以及其与特定职位的相关性。由于贝叶斯网络适用于不确定情况下的推理和决策,因此采用贝叶斯网络来解决这一问题。所提出的贝叶斯网络使用通过从广告中提取信息收集的数据集进行训练,也通过与一些确定的专家进行访谈。
{"title":"Soft skills recommendation systems for IT jobs: A Bayesian network approach","authors":"Azuraini Abu Bakar, Choo-Yee Ting","doi":"10.1109/DMO.2011.5976509","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976509","url":null,"abstract":"Today, soft skills are crucial factors to the success of a project. For a certain set of jobs, soft skills are often considered more crucial than the hard skills or technical skills, in order to perform the job effectively. However, it is not a trivial task to identify the appropriate soft skills for each job. In this light, this study proposed a solution to assist employers when preparing advertisement via identification of suitable soft skills together with its relevancy to that particular job title. Bayesian network is employed to solve this problem because it is suitable for reasoning and decision making under uncertainty. The proposed Bayesian Network is trained using a dataset collected via extracting information from advertisements and also through interview sessions with a few identified experts.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132000019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Intelligent Web caching using Adaptive Regression Trees, Splines, Random Forests and Tree Net 智能Web缓存使用自适应回归树,样条,随机森林和树网
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976513
Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, A. Abraham
Web caching is a technology for improving network traffic on the internet. It is a temporary storage of Web objects (such as HTML documents) for later retrieval. There are three significant advantages to Web caching; reduced bandwidth consumption, reduced server load, and reduced latency. These rewards have made the Web less expensive with better performance. The aim of this research is to introduce advanced machine learning approaches for Web caching to decide either to cache or not to the cache server, which could be modelled as a classification problem. The challenges include identifying attributes ranking and significant improvements in the classification accuracy. Four methods are employed in this research; Classification and Regression Trees (CART), Multivariate Adaptive Regression Splines (MARS), Random Forest (RF) and TreeNet (TN) are used for classification on Web caching. The experimental results reveal that CART performed extremely well in classifying Web objects from the existing log data and an excellent attribute to consider for an accomplishment of Web cache performance enhancement.
Web缓存是一种改善internet上网络流量的技术。它是用于以后检索的Web对象(如HTML文档)的临时存储。Web缓存有三个显著的优点;减少了带宽消耗、服务器负载和延迟。这些奖励使得Web更便宜,性能更好。本研究的目的是为Web缓存引入先进的机器学习方法,以决定是否缓存到缓存服务器,这可以建模为分类问题。挑战包括识别属性排序和显著提高分类精度。本研究采用了四种方法;分类与回归树(CART)、多元自适应回归样条(MARS)、随机森林(RF)和树网(TN)用于Web缓存的分类。实验结果表明,CART在从现有日志数据中分类Web对象方面表现优异,是实现Web缓存性能增强的一个很好的考虑因素。
{"title":"Intelligent Web caching using Adaptive Regression Trees, Splines, Random Forests and Tree Net","authors":"Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, A. Abraham","doi":"10.1109/DMO.2011.5976513","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976513","url":null,"abstract":"Web caching is a technology for improving network traffic on the internet. It is a temporary storage of Web objects (such as HTML documents) for later retrieval. There are three significant advantages to Web caching; reduced bandwidth consumption, reduced server load, and reduced latency. These rewards have made the Web less expensive with better performance. The aim of this research is to introduce advanced machine learning approaches for Web caching to decide either to cache or not to the cache server, which could be modelled as a classification problem. The challenges include identifying attributes ranking and significant improvements in the classification accuracy. Four methods are employed in this research; Classification and Regression Trees (CART), Multivariate Adaptive Regression Splines (MARS), Random Forest (RF) and TreeNet (TN) are used for classification on Web caching. The experimental results reveal that CART performed extremely well in classifying Web objects from the existing log data and an excellent attribute to consider for an accomplishment of Web cache performance enhancement.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133040425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data mining technique for expertise search in a special interest group knowledge portal 特殊兴趣群体知识门户中专业知识搜索的数据挖掘技术
Pub Date : 2011-06-28 DOI: 10.1109/DMO.2011.5976499
Wan Muhammad Zulhafizsyam Wan Ahmad, S. Sulaiman, U. K. Yusof
The Internet contributes to the development of electronic community (e-community) portals. Such portals become an indispensable platform for members especially for a Special Interest Groups (SIG) to share knowledge and expertise in their respective fields. Finding expertise over the e-community portal will help interested people and researchers to identify other experts, working in the same area. However, it is quite a cumbersome task to search such expertise in the portal. In order to find an expert, expertise data mining could be a solution to ease the search of experts. Performing effective data mining technique will help to analyze and measure expertise level accurately in a SIG portal. This paper proposes a method called Expertise Data Mining (EDM) that comprises a few techniques for expertise search in a SIG portal. It expects to improve the finding of experts among the members of a SIG e-community.
互联网促进了电子社区门户网站的发展。这些门户网站成为成员,特别是特殊兴趣小组(SIG)成员在各自领域分享知识和专业知识的不可或缺的平台。在电子社区门户网站上寻找专业知识将有助于感兴趣的人和研究人员找到在同一领域工作的其他专家。然而,在门户中搜索此类专业知识是一项相当繁琐的任务。为了找到专家,专家数据挖掘可以成为一种简化专家搜索的解决方案。执行有效的数据挖掘技术将有助于在SIG门户中准确地分析和测量专业水平。本文提出了一种专业知识数据挖掘(EDM)方法,该方法包含了SIG门户中专业知识搜索的几种技术。它希望改善SIG电子社区成员中专家的发现。
{"title":"Data mining technique for expertise search in a special interest group knowledge portal","authors":"Wan Muhammad Zulhafizsyam Wan Ahmad, S. Sulaiman, U. K. Yusof","doi":"10.1109/DMO.2011.5976499","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976499","url":null,"abstract":"The Internet contributes to the development of electronic community (e-community) portals. Such portals become an indispensable platform for members especially for a Special Interest Groups (SIG) to share knowledge and expertise in their respective fields. Finding expertise over the e-community portal will help interested people and researchers to identify other experts, working in the same area. However, it is quite a cumbersome task to search such expertise in the portal. In order to find an expert, expertise data mining could be a solution to ease the search of experts. Performing effective data mining technique will help to analyze and measure expertise level accurately in a SIG portal. This paper proposes a method called Expertise Data Mining (EDM) that comprises a few techniques for expertise search in a SIG portal. It expects to improve the finding of experts among the members of a SIG e-community.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114219550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2011 3rd Conference on Data Mining and Optimization (DMO)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1