2012 4th Conference on Data Mining and Optimization (DMO)最新文献

英文中文

Meaningless to meaningful Web log data for generation of Web pre-caching decision rules using Rough Set 将无意义的Web日志数据转化为有意义的Web日志数据，利用粗糙集生成Web预缓存决策规则

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329804

Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, Nor Bahiah Hj. Ahmad, A. Abraham

Web caching and pre-fetching are vital technologies that can increase the speed of Web loading processes. Since speed and memory are crucial aspects in enhancing the performance of mobile applications and websites, a better technique for Web loading process should be investigated. The weaknesses of the conventional Web caching policy include meaningless information and uncertainty of knowledge representation in Web logs data from the proxy cache to mobile-client. The organisation and learning task of the knowledge-processing for Web logs data require explicit representation to deal with uncertainties. This is due to the exponential growth of rules for finding a suitable knowledge representation from the proxy cache to the mobileclient. Consequently, Rough Set is chosen in this research to generate Web pre-caching decision rules to ensure the meaningless Web log data can be changed to meaningful information.

Web缓存和预取是可以提高Web加载过程速度的重要技术。由于速度和内存是增强移动应用程序和网站性能的关键方面，因此应该研究一种更好的Web加载过程技术。传统的Web缓存策略存在从代理缓存到移动客户端的Web日志数据信息无意义、知识表示不确定等缺点。Web日志数据知识处理的组织和学习任务需要明确的表示来处理不确定性。这是由于从代理缓存到移动客户端寻找合适的知识表示的规则呈指数增长。因此，本研究选择粗糙集来生成Web预缓存决策规则，以确保将无意义的Web日志数据转换为有意义的信息。

引用次数: 6

A Differential Evolution Algorithm for the University course timetabling problem 大学课程排课问题的差分进化算法

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329805

Khalid Shaker, S. Abdullah, A. Hatem

The University course timetabling problem is known as a NP-hard problem. It is a complex problem wherein the problem size can become huge due to limited resources (e.g. amount of rooms, their capacities and number availability of lecturers) and the requirements for these resources. The university course timetabling problem involves assigning a given number of events to a limited number of timeslots and rooms under a given set of constraints; the objective is to satisfy the hard constraints and minimize the violation of soft constraints. In this paper, a Differential Evolution (DE) algorithm is proposed. DE algorithm relies on the mutation operation to reduce the convergence time while reducing the penalty cost of solution. The proposed algorithm is tested over eleven benchmark datasets (representing one large, five medium and five small problems). Experimental results show that our approach is able to generate competitive results when compared with previous available approaches. Possible extensions upon this simple approach are also discussed.

大学课程排课问题被称为np困难问题。这是一个复杂的问题，由于有限的资源(例如，房间的数量，它们的容量和讲师的可用数量)以及对这些资源的需求，问题的规模可能会变得巨大。大学课程时间表问题涉及到在一组给定的约束条件下，将给定数量的事件分配到有限数量的时间段和房间;目标是满足硬约束和最小化对软约束的违反。本文提出了一种差分进化(DE)算法。DE算法依靠变异运算来缩短收敛时间，同时降低求解的惩罚代价。该算法在11个基准数据集(代表一个大、五个中、五个小问题)上进行了测试。实验结果表明，与现有的方法相比，我们的方法能够产生具有竞争力的结果。本文还讨论了这种简单方法的可能扩展。

引用次数: 5

Topic detections in Arabic Dark websites using improved Vector Space Model 基于改进向量空间模型的阿拉伯语Dark网站主题检测

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329790

H. Alghamdi, Ali Selamat

Terrorist group's forums remain a threat for all web users. It stills need to be inspired with algorithms to detect the informative contents. In this paper, we investigate most discussed topics on Arabic Dark Web forums. Arabic Textual contents extracted from selected Arabic Dark Web forums. Vector Space Model (VSM) used as text representation with two different term weighing schemas, Term Frequency (TF) and Term Frequency - Inverse Document Frequency (TF-IDF). Pre-processing phase plays a significant role in processing extracted terms. That consists of filtering, tokenization and stemming. Stemming step is based on proposed stemmer without a root dictionary. Using one of the well-know clustering algorithm k-means to cluster of the terms. The experimental results were presented and showed the most shared terms between the selected forums.

恐怖组织的论坛仍然对所有网络用户构成威胁。它仍然需要启发算法来检测信息内容。在本文中，我们调查了阿拉伯暗网论坛上讨论最多的话题。阿拉伯文文本内容提取自选定的阿拉伯文暗网论坛。使用向量空间模型(VSM)作为文本表示，使用两种不同的术语加权模式，术语频率(TF)和术语频率-逆文档频率(TF- idf)。预处理阶段在提取项的处理中起着重要的作用。这包括过滤、标记化和词干提取。词干提取步骤是基于建议的词干，而不需要根字典。使用一种著名的聚类算法k-means对词条进行聚类。给出了实验结果，并显示了所选论坛之间共享最多的术语。

引用次数: 28

WebSum: Enhanced SumBasic algorithm for Web site summarization WebSum:用于网站摘要的增强SumBasic算法

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329812

Jason Yong-Jin Tee, Lay-Ki Soon, Choo-Yee Ting

Due to the rapid increase of information in the World Wide Web, there exists an explosion of information on the Web that may overwhelm the common Web user. The Web user may find it quicker or more efficient to browse the Web by reading summaries of Web sites. This paper proposes WebSum to compress Web site content into a summary. WebSum is an enhancement of the SumBasic algorithm, that was mainly used for multi-document summarization. In the case of Web sites, we find that several Web characteristics such as title and keywords can be used to extract sentences that may represent the overall topic of the Web site. Initial results show that WebSum is able to reveal sentences relate to the concept of the Web site. WebSum is then evaluated against the original algorithm of SumBasic.

由于万维网中信息的快速增长，万维网上的信息呈爆炸式增长，可能使普通的网络用户不堪重负。Web用户可能会发现通过阅读Web站点的摘要来浏览Web更快或更有效。本文提出WebSum将网站内容压缩为摘要。WebSum是SumBasic算法的增强版，主要用于多文档摘要。在Web站点的例子中，我们发现可以使用一些Web特征，如标题和关键字来提取可能代表Web站点整体主题的句子。初步结果表明，WebSum能够揭示与网站概念相关的句子。然后根据SumBasic的原始算法对WebSum进行计算。

引用次数: 0

Discovering frequent serial episodes in symbolic sequences for rainfall dataset 在降雨数据集的符号序列中发现频繁的连续事件

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329809

A. Ahmed, A. Bakar, A. Hamdan, Sharifah Mastura Syed Abdullah, O. Jaafar

Serial episode is a type of temporal frequent pattern in time series. Many different algorithms have been proposed to discover different types of episodes for different applications. In this paper we propose an algorithm for discovering frequent episodes from processed rain fall data. The algorithm is based on three main steps. (1) The rainfall data is first represented in symbolic representation (2) Then numbers of events are detected by applying sliding window for segmentation and CBR for classification. (3)Finally the processed rain fall data is passed through mining phase. Frequent algorithm is used to discover frequent episodes with fixed width. The experiment shows that many frequent episodes with different structure in different years are extracted.

连续事件是时间序列中的一种时间频繁模式。针对不同的应用，人们提出了许多不同的算法来发现不同类型的剧集。本文提出了一种从处理过的降雨数据中发现频繁事件的算法。该算法基于三个主要步骤。(1)首先对降雨数据进行符号表示;(2)然后采用滑动窗口分割和CBR分类的方法检测事件数。(3)最后将处理后的降雨量数据通过挖掘阶段。频繁算法用于发现固定宽度的频繁事件。实验表明，该方法提取出了不同年份、不同结构的频繁事件。

引用次数: 3

Multi-parent insertion crossover for vehicle routing problem with time windows 带时间窗车辆路径问题的多父插入交叉

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329806

E. T. Yassen, M. Ayob, M. Nazri, Nasser R. Sabar

Multi parent crossover has been successfully applied to solve many combinatorial optimization problems such as unconstrained binary quadratic programming problem (UBQP). This because using more than two parents has increased the intensification process by exploiting the information shared by multi parents. However not all type of crossovers are suitable to solve vehicle routing problem (VRP). Therefore, this work introduces a multi parent insertion crossover in solving vehicle routing problem with time windows (VRPTW) by enhancing two parent insertion crossovers. This crossover exchange information among three parents instead of two. Result tested on Solomon VRPTW benchmarks demonstrate that multi parent crossover outperformed two parent crossover on same instances. This prove the effectiveness of having more parents for crossover that can be help the search to find better quality solution.

多父交叉已成功地应用于求解无约束二元二次规划问题等组合优化问题。这是因为使用两个以上的父母通过利用多个父母共享的信息增加了集约化过程。然而，并不是所有类型的交叉路口都适合解决车辆路径问题。因此，本文提出了一种多父插入交叉算法，通过对双父插入交叉算法的改进，来解决带时间窗的车辆路径问题。这种交叉在三个父母而不是两个父母之间交换信息。在Solomon VRPTW基准测试上测试的结果表明，在相同的实例上，多父交叉优于双父交叉。这证明了让更多的家长进行跨界学习的有效性，可以帮助寻找更好的解决方案。

引用次数: 4

A Direct Ensemble Classifier for Imbalanced Multiclass Learning 不平衡多类学习的直接集成分类器

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329799

M. Sainin, R. Alfred

Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks. Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain. Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers. In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers. The DECIML framework has been tested with several benchmark dataset and shows promising results.

研究表明，虽然传统的直接分类器算法可以很容易地应用于多类分类，但由于多类分类任务中数据不平衡的存在，单个分类器的性能下降。因此，针对数据挖掘和机器学习领域的不平衡问题，分类器集成已成为多类分类任务的研究热点之一。集成学习是一种有效的技术，越来越多地被用于结合多种学习算法来提高整体预测精度，并且可能优于任何单一的复杂分类器。本文提出了一种将简单近邻算法与朴素贝叶斯算法相结合的集成学习算法——直接集成分类器，用于不平衡多类学习。一种称为or树的组合方法用于组合从集成分类器获得的决策。使用几个基准数据集对DECIML框架进行了测试，并显示出令人满意的结果。

引用次数: 8

Solving flexible manufacturing system distributed scheduling problem subject to maintenance using harmony search algorithm 用和谐搜索算法求解柔性制造系统分布式维护调度问题

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329801

M. Khalid, U. K. Yusof, Maziani Sabudin

Flexible manufacturing system is one of the industrial branches that highly competitive and rapidly expand. Globalization of the industrial system has encouraged the development of distributed manufacturing, including flexible manufacturing system. As such, the complexity of the problem faced in this new environment promotes current researcher to develop various approaches in optimizing the production scheduling. Approaches such as petri net, ant colony, genetic algorithm, intelligent agents, particle swarm optimization, and tabu search are used to apprehend optimization issues. In reality, maintenance is one of the core parts which is important to the manufacturing scheduling as it will affect greatly toward the manufacturing scheduling when the machine breakdown happen. Unfortunately, most approaches disregard the preventive maintenance in the production scheduling problem. In this paper, a harmony search algorithm is introduced to address the problem which includes maintenance. The problem description is successfully represented and the algorithm performance is studied with several parameter tunings.

柔性制造系统是目前竞争激烈、发展迅速的工业分支之一。工业体系的全球化促进了包括柔性制造系统在内的分布式制造的发展。因此，在这种新环境下所面临的问题的复杂性促使目前的研究人员开发各种优化生产调度的方法。petri网、蚁群、遗传算法、智能代理、粒子群优化和禁忌搜索等方法被用来理解优化问题。在现实生活中，维修是制造调度的核心环节之一，在机器发生故障时，维修对制造调度的影响很大。不幸的是，大多数方法都忽视了生产调度问题中的预防性维护。本文提出了一种和谐搜索算法来解决包括维护在内的问题。成功地表达了问题描述，并通过几个参数调优研究了算法的性能。

{"title":"Solving flexible manufacturing system distributed scheduling problem subject to maintenance using harmony search algorithm","authors":"M. Khalid, U. K. Yusof, Maziani Sabudin","doi":"10.1109/DMO.2012.6329801","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329801","url":null,"abstract":"Flexible manufacturing system is one of the industrial branches that highly competitive and rapidly expand. Globalization of the industrial system has encouraged the development of distributed manufacturing, including flexible manufacturing system. As such, the complexity of the problem faced in this new environment promotes current researcher to develop various approaches in optimizing the production scheduling. Approaches such as petri net, ant colony, genetic algorithm, intelligent agents, particle swarm optimization, and tabu search are used to apprehend optimization issues. In reality, maintenance is one of the core parts which is important to the manufacturing scheduling as it will affect greatly toward the manufacturing scheduling when the machine breakdown happen. Unfortunately, most approaches disregard the preventive maintenance in the production scheduling problem. In this paper, a harmony search algorithm is introduced to address the problem which includes maintenance. The problem description is successfully represented and the algorithm performance is studied with several parameter tunings.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133544214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A feature selection model for binary classification of imbalanced data based on preference for target instances 基于目标实例偏好的不平衡数据二值分类特征选择模型

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329795

D. Tan, S. Liew, T. Tan, W. Yeoh

Telemarketers of online job advertising firms face significant challenges understanding the advertising demands of small-sized enterprises. The effective use of data mining approach can offer e-recruitment companies an improved understanding of customers' patterns and greater insights of purchasing trends. However, prior studies on classifier built by data mining approach provided limited insights into the customer targeting problem of job advertising companies. In this paper we develop a single feature evaluator and propose an approach to select a desired feature subset by setting a threshold. The proposed feature evaluator demonstrates its stability and outstanding performance through empirical experiments in which real-world customer data of an e-recruitment firm are used. Practically, the findings together with the model may help telemarketers to better understand their customers. Theoretically, this paper extends existing research on feature selection for binary classification of imbalanced data.

在线招聘广告公司的电话营销人员面临着理解小型企业广告需求的重大挑战。数据挖掘方法的有效使用可以让电子招聘公司更好地了解客户模式，更深入地了解购买趋势。然而，以往基于数据挖掘方法构建分类器的研究对招聘广告公司的客户定位问题提供的见解有限。在本文中，我们开发了一个单特征评估器，并提出了一种通过设置阈值来选择所需特征子集的方法。通过对某电子招聘公司真实客户数据的实证实验，证明了所提出的特征评估器的稳定性和突出的性能。实际上，这些发现和模型可以帮助电话营销人员更好地了解他们的客户。在理论上，本文扩展了已有的针对不平衡数据二分类的特征选择研究。

引用次数: 4

K-means clustering pre-analysis for fault diagnosis in an aluminium smelting process 基于k -均值聚类预分析的铝冶炼过程故障诊断

2012 4th Conference on Data Mining and Optimization (DMO)

Pub Date : 2012-10-15 DOI: 10.1109/DMO.2012.6329796

NA Abd Majid, B. Young, M. Taylor, John J. J. Chen

Developing a fault detection and diagnosis system of complex processes usually involve large volumes of highly correlated data. In the complex aluminium smelting process, there are difficulties in isolating historical data into different classes of faults for developing a fault diagnostic model. This paper presents a new application of using a data mining tool, k-means clustering in order to determine precisely how data corresponds to different classes of faults in the aluminium smelting process. The results of applying the clustering technique on real data sets show that the boundary of each class of faults can be identified. This means the faulty data can be isolated accurately to enable for the development of a fault diagnostic model that can diagnose faults effectively.

开发复杂过程的故障检测和诊断系统通常涉及大量高度相关的数据。在复杂的铝冶炼过程中，将历史数据分离成不同类型的故障以建立故障诊断模型存在困难。本文介绍了利用数据挖掘工具k-means聚类的一种新应用，以精确确定数据如何对应于铝冶炼过程中不同类别的故障。将聚类技术应用于实际数据集的结果表明，该类故障的边界可以被识别出来。这意味着可以准确地隔离故障数据，以便开发故障诊断模型，从而有效地诊断故障。

引用次数: 6

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 4th Conference on Data Mining and Optimization (DMO)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀