2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文中文

A Formal Design for the Lexical and Syntax Analyzer of a Pedagogically Effective Subset of C++ 一个教学上有效的c++子集的词法和语法分析器的形式化设计

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0074

M. Farooq, A. Abid, R. Fox

In this article, we have argued that a programming language can be improved for both teaching and learning by extracting its simpler subset, and by enforcing some useful constraints. We have further chosen a well known first programming language C++, and have defined its pedagogically effective subset, named Eazy, for teaching a first course in computer programming, generally known as CS1. In order to enforce the usage of the defined subset and to apply the constraints we need to modify the preprocessor of the language. To this end, we present a formal design for the lexical analyzer, and syntax analyzer for Eazy.

在本文中，我们论证了编程语言可以通过提取其更简单的子集，并通过实施一些有用的约束来改进教学和学习。我们进一步选择了一种著名的第一种编程语言c++，并定义了它在教学上有效的子集Eazy，用于教授计算机编程的第一门课程，通常称为CS1。为了强制使用已定义的子集并应用约束，我们需要修改语言的预处理器。为此，我们提出了词法分析器的形式化设计，以及Eazy的语法分析器。

引用次数: 1

Applying the Meta-heuristic Prediction Algorithm for Modeling Power Density in Wind Power Plant 应用元启发式预测算法建模风电场功率密度

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0079

H. Kahraman, M. Ayaz, I. Colak, R. Bayindir

In this paper, a robust artificial intelligence (AI) algorithm is applied to overcome challenges at power density prediction especially at the installation process of wind power plant. This algorithm also explores relationships between the meteorological parameters and power density. Importance degree of parameters on power density is converted numerical weighting values independently from each other. Thus, the effects of the wind speed, the wind direction, the temperature, the damp, the pressure on power density could be modelled. Besides, experimental study shows that the prediction accuracy and stability of the applied method superior than traditional AI-based techniques.

本文提出了一种鲁棒人工智能(AI)算法来克服功率密度预测方面的挑战，特别是在风力发电厂安装过程中。该算法还探讨了气象参数与功率密度之间的关系。各参数对功率密度的重要程度转换为相互独立的数值权重值。因此，风速、风向、温度、湿度、压力对功率密度的影响可以建模。此外，实验研究表明，该方法的预测精度和稳定性优于传统的人工智能技术。

引用次数: 2

Consensus Clustering: A Resampling-Based Method for Building Radiation Hybrid Maps 一致聚类:一种基于重采样的辐射混合地图构建方法

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0047

Raed I. Seetan, J. Bible, Michael Karavias, Wael Seitan, S. Thangiah

Building Radiation Hybrid (RH) maps is a challenging process. Traditional RH mapping techniques are very time consuming, and do not work well on noisy datasets. In this presented research, we propose a new approach that uses resampling technique with consensus clustering technique to filter out unreliable markers, and build robust RH maps in a short time. The main aims of using the proposed approach is: first to reduce the mapping computational complexity, thus speeding up the mapping process. And second, to filter out unreliable markers, and map the remaining reliable markers to build robust maps. The proposed approach maps RH datasets in four steps, as follows: 1) uses Jackknife resampling technique to resample the RH dataset, and groups all resampled datasets into clusters. 2) Builds consensus clusters and filters out unreliable markers. 3) Maps the consensus clusters. 4) Connects the consensus clusters' maps to form the final map. To demonstrate the performance of our proposed approach, we compare the accuracy of the constructed maps with the corresponding physical maps. Also, we compare the running time of our constructed maps with the Carthagene tool maps running time. The results show that the proposed approach can construct robust maps in a comparatively very short time.

构建辐射混合(RH)地图是一个具有挑战性的过程。传统的RH映射技术非常耗时，并且不能很好地处理有噪声的数据集。在本研究中，我们提出了一种新的方法，使用重采样技术和一致聚类技术来过滤不可靠的标记，并在短时间内建立鲁棒的RH图。采用该方法的主要目的是:首先降低映射的计算复杂度，从而加快映射过程。其次，过滤掉不可靠的标记，并绘制剩余的可靠标记以构建健壮的地图。该方法分四个步骤对RH数据集进行映射:1)利用Jackknife重采样技术对RH数据集进行重采样，并将所有重采样数据集聚类。2)建立共识聚类，过滤掉不可靠的标记。3)映射共识集群。4)将共识集群的地图连接起来，形成最终的地图。为了证明我们提出的方法的性能，我们将构造地图的精度与相应的物理地图进行了比较。此外，我们将我们构建的地图的运行时间与迦太基工具地图的运行时间进行了比较。结果表明，该方法可以在较短的时间内构造出鲁棒地图。

{"title":"Consensus Clustering: A Resampling-Based Method for Building Radiation Hybrid Maps","authors":"Raed I. Seetan, J. Bible, Michael Karavias, Wael Seitan, S. Thangiah","doi":"10.1109/ICMLA.2016.0047","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0047","url":null,"abstract":"Building Radiation Hybrid (RH) maps is a challenging process. Traditional RH mapping techniques are very time consuming, and do not work well on noisy datasets. In this presented research, we propose a new approach that uses resampling technique with consensus clustering technique to filter out unreliable markers, and build robust RH maps in a short time. The main aims of using the proposed approach is: first to reduce the mapping computational complexity, thus speeding up the mapping process. And second, to filter out unreliable markers, and map the remaining reliable markers to build robust maps. The proposed approach maps RH datasets in four steps, as follows: 1) uses Jackknife resampling technique to resample the RH dataset, and groups all resampled datasets into clusters. 2) Builds consensus clusters and filters out unreliable markers. 3) Maps the consensus clusters. 4) Connects the consensus clusters' maps to form the final map. To demonstrate the performance of our proposed approach, we compare the accuracy of the constructed maps with the corresponding physical maps. Also, we compare the running time of our constructed maps with the Carthagene tool maps running time. The results show that the proposed approach can construct robust maps in a comparatively very short time.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125328813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Deep Learning Anomaly Detection as Support Fraud Investigation in Brazilian Exports and Anti-Money Laundering 深度学习异常检测支持巴西出口欺诈调查和反洗钱

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0172

Ebberth L. Paula, M. Ladeira, Rommel N. Carvalho, Thiago Marzagão

Normally exports of goods and products are transactions encouraged by the governments of countries. Typically these incentives are promoted by tax exemptions or lower tax collections. However, exports fraud may occur with objectives not related to tax evasion, for example money laundering. This article presents the results obtained in implementing the unsupervised Deep Learning model to classify Brazilian exporters regarding the possibility of committing fraud in exports. Assuming that the vast majority of exporters have explanatory features of their export volume which interrelate in a standard way, we used the AutoEncoder to detect anomalous situations with regards to the data pattern. The databases used in this work come from exports of goods and products that occurred in Brazil in 2014, provided by the Secretariat of Federal Revenue of Brazil. From attributes that characterize export companies, the model was able to detect anomalies in at least twenty exporters.

通常情况下，货物和产品的出口是各国政府鼓励的交易。通常，这些激励措施是通过免税或降低税收来促进的。但是，出口欺诈的目的可能与逃税无关，例如洗钱。本文介绍了实现无监督深度学习模型对巴西出口商进行出口欺诈可能性分类的结果。假设绝大多数出口商都有其出口量的解释性特征，这些特征以标准的方式相互关联，我们使用AutoEncoder来检测与数据模式相关的异常情况。本工作中使用的数据库来自巴西2014年发生的货物和产品出口，由巴西联邦税收秘书处提供。从出口公司的特征属性中，该模型能够发现至少20个出口商的异常情况。

引用次数: 97

Hourly Solar Irradiance Forecasting Based on Machine Learning Models 基于机器学习模型的每小时太阳辐照度预测

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0078

F. Melzi, Taieb Touati, A. Samé, L. Oukhellou

In recent years, many research studies are conducted into the use of smart meters data for developping decision-making tools including both analytical, forecasting and display purposes. Forecasting energy generation or forecasting energy consumption demand are indeed central problems for urban stakeholders (electricity companies and urban planners). These issues are helpful to allow them ensuring an efficient planning and optimization of energy resources. This paper investigates the problem for forecasting the hourly solar irradiance within a Machine Learning (ML) framework using Similarity method (SIM), Support Vector Machine (SVM) and Neural Network (NN). These approaches rely on a methodology which takes into account the previous hours of the predicting day and also the days having the same number of sunshine hours in the history. The study is conducted on a real data set collected on the Paris suburb of Alfortville. A comparison with two time series approaches namely Naive method and Autoregressive Moving Average Model (ARMA) is performed. This study is the first step towards the development of the hourly solar irradiance forecasting hybrid models.

近年来，许多研究都是利用智能电表数据来开发决策工具，包括分析、预测和显示目的。预测能源生产或预测能源消费需求确实是城市利益相关者(电力公司和城市规划者)的核心问题。这些问题有助于确保能源资源的有效规划和优化。本文研究了在机器学习(ML)框架下使用相似度方法(SIM)、支持向量机(SVM)和神经网络(NN)预测每小时太阳辐照度的问题。这些方法依赖于一种方法，该方法考虑了预测日的前几个小时以及历史上具有相同日照时数的日子。这项研究是在巴黎郊区阿尔福特维尔收集的真实数据集上进行的。并与朴素法和自回归移动平均模型(ARMA)两种时间序列方法进行了比较。本研究是开发逐时太阳辐照度预报混合模式的第一步。

{"title":"Hourly Solar Irradiance Forecasting Based on Machine Learning Models","authors":"F. Melzi, Taieb Touati, A. Samé, L. Oukhellou","doi":"10.1109/ICMLA.2016.0078","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0078","url":null,"abstract":"In recent years, many research studies are conducted into the use of smart meters data for developping decision-making tools including both analytical, forecasting and display purposes. Forecasting energy generation or forecasting energy consumption demand are indeed central problems for urban stakeholders (electricity companies and urban planners). These issues are helpful to allow them ensuring an efficient planning and optimization of energy resources. This paper investigates the problem for forecasting the hourly solar irradiance within a Machine Learning (ML) framework using Similarity method (SIM), Support Vector Machine (SVM) and Neural Network (NN). These approaches rely on a methodology which takes into account the previous hours of the predicting day and also the days having the same number of sunshine hours in the history. The study is conducted on a real data set collected on the Paris suburb of Alfortville. A comparison with two time series approaches namely Naive method and Autoregressive Moving Average Model (ARMA) is performed. This study is the first step towards the development of the hourly solar irradiance forecasting hybrid models.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126125679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Automatic Algorithm Selection in Computational Software Using Machine Learning 基于机器学习的计算软件自动算法选择

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0064

M. Simpson, Qing Yi, J. Kalita

Computational software programs, such as Maple and Mathematica, heavily rely on superfunctions and meta-algorithms to select the optimal algorithm for a given task. These meta-algorithms may require intensive mathematical proof to formulate, incur large computational overhead, or fail to consistently select the best algorithm. Machine learning demonstrates a promising alternative for automatic algorithm selection by easing the design process and overhead while also attaining high accuracy in selection. In a case study on the resultant superfunction, a trained neural network is able to select the best algorithm out of the four available 86% of the time in Maple and 78% of the time in Mathematica. When used as a replacement for pre-existing meta-algorithms, the neural network brings about a 68% runtime improvement in Maple and 49% improvement in Mathematica. Random forests, k-nearest neighbors, and both linear and RBF kernel SVMs are also compared to the neural network model, the latter of which offers the best performance out of the tested machine learning methods.

计算软件程序，如Maple和Mathematica，严重依赖超函数和元算法来为给定任务选择最佳算法。这些元算法可能需要大量的数学证明来制定，产生大量的计算开销，或者不能始终如一地选择最佳算法。机器学习通过简化设计过程和开销，同时获得较高的选择准确性，证明了自动算法选择的一个有前途的替代方案。在对结果超函数的案例研究中，经过训练的神经网络能够从四种可用算法中选择出最佳算法，Maple的准确率为86%，Mathematica的准确率为78%。当使用神经网络替代已有的元算法时，Maple的运行时间提高了68%，Mathematica的运行时间提高了49%。随机森林、k近邻、线性核支持向量机和RBF核支持向量机也与神经网络模型进行了比较，后者在测试的机器学习方法中提供了最好的性能。

引用次数: 3

Bag of Bags: Nested Multi Instance Classification for Prostate Cancer Detection 袋中的袋:前列腺癌检测的嵌套多实例分类

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0032

F. Khalvati, Junjie Zhang, A. Wong, M. Haider

Computer-aided detection (CAD) algorithms have been proposed for auto-detection of different types of cancer. CAD algorithms rely on machine learning methods to classify regions of interest in images into cancerous and healthy regions. In cancer screening, the foremost problem to solve is whether a patient has cancer, regardless of the location of cancerous regions in the organ. This allows early detection of the disease leading to a right course of action in terms of treatment to be taken. In machine learning, this problem has been formulated as multi-instance learning (MIL) where bags of instances are classified rather than the individual instances. In this paper, we propose a bag of bags (BoB) nested MIL algorithm where high-level bags (or parent bags), each contains multiple smaller bags of instances. We applied the proposed BoB MIL algorithm to prostate cancer detection problem using magnetic resonance imaging data to first detect which patients have cancer and consequently, to detect which slices in the 3D volume imaging data of the detected patients contain cancerous regions. Experimental results obtained from the imaging data of 30 patients with ground-truth data based on biopsy results show that the proposed algorithm is not only capable of detecting prostate cancer at patient level, it is also able to detect the cancerous regions at slice level of imaging data with high accuracy.

计算机辅助检测(CAD)算法已被提出用于自动检测不同类型的癌症。CAD算法依靠机器学习方法将图像中感兴趣的区域分为癌变区域和健康区域。在癌症筛查中，要解决的首要问题是患者是否患有癌症，而不管癌变区域在器官中的位置如何。这样就可以及早发现疾病，从而采取正确的治疗措施。在机器学习中，这个问题被表述为多实例学习(MIL)，其中对大量实例进行分类而不是对单个实例进行分类。在本文中，我们提出了一个包的包(BoB)嵌套MIL算法，其中高级包(或父包)，每个包含多个较小的实例包。我们将提出的BoB MIL算法应用于前列腺癌检测问题，首先利用磁共振成像数据检测哪些患者患有癌症，然后检测被检测患者的三维体成像数据中哪些切片包含癌区。基于活检结果的30例患者影像数据的实验结果表明，该算法不仅能够在患者水平上检测前列腺癌，而且能够在影像数据的切片水平上检测癌区，准确率较高。

{"title":"Bag of Bags: Nested Multi Instance Classification for Prostate Cancer Detection","authors":"F. Khalvati, Junjie Zhang, A. Wong, M. Haider","doi":"10.1109/ICMLA.2016.0032","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0032","url":null,"abstract":"Computer-aided detection (CAD) algorithms have been proposed for auto-detection of different types of cancer. CAD algorithms rely on machine learning methods to classify regions of interest in images into cancerous and healthy regions. In cancer screening, the foremost problem to solve is whether a patient has cancer, regardless of the location of cancerous regions in the organ. This allows early detection of the disease leading to a right course of action in terms of treatment to be taken. In machine learning, this problem has been formulated as multi-instance learning (MIL) where bags of instances are classified rather than the individual instances. In this paper, we propose a bag of bags (BoB) nested MIL algorithm where high-level bags (or parent bags), each contains multiple smaller bags of instances. We applied the proposed BoB MIL algorithm to prostate cancer detection problem using magnetic resonance imaging data to first detect which patients have cancer and consequently, to detect which slices in the 3D volume imaging data of the detected patients contain cancerous regions. Experimental results obtained from the imaging data of 30 patients with ground-truth data based on biopsy results show that the proposed algorithm is not only capable of detecting prostate cancer at patient level, it is also able to detect the cancerous regions at slice level of imaging data with high accuracy.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124322360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Spatial Dependency and Hedonic Housing Regression Model 空间依赖与享乐性住房回归模型

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0097

T. Oladunni, Sharad Sharma

The location of a real estate property has a considerable impact on its appraised value. Accounting for geograph-ical information eliminates some reducible errors in the accuracy of a hedonic housing regression model. An im-proved performance will benefit home buyers, sellers, government and real estate professionals. This paper investigates the spatial dependency and substitutability of submarket and geospatial attributes in a hedonic housing regression model using mutual information (MI) and variance inflation factor (VIF). Best subset linear regression and regression tree predictive models were built as learning algorithms. Bayesian Information Criterion (BIC) and Residual Mean Deviance (RDM) measured the performance of the linear regression and regression trees respectively. The BIC of the linear regression model indicated a best fit at 14 and 11 variables for submarket and geospatial models respectively. Optimization of the submarket tree was attained with 9 parameters comprising of 15 terminal nodes, while 7 parameters comprising of 13 terminal nodes achieved optimization in the geospa-tial tree. While geospatial models have a slight edge over the submarket model, the experiment suggested the substi-tutability of the models. The dataset consisted of single family's homes in 8 counties between January and De-cember 2006 extracted from the Multiple Listing Service repository.

房地产的位置对其评估价值有相当大的影响。考虑地理信息消除了享乐住房回归模型精度中的一些可减少的误差。改善的表现将使购房者、卖家、政府和房地产专业人士受益。本文利用互信息(MI)和方差膨胀因子(VIF)研究了享乐住宅回归模型中子市场属性和地理空间属性的空间依赖性和可替代性。建立了最佳子集线性回归和回归树预测模型作为学习算法。贝叶斯信息准则(BIC)和残差平均偏差(RDM)分别衡量线性回归和回归树的性能。线性回归模型的BIC分别在14个变量和11个变量下最适合子市场和地理空间模型。子市场树共包含9个参数，共包含15个终端节点;地理空间树共包含7个参数，共包含13个终端节点。虽然地理空间模型比子市场模型有轻微的优势，但实验表明模型具有可替代性。该数据集包括2006年1月至12月期间8个县的单户住宅，提取自Multiple Listing Service存储库。

{"title":"Spatial Dependency and Hedonic Housing Regression Model","authors":"T. Oladunni, Sharad Sharma","doi":"10.1109/ICMLA.2016.0097","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0097","url":null,"abstract":"The location of a real estate property has a considerable impact on its appraised value. Accounting for geograph-ical information eliminates some reducible errors in the accuracy of a hedonic housing regression model. An im-proved performance will benefit home buyers, sellers, government and real estate professionals. This paper investigates the spatial dependency and substitutability of submarket and geospatial attributes in a hedonic housing regression model using mutual information (MI) and variance inflation factor (VIF). Best subset linear regression and regression tree predictive models were built as learning algorithms. Bayesian Information Criterion (BIC) and Residual Mean Deviance (RDM) measured the performance of the linear regression and regression trees respectively. The BIC of the linear regression model indicated a best fit at 14 and 11 variables for submarket and geospatial models respectively. Optimization of the submarket tree was attained with 9 parameters comprising of 15 terminal nodes, while 7 parameters comprising of 13 terminal nodes achieved optimization in the geospa-tial tree. While geospatial models have a slight edge over the submarket model, the experiment suggested the substi-tutability of the models. The dataset consisted of single family's homes in 8 counties between January and De-cember 2006 extracted from the Multiple Listing Service repository.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Learning Fairness under Constraints: A Decentralized Resource Allocation Game 约束下的学习公平:一个分散的资源分配博弈

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0043

Qinyun Zhu, J. Oh

We study multi-type resource allocation in multi-agent system, where some constraints are enforced upon resource providers and users. These constraints are limitations of resource types and connection availabilities, which may make the collaboration between agents infeasible. We discuss the notion of distributed resource fairness under these constraints. Then we propose a game theory and reinforcement learning based solution for collaborative resource allocation, so that resources are assigned to users fairly and tasks are assigned to resource agents efficiently. We utilize data from Google data center as our input to simulations. Results show that our learning approach outperforms a greedy and random explorations in terms of resource utilization and fairness.

研究了多智能体系统中对资源提供者和用户施加约束的多类型资源分配问题。这些约束是资源类型和连接可用性的限制，可能会使代理之间的协作变得不可行的。我们在这些约束下讨论了分布式资源公平的概念。在此基础上，提出了一种基于博弈论和强化学习的协同资源分配方案，使资源公平地分配给用户，任务高效地分配给资源代理。我们利用谷歌数据中心的数据作为模拟的输入。结果表明，我们的学习方法在资源利用率和公平性方面优于贪婪和随机的探索方法。

引用次数: 4

Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs 计算系统日志异常检测的文本与数字数据关联综合

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Pub Date : 2016-12-01 DOI: 10.1109/ICMLA.2016.0158

Elisabeth Baseman, S. Blanchard, Zongze Li, Song Fu

Monitoring high performance computing systems has become increasingly difficult as researchers and system analysts face the challenge of synthesizing a wide range of monitoring information in order to detect system problems on ever larger machines. We present a method for anomaly detection on syslog data, one of the most important data streams for determining system health. Syslog messages pose a difficult question for analysis because they include a mix of structured natural language text as well as numeric values. We present an anomaly detection framework that combines graph analysis, relational learning, and kernel density estimation to detect unusual syslog messages. We design an event block detector, which finds groups of related syslog messages, to retrieve the entire section of syslog messages associated with a single anomalous line. Our novel approach successfully retrieves anomalous behaviors inserted into syslog files from a virtual machine, including messages indicating serious system problems. We also test our approach on syslog messages from the Trinity supercomputer and find that our methods do not generate significant false positives.

监控高性能计算系统已经变得越来越困难，因为研究人员和系统分析人员面临着综合各种监控信息以检测越来越大的机器上的系统问题的挑战。我们提出了一种对系统日志数据进行异常检测的方法，syslog数据是确定系统健康状况的最重要的数据流之一。Syslog消息给分析带来了一个难题，因为它们既包含结构化的自然语言文本，也包含数值。我们提出了一个异常检测框架，它结合了图分析、关系学习和核密度估计来检测异常的syslog消息。我们设计了一个事件块检测器，它查找相关的syslog消息组，以检索与单个异常行相关的syslog消息的整个部分。我们的新方法成功地检索了从虚拟机插入到syslog文件中的异常行为，包括指示严重系统问题的消息。我们还在Trinity超级计算机的syslog消息上测试了我们的方法，发现我们的方法没有产生明显的误报。

{"title":"Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs","authors":"Elisabeth Baseman, S. Blanchard, Zongze Li, Song Fu","doi":"10.1109/ICMLA.2016.0158","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0158","url":null,"abstract":"Monitoring high performance computing systems has become increasingly difficult as researchers and system analysts face the challenge of synthesizing a wide range of monitoring information in order to detect system problems on ever larger machines. We present a method for anomaly detection on syslog data, one of the most important data streams for determining system health. Syslog messages pose a difficult question for analysis because they include a mix of structured natural language text as well as numeric values. We present an anomaly detection framework that combines graph analysis, relational learning, and kernel density estimation to detect unusual syslog messages. We design an event block detector, which finds groups of related syslog messages, to retrieve the entire section of syslog messages associated with a single anomalous line. Our novel approach successfully retrieves anomalous behaviors inserted into syslog files from a virtual machine, including messages indicating serious system problems. We also test our approach on syslog messages from the Trinity supercomputer and find that our methods do not generate significant false positives.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131562027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀