首页 > 最新文献

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)最新文献

英文 中文
A Graph Resilience Metric Based On Paths: Higher Order Analytics With GPU 基于路径的图形弹性度量:基于GPU的高阶分析
G. Drakopoulos, Xenophon Liapakis, Giannis Tzimas, Phivos Mylonas
Structural resilience is an inherent, paramount property of real world, massive, scale free graphs such as those typically encountered in brain networks, protein-to-protein interaction diagrams, logistics and supply chains, as well as social media among others. This means that in case a small fraction of edges or even vertices with their incident edges are deleted, then alternative, although possibly longer, paths can be found such that the overall graph connectivity remains intact. This durability, which is constantly exhibited in nature, can be attributed to three main reasons. First, almost by construction, scale free graphs have a relatively high density. Moreover, they have a short diameter or at least an effective diameter. Finally, scale free graphs are recursively built on communities. As a consequence, the effect of a few edge or even vertex deletions inside a community remains isolated there as a rule and the effects of deletion are thus negated. Ultimately these properties stem from the degree distribution. In this conference paper is proposed a new, generic, and scalable graph resilience metric which relies on the weighted sum of the number of paths crossing certain vertices of great communication and structural value. Finally, the CUDA implementation is discussed and compared to a serial one in mex. The metric performance is assessed in terms of total computational time and parallelism.
结构弹性是现实世界中一个固有的、最重要的属性,它是一个巨大的、无尺度的图形,比如那些在大脑网络、蛋白质到蛋白质的相互作用图、物流和供应链以及社交媒体等中经常遇到的图形。这意味着,如果删除一小部分边,甚至删除带有关联边的顶点,那么可以找到替代路径,尽管可能更长,从而使整个图的连通性保持完整。这种在自然界中不断表现出来的耐久性可以归结为三个主要原因。首先,几乎就构造而言,无标度图具有相对较高的密度。此外,它们具有短直径或至少具有有效直径。最后,在群体上递归地构建无标度图。因此,在一个群落中,一些边缘甚至顶点的删除的影响通常是孤立的,因此删除的影响被否定了。这些性质最终源于度分布。本文提出了一种新的、通用的、可扩展的图弹性度量方法,该度量方法依赖于通过具有较大通信和结构价值的某些顶点的路径数的加权和。最后,讨论了CUDA实现,并将其与串行实现进行了比较。度量性能是根据总计算时间和并行度来评估的。
{"title":"A Graph Resilience Metric Based On Paths: Higher Order Analytics With GPU","authors":"G. Drakopoulos, Xenophon Liapakis, Giannis Tzimas, Phivos Mylonas","doi":"10.1109/ICTAI.2018.00138","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00138","url":null,"abstract":"Structural resilience is an inherent, paramount property of real world, massive, scale free graphs such as those typically encountered in brain networks, protein-to-protein interaction diagrams, logistics and supply chains, as well as social media among others. This means that in case a small fraction of edges or even vertices with their incident edges are deleted, then alternative, although possibly longer, paths can be found such that the overall graph connectivity remains intact. This durability, which is constantly exhibited in nature, can be attributed to three main reasons. First, almost by construction, scale free graphs have a relatively high density. Moreover, they have a short diameter or at least an effective diameter. Finally, scale free graphs are recursively built on communities. As a consequence, the effect of a few edge or even vertex deletions inside a community remains isolated there as a rule and the effects of deletion are thus negated. Ultimately these properties stem from the degree distribution. In this conference paper is proposed a new, generic, and scalable graph resilience metric which relies on the weighted sum of the number of paths crossing certain vertices of great communication and structural value. Finally, the CUDA implementation is discussed and compared to a serial one in mex. The metric performance is assessed in terms of total computational time and parallelism.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130693545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interpreting Social Media-Based Substance Use Prediction Models with Knowledge Distillation 用知识蒸馏解释基于社交媒体的物质使用预测模型
Tao Ding, Fatema Hasan, W. Bickel, Shimei Pan
People nowadays spend a significant amount of time on social media such as Twitter, Facebook, and Instagram. As a result, social media data capture rich human behavioral evidence that can be used to help us understand their thoughts, behavior and decision making process. Social media data, however, are mostly unstructured (e.g., text and images) and may involve a large number of raw features (e.g., millions of raw text and image features). Moreover, the ground truth data about human behavior and decision making could be difficult to obtain at a large scale. As a result, most state-of-the-art social media-based human behavior models employ sophisticated unsupervised feature learning to leverage a large amount of unsupervised data. Unfortunately, these advanced models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important for behavior scientists, public health providers as well as policymakers, in this research, we focus on employing a knowledge distillation framework to build machine learning models with not only state-of-the-art predictive performance but also interpretable results. We evaluate the effectiveness of the proposed framework in explaining Substance Use Disorder (SUD) prediction models. Our best models achieved 87% ROC AUC for predicting tobacco use, 84% for alcohol use and 93% for drug use, which are comparable to existing state-of-the-art SUD prediction models. Since these models are also interpretable (e.g., a logistics regression model and a gradient boosting tree model), we combine the results from these models to gain insight into the relationship between a user's social media behavior (e.g., social media likes and word usage) and substance use.
如今,人们在Twitter、Facebook和Instagram等社交媒体上花费了大量时间。因此,社交媒体数据捕获了丰富的人类行为证据,可以用来帮助我们理解他们的想法、行为和决策过程。然而,社交媒体数据大多是非结构化的(例如,文本和图像),可能涉及大量的原始特征(例如,数百万个原始文本和图像特征)。此外,关于人类行为和决策的真实数据可能很难大规模获得。因此,大多数最先进的基于社交媒体的人类行为模型采用复杂的无监督特征学习来利用大量的无监督数据。不幸的是,这些高级模型往往依赖于难以解释的潜在特征。由于理解这些模型中捕获的知识对于行为科学家,公共卫生提供者以及政策制定者非常重要,因此在本研究中,我们专注于使用知识蒸馏框架来构建机器学习模型,不仅具有最先进的预测性能,而且具有可解释的结果。我们评估了所提出的框架在解释物质使用障碍(SUD)预测模型中的有效性。我们的最佳模型预测烟草使用的ROC AUC为87%,预测酒精使用的ROC AUC为84%,预测药物使用的ROC AUC为93%,与现有最先进的SUD预测模型相当。由于这些模型也是可解释的(例如,逻辑回归模型和梯度增强树模型),我们将这些模型的结果结合起来,以深入了解用户的社交媒体行为(例如,社交媒体点赞和用词)与物质使用之间的关系。
{"title":"Interpreting Social Media-Based Substance Use Prediction Models with Knowledge Distillation","authors":"Tao Ding, Fatema Hasan, W. Bickel, Shimei Pan","doi":"10.1109/ICTAI.2018.00100","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00100","url":null,"abstract":"People nowadays spend a significant amount of time on social media such as Twitter, Facebook, and Instagram. As a result, social media data capture rich human behavioral evidence that can be used to help us understand their thoughts, behavior and decision making process. Social media data, however, are mostly unstructured (e.g., text and images) and may involve a large number of raw features (e.g., millions of raw text and image features). Moreover, the ground truth data about human behavior and decision making could be difficult to obtain at a large scale. As a result, most state-of-the-art social media-based human behavior models employ sophisticated unsupervised feature learning to leverage a large amount of unsupervised data. Unfortunately, these advanced models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important for behavior scientists, public health providers as well as policymakers, in this research, we focus on employing a knowledge distillation framework to build machine learning models with not only state-of-the-art predictive performance but also interpretable results. We evaluate the effectiveness of the proposed framework in explaining Substance Use Disorder (SUD) prediction models. Our best models achieved 87% ROC AUC for predicting tobacco use, 84% for alcohol use and 93% for drug use, which are comparable to existing state-of-the-art SUD prediction models. Since these models are also interpretable (e.g., a logistics regression model and a gradient boosting tree model), we combine the results from these models to gain insight into the relationship between a user's social media behavior (e.g., social media likes and word usage) and substance use.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133939890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
SocialFan: Integrating Social Networks Into Recommender Systems SocialFan:将社交网络整合到推荐系统中
B. Díaz-Agudo, Guillermo Jiménez-Díaz, J. A. Recio-García
Social systems by their definition encourage interaction between users and both on-line content and other users thus generating new sources of knowledge that is valuable for recommender systems. In this paper we deal with the situation of having a recommender system where, even if a social structure implicitly exist, its users are not explicitly connected through a social network. We describe SocialFan, a domain independent tool that allows defining and integrating the social network infrastructure to capture and use the social knowledge into an existing recommender system.
根据其定义,社会系统鼓励用户与在线内容和其他用户之间的互动,从而产生对推荐系统有价值的新知识来源。在本文中,我们处理推荐系统的情况,即使隐式存在社会结构,其用户也没有通过社会网络显式连接。我们描述了SocialFan,一个独立于领域的工具,它允许定义和集成社交网络基础设施,以捕获和使用社会知识到现有的推荐系统中。
{"title":"SocialFan: Integrating Social Networks Into Recommender Systems","authors":"B. Díaz-Agudo, Guillermo Jiménez-Díaz, J. A. Recio-García","doi":"10.1109/ICTAI.2018.00035","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00035","url":null,"abstract":"Social systems by their definition encourage interaction between users and both on-line content and other users thus generating new sources of knowledge that is valuable for recommender systems. In this paper we deal with the situation of having a recommender system where, even if a social structure implicitly exist, its users are not explicitly connected through a social network. We describe SocialFan, a domain independent tool that allows defining and integrating the social network infrastructure to capture and use the social knowledge into an existing recommender system.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Neural Network Specialists for Inverse Spiral Inductor Design 反螺旋电感器设计的神经网络专家
N. Dervenis, Georgios Alexandridis, A. Stafylopatis
Integrated spiral inductors are a fundamental part of Radio-Frequency (RF) circuits. In certain scenarios, a solution to the inverse spiral inductor design problem is required; given the desired properties of an inductor, locate the most suitable geometric characteristics. This problem does not have a unique solution and current approaches approximate it through a number of differential equations and the subsequent application of optimization techniques that narrow down the set of feasible solutions. In this work, the Neural Network Specialists model is outlined; a preliminary approach to solving the aforementioned problem using fully connected neural network models. The obtained results on a first round of experiments are encouraging, especially in terms of the reduction in time complexity.
集成螺旋电感是射频(RF)电路的基本组成部分。在某些情况下,需要解决反螺旋电感的设计问题;给定电感器所需的特性,找出最合适的几何特性。这个问题没有唯一的解,目前的方法是通过一些微分方程和随后的优化技术的应用来缩小可行解的范围。在这项工作中,概述了神经网络专家模型;一种利用全连接神经网络模型解决上述问题的初步方法。在第一轮实验中获得的结果是令人鼓舞的,特别是在降低时间复杂度方面。
{"title":"Neural Network Specialists for Inverse Spiral Inductor Design","authors":"N. Dervenis, Georgios Alexandridis, A. Stafylopatis","doi":"10.1109/ICTAI.2018.00020","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00020","url":null,"abstract":"Integrated spiral inductors are a fundamental part of Radio-Frequency (RF) circuits. In certain scenarios, a solution to the inverse spiral inductor design problem is required; given the desired properties of an inductor, locate the most suitable geometric characteristics. This problem does not have a unique solution and current approaches approximate it through a number of differential equations and the subsequent application of optimization techniques that narrow down the set of feasible solutions. In this work, the Neural Network Specialists model is outlined; a preliminary approach to solving the aforementioned problem using fully connected neural network models. The obtained results on a first round of experiments are encouraging, especially in terms of the reduction in time complexity.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129316231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bike Usage Forecasting for Optimal Rebalancing Operations in Bike-Sharing Systems 自行车共享系统中最优再平衡操作的自行车使用预测
Simon Ruffieux, E. Mugellini, Omar Abou Khaled
This article presents the first step of a project focusing on enhancing the management of bike-sharing systems. The objective of the project is to optimize the daily rebalancing operations that need to be performed by operators of bike-sharing systems using machine-learning algorithms and constraint programming. This study presents an evaluation of machine learning algorithms developed for forecasting the availability of bikes on three Swiss bike-sharing networks. The results demonstrate the superiority of the Multi-Layer Perceptron algorithm for forecasting available bikes at station-level for different prediction horizons and its applicability for real-time prediction generation.
本文介绍了一个项目的第一步,重点是加强共享单车系统的管理。该项目的目标是使用机器学习算法和约束编程优化共享单车系统操作员需要执行的日常再平衡操作。本研究对机器学习算法进行了评估,该算法用于预测瑞士三个共享单车网络上的自行车可用性。结果表明,多层感知器算法在不同预测视野下对站点级可用自行车进行预测的优越性及其在实时预测生成中的适用性。
{"title":"Bike Usage Forecasting for Optimal Rebalancing Operations in Bike-Sharing Systems","authors":"Simon Ruffieux, E. Mugellini, Omar Abou Khaled","doi":"10.1109/ICTAI.2018.00133","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00133","url":null,"abstract":"This article presents the first step of a project focusing on enhancing the management of bike-sharing systems. The objective of the project is to optimize the daily rebalancing operations that need to be performed by operators of bike-sharing systems using machine-learning algorithms and constraint programming. This study presents an evaluation of machine learning algorithms developed for forecasting the availability of bikes on three Swiss bike-sharing networks. The results demonstrate the superiority of the Multi-Layer Perceptron algorithm for forecasting available bikes at station-level for different prediction horizons and its applicability for real-time prediction generation.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Supervised Data Synthesizing and Evolving – A Framework for Real-World Traffic Crash Severity Classification 监督数据综合与演化——现实世界交通碰撞严重程度分类的框架
Yi He, Di Wu, Ege Beyazit, Xiaoduan Sun, Xindong Wu
Traffic crashes have threatened properties and lives for more than thirty years. Thanks to the recent proliferation of traffic data, the machine learning techniques have been broadly expected to make contributions in the traffic safety community due to their triumphs in many other domains. Among these contributions, the most cited method is to classify traffic crashes in different severities since they have significantly unequal occurrences and costs. However, considering the complexity of transportation system, the traffic data are usually highly imbalanced and lowly separable (HILS), so that few proposed works report satisfactory results. In this paper, we propose a novel framework to deal with the HILS traffic crash data. The framework comprises two parts. In part I, a novel Supervised Data Synthesizing and Evolving algorithm is proposed, which can properly represent the HILS data into a more balanced and separable form without altering the original data distribution. In part II, the details of a customized Multi-Layer Perceptron (MLP) are presented, serving the purpose of learning from the represented data with fast convergence and high accuracy. A real-world traffic crash dataset, as a benchmark, is employed to evaluate the classification performances of our framework and three state-of-the-art imbalanced learning algorithms. The experimental results validate that our framework significantly outperforms the other algorithms. Moreover, the impacts of various parameter settings are studied and discussed
30多年来,交通事故一直威胁着财产和生命。由于最近交通数据的激增,人们普遍期望机器学习技术在交通安全领域做出贡献,因为它们在许多其他领域取得了成功。在这些贡献中,被引用最多的方法是对不同严重程度的交通事故进行分类,因为它们的发生率和成本明显不相等。然而,考虑到交通系统的复杂性,交通数据通常是高度不平衡和低可分离的(HILS),因此很少有建议的工作报告令人满意的结果。在本文中,我们提出了一个新的框架来处理HILS交通碰撞数据。该框架由两部分组成。在第一部分中,提出了一种新的监督数据综合与进化算法,该算法在不改变原始数据分布的情况下,将HILS数据恰当地表示为更加平衡和可分离的形式。在第二部分中,介绍了自定义多层感知器(MLP)的细节,以快速收敛和高精度的方式从表示的数据中学习。以现实世界的交通碰撞数据集为基准,评估了我们的框架和三种最先进的不平衡学习算法的分类性能。实验结果表明,我们的框架明显优于其他算法。此外,还对各种参数设置的影响进行了研究和讨论
{"title":"Supervised Data Synthesizing and Evolving – A Framework for Real-World Traffic Crash Severity Classification","authors":"Yi He, Di Wu, Ege Beyazit, Xiaoduan Sun, Xindong Wu","doi":"10.1109/ICTAI.2018.00034","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00034","url":null,"abstract":"Traffic crashes have threatened properties and lives for more than thirty years. Thanks to the recent proliferation of traffic data, the machine learning techniques have been broadly expected to make contributions in the traffic safety community due to their triumphs in many other domains. Among these contributions, the most cited method is to classify traffic crashes in different severities since they have significantly unequal occurrences and costs. However, considering the complexity of transportation system, the traffic data are usually highly imbalanced and lowly separable (HILS), so that few proposed works report satisfactory results. In this paper, we propose a novel framework to deal with the HILS traffic crash data. The framework comprises two parts. In part I, a novel Supervised Data Synthesizing and Evolving algorithm is proposed, which can properly represent the HILS data into a more balanced and separable form without altering the original data distribution. In part II, the details of a customized Multi-Layer Perceptron (MLP) are presented, serving the purpose of learning from the represented data with fast convergence and high accuracy. A real-world traffic crash dataset, as a benchmark, is employed to evaluate the classification performances of our framework and three state-of-the-art imbalanced learning algorithms. The experimental results validate that our framework significantly outperforms the other algorithms. Moreover, the impacts of various parameter settings are studied and discussed","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115673599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection 基于严重不平衡大数据的医疗欺诈检测数据采样方法
Richard A. Bauder, T. Khoshgoftaar, Tawfiq Hasanin
Class imbalance is an important problem in machine learning. With increases in available information and the growing use of Big Data sources to extract meaning from data, the challenges associated with class imbalance continue to influence research and shape business value. In this paper, we focus on using highly imbalanced Big Data from Medicare to detect provider claims fraud. We combine three Medicare parts and generate fraud labels using real-world excluded providers. The number of known fraudulent providers is very small, with 0.062% of the combined dataset being labeled as fraud, indicating severe class imbalance. To address class imbalance concerns, we provide experimental results incorporating six different data sampling methods (undersampling and oversampling) to create datasets for five class ratios (imbalanced to balanced), as well as using the full dataset (with no sampling). Three state-of-the-art machine learning models with Apache Spark are used to assess Medicare fraud detection performance across data sampling methods and class ratios. We demonstrate that data sampling, in particular random undersampling, presents good results across all learners, whereas oversampling provides no benefit versus models built using the full dataset.
类失衡是机器学习中的一个重要问题。随着可用信息的增加和越来越多地使用大数据源从数据中提取意义,与阶级不平衡相关的挑战继续影响研究和塑造商业价值。在本文中,我们着重于使用来自医疗保险的高度不平衡的大数据来检测提供者索赔欺诈。我们将三个医疗保险部分结合起来,并使用现实世界中排除的提供者生成欺诈标签。已知欺诈提供者的数量非常少,合并数据集的0.062%被标记为欺诈,这表明严重的类别不平衡。为了解决类不平衡问题,我们提供了包含六种不同数据采样方法(欠采样和过采样)的实验结果,以创建五个类比例(不平衡与平衡)的数据集,并使用完整的数据集(没有采样)。使用Apache Spark的三个最先进的机器学习模型来评估跨数据采样方法和类别比率的医疗保险欺诈检测性能。我们证明了数据采样,特别是随机欠采样,在所有学习器中都呈现出良好的结果,而过采样与使用完整数据集构建的模型相比没有任何好处。
{"title":"Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection","authors":"Richard A. Bauder, T. Khoshgoftaar, Tawfiq Hasanin","doi":"10.1109/ICTAI.2018.00030","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00030","url":null,"abstract":"Class imbalance is an important problem in machine learning. With increases in available information and the growing use of Big Data sources to extract meaning from data, the challenges associated with class imbalance continue to influence research and shape business value. In this paper, we focus on using highly imbalanced Big Data from Medicare to detect provider claims fraud. We combine three Medicare parts and generate fraud labels using real-world excluded providers. The number of known fraudulent providers is very small, with 0.062% of the combined dataset being labeled as fraud, indicating severe class imbalance. To address class imbalance concerns, we provide experimental results incorporating six different data sampling methods (undersampling and oversampling) to create datasets for five class ratios (imbalanced to balanced), as well as using the full dataset (with no sampling). Three state-of-the-art machine learning models with Apache Spark are used to assess Medicare fraud detection performance across data sampling methods and class ratios. We demonstrate that data sampling, in particular random undersampling, presents good results across all learners, whereas oversampling provides no benefit versus models built using the full dataset.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122249775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
GAMBAD: A Method for Developing Systems of Systems GAMBAD:开发系统的系统的方法
Gregory Moro Puppi Wanderley, Marie-Hélène Abel, E. Paraiso, J. Barthès
Despite the great number of Systems of Systems (SoS) being developed, building them still remains hard and difficult. Currently, there is a lack of methods capable of supporting architects for building an actual SoS. In this paper we introduce an original method called GAMBAD for developing an SoS from a practical point of view. Our method guides the development of SoS on top of a multi-agent layer supported by ontologies. We tested GAMBAD by building an SoS in the domain of Health Care. Early results show that by using our method, architects can develop an SoS faster and more accurately.
尽管大量的系统的系统(SoS)正在被开发,构建它们仍然是困难的。目前,缺乏能够支持架构师构建实际so的方法。在本文中,我们从实用的角度介绍了一种叫做GAMBAD的原始方法来开发SoS。我们的方法在本体支持的多代理层之上指导SoS的开发。我们通过在医疗保健领域建立SoS来测试GAMBAD。早期的结果表明,通过使用我们的方法,架构师可以更快、更准确地开发SoS。
{"title":"GAMBAD: A Method for Developing Systems of Systems","authors":"Gregory Moro Puppi Wanderley, Marie-Hélène Abel, E. Paraiso, J. Barthès","doi":"10.1109/ICTAI.2018.00127","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00127","url":null,"abstract":"Despite the great number of Systems of Systems (SoS) being developed, building them still remains hard and difficult. Currently, there is a lack of methods capable of supporting architects for building an actual SoS. In this paper we introduce an original method called GAMBAD for developing an SoS from a practical point of view. Our method guides the development of SoS on top of a multi-agent layer supported by ontologies. We tested GAMBAD by building an SoS in the domain of Health Care. Early results show that by using our method, architects can develop an SoS faster and more accurately.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124330114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Shilling Attack Based on Bayesian Model and User Embedding 基于贝叶斯模型和用户嵌入的Shilling攻击检测
Fan Yang, Min Gao, Junliang Yu, Yuqi Song, Xinyi Wang
The recommendation systems have been widely employed due to the effectiveness on mitigating the information overload issue. At present, the recommendation systems have made great progress, but they are under the threat of shilling attack because of their open nature. Shilling attack is the way by which the attackers can manipulate the recommendation results and cause great harm to recommendation systems. Existing shilling attack detection models are mainly based on statistical measures to extract features like the rating deviation, which are generally susceptible to attack strategies. Once the attacker changes attack strategy, the detection model which is based on the statistical method may fail. Some researchers have identified that implicit features hidden in user-user interactions and user-item interactions can be utilized to solve the problem. Their research aims to learn potential relationship between users to update features. However, the research ignores the significance of learning features by employing label information. To solve this problem, in this paper, we propose a novel detection model, named BayesDetector, which takes not only the user-user and user-item interactions but also the label information into consideration in the process of learning user implicit features. Furthermore, to take full advantage of user labels, the Bayesian model is added to the feature learning. Experiments on two datasets, Amazon and Movielens, show that BayesDetector significantly outperforms the state-of-the-art methods.
推荐系统由于能够有效地缓解信息过载问题而得到了广泛的应用。目前,推荐系统已经取得了很大的进步,但由于其开放性,也面临着先令攻击的威胁。先令攻击是攻击者操纵推荐结果,对推荐系统造成极大危害的一种攻击方式。现有的先令攻击检测模型主要是基于统计度量来提取评级偏差等特征,这些特征通常容易受到攻击策略的影响。一旦攻击者改变攻击策略,基于统计方法的检测模型可能会失效。一些研究者已经发现隐藏在用户-用户交互和用户-物品交互中的隐式特征可以用来解决这个问题。他们的研究旨在了解用户之间更新功能的潜在关系。然而,该研究忽略了使用标签信息学习特征的意义。为了解决这一问题,本文提出了一种新的检测模型BayesDetector,该模型在学习用户隐式特征的过程中不仅考虑了用户-用户和用户-物品的交互,还考虑了标签信息。为了充分利用用户标签,在特征学习中加入了贝叶斯模型。在Amazon和Movielens两个数据集上的实验表明,BayesDetector明显优于最先进的方法。
{"title":"Detection of Shilling Attack Based on Bayesian Model and User Embedding","authors":"Fan Yang, Min Gao, Junliang Yu, Yuqi Song, Xinyi Wang","doi":"10.1109/ICTAI.2018.00102","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00102","url":null,"abstract":"The recommendation systems have been widely employed due to the effectiveness on mitigating the information overload issue. At present, the recommendation systems have made great progress, but they are under the threat of shilling attack because of their open nature. Shilling attack is the way by which the attackers can manipulate the recommendation results and cause great harm to recommendation systems. Existing shilling attack detection models are mainly based on statistical measures to extract features like the rating deviation, which are generally susceptible to attack strategies. Once the attacker changes attack strategy, the detection model which is based on the statistical method may fail. Some researchers have identified that implicit features hidden in user-user interactions and user-item interactions can be utilized to solve the problem. Their research aims to learn potential relationship between users to update features. However, the research ignores the significance of learning features by employing label information. To solve this problem, in this paper, we propose a novel detection model, named BayesDetector, which takes not only the user-user and user-item interactions but also the label information into consideration in the process of learning user implicit features. Furthermore, to take full advantage of user labels, the Bayesian model is added to the feature learning. Experiments on two datasets, Amazon and Movielens, show that BayesDetector significantly outperforms the state-of-the-art methods.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123277440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Implementing Fuzzy Cognitive Maps with Neural Networks for Natural Gas Prediction 用神经网络实现模糊认知地图在天然气预测中的应用
Katarzyna Poczeta, E. Papageorgiou
The goal of this research study is to test the hardiness of a novel hybrid computational intelligence model in day-ahead natural gas demand prediction. The proposed model combines an evolutionary learned FCM method with a common ANN to construct a cascaded model that leads to high prediction accuracy in most distribution points. The FCM technique is used to provide a model which concepts are used as input nodes in a second-stage ANN model employed to provide the forecast for each gas time series. Learned by structure optimization genetic algorithm, the FCM outputs are fed into an ANN to refine the initial forecast and upgrade the overall forecasting accuracy. The model is applied to five distribution points that compose the natural gas grid of a Greek region, district of Thessaly. This approach enables the comparison of the hybrid model performance on different FCM and ANN structures and on consumption patterns, providing also insights on the characteristics of large urban centers and small towns
本研究的目的是测试一种新型混合计算智能模型在未来几天天然气需求预测中的适应性。该模型将一种进化学习的FCM方法与一种通用的人工神经网络相结合,构建了一个级联模型,在大多数分布点具有较高的预测精度。FCM技术用于提供一个模型,该模型的概念被用作第二阶段人工神经网络模型的输入节点,该模型用于提供每个气体时间序列的预测。通过结构优化遗传算法学习,将FCM输出输入到人工神经网络中,对初始预测进行细化,提高整体预测精度。该模型应用于构成希腊色萨利地区天然气网的五个配气点。这种方法可以比较混合模型在不同FCM和ANN结构以及消费模式上的性能,也提供了对大城市中心和小城镇特征的见解
{"title":"Implementing Fuzzy Cognitive Maps with Neural Networks for Natural Gas Prediction","authors":"Katarzyna Poczeta, E. Papageorgiou","doi":"10.1109/ICTAI.2018.00158","DOIUrl":"https://doi.org/10.1109/ICTAI.2018.00158","url":null,"abstract":"The goal of this research study is to test the hardiness of a novel hybrid computational intelligence model in day-ahead natural gas demand prediction. The proposed model combines an evolutionary learned FCM method with a common ANN to construct a cascaded model that leads to high prediction accuracy in most distribution points. The FCM technique is used to provide a model which concepts are used as input nodes in a second-stage ANN model employed to provide the forecast for each gas time series. Learned by structure optimization genetic algorithm, the FCM outputs are fed into an ANN to refine the initial forecast and upgrade the overall forecasting accuracy. The model is applied to five distribution points that compose the natural gas grid of a Greek region, district of Thessaly. This approach enables the comparison of the hybrid model performance on different FCM and ANN structures and on consumption patterns, providing also insights on the characteristics of large urban centers and small towns","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122188267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1