首页 > 最新文献

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
A Hybrid Machine Learning Approach for Planning Safe Trajectories in Complex Traffic-Scenarios 混合机器学习方法在复杂交通场景下的安全轨迹规划
Amit Chaulwar, M. Botsch, W. Utschick
Planning of safe trajectories with interventions in both lateral and longitudinal dynamics of vehicles has huge potential for increasing the road traffic safety. Main challenges for the development of such algorithms are the consideration of vehicle nonholonomic constraints and the efficiency in terms of implementation, so that algorithms run in real time in a vehicle. The recently introduced Augmented CL-RRT algorithm is an approach that uses analytical models for trajectory planning based on the brute force evaluation of many longitudinal acceleration profiles to find collision-free trajectories. The algorithm considers nonholonomic constraints of the vehicle in complex road traffic scenarios with multiple static and dynamic objects, but it requires a lot of computation time. This work proposes a hybrid machine learning approach for predicting suitable acceleration profiles in critical traffic scenarios, so that only few acceleration profiles are used with the Augmented CL-RRT to find a safe trajectory while reducing the computation time. This is realized using a convolutional neural network variant, introduced as 3D-ConvNet, which learns spatiotemporal features from a sequence of predicted occupancy grids generated from predictions of other road traffic participants. These learned features together with hand-designed features of the EGO vehicle are used to predict acceleration profiles. Simulations are performed to compare the brute force approach with the proposed approach in terms of efficiency and safety. The results show vast improvement in terms of efficiency without harming safety. Additionally, an extension to the Augmented CL-RRT algorithm is introduced for finding a trajectory with low severity of injury, if a collision is already unavoidable.
通过干预车辆的横向和纵向动力学来规划安全轨迹对于提高道路交通安全具有巨大的潜力。这类算法的发展面临的主要挑战是考虑车辆非完整约束和实现效率,使算法在车辆中实时运行。最近推出的Augmented CL-RRT算法是一种利用分析模型进行轨迹规划的方法,该方法基于对许多纵向加速度剖面的蛮力评估来找到无碰撞轨迹。该算法考虑了具有多个静态和动态目标的复杂道路交通场景中车辆的非完整约束,但需要大量的计算时间。这项工作提出了一种混合机器学习方法,用于预测关键交通场景中合适的加速度曲线,从而在减少计算时间的同时,仅使用少量加速度曲线与增强CL-RRT一起找到安全轨迹。这是使用卷积神经网络变体3D-ConvNet实现的,该网络从其他道路交通参与者的预测生成的预测占用网格序列中学习时空特征。这些学习到的特征与EGO车辆的手动设计特征一起用于预测加速度曲线。通过仿真比较了蛮力方法与所提方法在效率和安全性方面的差异。结果显示,在不损害安全的情况下,在效率方面有了巨大的提高。此外,还引入了对Augmented CL-RRT算法的扩展,用于在碰撞已经不可避免的情况下寻找伤害程度较低的轨迹。
{"title":"A Hybrid Machine Learning Approach for Planning Safe Trajectories in Complex Traffic-Scenarios","authors":"Amit Chaulwar, M. Botsch, W. Utschick","doi":"10.1109/ICMLA.2016.0095","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0095","url":null,"abstract":"Planning of safe trajectories with interventions in both lateral and longitudinal dynamics of vehicles has huge potential for increasing the road traffic safety. Main challenges for the development of such algorithms are the consideration of vehicle nonholonomic constraints and the efficiency in terms of implementation, so that algorithms run in real time in a vehicle. The recently introduced Augmented CL-RRT algorithm is an approach that uses analytical models for trajectory planning based on the brute force evaluation of many longitudinal acceleration profiles to find collision-free trajectories. The algorithm considers nonholonomic constraints of the vehicle in complex road traffic scenarios with multiple static and dynamic objects, but it requires a lot of computation time. This work proposes a hybrid machine learning approach for predicting suitable acceleration profiles in critical traffic scenarios, so that only few acceleration profiles are used with the Augmented CL-RRT to find a safe trajectory while reducing the computation time. This is realized using a convolutional neural network variant, introduced as 3D-ConvNet, which learns spatiotemporal features from a sequence of predicted occupancy grids generated from predictions of other road traffic participants. These learned features together with hand-designed features of the EGO vehicle are used to predict acceleration profiles. Simulations are performed to compare the brute force approach with the proposed approach in terms of efficiency and safety. The results show vast improvement in terms of efficiency without harming safety. Additionally, an extension to the Augmented CL-RRT algorithm is introduced for finding a trajectory with low severity of injury, if a collision is already unavoidable.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122264999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Automatic Species Recognition Based on Improved Birdsong Analysis 基于改进鸟鸣分析的物种自动识别
Joshua Knapp, Guangzhi Qu, Feng Zhang
This work seeks to improve upon the accuracy of birdsong analysis based species recognition. We intend to accomplish this by creating a more effective bird syllable segmentation algorithms (MIRS), Support Vector machine based classifiers are used to train the features of IRS and MIRS. The experimental results show the effectiveness of the proposed algorithm.
这项工作旨在提高基于物种识别的鸟鸣分析的准确性。我们打算通过创建更有效的鸟类音节分割算法(MIRS)来实现这一目标,使用基于支持向量机的分类器来训练IRS和MIRS的特征。实验结果表明了该算法的有效性。
{"title":"Automatic Species Recognition Based on Improved Birdsong Analysis","authors":"Joshua Knapp, Guangzhi Qu, Feng Zhang","doi":"10.1109/ICMLA.2016.0037","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0037","url":null,"abstract":"This work seeks to improve upon the accuracy of birdsong analysis based species recognition. We intend to accomplish this by creating a more effective bird syllable segmentation algorithms (MIRS), Support Vector machine based classifiers are used to train the features of IRS and MIRS. The experimental results show the effectiveness of the proposed algorithm.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131367950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Preference Aware Recommendation Based on Categorical Information 基于分类信息的偏好感知推荐
Zhiwei Rao, Jiangchao Yao, Ya Zhang, Rui Zhang
Contextual aware matrix factorization has been widely used in recommender systems by learning latent feature vectors of users and items along with contextual information. While most of them add identical bias for each type of side information to represent systematic tendencies in users' rating behaviors, they are not able to capture the preference unique to users or items. In this paper, we propose a probabilistic generative model which allows the bias to vary among different types of users or items. We first use Gaussian Mixture Components to cluster the users (or items) based on corresponding latent feature vectors respectively. Biases are then distributed on these clusters along with categorical side information. Finally, they are jointed with latent feature vectors of the users and items to affect the generation of observed ratings. Experiments on MovieLens-100K and MovieLens-1M data sets have shown promising results compared with state-of-the-art contextual aware recommendation approaches. We also qualitatively analyze the preferences of users and items and demonstrate differences in preference among both users and items.
上下文感知矩阵分解通过学习用户和物品的潜在特征向量以及上下文信息,在推荐系统中得到了广泛的应用。虽然它们中的大多数都为每种类型的附加信息添加了相同的偏差,以表示用户评分行为的系统倾向,但它们无法捕捉到用户或物品的独特偏好。在本文中,我们提出了一个概率生成模型,该模型允许偏差在不同类型的用户或项目之间变化。我们首先使用高斯混合分量分别基于相应的潜在特征向量对用户(或项目)进行聚类。然后,偏差与分类侧信息一起分布在这些聚类上。最后,将它们与用户和项目的潜在特征向量结合,影响观察到的评分的生成。与最先进的上下文感知推荐方法相比,在MovieLens-100K和MovieLens-1M数据集上的实验显示了有希望的结果。我们还定性地分析了用户和物品的偏好,并展示了用户和物品之间的偏好差异。
{"title":"Preference Aware Recommendation Based on Categorical Information","authors":"Zhiwei Rao, Jiangchao Yao, Ya Zhang, Rui Zhang","doi":"10.1109/ICMLA.2016.0155","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0155","url":null,"abstract":"Contextual aware matrix factorization has been widely used in recommender systems by learning latent feature vectors of users and items along with contextual information. While most of them add identical bias for each type of side information to represent systematic tendencies in users' rating behaviors, they are not able to capture the preference unique to users or items. In this paper, we propose a probabilistic generative model which allows the bias to vary among different types of users or items. We first use Gaussian Mixture Components to cluster the users (or items) based on corresponding latent feature vectors respectively. Biases are then distributed on these clusters along with categorical side information. Finally, they are jointed with latent feature vectors of the users and items to affect the generation of observed ratings. Experiments on MovieLens-100K and MovieLens-1M data sets have shown promising results compared with state-of-the-art contextual aware recommendation approaches. We also qualitatively analyze the preferences of users and items and demonstrate differences in preference among both users and items.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133977283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Big Data Analytics Framework for Supporting Multidimensional Mining over Big Healthcare Data 支持医疗大数据多维挖掘的大数据分析框架
Mario Alessandro Bochicchio, A. Cuzzocrea, L. Vaira
Nowadays, a great deal of attention is being devoted to big data analytics in complex healthcare environments. Fetal growth curves, which are a classical case of big healthcare data, are used in prenatal medicine to early detect potential fetal growth problems, estimate the perinatal outcome and promptly treat possible complications. However, the currently adopted curves and the related diagnostic techniques have been criticized because of their poor precision. New techniques, based on the idea of customized growth curves, have been proposed in literature. In this perspective, the problem of building customized or personalized fetal growth curves by means of big data techniques is discussed in this paper. The proposed framework introduces the idea of summarizing the massive amounts of (input) big data via multidimensional views on top of which well-known Data Mining methods like clustering and classification are applied. This overall defines a multidimensional mining approach, targeted to complex healthcare environments. A preliminary analysis on the effectiveness of the framework is also proposed.
如今,复杂医疗环境中的大数据分析备受关注。胎儿生长曲线作为大健康数据的经典案例,用于产前医学早期发现胎儿潜在生长问题,预估围产期结局,及时治疗可能出现的并发症。然而,目前采用的曲线及相关的诊断技术由于精度差而受到批评。基于定制生长曲线思想的新技术已经在文献中提出。在此基础上,本文探讨了利用大数据技术构建定制化或个性化胎儿生长曲线的问题。提出的框架引入了通过多维视图对大量(输入)大数据进行汇总的思想,在此基础上应用了众所周知的数据挖掘方法,如聚类和分类。这总体上定义了一种针对复杂医疗保健环境的多维挖掘方法。对该框架的有效性进行了初步分析。
{"title":"A Big Data Analytics Framework for Supporting Multidimensional Mining over Big Healthcare Data","authors":"Mario Alessandro Bochicchio, A. Cuzzocrea, L. Vaira","doi":"10.1109/ICMLA.2016.0090","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0090","url":null,"abstract":"Nowadays, a great deal of attention is being devoted to big data analytics in complex healthcare environments. Fetal growth curves, which are a classical case of big healthcare data, are used in prenatal medicine to early detect potential fetal growth problems, estimate the perinatal outcome and promptly treat possible complications. However, the currently adopted curves and the related diagnostic techniques have been criticized because of their poor precision. New techniques, based on the idea of customized growth curves, have been proposed in literature. In this perspective, the problem of building customized or personalized fetal growth curves by means of big data techniques is discussed in this paper. The proposed framework introduces the idea of summarizing the massive amounts of (input) big data via multidimensional views on top of which well-known Data Mining methods like clustering and classification are applied. This overall defines a multidimensional mining approach, targeted to complex healthcare environments. A preliminary analysis on the effectiveness of the framework is also proposed.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125214295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Nonnegative Tensor Factorization Approach for Three-Dimensional Binary Wafer-Test Data 三维二元晶圆试验数据的非负张量分解方法
T. Siegert, R. Schachtner, G. Pöppel, E. Lang
We introduce a new Blind Source Separation Approach called binNTF which operates on tensor-valued binary datasets. Assuming that several simultaneously acting sources or elementary causes are generating the observed data, the objective of our approach is to uncover the underlying sources as well as their individual contribution to each observation with a minimum number of assumptions in an unsupervised fashion. We motivate, develop and demonstrate our method in the context of binary wafer test data which evolve during microchip fabrication. In this application, we also have to deal with incomplete datasets which can occur due to the commonly used stop-on-first-fail testing procedure or result from the aggregation of several distinct tests into BIN categories.
我们介绍了一种新的盲源分离方法,称为binNTF,它对张量值二值数据集进行操作。假设几个同时起作用的源或基本原因正在产生观察到的数据,我们的方法的目标是在无监督的方式下,以最少数量的假设揭示潜在的源以及它们对每个观察的个人贡献。我们在微芯片制造过程中不断发展的二进制晶圆测试数据的背景下激励,开发和演示我们的方法。在此应用程序中,我们还必须处理不完整的数据集,这可能是由于通常使用的“先失败即停止”测试过程或将几个不同的测试聚合到BIN类别所导致的结果。
{"title":"A Nonnegative Tensor Factorization Approach for Three-Dimensional Binary Wafer-Test Data","authors":"T. Siegert, R. Schachtner, G. Pöppel, E. Lang","doi":"10.1109/ICMLA.2016.0151","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0151","url":null,"abstract":"We introduce a new Blind Source Separation Approach called binNTF which operates on tensor-valued binary datasets. Assuming that several simultaneously acting sources or elementary causes are generating the observed data, the objective of our approach is to uncover the underlying sources as well as their individual contribution to each observation with a minimum number of assumptions in an unsupervised fashion. We motivate, develop and demonstrate our method in the context of binary wafer test data which evolve during microchip fabrication. In this application, we also have to deal with incomplete datasets which can occur due to the commonly used stop-on-first-fail testing procedure or result from the aggregation of several distinct tests into BIN categories.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130850037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Validation of a Quantifier-Based Fuzzy Classification System for Breast Cancer Patients on External Independent Cohorts 基于量化因子的乳腺癌患者模糊分类系统的外部独立队列验证
D. Soria, J. Garibaldi
Recent studies in breast cancer domains have identified seven distinct clinical phenotypes (groups) using immunohistochemical analysis and a variety of unsupervised learning techniques. Consensus among the clustering algorithms has been used to categorise patients into these specific groups, but often at the expenses of not classifying all patients. It is known that fuzzy methodologies can provide linguistic based classification rules to ease those from consensus clustering. The objective of this study is to present the validation of a recently developed extension of a fuzzy quantification subsethood-based algorithm on three sets of newly available breast cancer data. Results show that our algorithm is able to reproduce the seven biological classes previously identified, preserving their characterisation in terms of marker distributions and therefore their clinical meaning. Moreover, because our algorithm constitutes the fundamental basis of the newly developed Nottingham Prognostic Index Plus (NPI+), our findings demonstrate that this new medical decision making tool can help moving towards a more tailored care in breast cancer.
最近在乳腺癌领域的研究已经使用免疫组织化学分析和各种无监督学习技术确定了七种不同的临床表型(组)。聚类算法之间的共识已被用于将患者分类到这些特定的组中,但往往以不能对所有患者进行分类为代价。已知模糊方法可以提供基于语言的分类规则,以缓解一致性聚类的问题。本研究的目的是在三组新获得的乳腺癌数据上,验证最近开发的模糊量化基于子集的算法的扩展。结果表明,我们的算法能够重现先前确定的七个生物类别,保留其标记分布的特征,从而保留其临床意义。此外,由于我们的算法构成了新开发的诺丁汉预后指数+ (NPI+)的基础,我们的研究结果表明,这种新的医疗决策工具可以帮助乳腺癌患者实现更量身定制的护理。
{"title":"Validation of a Quantifier-Based Fuzzy Classification System for Breast Cancer Patients on External Independent Cohorts","authors":"D. Soria, J. Garibaldi","doi":"10.1109/ICMLA.2016.0101","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0101","url":null,"abstract":"Recent studies in breast cancer domains have identified seven distinct clinical phenotypes (groups) using immunohistochemical analysis and a variety of unsupervised learning techniques. Consensus among the clustering algorithms has been used to categorise patients into these specific groups, but often at the expenses of not classifying all patients. It is known that fuzzy methodologies can provide linguistic based classification rules to ease those from consensus clustering. The objective of this study is to present the validation of a recently developed extension of a fuzzy quantification subsethood-based algorithm on three sets of newly available breast cancer data. Results show that our algorithm is able to reproduce the seven biological classes previously identified, preserving their characterisation in terms of marker distributions and therefore their clinical meaning. Moreover, because our algorithm constitutes the fundamental basis of the newly developed Nottingham Prognostic Index Plus (NPI+), our findings demonstrate that this new medical decision making tool can help moving towards a more tailored care in breast cancer.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131261360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning for Plant Disease Incidence and Severity Measurements from Leaf Images 基于叶片图像的植物疾病发生率和严重程度测量的机器学习
Ernest Mwebaze, Godliver Owomugisha
In many fields, superior gains have been obtained by leveraging the computational power of machine learning techniques to solve expert tasks. In this paper we present an application of machine learning to agriculture, solving a particular problem of diagnosis of crop disease based on plant images taken with a smartphone. Two pieces of information are important here, the disease incidence and disease severity. We present a classification system that trains a 5 class classification system to determine the state of disease of a plant. The 5 classes represent a health class and 4 disease classes. We further extend the classification system to classify different severity levels for any of the 4 diseases. Severity levels are assigned classes 1 - 5, 1 being a healthy plant, 5 being a severely diseased plant. We present ways of extracting different features from leaf images and show how different extraction methods result in different performance of the classifier. We finally present the smartphone-based system that uses the classification model learnt to do real-time prediction of the state of health of a farmers garden. This works by the farmer uploading an image of a plant in his garden and obtaining a disease score from a remote server.
在许多领域,通过利用机器学习技术的计算能力来解决专家任务,已经获得了卓越的收益。在本文中,我们提出了机器学习在农业中的应用,解决了一个基于智能手机拍摄的植物图像诊断作物疾病的特定问题。这里有两条信息很重要,疾病发病率和疾病严重程度。我们提出了一个分类系统,训练一个5类分类系统来确定植物的疾病状态。这5个等级代表一个健康等级和4个疾病等级。我们进一步扩展了分类系统,为这4种疾病中的任何一种划分不同的严重程度。严重程度分为1 - 5级,1为健康植物,5为严重患病植物。我们提出了从叶子图像中提取不同特征的方法,并展示了不同的提取方法如何导致分类器的不同性能。我们最后展示了基于智能手机的系统,该系统使用学习的分类模型对农民花园的健康状况进行实时预测。该系统的工作原理是,农民上传自己花园里植物的图片,并从远程服务器获取病害评分。
{"title":"Machine Learning for Plant Disease Incidence and Severity Measurements from Leaf Images","authors":"Ernest Mwebaze, Godliver Owomugisha","doi":"10.1109/ICMLA.2016.0034","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0034","url":null,"abstract":"In many fields, superior gains have been obtained by leveraging the computational power of machine learning techniques to solve expert tasks. In this paper we present an application of machine learning to agriculture, solving a particular problem of diagnosis of crop disease based on plant images taken with a smartphone. Two pieces of information are important here, the disease incidence and disease severity. We present a classification system that trains a 5 class classification system to determine the state of disease of a plant. The 5 classes represent a health class and 4 disease classes. We further extend the classification system to classify different severity levels for any of the 4 diseases. Severity levels are assigned classes 1 - 5, 1 being a healthy plant, 5 being a severely diseased plant. We present ways of extracting different features from leaf images and show how different extraction methods result in different performance of the classifier. We finally present the smartphone-based system that uses the classification model learnt to do real-time prediction of the state of health of a farmers garden. This works by the farmer uploading an image of a plant in his garden and obtaining a disease score from a remote server.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133249202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Uncovering the Landscape of Fraud and Spam in the Telephony Channel 揭露诈骗和垃圾邮件在电话频道的景观
A. Marzuoli, H. Kingravi, David Dewey, Robert S. Pienta
Robocalling, voice phishing, and caller ID spoofing are common cybercrime techniques used to launch scam campaigns through the telephony channel, which unsuspecting users have long trusted. More reliable than online complaints, a telephony honeypot provides complete, accurate and timely information about unwanted phone calls across the United States. Our first goal is to provide a large-scale data-driven analysis of the telephony spam and fraud ecosystem. Our second goal is to uniquely identify bad actors potentially operating several phone numbers. We collected about 40,000 unsolicited calls. Our results show that only a few bad actors, robocallers or telemarketers, are responsible for the majority of the spam and scam calls, and that they can be uniquely identified based on audio features from their calls. This discovery has major implications for law enforcement and businesses that are presently engaged in combatting the rise of telephony fraud. In particular, since our system allows endusers to detect fraudulent behavior and tie it back to existing fraud and spam campaigns, it can be used as the first step towards designing and deploying intelligent defense strategies.
自动电话、语音网络钓鱼和来电显示欺骗是常见的网络犯罪技术,用于通过电话渠道发起诈骗活动,毫无戒心的用户长期以来一直信任这些渠道。电话蜜罐比网上投诉更可靠,它提供了美国各地不受欢迎电话的完整、准确和及时的信息。我们的第一个目标是为电话垃圾邮件和欺诈生态系统提供大规模的数据驱动分析。我们的第二个目标是唯一地识别可能操作多个电话号码的不良行为者。我们收到了大约4万个不请自来的电话。我们的研究结果表明,大多数垃圾电话和诈骗电话都是由少数几个不良行为者(自动呼叫者或电话推销员)制造的,而且我们可以根据他们电话的音频特征来唯一地识别他们。这一发现对目前正在打击日益增多的电话诈骗的执法部门和企业具有重大意义。特别是,由于我们的系统允许最终用户检测欺诈行为并将其与现有的欺诈和垃圾邮件活动联系起来,因此它可以用作设计和部署智能防御策略的第一步。
{"title":"Uncovering the Landscape of Fraud and Spam in the Telephony Channel","authors":"A. Marzuoli, H. Kingravi, David Dewey, Robert S. Pienta","doi":"10.1109/ICMLA.2016.0153","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0153","url":null,"abstract":"Robocalling, voice phishing, and caller ID spoofing are common cybercrime techniques used to launch scam campaigns through the telephony channel, which unsuspecting users have long trusted. More reliable than online complaints, a telephony honeypot provides complete, accurate and timely information about unwanted phone calls across the United States. Our first goal is to provide a large-scale data-driven analysis of the telephony spam and fraud ecosystem. Our second goal is to uniquely identify bad actors potentially operating several phone numbers. We collected about 40,000 unsolicited calls. Our results show that only a few bad actors, robocallers or telemarketers, are responsible for the majority of the spam and scam calls, and that they can be uniquely identified based on audio features from their calls. This discovery has major implications for law enforcement and businesses that are presently engaged in combatting the rise of telephony fraud. In particular, since our system allows endusers to detect fraudulent behavior and tie it back to existing fraud and spam campaigns, it can be used as the first step towards designing and deploying intelligent defense strategies.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123158860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
System-Level Test Case Prioritization Using Machine Learning 使用机器学习的系统级测试用例优先级
R. Lachmann, Sandro Schulze, Manuel Nieke, C. Seidl, Ina Schaefer
Regression testing is the common task of retesting software that has been changed or extended (e.g., by new features) during software evolution. As retesting the whole program is not feasible with reasonable time and cost, usually only a subset of all test cases is executed for regression testing, e.g., by executing test cases according to test case prioritization. Although a vast amount of methods for test case prioritization exist, they mostly require access to source code (i.e., white-box). However, in industrial practice, system-level testing is an important task that usually grants no access to source code (i.e., black-box). Hence, for an effective regression testing process, other information has to be employed. In this paper, we introduce a novel technique for test case prioritization for manual system-level regression testing based on supervised machine learning. Our approach considers black-box meta-data, such as test case history, as well as natural language test case descriptions for prioritization. We use the machine learning algorithm SVM Rank to evaluate our approach by means of two subject systems and measure the prioritization quality. Our results imply that our technique improves the failure detection rate significantly compared to a random order. In addition, we are able to outperform a test case order given by a test expert. Moreover, using natural language descriptions improves the failure finding rate.
回归测试是在软件发展过程中重新测试已更改或扩展(例如,通过新特性)的软件的常见任务。由于重新测试整个程序在合理的时间和成本下是不可行的,通常只执行所有测试用例的一个子集进行回归测试,例如,根据测试用例优先级执行测试用例。尽管存在大量的测试用例优先级划分方法,但它们大多需要访问源代码(例如,白盒)。然而,在工业实践中,系统级测试是一项重要的任务,通常不允许访问源代码(即黑盒)。因此,对于有效的回归测试过程,必须使用其他信息。在本文中,我们介绍了一种基于监督机器学习的人工系统级回归测试用例优先级的新技术。我们的方法考虑了黑盒元数据,例如测试用例历史,以及用于优先级排序的自然语言测试用例描述。我们使用机器学习算法SVM Rank通过两个主题系统来评估我们的方法并衡量优先级质量。我们的结果表明,与随机顺序相比,我们的技术显著提高了故障检出率。此外,我们能够超越测试专家给出的测试用例顺序。此外,使用自然语言描述提高了故障查找率。
{"title":"System-Level Test Case Prioritization Using Machine Learning","authors":"R. Lachmann, Sandro Schulze, Manuel Nieke, C. Seidl, Ina Schaefer","doi":"10.1109/ICMLA.2016.0065","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0065","url":null,"abstract":"Regression testing is the common task of retesting software that has been changed or extended (e.g., by new features) during software evolution. As retesting the whole program is not feasible with reasonable time and cost, usually only a subset of all test cases is executed for regression testing, e.g., by executing test cases according to test case prioritization. Although a vast amount of methods for test case prioritization exist, they mostly require access to source code (i.e., white-box). However, in industrial practice, system-level testing is an important task that usually grants no access to source code (i.e., black-box). Hence, for an effective regression testing process, other information has to be employed. In this paper, we introduce a novel technique for test case prioritization for manual system-level regression testing based on supervised machine learning. Our approach considers black-box meta-data, such as test case history, as well as natural language test case descriptions for prioritization. We use the machine learning algorithm SVM Rank to evaluate our approach by means of two subject systems and measure the prioritization quality. Our results imply that our technique improves the failure detection rate significantly compared to a random order. In addition, we are able to outperform a test case order given by a test expert. Moreover, using natural language descriptions improves the failure finding rate.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123172170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Toward Parametric Security Analysis of Machine Learning Based Cyber Forensic Biometric Systems 基于机器学习的网络法医生物识别系统参数安全性分析
Koosha Sadeghi, Ayan Banerjee, Javad Sohankar, S. Gupta
Machine learning algorithms are widely used in cyber forensic biometric systems to analyze a subject's truthfulness in an interrogation. An analytical method (rather than experimental) to evaluate the security strength of these systems under potential cyber attacks is essential. In this paper, we formalize a theoretical method for analyzing the immunity of a machine learning based cyber forensic system against evidence tampering attack. We apply our theory on brain signal based forensic systems that use neural networks to classify responses from a subject. Attack simulation is run to validate our theoretical analysis results.
机器学习算法被广泛应用于网络法医生物识别系统中,用于分析审问对象的真实性。评估这些系统在潜在网络攻击下的安全强度的分析方法(而不是实验方法)是必不可少的。在本文中,我们形式化了一种理论方法来分析基于机器学习的网络取证系统对证据篡改攻击的免疫力。我们将我们的理论应用于基于脑信号的法医系统,该系统使用神经网络对受试者的反应进行分类。通过攻击仿真验证了理论分析结果。
{"title":"Toward Parametric Security Analysis of Machine Learning Based Cyber Forensic Biometric Systems","authors":"Koosha Sadeghi, Ayan Banerjee, Javad Sohankar, S. Gupta","doi":"10.1109/ICMLA.2016.0110","DOIUrl":"https://doi.org/10.1109/ICMLA.2016.0110","url":null,"abstract":"Machine learning algorithms are widely used in cyber forensic biometric systems to analyze a subject's truthfulness in an interrogation. An analytical method (rather than experimental) to evaluate the security strength of these systems under potential cyber attacks is essential. In this paper, we formalize a theoretical method for analyzing the immunity of a machine learning based cyber forensic system against evidence tampering attack. We apply our theory on brain signal based forensic systems that use neural networks to classify responses from a subject. Attack simulation is run to validate our theoretical analysis results.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128859940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1