首页 > 最新文献

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Detecting SSH and FTP Brute Force Attacks in Big Data 大数据环境下SSH、FTP暴力破解检测
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00126
John T. Hancock, T. Khoshgoftaar, Joffrey L. Leevy
We present a simple approach for detecting brute force attacks in the CSE-CIC-IDS2018 Big Data dataset. We show our approach is preferable to more complex approaches since it is simpler, and yields stronger classification performance. Our contribution is to show that it is possible to train and test simple Decision Tree models with two independent variables to classify CSE-CIC-IDS2018 data with better results than reported in previous research, where more complex Deep Learning models are employed. Moreover, we show that Decision Tree models trained on data with two independent variables perform similarly to Decision Tree models trained on a larger number independent variables. Our experiments reveal that simple models, with AUC and AUPRC scores greater than 0.99, are capable of detecting brute force attacks in CSE-CIC-IDS2018. To the best of our knowledge, these are the strongest performance metrics published for the machine learning task of detecting these types of attacks. Furthermore, the simplicity of our approach, combined with its strong performance, makes it an appealing technique.
我们提出了一种简单的方法来检测CSE-CIC-IDS2018大数据集中的暴力破解攻击。我们证明了我们的方法比更复杂的方法更可取,因为它更简单,并且产生更强的分类性能。我们的贡献是表明可以训练和测试具有两个自变量的简单决策树模型来对CSE-CIC-IDS2018数据进行分类,其结果比之前使用更复杂的深度学习模型的研究报告更好。此外,我们表明,在具有两个自变量的数据上训练的决策树模型与在更多自变量上训练的决策树模型表现相似。我们的实验表明,AUC和AUPRC得分大于0.99的简单模型能够检测CSE-CIC-IDS2018中的暴力破解攻击。据我们所知,这些是针对检测这些类型攻击的机器学习任务发布的最强性能指标。此外,我们的方法的简单性,加上其强大的性能,使其成为一种吸引人的技术。
{"title":"Detecting SSH and FTP Brute Force Attacks in Big Data","authors":"John T. Hancock, T. Khoshgoftaar, Joffrey L. Leevy","doi":"10.1109/ICMLA52953.2021.00126","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00126","url":null,"abstract":"We present a simple approach for detecting brute force attacks in the CSE-CIC-IDS2018 Big Data dataset. We show our approach is preferable to more complex approaches since it is simpler, and yields stronger classification performance. Our contribution is to show that it is possible to train and test simple Decision Tree models with two independent variables to classify CSE-CIC-IDS2018 data with better results than reported in previous research, where more complex Deep Learning models are employed. Moreover, we show that Decision Tree models trained on data with two independent variables perform similarly to Decision Tree models trained on a larger number independent variables. Our experiments reveal that simple models, with AUC and AUPRC scores greater than 0.99, are capable of detecting brute force attacks in CSE-CIC-IDS2018. To the best of our knowledge, these are the strongest performance metrics published for the machine learning task of detecting these types of attacks. Furthermore, the simplicity of our approach, combined with its strong performance, makes it an appealing technique.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"25 1","pages":"760-765"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81309974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Driven football scouting assistance with simulated player performance extrapolation 数据驱动的足球球探协助模拟球员的表现外推
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00189
Shantanu Ghar, Sayali Patil, Venkhatesh Arunachalam
In club football, scouting is a crucial aspect of player recruitment, with elite football clubs investing millions of dollars in scouting and signing the best player for their team every year. Scouting requires great analytical and observational skills from the scout, to find the best player for any position in the team. A scout needs to analyze the player by watching his in-game actions, physical attributes and make a judgement on how the player might fit into the team. Every team has a formation, a style of play and a specific profile of player is required for a given position depending on the aforementioned factors. But scouts only watch a player play a few matches in person, and prepare their scouting report based on a player’s performance in those matches. This process is flawed as the scout is expected to watch a few games and make estimates of the player’s performance in a new team. The player statistics can help the scout in making better data-driven decisions. A player’s career statistics can provide a picture of how the player performs individually, but they fail to predict player chemistry alongside a team. Misjudgement in scouting can lead to losses of millions of dollars to a club. We propose to solve this problem by utilising vast amounts of quantitative and qualitative player statistics (from 3+ sources), and by incorporating data science and machine learning algorithms to simulate real world performances of the team after the addition of the newly scouted player. We take into account specific player requirements and classify a player into one of our specific 15 player types, and use the team’s formation and style of play to predict the players that will have the best chemistry with any given lineup, thereby facilitating scouts in making better decisions.
在俱乐部足球中,球探是球员招募的一个重要方面,精英足球俱乐部每年都会投资数百万美元用于球探并为球队签下最好的球员。球探需要出色的分析和观察能力,以便为球队的任何位置找到最好的球员。侦察员需要通过观察玩家在游戏中的行为、身体属性来分析玩家,并判断玩家是否适合团队。每支球队都有一个阵型,一种比赛风格,一个特定的球员需要一个特定的位置,这取决于上述因素。但球探只会亲自观看球员的几场比赛,并根据球员在这些比赛中的表现准备球探报告。这个过程是有缺陷的,因为球探需要看几场比赛,并对球员在新球队的表现做出估计。球员统计数据可以帮助球探做出更好的数据驱动决策。球员的职业生涯数据可以提供球员个人表现的画面,但它们无法预测球员在球队中的化学反应。球探中的错误判断可能会给俱乐部造成数百万美元的损失。我们建议通过利用大量的定量和定性球员统计数据(来自3个以上的来源)来解决这个问题,并结合数据科学和机器学习算法来模拟新球员加入后球队在现实世界中的表现。我们会考虑特定的球员需求,并将球员分为15种特定的球员类型之一,并使用球队的阵型和打法来预测在任何给定阵容中最能产生最佳化学反应的球员,从而帮助球探做出更好的决策。
{"title":"Data Driven football scouting assistance with simulated player performance extrapolation","authors":"Shantanu Ghar, Sayali Patil, Venkhatesh Arunachalam","doi":"10.1109/ICMLA52953.2021.00189","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00189","url":null,"abstract":"In club football, scouting is a crucial aspect of player recruitment, with elite football clubs investing millions of dollars in scouting and signing the best player for their team every year. Scouting requires great analytical and observational skills from the scout, to find the best player for any position in the team. A scout needs to analyze the player by watching his in-game actions, physical attributes and make a judgement on how the player might fit into the team. Every team has a formation, a style of play and a specific profile of player is required for a given position depending on the aforementioned factors. But scouts only watch a player play a few matches in person, and prepare their scouting report based on a player’s performance in those matches. This process is flawed as the scout is expected to watch a few games and make estimates of the player’s performance in a new team. The player statistics can help the scout in making better data-driven decisions. A player’s career statistics can provide a picture of how the player performs individually, but they fail to predict player chemistry alongside a team. Misjudgement in scouting can lead to losses of millions of dollars to a club. We propose to solve this problem by utilising vast amounts of quantitative and qualitative player statistics (from 3+ sources), and by incorporating data science and machine learning algorithms to simulate real world performances of the team after the addition of the newly scouted player. We take into account specific player requirements and classify a player into one of our specific 15 player types, and use the team’s formation and style of play to predict the players that will have the best chemistry with any given lineup, thereby facilitating scouts in making better decisions.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"19 1","pages":"1160-1167"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84331322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Batch and Online Variational Learning of Hierarchical Pitman-Yor Mixtures of Multivariate Beta Distributions 多变量Beta分布的分层Pitman-Yor混合的批处理和在线变分学习
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00053
Narges Manouchehri, N. Bouguila, Wentao Fan
In this paper, we propose hierarchical Pitman-Yor process mixtures of multivariate Beta distributions and learn this novel clustering method by online variational inference. The flexibility of this mixture model and its non-parametric hierarchical structure help in fitting our data. Also, the model complexity and model parameters are estimated simultaneously. We apply our proposed model to real medical applications. Our motivation is that labelling healthcare data is sensitive and expensive. Also, interpretability and evidence-based decision-making are some basic needs of medicine. These conditions led us to focus on clustering as it doesn’t need labelling. Another driving reason is that the amount of publicly available data in medicine is less compared to other fields due to the confidential regulations. To evaluate our proposed model, we compare its performance with other similar alternatives. The experimental results indicate the potential of our proposed model.
本文提出了多元Beta分布的分层Pitman-Yor过程混合聚类方法,并通过在线变分推理学习这种新颖的聚类方法。这种混合模型的灵活性及其非参数层次结构有助于拟合我们的数据。同时对模型复杂度和模型参数进行了估计。我们将提出的模型应用于实际医疗应用。我们的动机是,给医疗保健数据贴标签既敏感又昂贵。可解释性和循证决策是医学的基本需求。这些条件使我们专注于聚类,因为它不需要标记。另一个驱动因素是,由于保密规定,与其他领域相比,医学领域公开可用数据的数量较少。为了评估我们提出的模型,我们将其性能与其他类似的替代方案进行比较。实验结果表明了该模型的可行性。
{"title":"Batch and Online Variational Learning of Hierarchical Pitman-Yor Mixtures of Multivariate Beta Distributions","authors":"Narges Manouchehri, N. Bouguila, Wentao Fan","doi":"10.1109/ICMLA52953.2021.00053","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00053","url":null,"abstract":"In this paper, we propose hierarchical Pitman-Yor process mixtures of multivariate Beta distributions and learn this novel clustering method by online variational inference. The flexibility of this mixture model and its non-parametric hierarchical structure help in fitting our data. Also, the model complexity and model parameters are estimated simultaneously. We apply our proposed model to real medical applications. Our motivation is that labelling healthcare data is sensitive and expensive. Also, interpretability and evidence-based decision-making are some basic needs of medicine. These conditions led us to focus on clustering as it doesn’t need labelling. Another driving reason is that the amount of publicly available data in medicine is less compared to other fields due to the confidential regulations. To evaluate our proposed model, we compare its performance with other similar alternatives. The experimental results indicate the potential of our proposed model.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"298-303"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87435668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Outperforming Clinical Practices in Breast Cancer Detection: A Superior Dense Neural Network in Classification and False Negative Reduction 在乳腺癌检测中表现优异的临床实践:在分类和假阴性减少方面的优越密集神经网络
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00098
Patrick Bujok, Maria Jensen, Steffen M. Larsen, R. A. Alphinas
Machine Learning applications provide a promising method to support clinical practitioners in Breast Cancer (BC) detection. Currently, Fine Needle Aspiration (FNA) is a commonly applied diagnostic method for BC tumors, which, however, is associated with ominous false negative misclassifications. For this purpose, the present study explores Artificial Neural Networks (ANNs) with the aim of outperforming clinical practices via FNA in classifying benign or malignant BC cases with regard to an improved accuracy and reduced False Negative Rate (FNR) using the Breast Cancer Wisconsin (Diagnostic) Dataset (WDBC). The findings reveal that a dense ANN with a single hidden layer including 15 neurons can reach a testing accuracy of 98.60% and a FNR of 0% on a scaled dataset. In combination with several introduced improvement measures, a high degree of generalizability is associated with the model under the consideration of the relatively small dataset. As a result, this model outperforms not only clinical practitioners but also 72 classifiers from the recent literature.
机器学习应用为支持临床医生检测乳腺癌(BC)提供了一种很有前途的方法。目前,细针穿刺(FNA)是一种常用的诊断BC肿瘤的方法,然而,它与不祥的假阴性错误分类有关。为此,本研究探索了人工神经网络(ann),目的是通过FNA在使用乳腺癌威斯康星(诊断)数据集(WDBC)对良性或恶性BC病例进行分类方面优于临床实践,提高准确性并降低假阴性率(FNR)。研究结果表明,包含15个神经元的单个隐藏层的密集神经网络在缩放数据集上的测试准确率为98.60%,FNR为0%。结合引入的几个改进措施,在考虑相对较小的数据集的情况下,该模型具有高度的泛化性。因此,该模型不仅优于临床医生,而且优于最近文献中的72个分类器。
{"title":"Outperforming Clinical Practices in Breast Cancer Detection: A Superior Dense Neural Network in Classification and False Negative Reduction","authors":"Patrick Bujok, Maria Jensen, Steffen M. Larsen, R. A. Alphinas","doi":"10.1109/ICMLA52953.2021.00098","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00098","url":null,"abstract":"Machine Learning applications provide a promising method to support clinical practitioners in Breast Cancer (BC) detection. Currently, Fine Needle Aspiration (FNA) is a commonly applied diagnostic method for BC tumors, which, however, is associated with ominous false negative misclassifications. For this purpose, the present study explores Artificial Neural Networks (ANNs) with the aim of outperforming clinical practices via FNA in classifying benign or malignant BC cases with regard to an improved accuracy and reduced False Negative Rate (FNR) using the Breast Cancer Wisconsin (Diagnostic) Dataset (WDBC). The findings reveal that a dense ANN with a single hidden layer including 15 neurons can reach a testing accuracy of 98.60% and a FNR of 0% on a scaled dataset. In combination with several introduced improvement measures, a high degree of generalizability is associated with the model under the consideration of the relatively small dataset. As a result, this model outperforms not only clinical practitioners but also 72 classifiers from the recent literature.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"25 1","pages":"589-594"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87677746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROV-GEM: Automated Provenance Analysis Framework using Graph Embeddings gem:使用图嵌入的自动化来源分析框架
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00273
Maya Kapoor, Joshua Melton, Michael Ridenhour, S. Krishnan, Thomas Moyer
Data provenance graphs, detailed traces of system behavior, are a popular construct to analyze and forecast malicious cyber activity like advanced persistent threats (APT). A critical limitation of existing analysis techniques is the lack of an automated analytic framework to predict APTs. In this work, we address that limitation by augmenting efficient capture and storage mechanisms to include automated analysis. Specifically, we propose PROV-GEM, a deep graph learning framework to identify malicious anomalous behavior from provenance data. Since data provenance graphs are complex datasets often expressed as heterogeneous attributed multiplex networks, we use a unified relation-aware embedding framework to capture the necessary contexts and associated interactions between the various entities manifest in the data. Furthermore, provenance graphs by nature are rich detailed structures that are heavily attributed compared to other complex systems that have been used traditionally in graph machine learning applications. Towards that end, our framework uniquely captures “multi-embeddings” that can represent varied contexts of nodes and their multi-faceted nature. We demonstrate the efficacy of our embeddings by applying PROV-GEM to two publicly available APT provenance graph datasets from StreamSpot and Unicorn. PROV-GEM achieves strong performance on both datasets with a 99% accuracy and 97% F1-score on the StreamSpot dataset, and a 97% accuracy and 89% F1-score on the Unicorn dataset, equaling or outperforming comparable state-of-the-art APT threat detection models. Unlike other frameworks, PROV-GEM utilizes an efficient graph convolutional approach coupled with relational self-attention to generate rich graph embeddings that capture the complex topology of data provenance graphs, providing an effective automated analytic framework for APT detection.
数据来源图是系统行为的详细痕迹,是分析和预测高级持续性威胁(APT)等恶意网络活动的常用结构。现有分析技术的一个关键限制是缺乏预测apt的自动化分析框架。在这项工作中,我们通过增加有效的捕获和存储机制来包括自动化分析来解决这一限制。具体来说,我们提出了provo - gem,这是一个深度图学习框架,用于从来源数据中识别恶意异常行为。由于数据来源图是复杂的数据集,通常表示为异构属性多路网络,因此我们使用统一的关系感知嵌入框架来捕获数据中显示的各种实体之间的必要上下文和相关交互。此外,与传统上在图机器学习应用中使用的其他复杂系统相比,来源图本质上是丰富的详细结构。为此,我们的框架独特地捕获了“多嵌入”,可以表示节点的各种上下文及其多面性。我们通过将provo - gem应用于来自StreamSpot和Unicorn的两个公开可用的APT来源图数据集来证明我们嵌入的有效性。provo - gem在两个数据集上都实现了强大的性能,在StreamSpot数据集上具有99%的准确性和97%的f1分数,在Unicorn数据集上具有97%的准确性和89%的f1分数,相当于或优于可比较的最先进的APT威胁检测模型。与其他框架不同,provo - gem利用高效的图卷积方法与关系自关注相结合,生成丰富的图嵌入,捕获数据来源图的复杂拓扑,为APT检测提供有效的自动化分析框架。
{"title":"PROV-GEM: Automated Provenance Analysis Framework using Graph Embeddings","authors":"Maya Kapoor, Joshua Melton, Michael Ridenhour, S. Krishnan, Thomas Moyer","doi":"10.1109/ICMLA52953.2021.00273","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00273","url":null,"abstract":"Data provenance graphs, detailed traces of system behavior, are a popular construct to analyze and forecast malicious cyber activity like advanced persistent threats (APT). A critical limitation of existing analysis techniques is the lack of an automated analytic framework to predict APTs. In this work, we address that limitation by augmenting efficient capture and storage mechanisms to include automated analysis. Specifically, we propose PROV-GEM, a deep graph learning framework to identify malicious anomalous behavior from provenance data. Since data provenance graphs are complex datasets often expressed as heterogeneous attributed multiplex networks, we use a unified relation-aware embedding framework to capture the necessary contexts and associated interactions between the various entities manifest in the data. Furthermore, provenance graphs by nature are rich detailed structures that are heavily attributed compared to other complex systems that have been used traditionally in graph machine learning applications. Towards that end, our framework uniquely captures “multi-embeddings” that can represent varied contexts of nodes and their multi-faceted nature. We demonstrate the efficacy of our embeddings by applying PROV-GEM to two publicly available APT provenance graph datasets from StreamSpot and Unicorn. PROV-GEM achieves strong performance on both datasets with a 99% accuracy and 97% F1-score on the StreamSpot dataset, and a 97% accuracy and 89% F1-score on the Unicorn dataset, equaling or outperforming comparable state-of-the-art APT threat detection models. Unlike other frameworks, PROV-GEM utilizes an efficient graph convolutional approach coupled with relational self-attention to generate rich graph embeddings that capture the complex topology of data provenance graphs, providing an effective automated analytic framework for APT detection.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"115 1","pages":"1720-1727"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77194918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
How Dense Autoencoders can still Achieve the State-of-the-art in Time-Series Anomaly Detection 如何密集的自编码器仍然可以实现最先进的时间序列异常检测
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00207
Louis Jensen, Jayme Fosa, Ben Teitelbaum, Peter Chin
Time series data has become ubiquitous in the modern era of data collection. With the increase of these time series data streams, the demand for automatic time series anomaly detection has also increased. Automatic monitoring of data allows engineers to investigate only unusual behavior in their data streams. Despite this increase in demand for automatic time series anomaly detection, many popular methods fail to offer a general purpose solution. Some demand expensive labelling of anomalies, others require the data to follow certain assumed patterns, some have long and unstable training, and many suffer from high rates of false alarms. In this paper we demonstrate that simpler is often better, showing that a fully unsupervised multilayer perceptron autoencoder is able to outperform much more complicated models with only a few critical improvements. We offer improvements to help distinguish anomalous subsequences near to each other, and to distinguish anomalies even in the midst of changing distributions of data. We compare our model with state-of-the-art competitors on benchmark datasets sourced from NASA, Yahoo, and Numenta, achieving improvements beyond competitive models in all three datasets.
时间序列数据在现代数据收集时代已经变得无处不在。随着这些时间序列数据流的增加,对时间序列自动异常检测的需求也随之增加。数据的自动监控允许工程师只调查数据流中的异常行为。尽管对自动时间序列异常检测的需求不断增加,但许多流行的方法无法提供通用的解决方案。一些需要昂贵的异常标记,另一些需要数据遵循某些假定的模式,一些有长期和不稳定的训练,许多遭受高误报率。在本文中,我们证明了简单通常是更好的,表明一个完全无监督的多层感知器自编码器仅通过一些关键的改进就能够胜过更复杂的模型。我们提供了改进,以帮助区分彼此靠近的异常子序列,并且即使在数据分布变化的过程中也能区分异常。我们将我们的模型与来自NASA、Yahoo和Numenta的最先进的竞争对手的基准数据集进行比较,在所有三个数据集上都取得了超越竞争对手模型的改进。
{"title":"How Dense Autoencoders can still Achieve the State-of-the-art in Time-Series Anomaly Detection","authors":"Louis Jensen, Jayme Fosa, Ben Teitelbaum, Peter Chin","doi":"10.1109/ICMLA52953.2021.00207","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00207","url":null,"abstract":"Time series data has become ubiquitous in the modern era of data collection. With the increase of these time series data streams, the demand for automatic time series anomaly detection has also increased. Automatic monitoring of data allows engineers to investigate only unusual behavior in their data streams. Despite this increase in demand for automatic time series anomaly detection, many popular methods fail to offer a general purpose solution. Some demand expensive labelling of anomalies, others require the data to follow certain assumed patterns, some have long and unstable training, and many suffer from high rates of false alarms. In this paper we demonstrate that simpler is often better, showing that a fully unsupervised multilayer perceptron autoencoder is able to outperform much more complicated models with only a few critical improvements. We offer improvements to help distinguish anomalous subsequences near to each other, and to distinguish anomalies even in the midst of changing distributions of data. We compare our model with state-of-the-art competitors on benchmark datasets sourced from NASA, Yahoo, and Numenta, achieving improvements beyond competitive models in all three datasets.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"1272-1277"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82949939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depression Detection Using Combination of sMRI and fMRI Image Features 结合sMRI和fMRI图像特征的抑郁症检测
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00092
Marzieh Mousavian, Jianhua Chen, S. Greening
Automatic detection of Major Depression Disorder (MDD) from brain MRI images with machine learning has been an active area of study. In this paper several methods are explored for MDD detection by combining features from structural and functional brain MRI images, and combining Atlas-based and spatial cube-based features. Experiments demonstrate good classification performance on an imbalanced dataset. The paper also presents a visualization that captures the spatial overlapping between the top discriminating spatial cube pairs and the regions of interests in the Harvard Atlas.
利用机器学习技术从脑MRI图像中自动检测重度抑郁症(MDD)一直是一个活跃的研究领域。本文结合脑MRI结构和功能图像特征,结合基于atlas的特征和基于空间立方体的特征,探索了几种检测MDD的方法。实验证明了在不平衡数据集上具有良好的分类性能。本文还提出了一种可视化方法,该方法捕获了哈佛地图集中顶部区分空间立方体对与感兴趣区域之间的空间重叠。
{"title":"Depression Detection Using Combination of sMRI and fMRI Image Features","authors":"Marzieh Mousavian, Jianhua Chen, S. Greening","doi":"10.1109/ICMLA52953.2021.00092","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00092","url":null,"abstract":"Automatic detection of Major Depression Disorder (MDD) from brain MRI images with machine learning has been an active area of study. In this paper several methods are explored for MDD detection by combining features from structural and functional brain MRI images, and combining Atlas-based and spatial cube-based features. Experiments demonstrate good classification performance on an imbalanced dataset. The paper also presents a visualization that captures the spatial overlapping between the top discriminating spatial cube pairs and the regions of interests in the Harvard Atlas.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"125 5 1","pages":"552-557"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80502286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deployment of Embedded Edge-AI for Wildlife Monitoring in Remote Regions 嵌入式边缘人工智能在偏远地区野生动物监测中的部署
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00170
D. Schwartz, Jonathan Michael Gomes Selman, P. Wrege, A. Paepcke
Artificial intelligence is increasingly used in ecological contexts to monitor animal and insect populations. Species of interest are those in danger of extinction, and those that play pivotal roles in agriculture. Noticing population declines or geographical shifts early enough for intervention can prevent local famine and disruption to the global food chain. Traditionally, data are collected in the field using human labor or sensors. Applicable classification models then analyze the data on central servers. The most expensive, and sometimes dangerous part of the remote sensing solution is the human labor of visiting the sensors, retrieving data, and changing batteries. Constantly sending all readings by radio is expensive in power. Instead, having AI in the sensors process readings, and only transmitting results could lead to an indefinitely autonomous, renewably powered solution. We implemented an elephant vocalization detector on a small processor board, and demonstrate that such a device can be operated at low enough power levels with considerable freedom of choice among AI technologies. We achieved a mean of 1.6W, in the best case staying within 75% of memory limits. Measurements covered three inference models, two batch sizes, and two floating point word width settings.
人工智能越来越多地用于生态环境中监测动物和昆虫种群。我们关注的物种是那些濒临灭绝的物种,以及那些在农业中发挥关键作用的物种。及早注意到人口减少或地理变化以便进行干预,可以防止局部饥荒和对全球食物链的破坏。传统上,数据是通过人工或传感器在现场收集的。然后,适用的分类模型分析中央服务器上的数据。遥感解决方案中最昂贵、有时也是最危险的部分是访问传感器、检索数据和更换电池的人力劳动。不断地用无线电发送所有的读数是很昂贵的。相反,在传感器中加入人工智能处理读数,只传输结果,可能会带来无限自主、可再生能源的解决方案。我们在一个小处理器板上实现了一个大象发声探测器,并证明了这样的设备可以在足够低的功率水平下运行,并且在人工智能技术中有相当大的选择自由。我们实现了平均1.6W,在最好的情况下保持在内存限制的75%以内。测量包括三个推理模型、两个批大小和两个浮点字宽设置。
{"title":"Deployment of Embedded Edge-AI for Wildlife Monitoring in Remote Regions","authors":"D. Schwartz, Jonathan Michael Gomes Selman, P. Wrege, A. Paepcke","doi":"10.1109/ICMLA52953.2021.00170","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00170","url":null,"abstract":"Artificial intelligence is increasingly used in ecological contexts to monitor animal and insect populations. Species of interest are those in danger of extinction, and those that play pivotal roles in agriculture. Noticing population declines or geographical shifts early enough for intervention can prevent local famine and disruption to the global food chain. Traditionally, data are collected in the field using human labor or sensors. Applicable classification models then analyze the data on central servers. The most expensive, and sometimes dangerous part of the remote sensing solution is the human labor of visiting the sensors, retrieving data, and changing batteries. Constantly sending all readings by radio is expensive in power. Instead, having AI in the sensors process readings, and only transmitting results could lead to an indefinitely autonomous, renewably powered solution. We implemented an elephant vocalization detector on a small processor board, and demonstrate that such a device can be operated at low enough power levels with considerable freedom of choice among AI technologies. We achieved a mean of 1.6W, in the best case staying within 75% of memory limits. Measurements covered three inference models, two batch sizes, and two floating point word width settings.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"1035-1042"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89243102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Applications of Mobile Machine Learning for Detecting Bio-energy Crops Flowers 移动机器学习在生物能源作物花卉检测中的应用
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00121
Wenjun Zeng, Bakhtiar Amen
Automated flower detection and control is important to crop production and precision agriculture. Some computer vision methods have been proposed for flower detection, but their performances are not satisfactory on platforms with limited computing ability such as mobile and embedded devices, and thus not suitable for field applications. Herein we demonstrate two de novo approaches that can precisely detect the flowers of two bioenergy crops (potatoes and sweet potatoes) and can distinguish them from similar flowers of relative species (eggplants and Ipomoea triloba) on mobile devices. In this work, a custom dataset containing 495 manually labelled images is constructed for training and testing, and the latest state-of-the-art object detection model, YOLOv4, as well as its lightweight version, YOLOv4-tiny, are selected as the flower detection models. Some other milestone object detection models including YOLOv3, YOLOv3-tiny, SSD and Faster-RCNN are chosen as benchmarks for performance comparison. The comparative experiment results indicate that the retrained YOLOv4 model achieves a considerable high mean average precision (mAP= 91%;) but a slower inference speed (FPS) on a mobile device, while the retrained YOLOv4-tiny has a lower mAP of 87%; but reach a higher FPS of 9 on a mobile device. Two mobile applications are then developed by directly deploying YOLOv4-tiny model on a mobile app and by deploying YOLOv4 on a web API, respectively. The testing experiments indicate that both applications can not only achieve real-time and accurate detection, but also reduce computation burdens on mobile devices.
花卉自动化检测与控制对作物生产和精准农业具有重要意义。目前已经提出了一些用于花卉检测的计算机视觉方法,但在移动和嵌入式设备等计算能力有限的平台上,这些方法的性能并不令人满意,因此不适合现场应用。在这里,我们展示了两种全新的方法,可以精确检测两种生物能源作物(土豆和红薯)的花朵,并可以在移动设备上将它们与相关物种(茄子和三叶马铃薯)的类似花朵区分开来。在这项工作中,我们构建了一个包含495张手动标记图像的自定义数据集进行训练和测试,并选择了最新的最先进的目标检测模型YOLOv4,以及它的轻量级版本YOLOv4-tiny作为花卉检测模型。其他一些具有里程碑意义的目标检测模型包括YOLOv3, YOLOv3-tiny, SSD和Faster-RCNN作为性能比较的基准。对比实验结果表明,经过再训练的YOLOv4模型在移动设备上获得了相当高的平均精度(mAP= 91%),但推理速度(FPS)较慢,而经过再训练的YOLOv4-tiny模型的mAP较低,为87%;但在移动设备上达到更高的FPS(9)。然后分别通过在移动应用程序上直接部署YOLOv4-tiny模型和在web API上部署YOLOv4来开发两个移动应用程序。测试实验表明,这两种应用不仅可以实现实时、准确的检测,还可以减少移动设备的计算负担。
{"title":"Applications of Mobile Machine Learning for Detecting Bio-energy Crops Flowers","authors":"Wenjun Zeng, Bakhtiar Amen","doi":"10.1109/ICMLA52953.2021.00121","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00121","url":null,"abstract":"Automated flower detection and control is important to crop production and precision agriculture. Some computer vision methods have been proposed for flower detection, but their performances are not satisfactory on platforms with limited computing ability such as mobile and embedded devices, and thus not suitable for field applications. Herein we demonstrate two de novo approaches that can precisely detect the flowers of two bioenergy crops (potatoes and sweet potatoes) and can distinguish them from similar flowers of relative species (eggplants and Ipomoea triloba) on mobile devices. In this work, a custom dataset containing 495 manually labelled images is constructed for training and testing, and the latest state-of-the-art object detection model, YOLOv4, as well as its lightweight version, YOLOv4-tiny, are selected as the flower detection models. Some other milestone object detection models including YOLOv3, YOLOv3-tiny, SSD and Faster-RCNN are chosen as benchmarks for performance comparison. The comparative experiment results indicate that the retrained YOLOv4 model achieves a considerable high mean average precision (mAP= 91%;) but a slower inference speed (FPS) on a mobile device, while the retrained YOLOv4-tiny has a lower mAP of 87%; but reach a higher FPS of 9 on a mobile device. Two mobile applications are then developed by directly deploying YOLOv4-tiny model on a mobile app and by deploying YOLOv4 on a web API, respectively. The testing experiments indicate that both applications can not only achieve real-time and accurate detection, but also reduce computation burdens on mobile devices.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"41 1","pages":"724-729"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91011208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improved Attribute Manipulation in the Latent Space of StyleGAN for Semantic Face Editing 面向语义人脸编辑的StyleGAN隐空间属性操作改进
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00014
Aashish Rai, Clara Ducher, J. Cooperstock
With the recent popularization of generative frameworks for producing photorealistic face images, we now have the ability to create a convincing graphical match for any particular individual. It is unrealistic, however, to rely solely on such generative methods to randomly produce the facial characteristics we are seeking. Instead, manipulation of facial attributes in the latent space, enabled by the InterFaceGAN framework, allows us to “tweak” these characteristics in the desired direction to improve the quality of the match. The challenge in this process is that attribute entanglement leads to a change of one feature having an undesirable impact on others. We explore several strategies to improve the results of these manipulations, and demonstrate how the automatic conditioning of attributes can be used to minimize the impact of such entanglement, and further, allow for improved control over complex (non-binary) attributes such as race or face shape.
随着最近用于生成逼真人脸图像的生成框架的普及,我们现在有能力为任何特定个体创建令人信服的图形匹配。然而,仅仅依靠这种生成方法来随机生成我们正在寻找的面部特征是不现实的。相反,通过InterFaceGAN框架在潜在空间中操纵面部属性,允许我们在期望的方向上“调整”这些特征,以提高匹配的质量。这个过程中的挑战是,属性纠缠会导致一个特性的改变对其他特性产生不良影响。我们探索了几种策略来改善这些操作的结果,并展示了如何使用属性的自动条件反射来最小化这种纠缠的影响,并且进一步允许改进对复杂(非二进制)属性(如种族或脸型)的控制。
{"title":"Improved Attribute Manipulation in the Latent Space of StyleGAN for Semantic Face Editing","authors":"Aashish Rai, Clara Ducher, J. Cooperstock","doi":"10.1109/ICMLA52953.2021.00014","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00014","url":null,"abstract":"With the recent popularization of generative frameworks for producing photorealistic face images, we now have the ability to create a convincing graphical match for any particular individual. It is unrealistic, however, to rely solely on such generative methods to randomly produce the facial characteristics we are seeking. Instead, manipulation of facial attributes in the latent space, enabled by the InterFaceGAN framework, allows us to “tweak” these characteristics in the desired direction to improve the quality of the match. The challenge in this process is that attribute entanglement leads to a change of one feature having an undesirable impact on others. We explore several strategies to improve the results of these manipulations, and demonstrate how the automatic conditioning of attributes can be used to minimize the impact of such entanglement, and further, allow for improved control over complex (non-binary) attributes such as race or face shape.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"21 39","pages":"38-43"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91505996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1