首页 > 最新文献

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)最新文献

英文 中文
PcapGAN: Packet Capture File Generator by Style-Based Generative Adversarial Networks PcapGAN:基于风格生成对抗网络的包捕获文件生成器
Baik Dowoo, Yujin Jung, Changhee Choi
After the advent of GAN technology, many varied models have been studied and applied to various fields such as image and audio. However, in the field of cyber data, which has the same issue of data shortage, the research on data augmentation is insufficient. To solve this problem, we propose PcapGAN that can augment pcap data, a kind of network data. The proposed model includes an encoder, a data generator, and a decoder. The encoder subdivides network data into four parts. The generator generates new data for each part of the data. The decoder combines the generated data into realistic network data. We demonstrate the similarity between the generated data and original data, and validation of the generated data by increased performance of intrusion detection algorithms.
GAN技术出现后,许多不同的模型被研究并应用于图像和音频等各个领域。然而,在网络数据领域,同样存在数据不足的问题,对数据扩充的研究不足。为了解决这个问题,我们提出了PcapGAN,它可以增强pcap数据,这是一种网络数据。所提出的模型包括一个编码器、一个数据生成器和一个解码器。编码器将网络数据细分为四个部分。生成器为数据的每个部分生成新的数据。解码器将生成的数据合并为真实的网络数据。我们展示了生成数据与原始数据之间的相似性,并通过提高入侵检测算法的性能来验证生成数据。
{"title":"PcapGAN: Packet Capture File Generator by Style-Based Generative Adversarial Networks","authors":"Baik Dowoo, Yujin Jung, Changhee Choi","doi":"10.1109/ICMLA.2019.00191","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00191","url":null,"abstract":"After the advent of GAN technology, many varied models have been studied and applied to various fields such as image and audio. However, in the field of cyber data, which has the same issue of data shortage, the research on data augmentation is insufficient. To solve this problem, we propose PcapGAN that can augment pcap data, a kind of network data. The proposed model includes an encoder, a data generator, and a decoder. The encoder subdivides network data into four parts. The generator generates new data for each part of the data. The decoder combines the generated data into realistic network data. We demonstrate the similarity between the generated data and original data, and validation of the generated data by increased performance of intrusion detection algorithms.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Educational Data Mining: Analysis of Drop out of Engineering Majors at the UnB - Brazil 教育数据挖掘:巴西UnB大学工程专业学生辍学分析
R. Silveira, M. Holanda, M. Victorino, M. Ladeira
This paper presents an analysis of data about the drop out of undergraduate engineering students at the University of Brasilia(UnB), Brazil. In Brazil, similar to other countries, there is a representative amount of engineering students that enroll in engineering majors, however, they don't get to graduate in those majors. Information about the reason for that phenomenon is important for action on the matter by university decisionmakers. This paper aims to answer the research question: What are the main factors that motivate engineering students to drop out of engineering majors at UnB? We have collected the social and performance data of engineering students from 2009 to 2019. Some of the data can be considered rare in similar studies, like students' distance from home to campus and factors like students' leave of absence requests rather than performance factors. We used three data mining techniques: Generalized Linear Model (GLM), Boosting algorithm (GBM) and Random Forest(RF). The results of the study showed that international students deserve some attention from the university and courses like Physics 1 can be challenging for engineering students.
本文对巴西巴西利亚大学(UnB)本科工程专业学生的退学数据进行了分析。在巴西,和其他国家一样,有一定数量的工程专业学生注册了工程专业,然而,他们并没有从这些专业毕业。关于这一现象的原因的信息对于大学决策者在这个问题上采取行动是很重要的。本文旨在回答一个研究问题:什么是促使UnB工程专业的学生退出的主要因素?我们收集了2009年至2019年工科学生的社会和绩效数据。有些数据在类似的研究中可以被认为是罕见的,比如学生家到学校的距离,以及学生请假等因素,而不是表现因素。我们使用了三种数据挖掘技术:广义线性模型(GLM)、增强算法(GBM)和随机森林(RF)。研究结果表明,国际学生应该得到大学的重视,而像物理1这样的课程对工科学生来说可能很有挑战性。
{"title":"Educational Data Mining: Analysis of Drop out of Engineering Majors at the UnB - Brazil","authors":"R. Silveira, M. Holanda, M. Victorino, M. Ladeira","doi":"10.1109/ICMLA.2019.00048","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00048","url":null,"abstract":"This paper presents an analysis of data about the drop out of undergraduate engineering students at the University of Brasilia(UnB), Brazil. In Brazil, similar to other countries, there is a representative amount of engineering students that enroll in engineering majors, however, they don't get to graduate in those majors. Information about the reason for that phenomenon is important for action on the matter by university decisionmakers. This paper aims to answer the research question: What are the main factors that motivate engineering students to drop out of engineering majors at UnB? We have collected the social and performance data of engineering students from 2009 to 2019. Some of the data can be considered rare in similar studies, like students' distance from home to campus and factors like students' leave of absence requests rather than performance factors. We used three data mining techniques: Generalized Linear Model (GLM), Boosting algorithm (GBM) and Random Forest(RF). The results of the study showed that international students deserve some attention from the university and courses like Physics 1 can be challenging for engineering students.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127353638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Using Machine Learning to Improve Surgical Outcomes 利用机器学习提高手术效果
Sindhura Bonthu, P. Armijo, Tiffany Tanner, Qiuming A. Zhu
Predicting the severity of patient’s condition helps providing accurate clinical care. Mortality prediction is one of the challenges due to distinct characteristics of the patient’s data. It is a challenging problem to evaluate the patient’s data which is highly sparse, highly biased and imbalanced, and highly mixed. In this paper, we are focusing on processing large volumes of data using neural networks which can be further used for analysis to obtain useful insights, such as identifying the major features contributing to certain outcomes of events or classifying different objects based on the presences of certain attributes and their measurements.
预测患者病情的严重程度有助于提供准确的临床护理。由于患者数据的不同特征,死亡率预测是一个挑战。患者数据高度稀疏、高度偏倚和不平衡、高度混杂,对其进行评估是一个具有挑战性的问题。在本文中,我们专注于使用神经网络处理大量数据,这些数据可以进一步用于分析以获得有用的见解,例如识别导致某些事件结果的主要特征,或根据某些属性及其测量值对不同对象进行分类。
{"title":"Using Machine Learning to Improve Surgical Outcomes","authors":"Sindhura Bonthu, P. Armijo, Tiffany Tanner, Qiuming A. Zhu","doi":"10.1109/ICMLA.2019.00233","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00233","url":null,"abstract":"Predicting the severity of patient’s condition helps providing accurate clinical care. Mortality prediction is one of the challenges due to distinct characteristics of the patient’s data. It is a challenging problem to evaluate the patient’s data which is highly sparse, highly biased and imbalanced, and highly mixed. In this paper, we are focusing on processing large volumes of data using neural networks which can be further used for analysis to obtain useful insights, such as identifying the major features contributing to certain outcomes of events or classifying different objects based on the presences of certain attributes and their measurements.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128913143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mood Classification with Lyrics and ConvNets 用歌词和卷积神经网络进行情绪分类
Revanth Akella, Teng-Sheng Moh
The paper presents research outcomes of classifying music into moods and provides an end-to-end, open source pipeline for mood classification using lyrics. It explores techniques that classify music using audio features and lyrics using various natural language processing methods and machine learning. The paper performs a comparative study across different classification models and mood frameworks. The linguistic aspects of lyrics are explored and are used as features for classification methods to understand what model classifies mood in the most adequate manner. The results show how lyrics are a valuable information source for classification of music. Term-frequency/inverse-document frequency and word embeddings are explored to connect words to mood classes. Various machine learning and deep learning classifiers are tested across different arrangements of the mood labels. The paper demonstrates that models which learn from lyrics using current methods of natural language processing using deep learning demonstrate higher levels of accuracy. Our final model achieves an accuracy of 71%.
本文介绍了将音乐分类为情绪的研究成果,并提供了一个端到端的开源管道,用于使用歌词进行情绪分类。它探索了使用各种自然语言处理方法和机器学习的音频特征和歌词对音乐进行分类的技术。本文对不同的分类模型和情绪框架进行了比较研究。本文探讨了歌词的语言方面,并将其作为分类方法的特征,以了解哪种模型以最充分的方式对情绪进行分类。结果表明歌词是一个有价值的信息来源,为音乐分类。研究了词频/反文档频率和词嵌入来将词与情绪类联系起来。各种机器学习和深度学习分类器在不同的情绪标签安排下进行测试。本文表明,使用当前自然语言处理方法学习歌词的模型使用深度学习显示出更高的准确性。我们的最终模型达到了71%的准确率。
{"title":"Mood Classification with Lyrics and ConvNets","authors":"Revanth Akella, Teng-Sheng Moh","doi":"10.1109/ICMLA.2019.00095","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00095","url":null,"abstract":"The paper presents research outcomes of classifying music into moods and provides an end-to-end, open source pipeline for mood classification using lyrics. It explores techniques that classify music using audio features and lyrics using various natural language processing methods and machine learning. The paper performs a comparative study across different classification models and mood frameworks. The linguistic aspects of lyrics are explored and are used as features for classification methods to understand what model classifies mood in the most adequate manner. The results show how lyrics are a valuable information source for classification of music. Term-frequency/inverse-document frequency and word embeddings are explored to connect words to mood classes. Various machine learning and deep learning classifiers are tested across different arrangements of the mood labels. The paper demonstrates that models which learn from lyrics using current methods of natural language processing using deep learning demonstrate higher levels of accuracy. Our final model achieves an accuracy of 71%.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130691414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Classifying Humpback Whale Calls to Song and Non-Song Vocalizations using Bag of Words Descriptor on Acoustic Data 基于声学数据的词袋描述符对座头鲸鸣叫声和非鸣叫声进行分类
Hamed Mohebbi-Kalkhoran, Chenyang Zhu, Matthew Schinault, P. Ratilal
Humpback whale behavior, population distribution and structure can be inferred from long term underwater passive acoustic monitoring of their vocalizations. Here we develop automatic approaches for classifying humpback whale vocalizations into the two categories of song and non-song, employing machine learning techniques. The vocalization behavior of humpback whales was monitored over instantaneous vast areas of the Gulf of Maine using a large aperture coherent hydrophone array system via the passive ocean acoustic waveguide remote sensing technique over multiple diel cycles in Fall 2006. We use wavelet signal denoising and coherent array processing to enhance the signal-to-noise ratio. To build features vector for every time sequence of the beamformed signals, we employ Bag of Words approach to time-frequency features. Finally, we apply Support Vector Machine (SVM), Neural Networks, and Naive Bayes to classify the acoustic data and compare their performances. Best results are obtained using Mel Frequency Cepstrum Coefficient (MFCC) features and SVM which leads to 94% accuracy and 72.73% F1-score for humpback whale song versus non-song vocalization classification, showing effectiveness of the proposed approach for real-time classification at sea.
座头鲸的行为、种群分布和结构可以通过对其发声的长期水下被动声学监测来推断。在这里,我们开发了使用机器学习技术将座头鲸的发声分为歌曲和非歌曲两类的自动方法。2006年秋季,利用大孔径相干水听器阵列系统,通过多周期被动海声波导遥感技术,对缅因湾大面积区域的座头鲸发声行为进行了瞬时监测。采用小波信号去噪和相干阵列处理来提高信号的信噪比。为了为波束形成信号的每一个时间序列构建特征向量,我们采用了Bag of Words方法来处理时频特征。最后,我们应用支持向量机、神经网络和朴素贝叶斯对声学数据进行分类,并比较它们的性能。使用Mel Frequency倒频谱系数(MFCC)特征和SVM对座头鲸鸣声与非鸣声进行分类的准确率为94%,f1得分为72.73%,显示了该方法在海上实时分类中的有效性。
{"title":"Classifying Humpback Whale Calls to Song and Non-Song Vocalizations using Bag of Words Descriptor on Acoustic Data","authors":"Hamed Mohebbi-Kalkhoran, Chenyang Zhu, Matthew Schinault, P. Ratilal","doi":"10.1109/ICMLA.2019.00150","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00150","url":null,"abstract":"Humpback whale behavior, population distribution and structure can be inferred from long term underwater passive acoustic monitoring of their vocalizations. Here we develop automatic approaches for classifying humpback whale vocalizations into the two categories of song and non-song, employing machine learning techniques. The vocalization behavior of humpback whales was monitored over instantaneous vast areas of the Gulf of Maine using a large aperture coherent hydrophone array system via the passive ocean acoustic waveguide remote sensing technique over multiple diel cycles in Fall 2006. We use wavelet signal denoising and coherent array processing to enhance the signal-to-noise ratio. To build features vector for every time sequence of the beamformed signals, we employ Bag of Words approach to time-frequency features. Finally, we apply Support Vector Machine (SVM), Neural Networks, and Naive Bayes to classify the acoustic data and compare their performances. Best results are obtained using Mel Frequency Cepstrum Coefficient (MFCC) features and SVM which leads to 94% accuracy and 72.73% F1-score for humpback whale song versus non-song vocalization classification, showing effectiveness of the proposed approach for real-time classification at sea.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130761010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
AudiDoS: Real-Time Denial-of-Service Adversarial Attacks on Deep Audio Models AudiDoS:深度音频模型的实时拒绝服务对抗性攻击
Taesik Gong, Alberto Gil C. P. Ramos, S. Bhattacharya, Akhil Mathur, F. Kawsar
Deep learning has enabled personal and IoT devices to rethink microphones as a multi-purpose sensor for understanding conversation and the surrounding environment. This resulted in a proliferation of Voice Controllable Systems (VCS) around us. The increasing popularity of such systems is also prone to attracting miscreants, who often want to take advantage of the VCS without the knowledge of the user. Consequently, understanding the robustness of VCS, especially under adversarial attacks, has become an important research topic. Although there exists some previous work on audio adversarial attacks, their scopes are limited to embedding the attacks onto pre-recorded music clips, which when played through speakers cause VCS to misbehave. As an attack-audio needs to be played, the occurrence of this type of attacks can be suspected by a human listener. In this paper, we focus on audio-based Denial-of-Service (DoS) attack, which is unexplored in the literature. Contrary to previous work, we show that adversarial audio attacks in real-time and overthe-air are possible, while a user interacts with VCS. We show that the attacks are effective regardless of the user's command and interaction timings. In this paper, we present a first-of-itskind imperceptible and always-on universal audio perturbation technique that enables such DoS attack to be successful. We thoroughly evaluate the performance of the attacking scheme across (i) two learning tasks, (ii) two model architectures and (iii) three datasets. We demonstrate that the attack can introduce as high as 78% error rate in audio recognition tasks.
深度学习使个人和物联网设备能够重新思考麦克风作为理解对话和周围环境的多用途传感器。这导致了语音控制系统(VCS)在我们周围的扩散。这种系统的日益普及也容易吸引不法分子,他们经常想在用户不知情的情况下利用VCS。因此,了解VCS的鲁棒性,特别是在对抗性攻击下的鲁棒性,已成为一个重要的研究课题。虽然之前有一些关于音频对抗性攻击的研究,但它们的范围仅限于将攻击嵌入到预先录制的音乐片段中,当通过扬声器播放时,这些音乐片段会导致VCS行为失常。由于需要播放攻击音频,这种类型的攻击的发生可以被人类听众怀疑。在本文中,我们关注的是基于音频的拒绝服务(DoS)攻击,这在文献中是未被探索的。与之前的工作相反,我们表明,当用户与VCS交互时,实时和空中的对抗性音频攻击是可能的。我们表明,无论用户的命令和交互时间如何,攻击都是有效的。在本文中,我们提出了一种首创的难以察觉且始终在线的通用音频扰动技术,使此类DoS攻击能够成功。我们全面评估了攻击方案在(i)两个学习任务,(ii)两个模型架构和(iii)三个数据集上的性能。我们证明了这种攻击可以在音频识别任务中引入高达78%的错误率。
{"title":"AudiDoS: Real-Time Denial-of-Service Adversarial Attacks on Deep Audio Models","authors":"Taesik Gong, Alberto Gil C. P. Ramos, S. Bhattacharya, Akhil Mathur, F. Kawsar","doi":"10.1109/ICMLA.2019.00167","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00167","url":null,"abstract":"Deep learning has enabled personal and IoT devices to rethink microphones as a multi-purpose sensor for understanding conversation and the surrounding environment. This resulted in a proliferation of Voice Controllable Systems (VCS) around us. The increasing popularity of such systems is also prone to attracting miscreants, who often want to take advantage of the VCS without the knowledge of the user. Consequently, understanding the robustness of VCS, especially under adversarial attacks, has become an important research topic. Although there exists some previous work on audio adversarial attacks, their scopes are limited to embedding the attacks onto pre-recorded music clips, which when played through speakers cause VCS to misbehave. As an attack-audio needs to be played, the occurrence of this type of attacks can be suspected by a human listener. In this paper, we focus on audio-based Denial-of-Service (DoS) attack, which is unexplored in the literature. Contrary to previous work, we show that adversarial audio attacks in real-time and overthe-air are possible, while a user interacts with VCS. We show that the attacks are effective regardless of the user's command and interaction timings. In this paper, we present a first-of-itskind imperceptible and always-on universal audio perturbation technique that enables such DoS attack to be successful. We thoroughly evaluate the performance of the attacking scheme across (i) two learning tasks, (ii) two model architectures and (iii) three datasets. We demonstrate that the attack can introduce as high as 78% error rate in audio recognition tasks.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127925334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predictive Analytics and Statistical Learning for Waterflooding Operations in Reservoir Simulations 油藏模拟水驱作业的预测分析和统计学习
X. Liao, M. Tyagi
Recent improvements in technology and computational power have increased interest in the application of data driven modeling (DDM) in petroleum industry. Recovery process evaluation using numerical reservoir simulators are always time consuming and computational intensive with many assumptions and uncertainty involved and inefficient for fast decision making. Thus, DDM have been adopted as an alternative tool to predict production performance under waterflooding which is one of the most important techniques for improving oil recovery. A synthetic waterflooding dataset including production profile, operational parameters, reservoir properties and well locations is constructed using the numerical reservoir simulator. Exploratory data analysis provides several insights into the non-intuitive factors in building the reservoir model. K-means clustering analysis is performed to identify internal groupings among producers. Artificial neural network (ANN) and support vector regression (SVR) are used to decipher the nonlinear relationships between input attributes and waterflooding production. The trained models are subsequently used to predict cumulative oil and watercut on the unseen samples. Clustering analysis reveal that distance to the free water level has a dominant effect and the clustering assignment is controlled by the interplay among input attributes characterizing reservoir properties and relative well locations. Good agreements between predicted outputs from models and simulation targets present the satisfactory generalization performance and predictive capabilities of ANN and SVR methods. ANN model with one output provides the most accurate prediction result on the test data. SVR models provide similar but slightly worse forecast than ANN models. Proposed methodologies in this work can be utilized as a surrogate or complementary model to analyze and predict recovery process in other reservoirs fast and efficiently.
近年来,随着技术和计算能力的不断提高,数据驱动建模(DDM)在石油工业中的应用日益受到关注。利用油藏数值模拟进行采收率过程评价,通常耗时且计算量大,涉及许多假设和不确定性,不利于快速决策。因此,DDM已成为预测水驱生产动态的替代工具,是提高采收率的最重要技术之一。利用数值油藏模拟器构建了包括生产剖面、作业参数、油藏性质和井位在内的综合水驱数据集。探索性数据分析为建立储层模型提供了一些非直观因素的见解。k -均值聚类分析用于识别生产者之间的内部分组。采用人工神经网络(ANN)和支持向量回归(SVR)来解析输入属性与注水产量之间的非线性关系。经过训练的模型随后用于预测未见样品上的累积含油和含水。聚类分析表明,到自由水位的距离对聚类分配具有主导作用,聚类分配受表征储层性质的输入属性与相对井位的相互作用控制。模型的预测输出与仿真目标之间的良好一致性表明了ANN和SVR方法令人满意的泛化性能和预测能力。具有一个输出的神经网络模型在测试数据上提供了最准确的预测结果。SVR模型提供了与人工神经网络模型相似但略差的预测。本文提出的方法可以作为替代或补充模型,快速有效地分析和预测其他油藏的采收率过程。
{"title":"Predictive Analytics and Statistical Learning for Waterflooding Operations in Reservoir Simulations","authors":"X. Liao, M. Tyagi","doi":"10.1109/ICMLA.2019.00249","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00249","url":null,"abstract":"Recent improvements in technology and computational power have increased interest in the application of data driven modeling (DDM) in petroleum industry. Recovery process evaluation using numerical reservoir simulators are always time consuming and computational intensive with many assumptions and uncertainty involved and inefficient for fast decision making. Thus, DDM have been adopted as an alternative tool to predict production performance under waterflooding which is one of the most important techniques for improving oil recovery. A synthetic waterflooding dataset including production profile, operational parameters, reservoir properties and well locations is constructed using the numerical reservoir simulator. Exploratory data analysis provides several insights into the non-intuitive factors in building the reservoir model. K-means clustering analysis is performed to identify internal groupings among producers. Artificial neural network (ANN) and support vector regression (SVR) are used to decipher the nonlinear relationships between input attributes and waterflooding production. The trained models are subsequently used to predict cumulative oil and watercut on the unseen samples. Clustering analysis reveal that distance to the free water level has a dominant effect and the clustering assignment is controlled by the interplay among input attributes characterizing reservoir properties and relative well locations. Good agreements between predicted outputs from models and simulation targets present the satisfactory generalization performance and predictive capabilities of ANN and SVR methods. ANN model with one output provides the most accurate prediction result on the test data. SVR models provide similar but slightly worse forecast than ANN models. Proposed methodologies in this work can be utilized as a surrogate or complementary model to analyze and predict recovery process in other reservoirs fast and efficiently.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131306008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal Ensembles for Deep Learning Classification: Theory and Practice 深度学习分类的最佳集成:理论与实践
Wenjing Li, R. Paffenroth
Ensemble methods for classification problems construct a set of models, often called "learners", and then assign class labels to new data points by taking a combination of the predictions from these models. Ensemble methods are popular and used in a wide range of problem domains because of their good performance. However, a theoretical understanding of the optimality of ensembles is, in many instances, an open problem. In particular, improving the performance of an ensemble requires an understanding of the subtle interplay between the accuracy of the individual learners and the diversity of the learners in the ensemble. For example, if all of the learners in an ensemble were identical, then clearly the accuracy of the ensemble cannot be any better than the accuracy of the individual learning, no matter how many learners one were to use. Accordingly, here we develop a theory for understanding when ensembles are optimal, in an appropriate sense, by balancing individual accuracy against ensemble diversity, from the perspective of statistical correlations. The theory that we derive is applicable for many practical ensembles, and we provide a set of metrics for assessing the optimality of any given ensemble. Perhaps most interestingly, the metrics that we develop lead naturally to a set of novel loss functions that can be optimized using backpropagation giving rise to optimal deep neural network based ensembles. We demonstrate the effectiveness of these deep neural network based ensembles using standard benchmark data sets.
用于分类问题的集成方法构建一组模型,通常称为“学习者”,然后通过从这些模型中获得预测的组合为新的数据点分配类标签。集成方法因其良好的性能而广泛应用于各种问题领域。然而,在许多情况下,对集成的最优性的理论理解是一个开放的问题。特别是,提高合奏的性能需要理解单个学习器的准确性和合奏中学习器的多样性之间微妙的相互作用。例如,如果集合中的所有学习器都是相同的,那么很明显,无论使用多少个学习器,集合的准确性都不会比单个学习的准确性好。因此,从统计相关性的角度来看,我们通过平衡个体准确性和集合多样性,在适当的意义上,开发了一种理论来理解何时集合是最佳的。我们推导的理论适用于许多实际的集成,并且我们提供了一组度量来评估任何给定集成的最优性。也许最有趣的是,我们开发的指标自然会导致一组新的损失函数,这些损失函数可以使用反向传播进行优化,从而产生最佳的基于深度神经网络的集成。我们使用标准基准数据集证明了这些基于深度神经网络的集成的有效性。
{"title":"Optimal Ensembles for Deep Learning Classification: Theory and Practice","authors":"Wenjing Li, R. Paffenroth","doi":"10.1109/ICMLA.2019.00271","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00271","url":null,"abstract":"Ensemble methods for classification problems construct a set of models, often called \"learners\", and then assign class labels to new data points by taking a combination of the predictions from these models. Ensemble methods are popular and used in a wide range of problem domains because of their good performance. However, a theoretical understanding of the optimality of ensembles is, in many instances, an open problem. In particular, improving the performance of an ensemble requires an understanding of the subtle interplay between the accuracy of the individual learners and the diversity of the learners in the ensemble. For example, if all of the learners in an ensemble were identical, then clearly the accuracy of the ensemble cannot be any better than the accuracy of the individual learning, no matter how many learners one were to use. Accordingly, here we develop a theory for understanding when ensembles are optimal, in an appropriate sense, by balancing individual accuracy against ensemble diversity, from the perspective of statistical correlations. The theory that we derive is applicable for many practical ensembles, and we provide a set of metrics for assessing the optimality of any given ensemble. Perhaps most interestingly, the metrics that we develop lead naturally to a set of novel loss functions that can be optimized using backpropagation giving rise to optimal deep neural network based ensembles. We demonstrate the effectiveness of these deep neural network based ensembles using standard benchmark data sets.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129229914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of Deep Learning for Semantic Image Segmentation in Tool Condition Monitoring 刀具状态监测中语义图像分割的深度学习评价
Benjamin Lutz, Dominik Kißkalt, Daniel Regulin, Raven T. Reisch, A. Schiffler, J. Franke
Tool wear is one of the main factors of manufacturing costs in subtractive manufacturing processes. To control manufacturing processes while taking the tool wear into account, a variety of tool condition monitoring systems have been investigated. In this paper, we present a new approach to support the manual analysis of tool wear images by the means of semantic image segmentation. We utilize deep learning for image evaluation through semantic classification of different defect regions. In this study, a small-sized dataset of 100 cutting tool inserts at different tool conditions, exhibiting various wear defects, is acquired and masked by a process expert. A sliding window approach is used to extract small size feature maps from the raw images, with the class of the center pixel as the label. The relationship between the features and the label is trained using a convolutional neural network. Our investigation shows that this network can predict the wear defect class of each pixel with an accuracy of over 91%. Compared to other approaches, the proposed solution can differentiate between various defect types, for instance, flank wear, groove formation and build-up-edge. From the resulting segmented image, different wear metrics are computed, such as the maximum flank wear width or the occurrence and size of other wear defects. This information is fed back to the machine operator to support the decision process of whether to continue machining, adapt the cutting conditions or exchange the insert.
刀具磨损是减法制造过程中影响制造成本的主要因素之一。为了在考虑刀具磨损的情况下控制制造过程,人们研究了各种刀具状态监测系统。本文提出了一种基于语义图像分割的刀具磨损图像人工分析方法。我们通过对不同缺陷区域的语义分类,利用深度学习对图像进行评估。在这项研究中,一个由100个刀具刀片组成的小型数据集在不同的刀具条件下,表现出各种磨损缺陷,由工艺专家获得并掩盖。使用滑动窗口方法从原始图像中提取小尺寸的特征映射,以中心像素的类作为标签。使用卷积神经网络训练特征和标签之间的关系。我们的研究表明,该网络可以预测每个像素的磨损缺陷类别,准确率超过91%。与其他方法相比,所提出的解决方案可以区分各种缺陷类型,例如,侧面磨损,沟槽形成和堆积边缘。从得到的分割图像中,计算不同的磨损指标,如最大侧面磨损宽度或其他磨损缺陷的发生和大小。这些信息被反馈给机床操作员,以支持是否继续加工、调整切削条件或更换刀片的决策过程。
{"title":"Evaluation of Deep Learning for Semantic Image Segmentation in Tool Condition Monitoring","authors":"Benjamin Lutz, Dominik Kißkalt, Daniel Regulin, Raven T. Reisch, A. Schiffler, J. Franke","doi":"10.1109/ICMLA.2019.00321","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00321","url":null,"abstract":"Tool wear is one of the main factors of manufacturing costs in subtractive manufacturing processes. To control manufacturing processes while taking the tool wear into account, a variety of tool condition monitoring systems have been investigated. In this paper, we present a new approach to support the manual analysis of tool wear images by the means of semantic image segmentation. We utilize deep learning for image evaluation through semantic classification of different defect regions. In this study, a small-sized dataset of 100 cutting tool inserts at different tool conditions, exhibiting various wear defects, is acquired and masked by a process expert. A sliding window approach is used to extract small size feature maps from the raw images, with the class of the center pixel as the label. The relationship between the features and the label is trained using a convolutional neural network. Our investigation shows that this network can predict the wear defect class of each pixel with an accuracy of over 91%. Compared to other approaches, the proposed solution can differentiate between various defect types, for instance, flank wear, groove formation and build-up-edge. From the resulting segmented image, different wear metrics are computed, such as the maximum flank wear width or the occurrence and size of other wear defects. This information is fed back to the machine operator to support the decision process of whether to continue machining, adapt the cutting conditions or exchange the insert.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125537248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Time Series Anomaly Detection from a Markov Chain Perspective 基于马尔可夫链的时间序列异常检测
Iman Vasheghani Farahani, Alex Chien, R. King, M. Kay, Brad Klenz
This paper introduces a new method for the pattern-wise anomaly detection problem, which aims to find segments whose behaviors are different from the rest of the segments in the time series (as opposed to finding a single data-point in classic anomaly detection problems). An important motivation for studying this problem is to find anomalies whose data-points are within the normal range but they create an unusual pattern. To this end, normal characteristics of the data are found by clustering the overlapping subsequences of the training dataset and analyzing their orders by Markov chains. The trained model is used to assess how well the testing dataset suits the baseline behavior. The designed anomaly detection framework is capable of discovering unusual patterns in both streaming data (online) and stored data (offline). The performance of the methodology is evaluated by applying it to three datasets from different fields: a medical dataset (electrocardiogram), a utility usage dataset, and a New York City taxi demand dataset. The detected anomaly in the medical data agrees with the results of the studies in the literature. A domain expert confirmed the accuracy of the results for the utility usage data, and the anomalies of the New York City taxi demand data referred to major US holidays.
本文介绍了一种基于模式的异常检测问题的新方法,该方法旨在寻找行为与时间序列中其他部分不同的片段(而不是在经典的异常检测问题中寻找单个数据点)。研究这一问题的一个重要动机是发现数据点在正常范围内但却产生异常模式的异常。为此,通过对训练数据集的重叠子序列进行聚类并通过马尔可夫链分析其顺序来发现数据的正常特征。训练后的模型用于评估测试数据集与基线行为的匹配程度。所设计的异常检测框架能够发现流数据(在线)和存储数据(离线)中的异常模式。通过将该方法应用于来自不同领域的三个数据集来评估该方法的性能:医疗数据集(心电图),公用事业使用数据集和纽约市出租车需求数据集。在医学数据中检测到的异常与文献研究的结果一致。一位领域专家证实了公用事业使用数据结果的准确性,而纽约市出租车需求数据的异常与美国主要假日有关。
{"title":"Time Series Anomaly Detection from a Markov Chain Perspective","authors":"Iman Vasheghani Farahani, Alex Chien, R. King, M. Kay, Brad Klenz","doi":"10.1109/ICMLA.2019.00170","DOIUrl":"https://doi.org/10.1109/ICMLA.2019.00170","url":null,"abstract":"This paper introduces a new method for the pattern-wise anomaly detection problem, which aims to find segments whose behaviors are different from the rest of the segments in the time series (as opposed to finding a single data-point in classic anomaly detection problems). An important motivation for studying this problem is to find anomalies whose data-points are within the normal range but they create an unusual pattern. To this end, normal characteristics of the data are found by clustering the overlapping subsequences of the training dataset and analyzing their orders by Markov chains. The trained model is used to assess how well the testing dataset suits the baseline behavior. The designed anomaly detection framework is capable of discovering unusual patterns in both streaming data (online) and stored data (offline). The performance of the methodology is evaluated by applying it to three datasets from different fields: a medical dataset (electrocardiogram), a utility usage dataset, and a New York City taxi demand dataset. The detected anomaly in the medical data agrees with the results of the studies in the literature. A domain expert confirmed the accuracy of the results for the utility usage data, and the anomalies of the New York City taxi demand data referred to major US holidays.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114209494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1