首页 > 最新文献

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Conformal Prediction Using Random Survival Forests 使用随机生存森林的适形预测
Henrik Boström, L. Asker, R. Gurung, Isak Karlsson, Tony Lindgren, P. Papapetrou
Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.
随机生存森林构成了一种强大的生存建模方法,即预测事件在给定时间点之前或在给定时间点发生的概率。与大多数标准预测模型类似,此模型不保证预测误差,而是通常进行经验评估。共形预测是一个相当新的框架,它允许模型的误差由用户指定的置信度来确定,这是通过考虑集合预测而不是点预测来实现的。这个框架,已经应用于一些最流行的分类和回归技术,在这里第一次应用于生存建模,通过随机生存森林。提出了一项实证调查,其中该技术对来自两个现实世界应用的数据集进行了评估;使用操作数据预测卡车部件故障,并根据行政保健数据预测心力衰竭患者的生存和治疗。实验结果表明,在共形预测框架的保证下,误差水平确实非常接近所提供的置信水平,并且预测每个结果(即事件或无事件)的误差可以单独控制。然而,后者可能导致信息较少的预测,即,在类别分布严重不平衡的情况下,更大的预测集。
{"title":"Conformal Prediction Using Random Survival Forests","authors":"Henrik Boström, L. Asker, R. Gurung, Isak Karlsson, Tony Lindgren, P. Papapetrou","doi":"10.1109/ICMLA.2017.00-57","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-57","url":null,"abstract":"Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"96 1","pages":"812-817"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85230241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Predicting Hotel Bookings Cancellation with a Machine Learning Classification Model 用机器学习分类模型预测酒店预订取消
N. António, Ana de Almeida, Luís Nunes
Booking cancellations have significant impact on demand-management decisions in the hospitality industry. To mitigate the effect of cancellations, hotels implement rigid cancellation policies and overbooking tactics, which in turn can have a negative impact on revenue and on the hotel reputation. To reduce this impact, a machine learning based system prototype was developed. It makes use of the hotel’s Property Management Systems data and trains a classification model every day to predict which bookings are “likely to cancel” and with that calculate net demand. This prototype, deployed in a production environment in two hotels, by enforcing A/B testing, also enables the measurement of the impact of actions taken to act upon bookings predicted as “likely to cancel”. Results indicate good prototype performance and provide important indications for research progress whilst evidencing that bookings contacted by hotels cancel less than bookings not contacted.
预订取消对酒店业的需求管理决策有重大影响。为了减轻取消的影响,酒店实施严格的取消政策和超额预订策略,这反过来会对收入和酒店声誉产生负面影响。为了减少这种影响,开发了一个基于机器学习的系统原型。它利用酒店的物业管理系统数据,每天训练一个分类模型来预测哪些预订“可能被取消”,并以此计算净需求。该原型部署在两家酒店的生产环境中,通过执行a /B测试,还可以测量对预测为“可能取消”的预订所采取的行动的影响。结果表明,原型性能良好,为研究进展提供了重要的指示,同时证明酒店联系预订的取消次数少于未联系预订的取消次数。
{"title":"Predicting Hotel Bookings Cancellation with a Machine Learning Classification Model","authors":"N. António, Ana de Almeida, Luís Nunes","doi":"10.1109/ICMLA.2017.00-11","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-11","url":null,"abstract":"Booking cancellations have significant impact on demand-management decisions in the hospitality industry. To mitigate the effect of cancellations, hotels implement rigid cancellation policies and overbooking tactics, which in turn can have a negative impact on revenue and on the hotel reputation. To reduce this impact, a machine learning based system prototype was developed. It makes use of the hotel’s Property Management Systems data and trains a classification model every day to predict which bookings are “likely to cancel” and with that calculate net demand. This prototype, deployed in a production environment in two hotels, by enforcing A/B testing, also enables the measurement of the impact of actions taken to act upon bookings predicted as “likely to cancel”. Results indicate good prototype performance and provide important indications for research progress whilst evidencing that bookings contacted by hotels cancel less than bookings not contacted.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"23 1","pages":"1049-1054"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84690681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Modeling Over-Dispersion for Network Data Clustering 网络数据聚类的过分散建模
Lu Wang, D. Zhu, Ming Dong, Yan Li
Over-dispersed network data mining has emerged as a central theme in data science, evident by a sharp increase in the volume of real-world network data with imbalanced clusters.While most of existing clustering methods are designed for discovering the number of clusters and class specific connectivity patterns, few methods are available to uncover the imbalanced clusters,commonly existing in network communities and image segments.In this paper, we propose a generalized probabilistic modeling framework,SizeConnectivity, to estimate over-dispersed cluster size distribution together with class specific connectivity patterns from network data.We performed extensive synthetic and real-world experiments on clustering social network data and image data for detecting network communities and image segments.Our results demonstrate a superior performance of our SizeConnectivity clustering method in recovering the hidden structure of network data via modeling over-dispersion.
过度分散的网络数据挖掘已经成为数据科学的一个中心主题,这可以从具有不平衡集群的真实网络数据量的急剧增加中看出。虽然现有的聚类方法大多是为了发现簇的数量和类特定的连接模式而设计的,但很少有方法可以发现不平衡簇,这种不平衡簇通常存在于网络社区和图像段中。在本文中,我们提出了一个广义的概率建模框架,SizeConnectivity,以估计过度分散的簇大小分布以及来自网络数据的类特定连接模式。我们对聚类社交网络数据和图像数据进行了广泛的合成和现实世界的实验,以检测网络社区和图像片段。我们的结果表明,我们的SizeConnectivity聚类方法在通过建模过度分散来恢复网络数据的隐藏结构方面具有优越的性能。
{"title":"Modeling Over-Dispersion for Network Data Clustering","authors":"Lu Wang, D. Zhu, Ming Dong, Yan Li","doi":"10.1109/ICMLA.2017.0-180","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-180","url":null,"abstract":"Over-dispersed network data mining has emerged as a central theme in data science, evident by a sharp increase in the volume of real-world network data with imbalanced clusters.While most of existing clustering methods are designed for discovering the number of clusters and class specific connectivity patterns, few methods are available to uncover the imbalanced clusters,commonly existing in network communities and image segments.In this paper, we propose a generalized probabilistic modeling framework,SizeConnectivity, to estimate over-dispersed cluster size distribution together with class specific connectivity patterns from network data.We performed extensive synthetic and real-world experiments on clustering social network data and image data for detecting network communities and image segments.Our results demonstrate a superior performance of our SizeConnectivity clustering method in recovering the hidden structure of network data via modeling over-dispersion.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"55 1","pages":"42-49"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87848311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MapReduce Based Classification for Fault Detection in Big Data Applications 基于MapReduce分类的大数据故障检测
M. O. Shafiq, Maryam Fekri, Rami Ibrahim
Recently emerging software applications are large, complex, distributed and data-intensive, i.e., big data applications. That makes the monitoring of such applications a challenging task due to lack of standards and techniques for modeling and analysis of execution data (i.e., logs) produced by such applications. Another challenge imposed by big data applications is that the execution data produced by such applications also has high volume, velocity, variety, and require high veracity, value. In this paper, we present our monitoring solution that performs real-time fault detection in big data applications. Our solution is two-fold. First, we prescribe a standard model for structuring execution logs. Second, we prescribe a Bayesian classification based analysis solution that is MapReduce compliant, distributed, parallel, single pass and incremental. That makes it possible for our proposed solution to be deployed and executed on cloud computing platforms to process logs produced by big data applications. We have carried out complexity, scalability, and usability analysis of our proposed solution that how efficiently and effectively it can perform fault detection in big data applications.
最近新兴的软件应用是大型、复杂、分布式和数据密集型的应用,即大数据应用。由于缺乏对此类应用程序产生的执行数据(即日志)进行建模和分析的标准和技术,这使得监视此类应用程序成为一项具有挑战性的任务。大数据应用带来的另一个挑战是,这些应用产生的执行数据也具有高容量、高速度、高多样性,并且需要高准确性、高价值。在本文中,我们提出了在大数据应用中进行实时故障检测的监控解决方案。我们的解决方案是双重的。首先,我们为构建执行日志规定了一个标准模型。其次,我们规定了一个基于贝叶斯分类的分析解决方案,该解决方案符合MapReduce,分布式,并行,单遍和增量。这使得我们提出的解决方案可以在云计算平台上部署和执行,以处理大数据应用程序产生的日志。我们对我们提出的解决方案进行了复杂性、可扩展性和可用性分析,以了解它在大数据应用中执行故障检测的效率和有效性。
{"title":"MapReduce Based Classification for Fault Detection in Big Data Applications","authors":"M. O. Shafiq, Maryam Fekri, Rami Ibrahim","doi":"10.1109/ICMLA.2017.00-89","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-89","url":null,"abstract":"Recently emerging software applications are large, complex, distributed and data-intensive, i.e., big data applications. That makes the monitoring of such applications a challenging task due to lack of standards and techniques for modeling and analysis of execution data (i.e., logs) produced by such applications. Another challenge imposed by big data applications is that the execution data produced by such applications also has high volume, velocity, variety, and require high veracity, value. In this paper, we present our monitoring solution that performs real-time fault detection in big data applications. Our solution is two-fold. First, we prescribe a standard model for structuring execution logs. Second, we prescribe a Bayesian classification based analysis solution that is MapReduce compliant, distributed, parallel, single pass and incremental. That makes it possible for our proposed solution to be deployed and executed on cloud computing platforms to process logs produced by big data applications. We have carried out complexity, scalability, and usability analysis of our proposed solution that how efficiently and effectively it can perform fault detection in big data applications.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"87 1","pages":"637-642"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85884154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Comparing Transfer Learning and Traditional Learning Under Domain Class Imbalance 领域类不平衡下迁移学习与传统学习的比较
Karl R. Weiss, T. Khoshgoftaar
Transfer learning is a subclass of machine learning, which uses training data (source) drawn from a different domain than that of the testing data (target). A transfer learning environment is characterized by the unavailability of labeled data from the target domain, due to data being rare or too expensive to obtain. However, there exists abundant labeled data from a different, but similar domain. These two domains are likely to have different distribution characteristics. Transfer learning algorithms attempt to align the distribution characteristics of the source and target domains to create high-performance classifiers. This paper provides comparative performance analysis between stateof- the-art transfer learning algorithms and traditional machine learning algorithms under the domain class imbalance condition. The domain class imbalance condition is characterized by the source and target domains having different class probabilities, which can create marginal distribution differences between the source and target data. Statistical analysis is provided to show the significance of the results.
迁移学习是机器学习的一个子类,它使用来自不同领域的训练数据(源),而不是测试数据(目标)。迁移学习环境的特点是无法获得目标领域的标记数据,因为数据很少或获取成本太高。然而,存在大量来自不同但相似的领域的标记数据。这两个域可能具有不同的分布特征。迁移学习算法试图对齐源域和目标域的分布特征,以创建高性能分类器。在领域类不平衡的情况下,对最先进的迁移学习算法和传统的机器学习算法进行了性能比较分析。域类不平衡情况的特征是源域和目标域具有不同的类概率,这会使源数据和目标数据之间产生边际分布差异。统计分析显示了结果的显著性。
{"title":"Comparing Transfer Learning and Traditional Learning Under Domain Class Imbalance","authors":"Karl R. Weiss, T. Khoshgoftaar","doi":"10.1109/ICMLA.2017.0-138","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-138","url":null,"abstract":"Transfer learning is a subclass of machine learning, which uses training data (source) drawn from a different domain than that of the testing data (target). A transfer learning environment is characterized by the unavailability of labeled data from the target domain, due to data being rare or too expensive to obtain. However, there exists abundant labeled data from a different, but similar domain. These two domains are likely to have different distribution characteristics. Transfer learning algorithms attempt to align the distribution characteristics of the source and target domains to create high-performance classifiers. This paper provides comparative performance analysis between stateof- the-art transfer learning algorithms and traditional machine learning algorithms under the domain class imbalance condition. The domain class imbalance condition is characterized by the source and target domains having different class probabilities, which can create marginal distribution differences between the source and target data. Statistical analysis is provided to show the significance of the results.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"85 1","pages":"337-343"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76173440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Feature Extraction and K-means Clustering Approach to Explore Important Features of Urban Identity 特征提取和k均值聚类方法探索城市身份的重要特征
Mei-Chih Chang, Peter Bus, G. Schmitt
Public spaces play an important role in the processes of formation, generation and change of urban identity. Under present day conditions, the identities of cities are rapidly deteriorating and vanishing. Therefore, the importance of urban design, which is a means of designing urban spaces and their physical and social aspects, is ever growing. This paper proposes a novel methodology by using Principle Component Analysis (PCA) and K-means clustering approach to find important features of the urban identity from public space. K. Lynch’s work and Space Syntax theory are reconstructed and integrated with POI (Points of Interest) to quantify the quality of the public space. A case study of Zürich city is used to test of these redefinitions and features of urban identity. The results show that PCA and K-means clustering approach can identify the urban identity and explore important features. This strategy could help to improve planning and design processes and generation of new urban patterns with more appropriate features and qualities.
公共空间在城市身份的形成、生成和变化过程中发挥着重要作用。在目前的条件下,城市的特征正在迅速恶化和消失。因此,作为设计城市空间及其物理和社会方面的一种手段,城市设计的重要性日益增加。本文提出了一种利用主成分分析(PCA)和K-means聚类方法从公共空间中发现城市身份的重要特征的新方法。重建林奇的作品和空间句法理论,并与POI (point of Interest)相结合,量化公共空间的质量。本文以浙江富裕城市为例,对城市身份的重新定义和特征进行了检验。结果表明,主成分分析和k -均值聚类方法可以识别城市特征,挖掘重要特征。这一战略有助于改进规划和设计过程,并产生具有更适当特点和品质的新城市格局。
{"title":"Feature Extraction and K-means Clustering Approach to Explore Important Features of Urban Identity","authors":"Mei-Chih Chang, Peter Bus, G. Schmitt","doi":"10.1109/ICMLA.2017.00015","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00015","url":null,"abstract":"Public spaces play an important role in the processes of formation, generation and change of urban identity. Under present day conditions, the identities of cities are rapidly deteriorating and vanishing. Therefore, the importance of urban design, which is a means of designing urban spaces and their physical and social aspects, is ever growing. This paper proposes a novel methodology by using Principle Component Analysis (PCA) and K-means clustering approach to find important features of the urban identity from public space. K. Lynch’s work and Space Syntax theory are reconstructed and integrated with POI (Points of Interest) to quantify the quality of the public space. A case study of Zürich city is used to test of these redefinitions and features of urban identity. The results show that PCA and K-means clustering approach can identify the urban identity and explore important features. This strategy could help to improve planning and design processes and generation of new urban patterns with more appropriate features and qualities.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"21 1","pages":"1139-1144"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80071373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Histogram-Based Asymmetric Relabeling for Learning from Only Positive and Unlabeled Data 基于直方图的非对称重标注学习方法
Tom Arjannikov, G. Tzanetakis
In this paper, we demonstrate how to use asymmetric data relabeling based on feature histograms as a pre-processing step for improving the overall classification performance of different classifiers in situations when only positive and unlabeled data is available. Additionally, this strategy can be used to identify with some level of confidence those data instances that should probably be labeled as positive. Moreover, this approach can be adapted to assess the quality of a given dataset, in terms of how many positive instances are not labeled. We examine our approach using synthetic data and demonstrate its applicability using real, publicly available data.
在本文中,我们演示了如何使用基于特征直方图的非对称数据重标注作为预处理步骤,以提高不同分类器在只有阳性和未标记数据可用的情况下的整体分类性能。此外,该策略可用于在一定程度上确定那些可能应该标记为积极的数据实例。此外,这种方法可以用于评估给定数据集的质量,即有多少正面实例没有被标记。我们使用合成数据来检验我们的方法,并使用真实的、公开的数据来证明它的适用性。
{"title":"Histogram-Based Asymmetric Relabeling for Learning from Only Positive and Unlabeled Data","authors":"Tom Arjannikov, G. Tzanetakis","doi":"10.1109/ICMLA.2017.000-8","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.000-8","url":null,"abstract":"In this paper, we demonstrate how to use asymmetric data relabeling based on feature histograms as a pre-processing step for improving the overall classification performance of different classifiers in situations when only positive and unlabeled data is available. Additionally, this strategy can be used to identify with some level of confidence those data instances that should probably be labeled as positive. Moreover, this approach can be adapted to assess the quality of a given dataset, in terms of how many positive instances are not labeled. We examine our approach using synthetic data and demonstrate its applicability using real, publicly available data.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"21 1","pages":"1065-1070"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77118183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-phase Parallel Learning to Identify Similar Structures Among Relational Databases 两阶段并行学习识别关系数据库中的相似结构
Debora G. Reis, Rommel N. Carvalho, Ricardo Silva Carvalho, M. Ladeira
The need for efficient techniques for dealing with large databases increases as the number of large databases grows. We propose a new two-phase parallel learning approach to identify similar structures of relational databases fast. Each phase represents a level of relational metadata aggregation. To test the approach, we realized an experiment in with several large databases of Ministry of Social Development of Brazil to classify which relational database have a similar structure of tables and columns, based on its metadata. The measure of similarity considered Levenshtein and cosine. Generalized Linear Model, Random Forest, and Gradient Boost Machines (GBM) techniques are applied to develop the model. Each model was executed in sequential and parallel processing and had performance compared. As results, the parallel execution of GBM was at least ten times faster than the sequential processing. The results encourage further applications of the propositional parallel learning in relational databases.
随着大型数据库数量的增加,对处理大型数据库的有效技术的需求也在增加。本文提出了一种新的两阶段并行学习方法来快速识别关系数据库的相似结构。每个阶段表示一个级别的关系元数据聚合。为了测试该方法,我们对巴西社会发展部的几个大型数据库进行了实验,根据其元数据对具有相似表和列结构的关系数据库进行分类。相似性的度量考虑了Levenshtein和余弦。采用广义线性模型、随机森林和梯度增强机(GBM)技术来建立模型。每个模型分别以顺序处理和并行处理的方式执行,并进行了性能比较。因此,GBM的并行执行速度至少比顺序处理快10倍。研究结果鼓励了命题并行学习在关系数据库中的进一步应用。
{"title":"Two-phase Parallel Learning to Identify Similar Structures Among Relational Databases","authors":"Debora G. Reis, Rommel N. Carvalho, Ricardo Silva Carvalho, M. Ladeira","doi":"10.1109/ICMLA.2017.00-17","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-17","url":null,"abstract":"The need for efficient techniques for dealing with large databases increases as the number of large databases grows. We propose a new two-phase parallel learning approach to identify similar structures of relational databases fast. Each phase represents a level of relational metadata aggregation. To test the approach, we realized an experiment in with several large databases of Ministry of Social Development of Brazil to classify which relational database have a similar structure of tables and columns, based on its metadata. The measure of similarity considered Levenshtein and cosine. Generalized Linear Model, Random Forest, and Gradient Boost Machines (GBM) techniques are applied to develop the model. Each model was executed in sequential and parallel processing and had performance compared. As results, the parallel execution of GBM was at least ten times faster than the sequential processing. The results encourage further applications of the propositional parallel learning in relational databases.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"458 1","pages":"1020-1023"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77045276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Noise Prediction and Time-Domain Subtraction Approach to Deep Neural Network Based Speech Enhancement 基于深度神经网络的语音增强噪声预测和时域减法
B. O. Odelowo, David V. Anderson
Deep neural networks (DNNs) have recently been successfully applied to the speech enhancement task; however, the low signal-to-noise ratio (SNR) performance of DNN-based speech enhancement systems remains less than desirable. In this paper, we study an approach to DNN-based speech enhancement based on noise prediction. Three speech enhancement models based on noise prediction are proposed, and their performance is compared to that of conventional spectral-mapping models in seen and unseen noise tests. Objective test results show that the proposed noise prediction models perform well in enhancing speech quality in seen noise conditions and in enhancing high SNR speech signals. They also perform well in enhancing speech intelligibility in both seen and unseen noise conditions, but do not outperform the conventional models on quality metrics in unseen noise conditions. Further analysis of the enhanced speech signals is undertaken to explain the observed results.
深度神经网络(dnn)最近已成功应用于语音增强任务;然而,基于dnn的语音增强系统的低信噪比(SNR)性能仍然不太理想。本文研究了一种基于噪声预测的基于dnn的语音增强方法。提出了三种基于噪声预测的语音增强模型,并在可见噪声和不可见噪声测试中与传统频谱映射模型的性能进行了比较。客观测试结果表明,所提出的噪声预测模型能够很好地提高可见噪声条件下的语音质量,并对高信噪比语音信号进行增强。在可见噪声和不可见噪声条件下,它们在提高语音清晰度方面也表现良好,但在不可见噪声条件下,它们在质量指标上的表现并不优于传统模型。对增强的语音信号进行进一步分析以解释观察到的结果。
{"title":"A Noise Prediction and Time-Domain Subtraction Approach to Deep Neural Network Based Speech Enhancement","authors":"B. O. Odelowo, David V. Anderson","doi":"10.1109/ICMLA.2017.0-133","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-133","url":null,"abstract":"Deep neural networks (DNNs) have recently been successfully applied to the speech enhancement task; however, the low signal-to-noise ratio (SNR) performance of DNN-based speech enhancement systems remains less than desirable. In this paper, we study an approach to DNN-based speech enhancement based on noise prediction. Three speech enhancement models based on noise prediction are proposed, and their performance is compared to that of conventional spectral-mapping models in seen and unseen noise tests. Objective test results show that the proposed noise prediction models perform well in enhancing speech quality in seen noise conditions and in enhancing high SNR speech signals. They also perform well in enhancing speech intelligibility in both seen and unseen noise conditions, but do not outperform the conventional models on quality metrics in unseen noise conditions. Further analysis of the enhanced speech signals is undertaken to explain the observed results.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"24 1","pages":"372-377"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86914077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Forward Looking Sonar Scene Matching Using Deep Learning 使用深度学习的前视声纳场景匹配
P. Ribeiro, M. Santos, Paulo L. J. Drews-Jr, S. Botelho
Optical images display drastically reduced visibility due to underwater turbidity conditions. Sonar imaging presents an alternative form of environment perception for underwater vehicles navigation, mapping and localization. In this work we present a novel method for Acoustic Scene Matching. Therefore, we developed and trained a new Deep Learning architecture designed to compare two acoustic images and decide if they correspond to the same underwater scene. The network is named Sonar Matching Network (SMNet). The acoustic images used in this paper were obtained by a Forward Looking Sonar during a Remotely Operated Vehicle (ROV) mission. A Geographic Positioning System provided the ROV position for the ground truth score which is used in the learning process of our network. The proposed method uses 36.000 samples of real data for validation. From a binary classification perspective, our method achieved 98% of accuracy when two given scenes have more than ten percent of intersection.
由于水下浑浊条件,光学图像显示能见度急剧降低。声纳成像为水下航行器导航、测绘和定位提供了另一种形式的环境感知。本文提出了一种新的声场景匹配方法。因此,我们开发并训练了一个新的深度学习架构,旨在比较两个声学图像并确定它们是否对应于相同的水下场景。该网络被命名为声呐匹配网络(SMNet)。本文所使用的声学图像是由一个前视声纳在遥控操作车辆(ROV)任务中获得的。地理定位系统提供ROV位置,用于我们网络的学习过程中。该方法使用了36000个真实数据样本进行验证。从二元分类的角度来看,当两个给定场景的交集超过10%时,我们的方法达到98%的准确率。
{"title":"Forward Looking Sonar Scene Matching Using Deep Learning","authors":"P. Ribeiro, M. Santos, Paulo L. J. Drews-Jr, S. Botelho","doi":"10.1109/ICMLA.2017.00-99","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-99","url":null,"abstract":"Optical images display drastically reduced visibility due to underwater turbidity conditions. Sonar imaging presents an alternative form of environment perception for underwater vehicles navigation, mapping and localization. In this work we present a novel method for Acoustic Scene Matching. Therefore, we developed and trained a new Deep Learning architecture designed to compare two acoustic images and decide if they correspond to the same underwater scene. The network is named Sonar Matching Network (SMNet). The acoustic images used in this paper were obtained by a Forward Looking Sonar during a Remotely Operated Vehicle (ROV) mission. A Geographic Positioning System provided the ROV position for the ground truth score which is used in the learning process of our network. The proposed method uses 36.000 samples of real data for validation. From a binary classification perspective, our method achieved 98% of accuracy when two given scenes have more than ten percent of intersection.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"59 1","pages":"574-579"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90617255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1