首页 > 最新文献

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Deep Learning for Microalgae Classification 微藻分类的深度学习
Iago Correa, Paulo L. J. Drews-Jr, S. Botelho, M. S. Souza, V. Tavano
Microalgae are unicellular organisms that presents limited physical characteristics such as size, shape or even the present structures. Classifying them manually may require great effort from experts since thousands of microalgae can be found in a small sample of water. Furthermore, the manual classification is a non-trivial operation. We proposed a deep learning technique to solve the problem. We also created a classified dataset that allow us to adopt this technique. To the best of our knowledge, the present work is the first one to apply this kind of technique on the microalgae classification task. The obtained results show the capabilities of the method to properly classify the data by using as input the low resolution images acquired by a particle analyzer instead of pre-processed features. We also show the improvement provided by the use of data augmentation technique.
微藻是单细胞生物,具有有限的物理特征,如大小、形状甚至现在的结构。人工分类可能需要专家付出很大的努力,因为在一小块水样本中可以发现成千上万的微藻。此外,手工分类是一项重要的操作。我们提出了一种深度学习技术来解决这个问题。我们还创建了一个分类数据集,使我们能够采用这种技术。据我们所知,本工作是首次将该技术应用于微藻分类任务。实验结果表明,该方法可以将颗粒分析仪获取的低分辨率图像作为输入,而不是预处理后的特征,从而对数据进行正确的分类。我们还展示了使用数据增强技术所带来的改进。
{"title":"Deep Learning for Microalgae Classification","authors":"Iago Correa, Paulo L. J. Drews-Jr, S. Botelho, M. S. Souza, V. Tavano","doi":"10.1109/ICMLA.2017.0-183","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-183","url":null,"abstract":"Microalgae are unicellular organisms that presents limited physical characteristics such as size, shape or even the present structures. Classifying them manually may require great effort from experts since thousands of microalgae can be found in a small sample of water. Furthermore, the manual classification is a non-trivial operation. We proposed a deep learning technique to solve the problem. We also created a classified dataset that allow us to adopt this technique. To the best of our knowledge, the present work is the first one to apply this kind of technique on the microalgae classification task. The obtained results show the capabilities of the method to properly classify the data by using as input the low resolution images acquired by a particle analyzer instead of pre-processed features. We also show the improvement provided by the use of data augmentation technique.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"2 1","pages":"20-25"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90117770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Integrating Prior Knowledge into Deep Learning 将先验知识融入深度学习
Michelangelo Diligenti, Soumali Roychowdhury, M. Gori
Deep learning allows to develop feature representations and train classification models in a fully integrated way. However, learning deep networks is quite hard and it improves over shallow architectures only if a large number of training data is available. Injecting prior knowledge into the learner is a principled way to reduce the amount of required training data, as the learner does not need to induce the knowledge from the data itself. In this paper we propose a general and principled way to integrate prior knowledge when training deep networks. Semantic Based Regularization (SBR) is used as underlying framework to represent the prior knowledge, expressed as a collection of first-order logic clauses (FOL), and where each task to be learned corresponds to a predicate in the knowledge base. The knowledge base correlates the tasks to be learned and it is translated into a set of constraints which are integrated into the learning process via backpropagation. The experimental results show how the integration of the prior knowledge boosts the accuracy of a state-of-the-art deep network on an image classification task.
深度学习允许以完全集成的方式开发特征表示和训练分类模型。然而,学习深度网络是相当困难的,只有当有大量的训练数据可用时,它才会比浅层架构有所改进。向学习器中注入先验知识是减少所需训练数据量的一种原则性方法,因为学习器不需要从数据本身中导出知识。在本文中,我们提出了一种在训练深度网络时整合先验知识的通用和原则性方法。基于语义的正则化(Semantic Based Regularization, SBR)被用作表示先验知识的底层框架,表示为一阶逻辑子句(FOL)的集合,其中每个要学习的任务对应于知识库中的谓词。知识库与要学习的任务相关联,并将其转化为一组约束,这些约束通过反向传播集成到学习过程中。实验结果表明,先验知识的集成提高了最先进的深度网络在图像分类任务上的准确性。
{"title":"Integrating Prior Knowledge into Deep Learning","authors":"Michelangelo Diligenti, Soumali Roychowdhury, M. Gori","doi":"10.1109/ICMLA.2017.00-37","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-37","url":null,"abstract":"Deep learning allows to develop feature representations and train classification models in a fully integrated way. However, learning deep networks is quite hard and it improves over shallow architectures only if a large number of training data is available. Injecting prior knowledge into the learner is a principled way to reduce the amount of required training data, as the learner does not need to induce the knowledge from the data itself. In this paper we propose a general and principled way to integrate prior knowledge when training deep networks. Semantic Based Regularization (SBR) is used as underlying framework to represent the prior knowledge, expressed as a collection of first-order logic clauses (FOL), and where each task to be learned corresponds to a predicate in the knowledge base. The knowledge base correlates the tasks to be learned and it is translated into a set of constraints which are integrated into the learning process via backpropagation. The experimental results show how the integration of the prior knowledge boosts the accuracy of a state-of-the-art deep network on an image classification task.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"44 1","pages":"920-923"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90462121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
OP-DCI: A Riskless K-Means Clustering for Influential User Identification in MOOC Forum 基于无风险k -均值聚类的MOOC论坛影响力用户识别
X. Hou, Chi-Un Lei, Yu-Kwong Kwok
Massive Open Online Courses (MOOCs) have recently been highly popular among worldwide learners, while it is challenging to manage and interpret the large-scale discussion forum which is the dominant channel of online communication. K-Means clustering, one of the famous unsupervised learning algorithms, could help instructors identify influential users in MOOC forum, to better understand and improve online learning experience. However, traditional K-Means suffers from bias of outliers and risk of falling into local optimum. In this paper, OP-DCI, an optimized K-Means algorithm is proposed, using outlier post-labeling and distant centroid initialization. Outliers are not solely filtered out but extracted as distinct objects for post-labeling, and distant centroid initialization eliminates the risk of falling into local optimum. With OP-DCI, learners in MOOC forum are clustered efficiently with satisfactory interpretation, and instructors can subsequently design personalized learning strategies for different clusters.
大规模在线开放课程(MOOCs)近年来在全球学习者中受到广泛欢迎,但作为在线交流的主要渠道,大型讨论论坛的管理和解释具有挑战性。K-Means聚类算法是著名的无监督学习算法之一,它可以帮助教师识别MOOC论坛中有影响力的用户,从而更好地理解和改善在线学习体验。然而,传统的K-Means存在异常值偏差和陷入局部最优的风险。本文提出了一种基于离群点后标记和距离质心初始化的优化K-Means算法OP-DCI。异常值不仅被过滤掉,而且作为不同的对象被提取出来用于后标记,并且远程质心初始化消除了陷入局部最优的风险。使用OP-DCI, MOOC论坛中的学习者可以高效地聚类并获得满意的解释,教师随后可以针对不同的聚类设计个性化的学习策略。
{"title":"OP-DCI: A Riskless K-Means Clustering for Influential User Identification in MOOC Forum","authors":"X. Hou, Chi-Un Lei, Yu-Kwong Kwok","doi":"10.1109/ICMLA.2017.00-34","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-34","url":null,"abstract":"Massive Open Online Courses (MOOCs) have recently been highly popular among worldwide learners, while it is challenging to manage and interpret the large-scale discussion forum which is the dominant channel of online communication. K-Means clustering, one of the famous unsupervised learning algorithms, could help instructors identify influential users in MOOC forum, to better understand and improve online learning experience. However, traditional K-Means suffers from bias of outliers and risk of falling into local optimum. In this paper, OP-DCI, an optimized K-Means algorithm is proposed, using outlier post-labeling and distant centroid initialization. Outliers are not solely filtered out but extracted as distinct objects for post-labeling, and distant centroid initialization eliminates the risk of falling into local optimum. With OP-DCI, learners in MOOC forum are clustered efficiently with satisfactory interpretation, and instructors can subsequently design personalized learning strategies for different clusters.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"53 11","pages":"936-939"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91498787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Empirical Study of the Hidden Matrix Rank for Neural Networks with Random Weights 随机权重神经网络隐矩阵秩的实证研究
Pablo A. Henríquez, G. A. Ruz
Neural networks with random weights can be regarded as feed-forward neural networks built with a specific randomized algorithm, i.e., the input weights and biases are randomly assigned and fixed during the training phase, and the output weights are analytically evaluated by the least square method. This paper presents an empirical study of the hidden matrix rank for neural networks with random weights. We study the impacts of the scope of random parameters on the model's performance, and show that the assignment of the input weights in the range [-1,1] is misleading. Experiments were conducted using two types of neural networks obtaining insights not only on the input weights but also how these relate to different architectures.
具有随机权值的神经网络可以看作是用特定的随机化算法构建的前馈神经网络,即在训练阶段随机分配和固定输入权值和偏置,用最小二乘法解析评估输出权值。本文对具有随机权重的神经网络的隐矩阵秩进行了实证研究。我们研究了随机参数的范围对模型性能的影响,并表明在[-1,1]范围内的输入权重分配具有误导性。使用两种类型的神经网络进行了实验,不仅获得了输入权重的见解,而且还获得了这些权重与不同架构的关系。
{"title":"An Empirical Study of the Hidden Matrix Rank for Neural Networks with Random Weights","authors":"Pablo A. Henríquez, G. A. Ruz","doi":"10.1109/ICMLA.2017.00-44","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-44","url":null,"abstract":"Neural networks with random weights can be regarded as feed-forward neural networks built with a specific randomized algorithm, i.e., the input weights and biases are randomly assigned and fixed during the training phase, and the output weights are analytically evaluated by the least square method. This paper presents an empirical study of the hidden matrix rank for neural networks with random weights. We study the impacts of the scope of random parameters on the model's performance, and show that the assignment of the input weights in the range [-1,1] is misleading. Experiments were conducted using two types of neural networks obtaining insights not only on the input weights but also how these relate to different architectures.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"99 1","pages":"883-888"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81268753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Relevancy Ranking of User Recommendations of Services Based on Browsing Patterns 基于浏览模式的用户服务推荐相关度排序
Suresh Kumar Gudla, Joy Bose, Venugopal Gajam, S. Srinivasa
There are a number of inbound web services, which recommend content to users. However, there is no way for such services to prioritize their recommendations as per the users' interests. Here we are not interested in generating new recommendations, but rather organizing and prioritizing existing recommendations in order to increase the click rate. Since users have different patterns of browsing that also change frequently, it is good to have a system that prioritizes recommendations based on the current browsing patterns of individual users. In this paper we present such a system. We first generate the clusters of article topics using URLs from the users' browsing history, which is then used to generate the relevancy scores of the recommendation services based on entropy. The relevancy scores are then fed to the service providers, which use them to prioritize their recommendations by ranking them based on the relevancy scores. We test the model using the browsing history for 10 users, and validate the model by calculating the correlation of the generated relevancy scores with the users' manually provided topic preferences. We further use collaborative filtering to benchmark the usefulness of our ranking systems.
有许多入站web服务,它们向用户推荐内容。然而,这些服务没有办法根据用户的兴趣来优先推荐。在这里,我们对生成新的推荐不感兴趣,而是对现有的推荐进行组织和排序,以提高点击率。由于用户有不同的浏览模式,而且经常变化,因此最好有一个基于单个用户当前浏览模式来优先考虑推荐的系统。在本文中,我们提出了这样一个系统。我们首先使用用户浏览历史记录中的url生成文章主题的聚类,然后使用该聚类生成基于熵的推荐服务的相关性分数。然后将相关性分数馈送给服务提供商,服务提供商使用这些分数根据相关性分数对推荐进行排序,从而确定推荐的优先级。我们使用10个用户的浏览历史来测试模型,并通过计算生成的相关性分数与用户手动提供的主题偏好之间的相关性来验证模型。我们进一步使用协同过滤对我们的排名系统的有用性进行基准测试。
{"title":"Relevancy Ranking of User Recommendations of Services Based on Browsing Patterns","authors":"Suresh Kumar Gudla, Joy Bose, Venugopal Gajam, S. Srinivasa","doi":"10.1109/ICMLA.2017.00-66","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-66","url":null,"abstract":"There are a number of inbound web services, which recommend content to users. However, there is no way for such services to prioritize their recommendations as per the users' interests. Here we are not interested in generating new recommendations, but rather organizing and prioritizing existing recommendations in order to increase the click rate. Since users have different patterns of browsing that also change frequently, it is good to have a system that prioritizes recommendations based on the current browsing patterns of individual users. In this paper we present such a system. We first generate the clusters of article topics using URLs from the users' browsing history, which is then used to generate the relevancy scores of the recommendation services based on entropy. The relevancy scores are then fed to the service providers, which use them to prioritize their recommendations by ranking them based on the relevancy scores. We test the model using the browsing history for 10 users, and validate the model by calculating the correlation of the generated relevancy scores with the users' manually provided topic preferences. We further use collaborative filtering to benchmark the usefulness of our ranking systems.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"31 1","pages":"765-768"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76689519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
NMF-Based Label Space Factorization for Multi-label Classification 基于nmf的标签空间分解多标签分类
Mohammad Firouzi, Mahmood Karimian, Mahdieh Soleymani
Multi-label classification is a learning task in which each data sample can belong to more than one class. Until now, some methods that are based on reducing the dimensionality of the label space have been proposed. However, these methods have not used specific properties of the label space for this purpose. In this paper, we intend to find a hidden space in which both the input feature vectors and the label vectors are embedded. We propose a modified Non-Negative Matrix Factorization (NMF) method that is suitable for decomposing the label matrix and finding a proper hidden space by a feature-aware approach. We consider that the label matrix is binary and also in this matrix some deserving labels for an instance may not be on (called missing labels). We conduct several experiments and show the superiority of our proposed methods to the state-of-the-art multi- label classification methods.
多标签分类是一种学习任务,其中每个数据样本可以属于多个类别。到目前为止,已经提出了一些基于降低标签空间维数的方法。但是,这些方法没有为此目的使用标签空间的特定属性。在本文中,我们打算找到一个隐藏空间,其中同时嵌入输入特征向量和标签向量。提出了一种改进的非负矩阵分解(NMF)方法,该方法适用于通过特征感知方法分解标签矩阵并找到合适的隐藏空间。我们认为标签矩阵是二进制的,并且在这个矩阵中,一个实例的一些应得的标签可能不存在(称为缺失标签)。我们进行了几个实验,并证明了我们提出的方法比最先进的多标签分类方法的优越性。
{"title":"NMF-Based Label Space Factorization for Multi-label Classification","authors":"Mohammad Firouzi, Mahmood Karimian, Mahdieh Soleymani","doi":"10.1109/ICMLA.2017.0-144","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-144","url":null,"abstract":"Multi-label classification is a learning task in which each data sample can belong to more than one class. Until now, some methods that are based on reducing the dimensionality of the label space have been proposed. However, these methods have not used specific properties of the label space for this purpose. In this paper, we intend to find a hidden space in which both the input feature vectors and the label vectors are embedded. We propose a modified Non-Negative Matrix Factorization (NMF) method that is suitable for decomposing the label matrix and finding a proper hidden space by a feature-aware approach. We consider that the label matrix is binary and also in this matrix some deserving labels for an instance may not be on (called missing labels). We conduct several experiments and show the superiority of our proposed methods to the state-of-the-art multi- label classification methods.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"47 1","pages":"297-303"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78412779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Investigation of How Neural Networks Learn from the Experiences of Peers Through Periodic Weight Averaging 神经网络通过周期加权平均学习同伴经验的研究
Joshua Smith, Michael S. Gashler
We investigate a method for cooperative learning called weighted average model fusion that enables neural networks to learn from the experiences of other networks, as well as from their own experiences. Modern machine learning methods have focused predominantly on learning from direct training, but many situations exist where the data cannot be aggregated, rendering direct learning impossible. However, we show that the simple approach of averaging weights with peer neural networks at periodic intervals enables neural networks to learn from second hand experiences. We analyze the effects that several meta-parameters have on model fusion to provide deeper insights into how they affect cooperative learning in a variety of scenarios.
我们研究了一种称为加权平均模型融合的合作学习方法,该方法使神经网络能够从其他网络的经验中学习,以及从自己的经验中学习。现代机器学习方法主要集中在从直接训练中学习,但在许多情况下,数据无法聚合,导致直接学习不可能。然而,我们表明,以周期间隔对对等神经网络进行加权平均的简单方法使神经网络能够从二手经验中学习。我们分析了几个元参数对模型融合的影响,以更深入地了解它们如何影响各种场景下的合作学习。
{"title":"An Investigation of How Neural Networks Learn from the Experiences of Peers Through Periodic Weight Averaging","authors":"Joshua Smith, Michael S. Gashler","doi":"10.1109/ICMLA.2017.00-72","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.00-72","url":null,"abstract":"We investigate a method for cooperative learning called weighted average model fusion that enables neural networks to learn from the experiences of other networks, as well as from their own experiences. Modern machine learning methods have focused predominantly on learning from direct training, but many situations exist where the data cannot be aggregated, rendering direct learning impossible. However, we show that the simple approach of averaging weights with peer neural networks at periodic intervals enables neural networks to learn from second hand experiences. We analyze the effects that several meta-parameters have on model fusion to provide deeper insights into how they affect cooperative learning in a variety of scenarios.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"731-736"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88319651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Early Prediction of College Attrition Using Data Mining 基于数据挖掘的高校人员流失早期预测
L. C. B. Martins, Rommel N. Carvalho, Ricardo Silva Carvalho, M. Victorino, M. Holanda
College attrition is a chronic problem for institutions of higher education. In Brazilian public universities, attrition also accounts for the significant waste of public resources desperately needed in other sectors of society. Thus, given the severity and persistence of this problem, several studies have been conducted in an attempt to mitigate undergraduate dropout rates. Using H2O software as a data mining tool, our study employed parameter tuning to train 321 of three classification algorithms, and with Deep Learning, it was possible to predict 71.1% of the cases of dropout given these characteristics. With this result, it will be possible to identify the attrition profiles of students and implement corrective measures on initiating their studies.
高校人员流失是高等教育机构面临的一个长期问题。在巴西的公立大学中,人员流失也造成了公共资源的大量浪费,而社会其他部门急需这些资源。因此,鉴于这个问题的严重性和持久性,已经进行了几项研究,试图降低大学生辍学率。使用H2O软件作为数据挖掘工具,我们的研究使用参数调优训练了三种分类算法中的321种,并且在给定这些特征的情况下,使用深度学习可以预测71.1%的辍学案例。有了这个结果,就有可能确定学生的流失概况,并在他们开始学习时实施纠正措施。
{"title":"Early Prediction of College Attrition Using Data Mining","authors":"L. C. B. Martins, Rommel N. Carvalho, Ricardo Silva Carvalho, M. Victorino, M. Holanda","doi":"10.1109/ICMLA.2017.000-6","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.000-6","url":null,"abstract":"College attrition is a chronic problem for institutions of higher education. In Brazilian public universities, attrition also accounts for the significant waste of public resources desperately needed in other sectors of society. Thus, given the severity and persistence of this problem, several studies have been conducted in an attempt to mitigate undergraduate dropout rates. Using H2O software as a data mining tool, our study employed parameter tuning to train 321 of three classification algorithms, and with Deep Learning, it was possible to predict 71.1% of the cases of dropout given these characteristics. With this result, it will be possible to identify the attrition profiles of students and implement corrective measures on initiating their studies.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"43 1","pages":"1075-1078"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87369594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
RobustSPAM for Inference from Noisy Longitudinal Data and Preservation of Privacy 基于噪声纵向数据推断和隐私保护的鲁棒垃圾邮件
Anna Palczewska, Jan Palczewski, G. Aivaliotis, Lukasz Kowalik
The availability of complex temporal datasets in social, health and consumer contexts has driven the development of pattern mining techniques that enable the use of classical machine learning tools for model building. In this work we introduce a robust temporal pattern mining framework for finding predictive patterns in complex timestamped multivariate and noisy data. We design an algorithm RobustSPAM that enables mining of temporal patterns from data with noisy timestamps. We apply our algorithm to social care data from a local government body and investigate how the efficiency and accuracy of the method depends on the level of noise. We further explore the trade-off between the loss of predictivity due to perturbation of timestamps and the risk of person re-identification.
社会、健康和消费者环境中复杂时间数据集的可用性推动了模式挖掘技术的发展,使经典机器学习工具能够用于模型构建。在这项工作中,我们引入了一个健壮的时间模式挖掘框架,用于在复杂的时间戳多变量和噪声数据中发现预测模式。我们设计了一种算法RobustSPAM,可以从带有噪声时间戳的数据中挖掘时间模式。我们将算法应用于当地政府机构的社会护理数据,并研究该方法的效率和准确性如何取决于噪音水平。我们进一步探讨了由于时间戳扰动导致的预测性损失与人员重新识别风险之间的权衡。
{"title":"RobustSPAM for Inference from Noisy Longitudinal Data and Preservation of Privacy","authors":"Anna Palczewska, Jan Palczewski, G. Aivaliotis, Lukasz Kowalik","doi":"10.1109/ICMLA.2017.0-137","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-137","url":null,"abstract":"The availability of complex temporal datasets in social, health and consumer contexts has driven the development of pattern mining techniques that enable the use of classical machine learning tools for model building. In this work we introduce a robust temporal pattern mining framework for finding predictive patterns in complex timestamped multivariate and noisy data. We design an algorithm RobustSPAM that enables mining of temporal patterns from data with noisy timestamps. We apply our algorithm to social care data from a local government body and investigate how the efficiency and accuracy of the method depends on the level of noise. We further explore the trade-off between the loss of predictivity due to perturbation of timestamps and the risk of person re-identification.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"76 1","pages":"344-351"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80541163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predictive Modelling Strategies to Understand Heterogeneous Manifestations of Asthma in Early Life 预测模型策略以了解生命早期哮喘的异质表现
D. Belgrave, R. Cassidy, D. Stamate, A. Custovic, L. Fleming, A. Bush, S. Saglani
Wheezing is common among children and ∼50% of those under 6 years of age are thought to experience at least one episode of wheeze. However, due to the heterogeneity of symptoms there are difficulties in treating and diagnosing these children. ‘Phenotype specific therapy’ is one possible avenue of treatment, whereby we use significant pathology and physiology to identify and treat pre-schoolers with wheeze. By performing feature selection algorithms and predictive modelling techniques, this study will attempt to determine if it is possible to robustly distinguish patient diagnostic categories among pre-school children. Univariate feature analysis identified more objective variables and recursive feature elimination a larger number of subjective variables as important in distinguishing between patient categories. Predicative modelling saw a drop in performance when subjective variables were removed from analysis, indicating that these variables are important in distinguishing wheeze classes. We achieved 90%+ performance in AUC, sensitivity, specificity, and accuracy, and 80%+ in kappa statistic, in distinguishing ill from healthy patients. Developed in a synergistic statistical - machine learning approach, our methodologies propose also a novel ROC Cross Evaluation method for model post-processing and evaluation. Our predictive modelling's stability was assessed in computationally intensive Monte Carlo simulations.
喘息在儿童中很常见,6岁以下儿童中约有50%被认为至少经历过一次喘息。然而,由于症状的异质性,对这些儿童的治疗和诊断存在困难。“表现型特异性治疗”是一种可能的治疗途径,我们利用重要的病理学和生理学来识别和治疗学龄前儿童的喘息。通过执行特征选择算法和预测建模技术,本研究将试图确定是否有可能在学龄前儿童中强有力地区分患者诊断类别。单变量特征分析识别了更多的客观变量,递归特征消除了大量的主观变量,这对区分患者类别很重要。当从分析中去除主观变量时,预测建模的性能下降,表明这些变量在区分喘息类别时很重要。我们在AUC、敏感性、特异性和准确性方面达到90%以上的性能,在kappa统计上达到80%以上,在区分疾病和健康患者方面。我们的方法采用协同统计-机器学习方法,提出了一种新的ROC交叉评估方法,用于模型后处理和评估。我们的预测模型的稳定性在计算密集的蒙特卡罗模拟中进行了评估。
{"title":"Predictive Modelling Strategies to Understand Heterogeneous Manifestations of Asthma in Early Life","authors":"D. Belgrave, R. Cassidy, D. Stamate, A. Custovic, L. Fleming, A. Bush, S. Saglani","doi":"10.1109/ICMLA.2017.0-176","DOIUrl":"https://doi.org/10.1109/ICMLA.2017.0-176","url":null,"abstract":"Wheezing is common among children and ∼50% of those under 6 years of age are thought to experience at least one episode of wheeze. However, due to the heterogeneity of symptoms there are difficulties in treating and diagnosing these children. ‘Phenotype specific therapy’ is one possible avenue of treatment, whereby we use significant pathology and physiology to identify and treat pre-schoolers with wheeze. By performing feature selection algorithms and predictive modelling techniques, this study will attempt to determine if it is possible to robustly distinguish patient diagnostic categories among pre-school children. Univariate feature analysis identified more objective variables and recursive feature elimination a larger number of subjective variables as important in distinguishing between patient categories. Predicative modelling saw a drop in performance when subjective variables were removed from analysis, indicating that these variables are important in distinguishing wheeze classes. We achieved 90%+ performance in AUC, sensitivity, specificity, and accuracy, and 80%+ in kappa statistic, in distinguishing ill from healthy patients. Developed in a synergistic statistical - machine learning approach, our methodologies propose also a novel ROC Cross Evaluation method for model post-processing and evaluation. Our predictive modelling's stability was assessed in computationally intensive Monte Carlo simulations.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"68-75"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76564511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1