首页 > 最新文献

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Exploring the Explicit Modelling of Bias in Machine Learning Classifiers: A Deep Multi-label ConvNet Approach * 探索机器学习分类器中偏差的显式建模:一种深度多标签卷积神经网络方法*
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00277
Mashael Al-Luhaybi, S. Swift, S. Counsell, A. Tucker
This paper addresses the problem that many machine learning classifiers make decisions based on data that are biased and can therefore result in prejudiced decisions. For example, in education (which this paper focuses on) a student may be rejected from a course based on historical decisions in the data that only exist due to historical biases in society or due to the skewed sampling of the data. Other approaches to dealing with bias in data include resampling methods (to counter imbalanced samples) and dimensionality reduction (to focus only on relevant features to the classification task). In this paper, we explore issues of modelling bias explicitly so that we can identify the types of bias and whether they are accounting for inflated predictive accuracies. In particular, we compare graphical model approaches to building classifiers, that are transparent in how they make decisions, with two forms of Deep Multi-label Convolutional Neural Networks to investigate if models can be built that maximise accuracy and minimise bias. We carry out this comparison on student entry and performance data from a higher educational institution.
本文解决了许多机器学习分类器基于有偏见的数据做出决策的问题,因此可能导致有偏见的决策。例如,在教育(本文所关注的)中,一个学生可能会因为数据中的历史决策而被拒绝上一门课程,这些决策只存在于社会中的历史偏见或由于数据的抽样偏差。处理数据偏差的其他方法包括重新采样方法(以对抗不平衡的样本)和降维方法(仅关注与分类任务相关的特征)。在本文中,我们明确地探讨了建模偏差的问题,以便我们可以识别偏差的类型以及它们是否会导致预测精度过高。特别是,我们将图形模型方法与两种形式的深度多标签卷积神经网络进行比较,以构建分类器,这些分类器在决策过程中是透明的,以研究是否可以构建出最大化准确性和最小化偏差的模型。我们对一所高等教育机构的学生入学和表现数据进行了比较。
{"title":"Exploring the Explicit Modelling of Bias in Machine Learning Classifiers: A Deep Multi-label ConvNet Approach *","authors":"Mashael Al-Luhaybi, S. Swift, S. Counsell, A. Tucker","doi":"10.1109/ICMLA55696.2022.00277","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00277","url":null,"abstract":"This paper addresses the problem that many machine learning classifiers make decisions based on data that are biased and can therefore result in prejudiced decisions. For example, in education (which this paper focuses on) a student may be rejected from a course based on historical decisions in the data that only exist due to historical biases in society or due to the skewed sampling of the data. Other approaches to dealing with bias in data include resampling methods (to counter imbalanced samples) and dimensionality reduction (to focus only on relevant features to the classification task). In this paper, we explore issues of modelling bias explicitly so that we can identify the types of bias and whether they are accounting for inflated predictive accuracies. In particular, we compare graphical model approaches to building classifiers, that are transparent in how they make decisions, with two forms of Deep Multi-label Convolutional Neural Networks to investigate if models can be built that maximise accuracy and minimise bias. We carry out this comparison on student entry and performance data from a higher educational institution.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129808166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Learning Generalised Low-Rank Models 多学习广义低秩模型
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00142
Francois Buet-Golfouse, Parth Pahwa
Multi-output supervised learning and multi-task learning are all instances of a broader learning paradigm where features, parameters and objectives are shared to a certain extent. Examples of such approaches include reusing features from pre-existing models in a new algorithm, performing multi-label regression or optimising for several tasks jointly. In this paper, we address this challenge by devising a generic framework based on generalised low-rank models ("GLRMs"), which include – broadly speaking– most techniques that can be expressed in terms of matrix factorisation. Importantly, while GLRMs first and foremost tackle unsupervised learning problems and supervised linear models. Here, we show that GLRMs can be extended by introducing multivariate functionals and structure regularisation terms to handle multivariate learning. This paper also proposes a coherent framework to design multi-learning strategies and covers existing algorithms. Finally, we prove the simplicity and effectiveness of our approach on empirical data.
多输出监督学习和多任务学习都是更广泛的学习范式的实例,其中特征、参数和目标在一定程度上是共享的。这些方法的例子包括在新算法中重用已有模型的特征,执行多标签回归或对多个任务进行联合优化。在本文中,我们通过设计一个基于广义低秩模型(“glrm”)的通用框架来解决这一挑战,广义上说,它包括了大多数可以用矩阵分解来表达的技术。重要的是,虽然glrm首先解决无监督学习问题和有监督线性模型。在这里,我们展示了可以通过引入多元函数和结构正则化术语来扩展glrm来处理多元学习。本文还提出了一个连贯的框架来设计多学习策略并涵盖了现有的算法。最后,用实证数据证明了本文方法的简单性和有效性。
{"title":"Multi-Learning Generalised Low-Rank Models","authors":"Francois Buet-Golfouse, Parth Pahwa","doi":"10.1109/ICMLA55696.2022.00142","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00142","url":null,"abstract":"Multi-output supervised learning and multi-task learning are all instances of a broader learning paradigm where features, parameters and objectives are shared to a certain extent. Examples of such approaches include reusing features from pre-existing models in a new algorithm, performing multi-label regression or optimising for several tasks jointly. In this paper, we address this challenge by devising a generic framework based on generalised low-rank models (\"GLRMs\"), which include – broadly speaking– most techniques that can be expressed in terms of matrix factorisation. Importantly, while GLRMs first and foremost tackle unsupervised learning problems and supervised linear models. Here, we show that GLRMs can be extended by introducing multivariate functionals and structure regularisation terms to handle multivariate learning. This paper also proposes a coherent framework to design multi-learning strategies and covers existing algorithms. Finally, we prove the simplicity and effectiveness of our approach on empirical data.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121186987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Edge Machine Learning-based Stress Prediction using Wearable Devices 利用可穿戴设备探索基于边缘机器学习的应力预测
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00203
Sang-Hun Sim, Tara Paranjpe, Nicole Roberts, Ming Zhao
Stress is a central factor in our daily lives, impacting performance, decisions, well-being, and our interactions with others. With the development of IoT technology, smart wearable devices can handle diverse operations, including networking and recording biometric signals. The enhanced data processing capability of wearables has also allowed for increased stress awareness among users. Edge computing on such devices enables real-time feedback which can provide an opportunity to prevent severe consequences that might result if stress is left unaddressed. Edge computing can also strengthen privacy by implementing stress prediction on local devices without transferring personal information to the public cloud.This paper presents a framework for real-time stress prediction, specifically for police training cadets, using wearable devices and machine learning with support from cloud computing. We developed an application for Fitbit and the user's accompanying smartphone to collect heart rate fluctuations and corresponding stress levels entered by users and a cloud backend for storing data and training models. Real-world data for this study was collected from police cadets during a police academy training program. Machine learning classifiers for stress prediction were built using this data through classic machine learning models and neural networks. To analyze efficiency across different environments, the models were optimized using model compression and other relevant techniques and tested on cloud and edge environments. Evaluation using real data and real devices showed that the highest accuracy came from XGBoost and Tensorflow neural network models, and on-edge stress prediction models produced lower latency results than in-cloud prediction.
压力是我们日常生活中的一个中心因素,影响着我们的表现、决定、幸福以及我们与他人的互动。随着物联网技术的发展,智能可穿戴设备可以处理多种操作,包括联网和记录生物特征信号。可穿戴设备增强的数据处理能力也增强了用户的压力意识。这些设备上的边缘计算可以实现实时反馈,这可以提供一个机会,防止如果不解决压力可能导致的严重后果。边缘计算还可以通过在本地设备上实施压力预测来加强隐私,而无需将个人信息传输到公共云。本文提出了一个实时压力预测框架,特别是针对警察培训学员,使用可穿戴设备和云计算支持的机器学习。我们为Fitbit和用户的智能手机开发了一个应用程序,用于收集用户输入的心率波动和相应的压力水平,并开发了一个云后端,用于存储数据和训练模型。本研究的真实数据是从警察学院培训计划中的警察学员中收集的。利用这些数据,通过经典的机器学习模型和神经网络建立了用于应力预测的机器学习分类器。为了分析不同环境下的效率,使用模型压缩和其他相关技术对模型进行了优化,并在云和边缘环境下进行了测试。使用真实数据和真实设备进行的评估表明,XGBoost和Tensorflow神经网络模型的准确率最高,并且边缘应力预测模型的延迟比云内预测结果更低。
{"title":"Exploring Edge Machine Learning-based Stress Prediction using Wearable Devices","authors":"Sang-Hun Sim, Tara Paranjpe, Nicole Roberts, Ming Zhao","doi":"10.1109/ICMLA55696.2022.00203","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00203","url":null,"abstract":"Stress is a central factor in our daily lives, impacting performance, decisions, well-being, and our interactions with others. With the development of IoT technology, smart wearable devices can handle diverse operations, including networking and recording biometric signals. The enhanced data processing capability of wearables has also allowed for increased stress awareness among users. Edge computing on such devices enables real-time feedback which can provide an opportunity to prevent severe consequences that might result if stress is left unaddressed. Edge computing can also strengthen privacy by implementing stress prediction on local devices without transferring personal information to the public cloud.This paper presents a framework for real-time stress prediction, specifically for police training cadets, using wearable devices and machine learning with support from cloud computing. We developed an application for Fitbit and the user's accompanying smartphone to collect heart rate fluctuations and corresponding stress levels entered by users and a cloud backend for storing data and training models. Real-world data for this study was collected from police cadets during a police academy training program. Machine learning classifiers for stress prediction were built using this data through classic machine learning models and neural networks. To analyze efficiency across different environments, the models were optimized using model compression and other relevant techniques and tested on cloud and edge environments. Evaluation using real data and real devices showed that the highest accuracy came from XGBoost and Tensorflow neural network models, and on-edge stress prediction models produced lower latency results than in-cloud prediction.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121257224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deeper Bidirectional Neural Networks with Generalized Non-Vanishing Hidden Neurons 具有广义不消失隐藏神经元的深层双向神经网络
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00017
Olaoluwa Adigun, B. Kosko
The new NoVa hidden neurons have outperformed ReLU hidden neurons in deep classifiers on some large image test sets. The NoVa or nonvanishing logistic neuron additively perturbs the sigmoidal activation function so that its derivative is not zero. This helps avoid or delay the problem of vanishing gradients. We here extend the NoVa to the generalized perturbed logistic neuron and compare it to ReLU and several other hidden neurons on large image test sets that include CIFAR-100 and Caltech-256. Generalized NoVa classifiers allow deeper networks with better classification on the large datasets. This deep benefit holds for ordinary unidirectional backpropagation. It also holds for the more efficient bidirectional backpropagation that trains in both the forward and backward directions.
在一些大型图像测试集上,NoVa隐藏神经元在深度分类器中的表现优于ReLU隐藏神经元。新星或非消失逻辑神经元加性扰动s型激活函数,使其导数不为零。这有助于避免或延迟梯度消失的问题。我们将NoVa扩展到广义摄动逻辑神经元,并将其与包括CIFAR-100和Caltech-256在内的大型图像测试集上的ReLU和其他几个隐藏神经元进行比较。广义NoVa分类器允许对大型数据集进行更深入的分类。这种深刻的好处适用于普通的单向反向传播。它也适用于更有效的双向反向传播,即在向前和向后方向上进行训练。
{"title":"Deeper Bidirectional Neural Networks with Generalized Non-Vanishing Hidden Neurons","authors":"Olaoluwa Adigun, B. Kosko","doi":"10.1109/ICMLA55696.2022.00017","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00017","url":null,"abstract":"The new NoVa hidden neurons have outperformed ReLU hidden neurons in deep classifiers on some large image test sets. The NoVa or nonvanishing logistic neuron additively perturbs the sigmoidal activation function so that its derivative is not zero. This helps avoid or delay the problem of vanishing gradients. We here extend the NoVa to the generalized perturbed logistic neuron and compare it to ReLU and several other hidden neurons on large image test sets that include CIFAR-100 and Caltech-256. Generalized NoVa classifiers allow deeper networks with better classification on the large datasets. This deep benefit holds for ordinary unidirectional backpropagation. It also holds for the more efficient bidirectional backpropagation that trains in both the forward and backward directions.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116845885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-Efficient Automatic Model Selection in Unsupervised Anomaly Detection 无监督异常检测中的数据高效自动模型选择
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00227
Gautham Krishna Gudur, Raaghul R, Adithya K, Shrihari Vasudevan
Anomaly Detection is a widely used technique in machine learning that identifies context-specific outliers. Most real-world anomaly detection applications are unsupervised, owing to the bottleneck of obtaining labeled data for a given context. In this paper, we solve two important problems pertaining to unsupervised anomaly detection. First, we identify only the most informative subsets of data points and obtain ground truths from the domain expert (oracle); second, we perform efficient model selection using a Bayesian Inference framework and recommend the top-k models to be fine-tuned prior to deployment. To this end, we exploit multiple existing and novel acquisition functions, and successfully demonstrate the effectiveness of the proposed framework using a weighted Ranking Score (η) to accurately rank the top-k models. Our empirical results show a significant reduction in data points acquired (with at least 60% reduction) while not compromising on the efficiency of the top-k models chosen, with both uniform and non-uniform priors over models.
异常检测是一种在机器学习中广泛使用的技术,用于识别特定于上下文的异常值。大多数现实世界的异常检测应用程序都是无监督的,这是由于获取给定上下文的标记数据的瓶颈。在本文中,我们解决了与无监督异常检测相关的两个重要问题。首先,我们只识别数据点中信息量最大的子集,并从领域专家(oracle)那里获得基本事实;其次,我们使用贝叶斯推理框架执行有效的模型选择,并建议在部署之前对top-k模型进行微调。为此,我们利用了多个现有的和新的获取函数,并成功地证明了使用加权排名分数(η)对top-k模型进行准确排名的框架的有效性。我们的经验结果显示,获得的数据点显著减少(至少减少60%),同时不影响所选择的top-k模型的效率,对模型具有均匀和非均匀的先验。
{"title":"Data-Efficient Automatic Model Selection in Unsupervised Anomaly Detection","authors":"Gautham Krishna Gudur, Raaghul R, Adithya K, Shrihari Vasudevan","doi":"10.1109/ICMLA55696.2022.00227","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00227","url":null,"abstract":"Anomaly Detection is a widely used technique in machine learning that identifies context-specific outliers. Most real-world anomaly detection applications are unsupervised, owing to the bottleneck of obtaining labeled data for a given context. In this paper, we solve two important problems pertaining to unsupervised anomaly detection. First, we identify only the most informative subsets of data points and obtain ground truths from the domain expert (oracle); second, we perform efficient model selection using a Bayesian Inference framework and recommend the top-k models to be fine-tuned prior to deployment. To this end, we exploit multiple existing and novel acquisition functions, and successfully demonstrate the effectiveness of the proposed framework using a weighted Ranking Score (η) to accurately rank the top-k models. Our empirical results show a significant reduction in data points acquired (with at least 60% reduction) while not compromising on the efficiency of the top-k models chosen, with both uniform and non-uniform priors over models.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117251164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulating New and Old Twitter User Activity with XGBoost and Probabilistic Hybrid Models 用XGBoost和概率混合模型模拟新旧Twitter用户活动
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00026
Frederick Mubang, Lawrence O. Hall
The Volume Audience Match Simulator is an end-to-end approach for predicting user-to-user interactions on a given social media platform. It is comprised of 2 components: firstly, an XGBoost-driven volume prediction module that predicts the number of: (1) total activities, (2) active old users, and (3) newly active users over the span of 24 hours from the start time of prediction. Secondly, VAM contains a User-Assignment Module that takes as input the volume predictions and predicts the user-to-user interactions of the old and new users.In previous work, VAM has been used to predict Twitter discussions related to political crises. In this work, VAM was used to predict future activity on Twitter related to international economic affairs. We include more experiments and analyses than previous work performed with VAM. In this work, VAM is used to predict all types of retweets, including quotes and replies, unlike previous work, which only focused on regular retweets. Furthermore, we show that YouTube features, in addition to Reddit features can improve prediction performance. We examine the importance of the time series features used in VAM’s Volume Prediction module. Lastly, we show that VAM’s performance is significantly more accurate than other approaches when predicting highly-skewed, lowly-skewed, highly-sparse, and lowly-sparse time series.
受众匹配模拟器是一种端到端方法,用于预测给定社交媒体平台上的用户对用户交互。它由两个部分组成:首先是xgboost驱动的容量预测模块,该模块预测从预测开始时间起24小时内的活动数量:(1)总活动数量,(2)活跃老用户数量,(3)新活跃用户数量。其次,VAM包含一个用户分配模块,该模块以预测量为输入,预测新老用户之间的用户交互。在之前的工作中,VAM已被用于预测与政治危机相关的Twitter讨论。在这项工作中,VAM被用来预测Twitter上与国际经济事务有关的未来活动。我们包括更多的实验和分析比以前的工作进行了VAM。在这项工作中,VAM用于预测所有类型的转发,包括引用和回复,而不是像以前的工作那样只关注常规转发。此外,我们表明YouTube的功能,除了Reddit的功能可以提高预测性能。我们研究了VAM体积预测模块中使用的时间序列特征的重要性。最后,我们证明了VAM在预测高偏、低偏、高稀疏和低稀疏时间序列时的性能明显比其他方法更准确。
{"title":"Simulating New and Old Twitter User Activity with XGBoost and Probabilistic Hybrid Models","authors":"Frederick Mubang, Lawrence O. Hall","doi":"10.1109/ICMLA55696.2022.00026","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00026","url":null,"abstract":"The Volume Audience Match Simulator is an end-to-end approach for predicting user-to-user interactions on a given social media platform. It is comprised of 2 components: firstly, an XGBoost-driven volume prediction module that predicts the number of: (1) total activities, (2) active old users, and (3) newly active users over the span of 24 hours from the start time of prediction. Secondly, VAM contains a User-Assignment Module that takes as input the volume predictions and predicts the user-to-user interactions of the old and new users.In previous work, VAM has been used to predict Twitter discussions related to political crises. In this work, VAM was used to predict future activity on Twitter related to international economic affairs. We include more experiments and analyses than previous work performed with VAM. In this work, VAM is used to predict all types of retweets, including quotes and replies, unlike previous work, which only focused on regular retweets. Furthermore, we show that YouTube features, in addition to Reddit features can improve prediction performance. We examine the importance of the time series features used in VAM’s Volume Prediction module. Lastly, we show that VAM’s performance is significantly more accurate than other approaches when predicting highly-skewed, lowly-skewed, highly-sparse, and lowly-sparse time series.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116804837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentence Similarity Recognition in Portuguese from Multiple Embedding Models 基于多嵌入模型的葡萄牙语句子相似度识别
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00029
Ana Carolina Rodrigues, R. Marcacini
Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.
不同的预训练嵌入模型在句子相似度识别任务中的表现不同。目前的假设是,由于算法设计和预训练过程中使用的数据集的特征不同,它们编码了不同的特征。从利用不同的编码特征来生成更合适的表示的角度出发,推动了多个嵌入模型的组装,即元嵌入。元嵌入方法结合不同的预训练嵌入模型来执行任务。最近,从基于transformer体系结构的系统中派生的多个预训练语言表示已被证明在许多下游任务中是有效的。本文介绍了一种监督元嵌入神经网络,结合情境化预训练模型进行葡萄牙语句子相似度识别。我们的研究结果表明,组合多个句子预训练的嵌入模型优于单个模型,可以成为提高句子相似度的一种有希望的替代方法。此外,我们还讨论了我们将模型可解释性方法简单扩展到元嵌入上下文的结果,允许视觉识别每个标记对句子相似度得分的影响。
{"title":"Sentence Similarity Recognition in Portuguese from Multiple Embedding Models","authors":"Ana Carolina Rodrigues, R. Marcacini","doi":"10.1109/ICMLA55696.2022.00029","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00029","url":null,"abstract":"Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115512781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Task-independent Joint Control for Robotic Manipulators with Reinforcement Learning and Curriculum Learning 基于强化学习和课程学习的机械臂学习任务独立联合控制
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00201
Lars Væhrens, D. D. Álvarez, U. Berger, Simon Boegh
We present a deep reinforcement learning-based approach to control robotic manipulators and construct task-independent trajectories for point-to-point motions. The research objective in this work is to learn control in the joint action space, which can be generalized to various industrial manipulators. The approach necessitates that the neural network learns a mapping from joint movements to the reward landscape determined by the distance to the goal and nearby obstacles. In addition, curriculum learning is embedded in this approach to facilitate learning by reducing the complexity of the environment. Conducted experiments demonstrate how the reinforcement learning-based approach can be applied to three different industrial manipulators in simulation with minimal configuration changes. The results of our contribution demonstrate that a model can be trained in a simulation environment, transferred to the real world, and used in complex environments. Furthermore, the Sim2Real transfer, augmented by curriculum learning, highlights that the robots behave in the same way in the real world as in the simulation and that the operations in the real world are safe from a control and trajectory point-of-view.
我们提出了一种基于深度强化学习的方法来控制机器人操纵器,并为点对点运动构建任务无关的轨迹。本工作的研究目标是学习在联合动作空间中的控制,这可以推广到各种工业机械臂。该方法要求神经网络学习从关节运动到由目标和附近障碍物的距离决定的奖励景观的映射。此外,课程学习嵌入在这种方法中,通过降低环境的复杂性来促进学习。进行的实验证明了基于强化学习的方法如何在最小配置变化的情况下应用于三种不同的工业机械手仿真。我们贡献的结果表明,一个模型可以在模拟环境中训练,转移到现实世界,并在复杂的环境中使用。此外,通过课程学习增强的Sim2Real迁移强调了机器人在现实世界中的行为方式与模拟中的相同,并且从控制和轨迹的角度来看,现实世界中的操作是安全的。
{"title":"Learning Task-independent Joint Control for Robotic Manipulators with Reinforcement Learning and Curriculum Learning","authors":"Lars Væhrens, D. D. Álvarez, U. Berger, Simon Boegh","doi":"10.1109/ICMLA55696.2022.00201","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00201","url":null,"abstract":"We present a deep reinforcement learning-based approach to control robotic manipulators and construct task-independent trajectories for point-to-point motions. The research objective in this work is to learn control in the joint action space, which can be generalized to various industrial manipulators. The approach necessitates that the neural network learns a mapping from joint movements to the reward landscape determined by the distance to the goal and nearby obstacles. In addition, curriculum learning is embedded in this approach to facilitate learning by reducing the complexity of the environment. Conducted experiments demonstrate how the reinforcement learning-based approach can be applied to three different industrial manipulators in simulation with minimal configuration changes. The results of our contribution demonstrate that a model can be trained in a simulation environment, transferred to the real world, and used in complex environments. Furthermore, the Sim2Real transfer, augmented by curriculum learning, highlights that the robots behave in the same way in the real world as in the simulation and that the operations in the real world are safe from a control and trajectory point-of-view.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116249232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Interpretability of ReLU for Inversion ReLU在反演中的可解释性
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00192
Boaz Ilan, A. Ranganath, Jacqueline Alvarez, Shilpa Khatri, Roummel F. Marcia
Interpretability continues to be a focus of much research in deep neural network. In this work, we focus on the mathematical interpretability of fully-connected neural networks, especially those that use a rectified linear unit (ReLU) activation function. Our analysis elucidates the difficulty of approximating the reciprocal function. Notwithstanding, using the ReLU activation function halves the error compared with a linear model. In addition, one might have expected the errors to increase only towards the singular point x = 0, but both the linear and ReLU errors are fairly oscillatory and increase near both edge points.
可解释性一直是深度神经网络研究的焦点。在这项工作中,我们专注于全连接神经网络的数学可解释性,特别是那些使用整流线性单元(ReLU)激活函数的神经网络。我们的分析说明了近似互反函数的困难。尽管如此,与线性模型相比,使用ReLU激活函数可以将误差减半。此外,人们可能会期望误差只在奇异点x = 0处增加,但线性和ReLU误差都是相当振荡的,并且在两个边缘点附近都增加。
{"title":"Interpretability of ReLU for Inversion","authors":"Boaz Ilan, A. Ranganath, Jacqueline Alvarez, Shilpa Khatri, Roummel F. Marcia","doi":"10.1109/ICMLA55696.2022.00192","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00192","url":null,"abstract":"Interpretability continues to be a focus of much research in deep neural network. In this work, we focus on the mathematical interpretability of fully-connected neural networks, especially those that use a rectified linear unit (ReLU) activation function. Our analysis elucidates the difficulty of approximating the reciprocal function. Notwithstanding, using the ReLU activation function halves the error compared with a linear model. In addition, one might have expected the errors to increase only towards the singular point x = 0, but both the linear and ReLU errors are fairly oscillatory and increase near both edge points.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128063550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are Post-Hoc Explanation Methods for Prostate Lesion Detection Effective for Radiology End Use? 前列腺病变检测的事后解释方法对放射学最终用途有效吗?
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00191
Mehmet Akif Gulum, Christopher M. Trombley, M. Ozen, M. Kantardzic
Deep learning has demonstrated impressive performance for medical tasks such as cancer classification and lesion detection. While it has achieved impressive performance, it is a black-box algorithm and therefore is difficult to interpret. Interpretation is especially important in fields that are high-risk in nature such as the medical field. There recently has been various methods proposed to interpret deep learning algorithms. However, there are limited studies evaluating these explanation methods in clinical settings such as radiology. To that end, we conduct a pilot study that evaluates the effectiveness of explanation methods for radiology end use. We evaluate if explanation methods improve diagnosis performance and what method is preferred by radiologists. We also glean insight into what characteristics radiologists deem explainable. We found that explanation methods increase diagnosis performance however it is dependent on the individual method. We also find that the radiology cohort deem the themes insight, visualization, and accuracy to be the most sought after explainable characteristics. The insights garnered in this study have the potential to guide future developments and studies of explanation methods for clinical use.
深度学习在癌症分类和病变检测等医疗任务中表现出了令人印象深刻的表现。虽然它取得了令人印象深刻的性能,但它是一个黑盒算法,因此很难解释。口译在医疗等高风险领域尤为重要。最近提出了各种方法来解释深度学习算法。然而,在临床环境(如放射学)中评估这些解释方法的研究有限。为此,我们进行了一项初步研究,以评估放射学最终用途的解释方法的有效性。我们评估解释方法是否能提高诊断性能,以及放射科医生更喜欢哪种方法。我们还收集了放射科医生认为可以解释的特征。我们发现解释方法提高了诊断性能,但它依赖于个体方法。我们还发现,放射学队列认为主题的洞察力,可视化和准确性是最追求的可解释的特征。本研究获得的见解有可能指导临床使用的解释方法的未来发展和研究。
{"title":"Are Post-Hoc Explanation Methods for Prostate Lesion Detection Effective for Radiology End Use?","authors":"Mehmet Akif Gulum, Christopher M. Trombley, M. Ozen, M. Kantardzic","doi":"10.1109/ICMLA55696.2022.00191","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00191","url":null,"abstract":"Deep learning has demonstrated impressive performance for medical tasks such as cancer classification and lesion detection. While it has achieved impressive performance, it is a black-box algorithm and therefore is difficult to interpret. Interpretation is especially important in fields that are high-risk in nature such as the medical field. There recently has been various methods proposed to interpret deep learning algorithms. However, there are limited studies evaluating these explanation methods in clinical settings such as radiology. To that end, we conduct a pilot study that evaluates the effectiveness of explanation methods for radiology end use. We evaluate if explanation methods improve diagnosis performance and what method is preferred by radiologists. We also glean insight into what characteristics radiologists deem explainable. We found that explanation methods increase diagnosis performance however it is dependent on the individual method. We also find that the radiology cohort deem the themes insight, visualization, and accuracy to be the most sought after explainable characteristics. The insights garnered in this study have the potential to guide future developments and studies of explanation methods for clinical use.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126718294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1