首页 > 最新文献

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Empirical analysis of the convergence of Double DQN in relation to reward sparsity 双DQN收敛性与奖励稀疏性的实证分析
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00102
Samuel Blad, Martin Längkvist, Franziska Klügl-Frohnmeyer, A. Loutfi
Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.
Q-Networks在强化学习中用于模拟给定状态下每个动作的预期回报。当训练q网络时,外部奖励信号被传播到导致每个奖励的先前执行的动作。如果在体验奖励之前需要许多行动,那么奖励信号就会分布在所有这些行动中,其中一些行动可能比其他行动对奖励的影响更大。当奖励之间的重要行动数量增加时,每个行动的相对重要性就会降低。如果动作的重要性太小,它们的影响可能会被深度神经网络模型中的噪声所掩盖,从而可能导致收敛问题。在这项工作中,我们通过经验测试了在简单的网格世界环境中增加导致奖励的行动数量的限制。我们在实验中表明,即使训练误差超过了归因于每个动作的奖励信号,模型仍然能够学习到足够平滑的值表示。
{"title":"Empirical analysis of the convergence of Double DQN in relation to reward sparsity","authors":"Samuel Blad, Martin Längkvist, Franziska Klügl-Frohnmeyer, A. Loutfi","doi":"10.1109/ICMLA55696.2022.00102","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00102","url":null,"abstract":"Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127233843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imitation from Observation using RL and Graph-based Representation of Demonstrations 基于RL的观察模仿和基于图的演示表示
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00202
Y. Manyari, P. Callet, Laurent Dollé
Teaching robots behavioral skills by leveraging examples provided by an expert, also referred to as Imitation Learning from Observation (IfO or ILO), is a promising approach for learning novel tasks without requiring a task-specific reward function to be engineered. We propose a RL-based framework to teach robots manipulation tasks given expert observation-only demonstrations. First, a representation model is trained to extract spatial and temporal features from demonstrations. Graph Neural Networks (GNNs) are used to encode spatial patterns, while LSTMs and Transformers are used to encode temporal features. Second, based on an off-the-shelf RL algorithm, the demonstrations are leveraged through the trained representation to guide the policy training towards solving the task demonstrated by the expert. We show that our approach compares favorably to state-of-the-art IfO algorithms with a 99% success rate and transfers well to the real world.
通过专家提供的例子来教授机器人行为技能,也被称为从观察中模仿学习(IfO或ILO),是一种很有前途的学习新任务的方法,而不需要设计特定任务的奖励函数。我们提出了一个基于强化学习的框架来教授机器人操作任务,并给出仅限专家观察的演示。首先,训练表征模型从演示中提取时空特征。图神经网络(gnn)用于空间模式编码,lstm和transformer用于时间特征编码。其次,基于现成的强化学习算法,通过训练的表示来利用演示来指导策略训练,以解决专家演示的任务。我们表明,我们的方法与最先进的IfO算法相比,具有99%的成功率,并且可以很好地转移到现实世界。
{"title":"Imitation from Observation using RL and Graph-based Representation of Demonstrations","authors":"Y. Manyari, P. Callet, Laurent Dollé","doi":"10.1109/ICMLA55696.2022.00202","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00202","url":null,"abstract":"Teaching robots behavioral skills by leveraging examples provided by an expert, also referred to as Imitation Learning from Observation (IfO or ILO), is a promising approach for learning novel tasks without requiring a task-specific reward function to be engineered. We propose a RL-based framework to teach robots manipulation tasks given expert observation-only demonstrations. First, a representation model is trained to extract spatial and temporal features from demonstrations. Graph Neural Networks (GNNs) are used to encode spatial patterns, while LSTMs and Transformers are used to encode temporal features. Second, based on an off-the-shelf RL algorithm, the demonstrations are leveraged through the trained representation to guide the policy training towards solving the task demonstrated by the expert. We show that our approach compares favorably to state-of-the-art IfO algorithms with a 99% success rate and transfers well to the real world.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127356364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepWafer: A Generative Wafermap Model with Deep Adversarial Networks DeepWafer:一个具有深度对抗网络的生成晶圆图模型
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00025
H. Mahyar, Peter Tulala, E. Ghalebi, R. Grosu
A certain amount of process deviations characterizes semiconductor manufacturing processes. Automated detection of these production issues followed by an automated root cause analysis has the potential to increase the effectiveness of semiconductor production. Manufacturing defects exhibit typical patterns in measured wafer test data, e.g., rings, spots, repetitive textures, or scratches. Recognizing these patterns is an essential step for finding the root cause of production issues. This paper demonstrates that combining Information Maximizing Generative Adversarial Network (InfoGAN) and Wasserstein GAN (WGAN) with a new loss function is suitable for extracting the most characteristic features from extensive real-world sensory wafer test data, which in various aspects outperforms traditional unsupervised techniques. These features are then used in subsequent clustering tasks to group wafers into clusters according to their exhibit patterns. The primary outcome of this work is a statistical generative model for recognizing spatial wafermaps patterns using deep adversarial neural networks. We experimentally evaluate the performance of the proposed approach over a real dataset.
一定数量的工艺偏差是半导体制造过程的特征。自动化检测这些生产问题,然后进行自动化的根本原因分析,有可能提高半导体生产的效率。在测量的晶圆测试数据中,制造缺陷表现出典型的模式,例如,环、斑点、重复纹理或划痕。识别这些模式是找到生产问题的根本原因的必要步骤。本文证明了将信息最大化生成对抗网络(InfoGAN)和Wasserstein GAN (WGAN)与一种新的损失函数相结合,适用于从大量真实感官晶片测试数据中提取最具特征的特征,在许多方面优于传统的无监督技术。然后在后续的集群任务中使用这些特性,根据晶圆的显示模式将其分组到集群中。这项工作的主要成果是一个使用深度对抗神经网络识别空间晶圆图模式的统计生成模型。我们通过实验评估了在真实数据集上提出的方法的性能。
{"title":"DeepWafer: A Generative Wafermap Model with Deep Adversarial Networks","authors":"H. Mahyar, Peter Tulala, E. Ghalebi, R. Grosu","doi":"10.1109/ICMLA55696.2022.00025","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00025","url":null,"abstract":"A certain amount of process deviations characterizes semiconductor manufacturing processes. Automated detection of these production issues followed by an automated root cause analysis has the potential to increase the effectiveness of semiconductor production. Manufacturing defects exhibit typical patterns in measured wafer test data, e.g., rings, spots, repetitive textures, or scratches. Recognizing these patterns is an essential step for finding the root cause of production issues. This paper demonstrates that combining Information Maximizing Generative Adversarial Network (InfoGAN) and Wasserstein GAN (WGAN) with a new loss function is suitable for extracting the most characteristic features from extensive real-world sensory wafer test data, which in various aspects outperforms traditional unsupervised techniques. These features are then used in subsequent clustering tasks to group wafers into clusters according to their exhibit patterns. The primary outcome of this work is a statistical generative model for recognizing spatial wafermaps patterns using deep adversarial neural networks. We experimentally evaluate the performance of the proposed approach over a real dataset.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130621903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-stream Deep Residual Network for Cloud Imputation Using Multi-resolution Remote Sensing Imagery 基于多分辨率遥感影像的云计算多流深度残差网络
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00021
Yifan Zhao, Xian Yang, Ranga Raju Vatsavai
For more than five decades, remote sensing imagery has been providing critical information for many applications such as crop monitoring, disaster assessment, and urban planning. Unfortunately, more than 50% of optical remote sensing images are contaminated by clouds severely affecting the object identification. However, thanks to recent advances in remote sensing instruments and increase in number of operational satellites, we now have petabytes of multi-sensor observations covering the globe. Historically cloud imputation techniques were designed for single sensor images, thus existing benchmarks were mostly limited to single sensor images, which precludes design and validation of cloud imputation techniques on multi-sensor data. In this paper, we introduce a new benchmark data set consisting of images from two widely used and publicly available satellite images, Landsat-8 and Sentinel-2, and a new multi-stream deep residual network (MDRN). This newly introduced benchmark dataset fills an important gap in the existing benchmark datasets, which allows exploitation of multi-resolution spectral information from the cloud-free regions of temporally nearby images, and the MDRN algorithm addresses imputation using the multi-resolution data. Both quantitative and qualitative experiments show that the utility of our benchmark dataset and as well as efficacy of our MDRN architecture in cloud imputation. The MDRN outperforms the closest competing method by 14.1%.
50多年来,遥感图像一直为作物监测、灾害评估和城市规划等许多应用提供关键信息。不幸的是,超过50%的光学遥感图像被云污染,严重影响了目标识别。然而,由于遥感仪器的最新进展和运行卫星数量的增加,我们现在拥有覆盖全球的pb级多传感器观测数据。以往的云插值技术都是针对单传感器图像设计的,因此现有的基准测试大多局限于单传感器图像,这就阻碍了云插值技术在多传感器数据上的设计和验证。在本文中,我们引入了一个新的基准数据集,该数据集由两个广泛使用和公开可用的卫星图像组成,Landsat-8和Sentinel-2,以及一个新的多流深度残差网络(MDRN)。这个新引入的基准数据集填补了现有基准数据集的一个重要空白,它允许利用来自暂时附近图像的无云区域的多分辨率光谱信息,并且MDRN算法解决了使用多分辨率数据的插值问题。定量和定性实验都表明了我们的基准数据集的有效性,以及我们的MDRN架构在云插值中的有效性。MDRN比最接近的竞争方法高出14.1%。
{"title":"Multi-stream Deep Residual Network for Cloud Imputation Using Multi-resolution Remote Sensing Imagery","authors":"Yifan Zhao, Xian Yang, Ranga Raju Vatsavai","doi":"10.1109/ICMLA55696.2022.00021","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00021","url":null,"abstract":"For more than five decades, remote sensing imagery has been providing critical information for many applications such as crop monitoring, disaster assessment, and urban planning. Unfortunately, more than 50% of optical remote sensing images are contaminated by clouds severely affecting the object identification. However, thanks to recent advances in remote sensing instruments and increase in number of operational satellites, we now have petabytes of multi-sensor observations covering the globe. Historically cloud imputation techniques were designed for single sensor images, thus existing benchmarks were mostly limited to single sensor images, which precludes design and validation of cloud imputation techniques on multi-sensor data. In this paper, we introduce a new benchmark data set consisting of images from two widely used and publicly available satellite images, Landsat-8 and Sentinel-2, and a new multi-stream deep residual network (MDRN). This newly introduced benchmark dataset fills an important gap in the existing benchmark datasets, which allows exploitation of multi-resolution spectral information from the cloud-free regions of temporally nearby images, and the MDRN algorithm addresses imputation using the multi-resolution data. Both quantitative and qualitative experiments show that the utility of our benchmark dataset and as well as efficacy of our MDRN architecture in cloud imputation. The MDRN outperforms the closest competing method by 14.1%.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131002729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BlinkNet: Software-Defined Deep Learning Analytics with Bounded Resources BlinkNet:有限资源下的软件定义深度学习分析
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00037
Brian Koga, Theresa VanderWeide, Xinghui Zhao, Xuechen Zhang
Deep neural networks (DNNs) have recently gained unprecedented success in various domains. In resource-constrained edge systems (e.g., mobile devices and IoT devices) QoS-aware DNNs are required to meet latency and memory/storage requirements of mission-critical deep learning applications. However, none of the existing DNNs has been de-signed to satisfy both latency and memory bounds simultaneously as specified by end-users in the resource-constrained systems. This paper proposes a runtime system, BlinkNet, which can guarantee both latency and memory/storage bounds for one or multiple DNNs via efficient QoS-aware per-layer approximation. We implement BlinkNet in Apache TVM and evaluate it using CaffeNet, CIFAR-10-quick, and VGG16 network models on both CPU and GPU platforms. Our experimental results show that BlinkNet can enforce various latency and memory bounds set by end-users with real-world datasets.
近年来,深度神经网络(dnn)在各个领域取得了前所未有的成功。在资源受限的边缘系统(例如,移动设备和物联网设备)中,需要qos感知dnn来满足关键任务深度学习应用的延迟和内存/存储需求。然而,现有的dnn都没有被设计成同时满足资源受限系统中最终用户指定的延迟和内存边界。本文提出了一种运行时系统BlinkNet,它可以通过有效的qos感知的逐层近似来保证一个或多个dnn的延迟和内存/存储边界。我们在Apache TVM中实现了BlinkNet,并在CPU和GPU平台上使用CaffeNet、CIFAR-10-quick和VGG16网络模型对其进行了评估。我们的实验结果表明,BlinkNet可以在实际数据集上执行最终用户设置的各种延迟和内存边界。
{"title":"BlinkNet: Software-Defined Deep Learning Analytics with Bounded Resources","authors":"Brian Koga, Theresa VanderWeide, Xinghui Zhao, Xuechen Zhang","doi":"10.1109/ICMLA55696.2022.00037","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00037","url":null,"abstract":"Deep neural networks (DNNs) have recently gained unprecedented success in various domains. In resource-constrained edge systems (e.g., mobile devices and IoT devices) QoS-aware DNNs are required to meet latency and memory/storage requirements of mission-critical deep learning applications. However, none of the existing DNNs has been de-signed to satisfy both latency and memory bounds simultaneously as specified by end-users in the resource-constrained systems. This paper proposes a runtime system, BlinkNet, which can guarantee both latency and memory/storage bounds for one or multiple DNNs via efficient QoS-aware per-layer approximation. We implement BlinkNet in Apache TVM and evaluate it using CaffeNet, CIFAR-10-quick, and VGG16 network models on both CPU and GPU platforms. Our experimental results show that BlinkNet can enforce various latency and memory bounds set by end-users with real-world datasets.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125373340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning 基于注意力的强化学习泛化策略与值的部分解耦
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00011
N. Nafi, Creighton Glasscock, W. Hsu
In this work, we introduce Attention-based Partially Decoupled Actor-Critic (APDAC), an actor-critic architecture for generalization in reinforcement learning, which partially separates the policy and the value functions. To learn directly from images, traditional actor-critic architectures use a shared network to represent the policy and value functions. While a shared representation allows parameter and feature sharing, it can also lead to overfitting that catastrophically damages generalization performance. On the other hand, two separate networks for policy and value can help to avoid overfitting and reduce the generalization gap, but at the cost of added complexity both in terms of architecture design and computation time. APDAC is a hybrid architecture that builds upon the combined strengths of both architectures by sharing initial layer blocks of the network and separating the later ones for policy and value. APDAC incorporates an attention mechanism to enable robust representation learning. We present meaningful visualization of the policy and value that explains the perception of the trained agent. Our empirical analysis, including an ablation study, shows that APDAC significantly outperforms the standard PPO baseline on the challenging RL generalization benchmark Procgen and achieves performance that is competitive with the recent state-of-the-art method (IDAAC) while using fewer convolutional layers and requiring less computational time. Our code is available at https://github.com/nasiknafi/apdac.
在这项工作中,我们引入了基于注意力的部分解耦行为者-批评者(APDAC),这是一种用于强化学习泛化的行为者-批评者架构,它部分分离了策略和价值函数。为了直接从图像中学习,传统的演员评论架构使用共享网络来表示策略和价值函数。虽然共享表示允许参数和特征共享,但它也可能导致过度拟合,从而灾难性地损害泛化性能。另一方面,策略和值的两个独立网络可以帮助避免过拟合并减少泛化差距,但代价是在架构设计和计算时间方面增加了复杂性。APDAC是一种混合体系结构,它通过共享网络的初始层块并根据策略和价值分离后来的层块,从而建立在两种体系结构的综合优势之上。APDAC结合了一个注意机制来实现稳健的表示学习。我们提出了有意义的可视化策略和值,解释了训练代理的感知。我们的实证分析,包括消蚀研究,表明APDAC在具有挑战性的RL泛化基准Procgen上显著优于标准PPO基线,并且在使用更少的卷积层和更少的计算时间的情况下,实现了与最新的最先进方法(IDAAC)竞争的性能。我们的代码可在https://github.com/nasiknafi/apdac上获得。
{"title":"Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning","authors":"N. Nafi, Creighton Glasscock, W. Hsu","doi":"10.1109/ICMLA55696.2022.00011","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00011","url":null,"abstract":"In this work, we introduce Attention-based Partially Decoupled Actor-Critic (APDAC), an actor-critic architecture for generalization in reinforcement learning, which partially separates the policy and the value functions. To learn directly from images, traditional actor-critic architectures use a shared network to represent the policy and value functions. While a shared representation allows parameter and feature sharing, it can also lead to overfitting that catastrophically damages generalization performance. On the other hand, two separate networks for policy and value can help to avoid overfitting and reduce the generalization gap, but at the cost of added complexity both in terms of architecture design and computation time. APDAC is a hybrid architecture that builds upon the combined strengths of both architectures by sharing initial layer blocks of the network and separating the later ones for policy and value. APDAC incorporates an attention mechanism to enable robust representation learning. We present meaningful visualization of the policy and value that explains the perception of the trained agent. Our empirical analysis, including an ablation study, shows that APDAC significantly outperforms the standard PPO baseline on the challenging RL generalization benchmark Procgen and achieves performance that is competitive with the recent state-of-the-art method (IDAAC) while using fewer convolutional layers and requiring less computational time. Our code is available at https://github.com/nasiknafi/apdac.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123390348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic Key Information Extraction from Visually Rich Documents 从视觉丰富的文档中自动提取关键信息
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00020
Charles De Trogoff, Rim Hantach, Gisela Lechuga, P. Calvez
Currently, the need for business documents analysis, particularly invoices, is playing a vital role in companies, especially in large ones. These documents have the particularity of being visually rich, with low text quantity and many different layouts. As such, processing them with traditional techniques remains inefficient. Hence, one of the key challenge is to exploit visual patterns between entities of interest. After an overview of the state-of-the-art in this domain, we propose a graph-based model that recognizes specific text in invoices. First, an Encoder module creates a multimodal embedding for each text sequence based on textual, visual, and spatial information. This representation is then passed through a multi-layer graph attention network, before being subjected to a simple classification task. Some experimental results were conducted in order to improve the performance of the proposed approach.
目前,对商业文档分析的需求,特别是发票,在公司中扮演着至关重要的角色,特别是在大公司中。这些文档具有视觉丰富、文本数量少、布局多样等特点。因此,用传统技术处理它们仍然效率低下。因此,关键的挑战之一是利用感兴趣的实体之间的视觉模式。在概述了该领域的最新技术之后,我们提出了一个基于图的模型,该模型可以识别发票中的特定文本。首先,Encoder模块基于文本、视觉和空间信息为每个文本序列创建多模态嵌入。然后,在进行简单的分类任务之前,该表示通过多层图注意网络。为了提高该方法的性能,进行了一些实验结果。
{"title":"Automatic Key Information Extraction from Visually Rich Documents","authors":"Charles De Trogoff, Rim Hantach, Gisela Lechuga, P. Calvez","doi":"10.1109/ICMLA55696.2022.00020","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00020","url":null,"abstract":"Currently, the need for business documents analysis, particularly invoices, is playing a vital role in companies, especially in large ones. These documents have the particularity of being visually rich, with low text quantity and many different layouts. As such, processing them with traditional techniques remains inefficient. Hence, one of the key challenge is to exploit visual patterns between entities of interest. After an overview of the state-of-the-art in this domain, we propose a graph-based model that recognizes specific text in invoices. First, an Encoder module creates a multimodal embedding for each text sequence based on textual, visual, and spatial information. This representation is then passed through a multi-layer graph attention network, before being subjected to a simple classification task. Some experimental results were conducted in order to improve the performance of the proposed approach.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121302281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer Learning model for Social Emotion Prediction using Writers Emotions in Comments 基于评论作者情绪的社会情绪预测迁移学习模型
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00063
Abdullah Alsaedi, S. Thomason, F. Grasso, Phillip Brooker
Social emotion prediction is concerned with the prediction of the reader’s emotion when exposed to a text. In this paper, we propose a transfer learning approach to social emotion prediction, where the source task is writer’s emotion prediction, an area in which models are advanced due to the rich literature and availability of large and high-quality training datasets. We utilized a pre-trained writer’s emotion prediction model to predict the writer’s emotion in comments, then we aggregated the emotions and trained a classifier to predict social emotion for posts. Results show that pre-trained models for writer’s emotion prediction can improve the prediction of social emotion. Furthermore, we demonstrate that our proposed model outperforms popular models in terms of F1-score and performs similarly to the best model in terms of Acc@1.
社会情绪预测关注的是读者在接触文本时的情绪预测。在本文中,我们提出了一种用于社会情绪预测的迁移学习方法,其中源任务是作者的情绪预测,由于丰富的文献和大量高质量训练数据集的可用性,该领域的模型是先进的。我们利用预训练的作者情绪预测模型来预测作者在评论中的情绪,然后我们将情绪汇总并训练分类器来预测帖子的社会情绪。结果表明,预训练的作家情绪预测模型可以提高对社会情绪的预测。此外,我们证明了我们提出的模型在f1得分方面优于流行模型,并且在Acc@1方面与最佳模型相似。
{"title":"Transfer Learning model for Social Emotion Prediction using Writers Emotions in Comments","authors":"Abdullah Alsaedi, S. Thomason, F. Grasso, Phillip Brooker","doi":"10.1109/ICMLA55696.2022.00063","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00063","url":null,"abstract":"Social emotion prediction is concerned with the prediction of the reader’s emotion when exposed to a text. In this paper, we propose a transfer learning approach to social emotion prediction, where the source task is writer’s emotion prediction, an area in which models are advanced due to the rich literature and availability of large and high-quality training datasets. We utilized a pre-trained writer’s emotion prediction model to predict the writer’s emotion in comments, then we aggregated the emotions and trained a classifier to predict social emotion for posts. Results show that pre-trained models for writer’s emotion prediction can improve the prediction of social emotion. Furthermore, we demonstrate that our proposed model outperforms popular models in terms of F1-score and performs similarly to the best model in terms of Acc@1.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126924424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PerMTL: A Multi-Task Learning Framework for Skilled Human Performance Assessment PerMTL:一个用于技术人员绩效评估的多任务学习框架
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00177
Indrajeet Ghosh, Avijoy Chakma, S. R. Ramamurthy, Nirmalya Roy, Nicholas R. Waytowich
Intelligent and complex human motion analysis can help design the next generation IoT and AR/VR systems for automated human performance assessment. Such an automated system can help advocate the interpretability and translatability of complex human motions, intelligent motion feedback, and fine-grained motion skill assessment to design next-generation interactive human-machine teaming systems. Motivated by this, we design a wearable sensing framework for assessing the players’ performance and consider a live badminton game as our use case. Generally, the players on the field try to improve their performance by focusing on fast and synchronous coordination of their limbs’ reflex actions to have the ideal body postures to perform the desired shot. Learning the minute dissimilarities and distinctive traits from each limb of the players simultaneously can help assess the players’ performance and specific skillsets during a game. This paper proposes a multi-task learning framework, PerMTL to learn the shared features from each player’s limb. The PerMTL comprises a task-specific regressor output layer that helps to determine the dissimilarities and distinctive traits between the player’s limbs for collective inference in a body sensor network (BSN) environment. We evaluate the PerMTL framework using publicly available Badminton Activity Recognition (BAR) and Daily and Sports Activities (DSA) datasets. Empirical results indicate that PerMTL achieves R2 Score of ≈ 82% in predicting the players’ performance.
智能和复杂的人体运动分析可以帮助设计下一代物联网和AR/VR系统,用于自动化的人体性能评估。这样的自动化系统可以帮助倡导复杂的人类动作的可解释性和可翻译性,智能动作反馈和细粒度运动技能评估,以设计下一代交互式人机团队系统。受此启发,我们设计了一个可穿戴传感框架来评估球员的表现,并将一场羽毛球比赛作为我们的用例。一般来说,运动员在场上都是通过快速、同步地协调四肢的反射动作,以达到理想的身体姿势来完成期望的投篮,从而提高自己的表现。同时了解每个球员肢体的细微差异和独特特征可以帮助评估球员在比赛中的表现和特定技能。本文提出了一个多任务学习框架PerMTL,从每个玩家的肢体中学习共享特征。PerMTL包括一个任务特定的回归输出层,有助于确定在身体传感器网络(BSN)环境中,玩家肢体之间的差异和显著特征。我们使用公开可用的羽毛球活动识别(BAR)和日常和体育活动(DSA)数据集来评估PerMTL框架。实证结果表明,PerMTL预测球员表现的R2得分约为82%。
{"title":"PerMTL: A Multi-Task Learning Framework for Skilled Human Performance Assessment","authors":"Indrajeet Ghosh, Avijoy Chakma, S. R. Ramamurthy, Nirmalya Roy, Nicholas R. Waytowich","doi":"10.1109/ICMLA55696.2022.00177","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00177","url":null,"abstract":"Intelligent and complex human motion analysis can help design the next generation IoT and AR/VR systems for automated human performance assessment. Such an automated system can help advocate the interpretability and translatability of complex human motions, intelligent motion feedback, and fine-grained motion skill assessment to design next-generation interactive human-machine teaming systems. Motivated by this, we design a wearable sensing framework for assessing the players’ performance and consider a live badminton game as our use case. Generally, the players on the field try to improve their performance by focusing on fast and synchronous coordination of their limbs’ reflex actions to have the ideal body postures to perform the desired shot. Learning the minute dissimilarities and distinctive traits from each limb of the players simultaneously can help assess the players’ performance and specific skillsets during a game. This paper proposes a multi-task learning framework, PerMTL to learn the shared features from each player’s limb. The PerMTL comprises a task-specific regressor output layer that helps to determine the dissimilarities and distinctive traits between the player’s limbs for collective inference in a body sensor network (BSN) environment. We evaluate the PerMTL framework using publicly available Badminton Activity Recognition (BAR) and Daily and Sports Activities (DSA) datasets. Empirical results indicate that PerMTL achieves R2 Score of ≈ 82% in predicting the players’ performance.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115284275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Performance-Actionability Trade-Off in Retention Prediction at Middle School 中学保留预测的绩效-行动权衡
Pub Date : 2022-12-01 DOI: 10.1109/ICMLA55696.2022.00087
Susana Lavado, Miguel Mateus, Leid Zejnilovic
Predicting students’ retention risk is one of the major trends in machine learning applications in education. While early identification of at-risk students allows timely planning and implementation of measures to prevent adverse outcomes, there is a trade-off between the predictive model’s performance and the prediction window size, or model performance and its actionability. In this study, we used a dataset of 83,596 unique Portuguese students in grades 5th to 9th to predict retention at or before the end of 9th grade. We explored how different prediction window sizes impact the predictive model’s performance, the feature importance, and the models’ bias. The models with the shorter prediction window performed better in terms of precision, but the model with the largest prediction window showed a higher lift over the existing rule-based model. Prediction window size impacted the importance of demographic features and model’s fairness. Our results contribute to the extant discussion on predicting retention, by adding empirical evidence about the models’ added value in performance versus the existing practice, suggesting types of data to collect and use, and discussing education-specific challenges of responsible data science.
预测学生的滞留风险是机器学习在教育领域应用的主要趋势之一。虽然早期识别有风险的学生可以及时规划和实施措施,以防止不良后果,但在预测模型的性能和预测窗口大小之间,或模型性能与其可操作性之间存在权衡。在这项研究中,我们使用了83596名五年级到九年级的葡萄牙学生的数据集来预测九年级结束或之前的保留率。我们探讨了不同的预测窗口大小如何影响预测模型的性能、特征重要性和模型的偏差。预测窗口较短的模型在精度方面表现较好,但预测窗口最大的模型比现有的基于规则的模型有更高的提升。预测窗口大小影响人口统计学特征的重要性和模型的公平性。我们的结果有助于现有的关于预测保留的讨论,通过添加关于模型在绩效方面的附加值与现有实践的经验证据,建议收集和使用的数据类型,并讨论负责任的数据科学的教育特定挑战。
{"title":"The Performance-Actionability Trade-Off in Retention Prediction at Middle School","authors":"Susana Lavado, Miguel Mateus, Leid Zejnilovic","doi":"10.1109/ICMLA55696.2022.00087","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00087","url":null,"abstract":"Predicting students’ retention risk is one of the major trends in machine learning applications in education. While early identification of at-risk students allows timely planning and implementation of measures to prevent adverse outcomes, there is a trade-off between the predictive model’s performance and the prediction window size, or model performance and its actionability. In this study, we used a dataset of 83,596 unique Portuguese students in grades 5th to 9th to predict retention at or before the end of 9th grade. We explored how different prediction window sizes impact the predictive model’s performance, the feature importance, and the models’ bias. The models with the shorter prediction window performed better in terms of precision, but the model with the largest prediction window showed a higher lift over the existing rule-based model. Prediction window size impacted the importance of demographic features and model’s fairness. Our results contribute to the extant discussion on predicting retention, by adding empirical evidence about the models’ added value in performance versus the existing practice, suggesting types of data to collect and use, and discussing education-specific challenges of responsible data science.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125228784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1