首页 > 最新文献

Proceedings of the 2023 6th International Conference on Machine Vision and Applications最新文献

英文 中文
Object-Based Vehicle Color Recognition in Uncontrolled Environment 非受控环境下基于目标的车辆颜色识别
Panumate Chetprayoon, Theerat Sakdejayont, Monchai Lertsutthiwong
The demand for vehicle recognition significantly increases with impact on many businesses in recent decades. This paper focuses on a vehicle color attribute. A novel method for vehicle color recognition is introduced to overcome three challenges of vehicle color recognition. The first challenge is an uncontrolled environment such as shadow, brightness, and reflection. Second, similar color is hard to be taken into account. Third, few research works dedicate to multi-color vehicle recognition. Previous works can provide only color information of the whole vehicle, but not at vehicle part level. In this study, a new approach for recognizing the colors of vehicles at the part level is introduced. It utilizes object detection techniques to identify the colors based on the different objects (e.g. parts of a vehicle in this research). In addition, a novel generic post-processing is proposed to improve robustness in the uncontrolled environment and support not only single-color but also multi-color vehicles. Experimental results show that it can effectively identify the color under the three challenges addressed above with 99 % accuracy for single-color vehicle and outperforms the other seven baseline models, and 76 % accuracy for multi-color vehicle.
近几十年来,对车辆识别的需求显著增加,对许多企业产生了影响。本文主要研究车辆颜色属性。提出了一种新的车辆颜色识别方法,克服了车辆颜色识别的三大难题。第一个挑战是一个不受控制的环境,如阴影、亮度和反射。其次,相似的颜色很难被考虑在内。第三,针对多色车辆识别的研究较少。以往的作品只能提供整车的颜色信息,而不能提供整车部件层面的颜色信息。在本研究中,提出了一种在零件层面上识别车辆颜色的新方法。它利用物体检测技术来识别基于不同物体的颜色(例如在本研究中车辆的部件)。此外,提出了一种新的通用后处理方法,以提高非受控环境下的鲁棒性,不仅支持单色车辆,也支持多色车辆。实验结果表明,在上述三种挑战下,该模型对单色车辆的识别准确率为99%,优于其他7种基准模型,对多色车辆的识别准确率为76%。
{"title":"Object-Based Vehicle Color Recognition in Uncontrolled Environment","authors":"Panumate Chetprayoon, Theerat Sakdejayont, Monchai Lertsutthiwong","doi":"10.1145/3589572.3589585","DOIUrl":"https://doi.org/10.1145/3589572.3589585","url":null,"abstract":"The demand for vehicle recognition significantly increases with impact on many businesses in recent decades. This paper focuses on a vehicle color attribute. A novel method for vehicle color recognition is introduced to overcome three challenges of vehicle color recognition. The first challenge is an uncontrolled environment such as shadow, brightness, and reflection. Second, similar color is hard to be taken into account. Third, few research works dedicate to multi-color vehicle recognition. Previous works can provide only color information of the whole vehicle, but not at vehicle part level. In this study, a new approach for recognizing the colors of vehicles at the part level is introduced. It utilizes object detection techniques to identify the colors based on the different objects (e.g. parts of a vehicle in this research). In addition, a novel generic post-processing is proposed to improve robustness in the uncontrolled environment and support not only single-color but also multi-color vehicles. Experimental results show that it can effectively identify the color under the three challenges addressed above with 99 % accuracy for single-color vehicle and outperforms the other seven baseline models, and 76 % accuracy for multi-color vehicle.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114827819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions 骷髅:人与物体交互的细粒度姿势合成
Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang
Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.
由于人体具有复杂多变的表征,人-物交互的合成是一个具有挑战性的问题。现有的解决方案可以很好地生成单个物体或人脸,但在生成逼真的人体及其与多个物体的交互方面仍然面临困难。在这项工作中,我们专注于基于HOI描述三联体的人体姿势合成,并引入了一种新的视角,将人与物体之间的每个动作分解为人体部位的子动作,以细粒度的方式生成身体姿势。我们提出了一个条件生成对抗模型,用于对人与物体之间的交互进行身体-部位级控制。使用HICO-DET数据集对骷髅进行训练和评估,该数据集是一个知识库,由现实场景中各种人类物体动作的复杂交互姿势组成。我们通过定性和定量评估表明,该模型能够生成与给定语义特征一致的多种合理姿势,特别是我们的模型还可以预测物体与身体姿势的相对位置。我们还探索了包括共同发生的人类动作的合成复合姿势,表明该模型可以学习人类姿势和给定条件语义特征之间的多模态关系。
{"title":"SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions","authors":"Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang","doi":"10.1145/3589572.3589579","DOIUrl":"https://doi.org/10.1145/3589572.3589579","url":null,"abstract":"Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130193441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Feature Enhancement Network for Face Forgery Detection 人脸伪造检测的多尺度特征增强网络
Zhiyuan Ma, Xue Mei, Hao Chen, Jienan Shen
Nowadays, synthesizing realistic fake face images and videos becomes easy benefiting from the advance in generation technology. With the popularity of face forgery, abuse of the technology occurs from time to time, which promotes the research on face forgery detection to be an emergency. To deal with the potential risks, we propose a face forgery detection method based on multi-scale feature enhancement. Specifically, we analyze the forgery traces from the perspective of texture and frequency domain, respectively. We find that forgery traces are hard to be perceived by human eyes but noticeable in shallow layers of CNNs and middle-frequency domain and high-frequency domain. Hence, to reserve more forgery information, we design a texture feature enhancement module and a frequency domain feature enhancement module, respectively. The experiments on FaceForensics++ dataset and Celeb-DF dataset show that our method exceeds most existing networks and methods, which proves that our method has strong classification ability.
如今,由于生成技术的进步,合成逼真的假人脸图像和视频变得容易。随着人脸伪造技术的普及,滥用人脸伪造技术的现象时有发生,使得人脸伪造检测的研究成为当务之急。为了应对潜在的风险,我们提出了一种基于多尺度特征增强的人脸伪造检测方法。具体来说,我们分别从纹理和频域的角度对伪造痕迹进行了分析。我们发现,人眼很难感知到伪造痕迹,但在cnn的浅层和中频域、高频域明显可见。因此,为了保留更多的伪造信息,我们分别设计了纹理特征增强模块和频域特征增强模块。在FaceForensics++数据集和Celeb-DF数据集上的实验表明,我们的方法优于大多数现有的网络和方法,证明了我们的方法具有很强的分类能力。
{"title":"Multi-Scale Feature Enhancement Network for Face Forgery Detection","authors":"Zhiyuan Ma, Xue Mei, Hao Chen, Jienan Shen","doi":"10.1145/3589572.3589577","DOIUrl":"https://doi.org/10.1145/3589572.3589577","url":null,"abstract":"Nowadays, synthesizing realistic fake face images and videos becomes easy benefiting from the advance in generation technology. With the popularity of face forgery, abuse of the technology occurs from time to time, which promotes the research on face forgery detection to be an emergency. To deal with the potential risks, we propose a face forgery detection method based on multi-scale feature enhancement. Specifically, we analyze the forgery traces from the perspective of texture and frequency domain, respectively. We find that forgery traces are hard to be perceived by human eyes but noticeable in shallow layers of CNNs and middle-frequency domain and high-frequency domain. Hence, to reserve more forgery information, we design a texture feature enhancement module and a frequency domain feature enhancement module, respectively. The experiments on FaceForensics++ dataset and Celeb-DF dataset show that our method exceeds most existing networks and methods, which proves that our method has strong classification ability.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124593534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multistage Framework for Detection of Very Small Objects 一种用于微小目标检测的多级框架
Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya
Small object detection is one of the most challenging problems in computer vision. Algorithms based on state-of-the-art object detection methods such as R-CNN, SSD, FPN, and YOLO fail to detect objects of very small sizes. In this study, we propose a novel method to detect very small objects, smaller than 8×8 pixels, that appear in a complex background. The proposed method is a multistage framework consisting of an unsupervised algorithm and three separately trained supervised algorithms. The unsupervised algorithm extracts ROIs from a high-resolution image. Then the ROIs are upsampled using SRGAN, and the enhanced ROIs are detected by our two-stage cascade classifier based on two ResNet50 models. The maximum size of the images used for training the proposed framework is 32×32 pixels. The experiments are conducted using rescaled German Traffic Sign Recognition Benchmark dataset (GTSRB) and downsampled German Traffic Sign Detection Benchmark dataset (GTSDB). Unlike MS COCO and DOTA datasets, the resulting GTSDB turns out to be very challenging for any small object detection algorithm due to not only the size of objects of interest but the complex textures of the background as well. Our experimental results show that the proposed method detects small traffic signs with an average precision of 0.332 at the intersection over union of 0.3.
小目标检测是计算机视觉中最具挑战性的问题之一。基于R-CNN、SSD、FPN和YOLO等最先进的目标检测方法的算法无法检测到非常小的对象。在这项研究中,我们提出了一种新的方法来检测非常小的物体,小于8×8像素,出现在一个复杂的背景。该方法是由一个无监督算法和三个单独训练的有监督算法组成的多阶段框架。无监督算法从高分辨率图像中提取roi。然后使用SRGAN对roi进行上采样,并使用基于两个ResNet50模型的两阶段级联分类器检测增强的roi。用于训练所提出的框架的图像的最大大小为32×32像素。实验使用重新缩放的德国交通标志识别基准数据集(GTSRB)和下采样的德国交通标志检测基准数据集(GTSDB)进行。与MS COCO和DOTA数据集不同,结果GTSDB对于任何小物体检测算法来说都是非常具有挑战性的,这不仅是因为感兴趣的物体的大小,还因为背景的复杂纹理。实验结果表明,该方法在交叉路口的并集精度为0.3时,检测小交通标志的平均精度为0.332。
{"title":"A Multistage Framework for Detection of Very Small Objects","authors":"Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya","doi":"10.1145/3589572.3589574","DOIUrl":"https://doi.org/10.1145/3589572.3589574","url":null,"abstract":"Small object detection is one of the most challenging problems in computer vision. Algorithms based on state-of-the-art object detection methods such as R-CNN, SSD, FPN, and YOLO fail to detect objects of very small sizes. In this study, we propose a novel method to detect very small objects, smaller than 8×8 pixels, that appear in a complex background. The proposed method is a multistage framework consisting of an unsupervised algorithm and three separately trained supervised algorithms. The unsupervised algorithm extracts ROIs from a high-resolution image. Then the ROIs are upsampled using SRGAN, and the enhanced ROIs are detected by our two-stage cascade classifier based on two ResNet50 models. The maximum size of the images used for training the proposed framework is 32×32 pixels. The experiments are conducted using rescaled German Traffic Sign Recognition Benchmark dataset (GTSRB) and downsampled German Traffic Sign Detection Benchmark dataset (GTSDB). Unlike MS COCO and DOTA datasets, the resulting GTSDB turns out to be very challenging for any small object detection algorithm due to not only the size of objects of interest but the complex textures of the background as well. Our experimental results show that the proposed method detects small traffic signs with an average precision of 0.332 at the intersection over union of 0.3.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133576939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of synthetic images in deep learning for defect recognition in industrial infrastructures 基于深度学习的合成图像在工业基础设施缺陷识别中的应用
Clément Mailhé, A. Ammar, F. Chinesta
The use of synthetic images in deep learning for object detection applications is recognized as a key technological lever in reducing time and cost constraints associated with data-driven processes. In this work, the applicability of training an instance recognition algorithm on a synthetic database in an industrial context is assessed based on the detection of dents in pipes. Photo-realistic artificial images are procedurally generated using a rendering software and used for the training of the YOLOv5 object recognition algorithm. Its prediction effectiveness is assessed on a small test set in different configurations to identify improvement steps towards the reliable use of artificial data in computer-vision.
在深度学习中使用合成图像进行目标检测应用被认为是减少与数据驱动过程相关的时间和成本限制的关键技术杠杆。在这项工作中,基于管道凹痕的检测,评估了在工业背景下合成数据库上训练实例识别算法的适用性。使用渲染软件程序生成逼真的人工图像,并用于YOLOv5对象识别算法的训练。在不同配置的小测试集上评估其预测有效性,以确定在计算机视觉中可靠使用人工数据的改进步骤。
{"title":"On the use of synthetic images in deep learning for defect recognition in industrial infrastructures","authors":"Clément Mailhé, A. Ammar, F. Chinesta","doi":"10.1145/3589572.3589584","DOIUrl":"https://doi.org/10.1145/3589572.3589584","url":null,"abstract":"The use of synthetic images in deep learning for object detection applications is recognized as a key technological lever in reducing time and cost constraints associated with data-driven processes. In this work, the applicability of training an instance recognition algorithm on a synthetic database in an industrial context is assessed based on the detection of dents in pipes. Photo-realistic artificial images are procedurally generated using a rendering software and used for the training of the YOLOv5 object recognition algorithm. Its prediction effectiveness is assessed on a small test set in different configurations to identify improvement steps towards the reliable use of artificial data in computer-vision.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatically Design Lightweight Neural Architectures for Facial Expression Recognition 面部表情识别的自动设计轻量级神经结构
Xiaoyu Han
Facial expression recognition (FER) is a popular direction researched in the field of human-computer interaction. Recently, most of the work in the direction of FER are with the help of convolutional neutral networks (CNNs). However, most of the CNNs used for FER are designed by humans, and the design process is time-consuming and highly relies on the domain expertise. To address this problem, some methods are proposed based on neural architecture search (NAS), which can automatically design neural architectures. Nevertheless, those methods mainly focus on the accuracy of the recognition, but the model size of the designed architecture is often large, which limits the deployment of the architecture on devices with limited computing resources, such as mobile devices. In this paper, a novel approach named AutoFER-L is proposed for automatically designing lightweight CNNs for FER. Specifically, the accuracy of recognition and the model size are both considered in the objective functions, thus the resulting architectures can be both accurate and lightweight. We conduct experiments on CK+ and FER2013, which are popular benchmark datasets for FER. The experimental results show that the CNN architectures designed by the proposed method are more accurate and lighter than the handcrafted models and the models derived by standard NAS.
面部表情识别是人机交互领域的一个热门研究方向。目前,在人工神经网络方向上的大部分工作都是借助卷积神经网络(cnn)进行的。然而,大多数用于人工神经网络的cnn都是由人工设计的,设计过程耗时且高度依赖于领域专业知识。针对这一问题,提出了基于神经结构搜索(NAS)的神经结构自动设计方法。然而,这些方法主要关注识别的准确性,但所设计的体系结构的模型尺寸往往很大,这限制了体系结构在计算资源有限的设备(如移动设备)上的部署。本文提出了一种自动设计轻量级cnn的新方法AutoFER-L。具体来说,目标函数中同时考虑了识别的准确性和模型的大小,因此得到的体系结构既准确又轻量级。我们在CK+和FER2013这两个常用的FER基准数据集上进行了实验。实验结果表明,与手工模型和标准NAS模型相比,该方法设计的CNN结构精度更高,重量更轻。
{"title":"Automatically Design Lightweight Neural Architectures for Facial Expression Recognition","authors":"Xiaoyu Han","doi":"10.1145/3589572.3589587","DOIUrl":"https://doi.org/10.1145/3589572.3589587","url":null,"abstract":"Facial expression recognition (FER) is a popular direction researched in the field of human-computer interaction. Recently, most of the work in the direction of FER are with the help of convolutional neutral networks (CNNs). However, most of the CNNs used for FER are designed by humans, and the design process is time-consuming and highly relies on the domain expertise. To address this problem, some methods are proposed based on neural architecture search (NAS), which can automatically design neural architectures. Nevertheless, those methods mainly focus on the accuracy of the recognition, but the model size of the designed architecture is often large, which limits the deployment of the architecture on devices with limited computing resources, such as mobile devices. In this paper, a novel approach named AutoFER-L is proposed for automatically designing lightweight CNNs for FER. Specifically, the accuracy of recognition and the model size are both considered in the objective functions, thus the resulting architectures can be both accurate and lightweight. We conduct experiments on CK+ and FER2013, which are popular benchmark datasets for FER. The experimental results show that the CNN architectures designed by the proposed method are more accurate and lighter than the handcrafted models and the models derived by standard NAS.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Conversational Health in a Multimodal Conversation Graph by Measuring Emotional Concordance 基于情绪一致性的多模态会话图会话健康检测
Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das
With the advent of social media and technology, the increased connections between individuals and organizations have led to a similar increase in the number of conversations. These conversations, in most cases are bimodal in nature, consisting of both images and text. Existing work in multimodal conversation typically focuses on individual utterances rather than the overall dialogue. The aspect of conversational health is important in many real world conversational uses cases including the emerging world of Metaverse. The work described in this paper investigates conversational health from the viewpoint of emotional concordance in bimodal conversations modelled as graphs. Using this framework, an existing multimodal dialogue dataset has been reformatted as a graph dataset that is labelled with the emotional concordance score. In this work, determination of conversational health has been framed as a graph classification problem. A graph neural network based model using algorithms such as Graph Convolution Network and Graph Attention Network is then used to detect the emotional concordance or discordance based upon the multimodal conversation that is provided. The model proposed in this paper achieves an overall F1 Score of 0.71 for equally sized class training and testing size, which offers improved results compared to previous models using the same benchmark dataset.
随着社交媒体和技术的出现,个人和组织之间联系的增加导致了对话数量的增加。在大多数情况下,这些对话本质上是双峰的,由图像和文本组成。现有的多模态会话研究通常侧重于单个话语,而不是整体对话。会话运行状况在许多真实世界的会话用例中都很重要,包括新兴的Metaverse世界。本文所描述的工作是从情感和谐的角度来研究会话健康的双峰对话模型。使用该框架,现有的多模态对话数据集被重新格式化为带有情感一致性分数标记的图形数据集。在这项工作中,会话健康的确定被框架为一个图分类问题。然后使用基于图卷积网络和图注意网络等算法的基于图神经网络的模型来检测基于所提供的多模态对话的情感一致性或不一致性。本文提出的模型在相同大小的类训练和测试规模下获得了0.71的F1总分,与使用相同基准数据集的先前模型相比,结果有所改善。
{"title":"Detection of Conversational Health in a Multimodal Conversation Graph by Measuring Emotional Concordance","authors":"Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das","doi":"10.1145/3589572.3589588","DOIUrl":"https://doi.org/10.1145/3589572.3589588","url":null,"abstract":"With the advent of social media and technology, the increased connections between individuals and organizations have led to a similar increase in the number of conversations. These conversations, in most cases are bimodal in nature, consisting of both images and text. Existing work in multimodal conversation typically focuses on individual utterances rather than the overall dialogue. The aspect of conversational health is important in many real world conversational uses cases including the emerging world of Metaverse. The work described in this paper investigates conversational health from the viewpoint of emotional concordance in bimodal conversations modelled as graphs. Using this framework, an existing multimodal dialogue dataset has been reformatted as a graph dataset that is labelled with the emotional concordance score. In this work, determination of conversational health has been framed as a graph classification problem. A graph neural network based model using algorithms such as Graph Convolution Network and Graph Attention Network is then used to detect the emotional concordance or discordance based upon the multimodal conversation that is provided. The model proposed in this paper achieves an overall F1 Score of 0.71 for equally sized class training and testing size, which offers improved results compared to previous models using the same benchmark dataset.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123492524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-based mobile analysis of roadside guardrail structures 基于视觉的路边护栏结构移动分析
Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica
Vision-based analysis of the roadside infrastructure is a research field of growing relevance, since autonomous driving, roadside asset digitization and mapping are key emerging applications. The advancement of Deep Learning for vision-based environment perception represents a core enabling technology to interpret scenes in terms of its objects and their spatial relations. In this paper we present a multi-sensory mobile analysis systemic concept, which targets the structural classification of roadside guardrail structures, and allows for digital measurements within the scene surrounding the guardrail objects. We propose an RGB-D vision-based analysis pipeline to perform semantic segmentation and metric dimension estimation of key structural elements of a given guardrail segment. We demonstrate that the semantic segmentation task can be fully learned in the synthetic domain and deployed with a high accuracy in the real domain. Based on guardrail structural measurements aggregated and tracked over time, our pipeline estimates one or several type-labels for the observed guardrail structure, based on a prior catalog of all possible types. The paper presents qualitative and quantitative results from experiments using our measurement vehicle and covering 100km in total. Obtained results demonstrate that the presented mobile analysis framework can well delineate roadside guardrail structures spatially, and able to propose a limited set of type-candidates. The paper also discusses failure modes and possible future improvements towards accomplishing digital mapping and recognition of safety-critical roadside assets.
基于视觉的路边基础设施分析是一个越来越重要的研究领域,因为自动驾驶、路边资产数字化和地图绘制是关键的新兴应用。深度学习在基于视觉的环境感知方面的进步代表了一种核心的使能技术,可以根据物体及其空间关系来解释场景。在本文中,我们提出了一个多感官移动分析系统概念,该概念针对路边护栏结构的结构分类,并允许在护栏物体周围的场景内进行数字测量。我们提出了一种基于RGB-D视觉的分析管道,用于对给定护栏段的关键结构元素进行语义分割和度量维度估计。我们证明了语义分割任务可以在合成域中完全学习,并以较高的准确率部署在真实域中。根据护栏结构测量的汇总和随时间的跟踪,我们的管道根据所有可能类型的先前目录,估计观察到的护栏结构的一种或几种类型标签。本文介绍了用我们的测量车在100公里范围内进行的试验的定性和定量结果。结果表明,所提出的移动分析框架能够很好地描述道路护栏结构的空间分布,并能够提出有限的候选类型集。本文还讨论了故障模式和可能的未来改进,以实现对安全至关重要的路边资产的数字测绘和识别。
{"title":"Vision-based mobile analysis of roadside guardrail structures","authors":"Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica","doi":"10.1145/3589572.3589597","DOIUrl":"https://doi.org/10.1145/3589572.3589597","url":null,"abstract":"Vision-based analysis of the roadside infrastructure is a research field of growing relevance, since autonomous driving, roadside asset digitization and mapping are key emerging applications. The advancement of Deep Learning for vision-based environment perception represents a core enabling technology to interpret scenes in terms of its objects and their spatial relations. In this paper we present a multi-sensory mobile analysis systemic concept, which targets the structural classification of roadside guardrail structures, and allows for digital measurements within the scene surrounding the guardrail objects. We propose an RGB-D vision-based analysis pipeline to perform semantic segmentation and metric dimension estimation of key structural elements of a given guardrail segment. We demonstrate that the semantic segmentation task can be fully learned in the synthetic domain and deployed with a high accuracy in the real domain. Based on guardrail structural measurements aggregated and tracked over time, our pipeline estimates one or several type-labels for the observed guardrail structure, based on a prior catalog of all possible types. The paper presents qualitative and quantitative results from experiments using our measurement vehicle and covering 100km in total. Obtained results demonstrate that the presented mobile analysis framework can well delineate roadside guardrail structures spatially, and able to propose a limited set of type-candidates. The paper also discusses failure modes and possible future improvements towards accomplishing digital mapping and recognition of safety-critical roadside assets.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122736685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning 半监督学习的高效噪声标签学习方法半监督学习的高效噪声标签学习方法
Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung
Even though deep learning models make success in many application areas, it is well-known that they are vulnerable to data noise. Therefore, researches on a model that detects and removes noisy data or the one that operates robustly against noisy data have been actively conducted. However, most existing approaches have limitations in either that important information could be left out while noisy data are cleaned up or that prior information on the dataset is required while such information may not be easily available. In this paper, we propose an effective semi-supervised learning method with model ensemble and parameter scheduling techniques. Our experiment results show that the proposed method achieves the best accuracy under 20% and 40% noise-ratio conditions. The proposed model is robust to data noise, suffering from only 2.08% of accuracy degradation when the noise ratio increases from 20% to 60% on CIFAR-10. We additionally perform an ablation study to verify net accuracy enhancement by applying one technique after another.
尽管深度学习模型在许多应用领域取得了成功,但众所周知,它们很容易受到数据噪声的影响。因此,对检测和去除噪声数据的模型或对噪声数据进行鲁棒运算的模型的研究一直在积极进行。然而,大多数现有方法都存在局限性,要么是在清理噪声数据时遗漏了重要信息,要么是需要数据集上的先验信息,而这些信息可能不容易获得。本文提出了一种结合模型集成和参数调度技术的有效的半监督学习方法。实验结果表明,该方法在噪声比为20%和40%的情况下具有最佳的识别精度。该模型对数据噪声具有较强的鲁棒性,在CIFAR-10上,当噪声比从20%增加到60%时,准确率仅下降2.08%。此外,我们还进行了消融研究,以验证通过应用一项又一项技术来提高净精度。
{"title":"An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning","authors":"Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung","doi":"10.1145/3589572.3589596","DOIUrl":"https://doi.org/10.1145/3589572.3589596","url":null,"abstract":"Even though deep learning models make success in many application areas, it is well-known that they are vulnerable to data noise. Therefore, researches on a model that detects and removes noisy data or the one that operates robustly against noisy data have been actively conducted. However, most existing approaches have limitations in either that important information could be left out while noisy data are cleaned up or that prior information on the dataset is required while such information may not be easily available. In this paper, we propose an effective semi-supervised learning method with model ensemble and parameter scheduling techniques. Our experiment results show that the proposed method achieves the best accuracy under 20% and 40% noise-ratio conditions. The proposed model is robust to data noise, suffering from only 2.08% of accuracy degradation when the noise ratio increases from 20% to 60% on CIFAR-10. We additionally perform an ablation study to verify net accuracy enhancement by applying one technique after another.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123341431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Stenosis in Coronary Arteries based on Deep Neural Network using Non-Contrast and Contrast Cardiac CT images 基于非对比和对比心脏CT图像的深度神经网络预测冠状动脉狭窄
Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda
In this paper, we demonstrate two different methods to predict stenosis, given non-contrast and contrast heart CT scan images, respectively. As far as we know, non-contrast heart CT images have been hardly used for predicting stenosis, since non-contrast CT images generally do not show the coronary arteries (LCX, LAD, RCA, LMT) distinctively. However, if it is possible to predict stenosis with non-contrast CT images, we believe it is beneficial for patients because they do not suffer from side effects of contrast agents. Our demonstration for non-contrast CT image depends upon the relationship between calcification and stenosis. According to physicians, 90% of stenosis accompanies calcification in coronary arteries. On the other hand, we have also conducted experiments with contrast heart CT scan images, where coronary arteries are rendered as “straightened circumferentially”. This second approach using contrast CT image can be reduced to binary classification problem. From our experiments, we demonstrate that our two approaches defined as multi-label, multi-class classification problem using non-contrast CT images and binary classification problem using contrast CT images, respectively, with deep neural networks as classifiers, are very promising. We also note that our data in non-contrast and contrast CT images have both able-bodied (or healthy) subjects as well as patients, which makes us believe it is practical when the methods are incorporated into supporting a real stenosis diagnosis system.
在本文中,我们展示了两种不同的预测狭窄的方法,分别给出了非对比和对比心脏CT扫描图像。据我们所知,心脏CT非对比图像很少用于预测狭窄,因为非对比CT图像通常不能清晰地显示冠状动脉(LCX, LAD, RCA, LMT)。然而,如果可以通过非对比CT图像预测狭窄,我们认为这对患者是有益的,因为他们不会遭受对比剂的副作用。我们对非对比CT图像的论证取决于钙化和狭窄之间的关系。据医生介绍,90%的狭窄伴冠状动脉钙化。另一方面,我们也对心脏CT扫描图像进行了对比实验,其中冠状动脉被渲染为“圆周拉直”。第二种方法利用对比CT图像可以简化为二值分类问题。从我们的实验中,我们证明了我们的两种方法,分别定义为使用非对比CT图像的多标签、多类别分类问题和使用对比CT图像的二值分类问题,以深度神经网络作为分类器,是非常有前途的。我们还注意到,我们在非对比和对比CT图像中的数据既包括健全(或健康)的受试者,也包括患者,这使我们相信,将这些方法纳入支持真正的狭窄诊断系统是可行的。
{"title":"Predicting Stenosis in Coronary Arteries based on Deep Neural Network using Non-Contrast and Contrast Cardiac CT images","authors":"Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda","doi":"10.1145/3589572.3589595","DOIUrl":"https://doi.org/10.1145/3589572.3589595","url":null,"abstract":"In this paper, we demonstrate two different methods to predict stenosis, given non-contrast and contrast heart CT scan images, respectively. As far as we know, non-contrast heart CT images have been hardly used for predicting stenosis, since non-contrast CT images generally do not show the coronary arteries (LCX, LAD, RCA, LMT) distinctively. However, if it is possible to predict stenosis with non-contrast CT images, we believe it is beneficial for patients because they do not suffer from side effects of contrast agents. Our demonstration for non-contrast CT image depends upon the relationship between calcification and stenosis. According to physicians, 90% of stenosis accompanies calcification in coronary arteries. On the other hand, we have also conducted experiments with contrast heart CT scan images, where coronary arteries are rendered as “straightened circumferentially”. This second approach using contrast CT image can be reduced to binary classification problem. From our experiments, we demonstrate that our two approaches defined as multi-label, multi-class classification problem using non-contrast CT images and binary classification problem using contrast CT images, respectively, with deep neural networks as classifiers, are very promising. We also note that our data in non-contrast and contrast CT images have both able-bodied (or healthy) subjects as well as patients, which makes us believe it is practical when the methods are incorporated into supporting a real stenosis diagnosis system.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121281955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2023 6th International Conference on Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1