首页 > 最新文献

2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

英文 中文
Padding Investigations for CNNs in Scene Parsing Tasks cnn在场景解析任务中的填充调查
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216084
Yu-Hui Huang, M. Proesmans, L. Gool
Zero padding is widely used in convolutional neural networks (CNNs) to prevent the size of feature maps diminishing too fast. However, it has been claimed to disturb the statistics at the border [9]. In this work, we compare various padding methods for the scene parsing task and propose an alternative padding method (CApadding) by extending the image to alleviate the border issue. Experiments on Cityspaces [2] and Deep-Globe [3] show that models with the proposed padding method achieves higher mean Intersection-Over-Union (IoU) than the zero padding based models.
零填充被广泛应用于卷积神经网络(cnn)中,以防止特征映射的大小衰减过快。然而,有人声称它扰乱了边境的统计数据[9]。在这项工作中,我们比较了场景解析任务的各种填充方法,并提出了一种替代填充方法(CApadding),通过扩展图像来缓解边界问题。在Cityspaces[2]和Deep-Globe[3]上的实验表明,与基于零填充的模型相比,采用本文提出的填充方法的模型获得了更高的平均交叉-超联合(Intersection-Over-Union, IoU)。
{"title":"Padding Investigations for CNNs in Scene Parsing Tasks","authors":"Yu-Hui Huang, M. Proesmans, L. Gool","doi":"10.23919/MVA57639.2023.10216084","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216084","url":null,"abstract":"Zero padding is widely used in convolutional neural networks (CNNs) to prevent the size of feature maps diminishing too fast. However, it has been claimed to disturb the statistics at the border [9]. In this work, we compare various padding methods for the scene parsing task and propose an alternative padding method (CApadding) by extending the image to alleviate the border issue. Experiments on Cityspaces [2] and Deep-Globe [3] show that models with the proposed padding method achieves higher mean Intersection-Over-Union (IoU) than the zero padding based models.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Unconditional Diffusion Models in Level Generation for Super Mario Bros 在《超级马里奥兄弟》关卡生成中使用无条件扩散模型
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215856
Hyeon Joon Lee, E. Simo-Serra
This study introduces a novel methodology for generating levels in the iconic video game Super Mario Bros. using a diffusion model based on a UNet architecture. The model is trained on existing levels, represented as a categorical distribution, to accurately capture the game’s fundamental mechanics and design principles. The proposed approach demonstrates notable success in producing high-quality and diverse levels, with a significant proportion being playable by an artificial agent. This research emphasizes the potential of diffusion models as an efficient tool for procedural content generation and highlights their potential impact on the development of new video games and the enhancement of existing games through generated content.
本研究介绍了一种基于UNet架构的扩散模型在标志性电子游戏《超级马里奥兄弟》中生成关卡的新方法。该模型基于现有关卡进行训练,呈现为分类分布,以准确捕捉游戏的基本机制和设计原则。所提出的方法在制作高质量和多样化的关卡方面取得了显著的成功,其中很大一部分是由人工代理可玩的。本研究强调了扩散模型作为程序内容生成的有效工具的潜力,并强调了它们对新视频游戏开发和通过生成内容增强现有游戏的潜在影响。
{"title":"Using Unconditional Diffusion Models in Level Generation for Super Mario Bros","authors":"Hyeon Joon Lee, E. Simo-Serra","doi":"10.23919/MVA57639.2023.10215856","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215856","url":null,"abstract":"This study introduces a novel methodology for generating levels in the iconic video game Super Mario Bros. using a diffusion model based on a UNet architecture. The model is trained on existing levels, represented as a categorical distribution, to accurately capture the game’s fundamental mechanics and design principles. The proposed approach demonstrates notable success in producing high-quality and diverse levels, with a significant proportion being playable by an artificial agent. This research emphasizes the potential of diffusion models as an efficient tool for procedural content generation and highlights their potential impact on the development of new video games and the enhancement of existing games through generated content.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Retail Product Recognition: Fine-Grained Bottle Size Classification 提高零售产品的识别度:细粒度的瓶子尺寸分类
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215699
Katarina Tolja, M. Subašić, Z. Kalafatić, S. Lončarić
In this paper, we propose two innovative approaches to tackle the key challenges in product size classification, with a specific focus on bottles. Our research is particularly interesting as we leverage the bottle cap as a reference object, which allows bottle size classification to overcome challenges in the distance between the capturing device and the retail shelf, viewing angle, and arrangement of bottles on the shelves. We showcase the usage of the reference object in explicit and implicit novel approaches and discuss the benefits and limitations of the proposed methods.
在本文中,我们提出了两种创新的方法来解决产品尺寸分类的关键挑战,特别关注瓶子。我们的研究特别有趣,因为我们利用瓶盖作为参考对象,这使得瓶子尺寸分类能够克服捕获设备与零售货架之间的距离、视角和货架上瓶子的排列等挑战。我们展示了参考对象在显式和隐式新方法中的使用,并讨论了所提出方法的优点和局限性。
{"title":"Enhancing Retail Product Recognition: Fine-Grained Bottle Size Classification","authors":"Katarina Tolja, M. Subašić, Z. Kalafatić, S. Lončarić","doi":"10.23919/MVA57639.2023.10215699","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215699","url":null,"abstract":"In this paper, we propose two innovative approaches to tackle the key challenges in product size classification, with a specific focus on bottles. Our research is particularly interesting as we leverage the bottle cap as a reference object, which allows bottle size classification to overcome challenges in the distance between the capturing device and the retail shelf, viewing angle, and arrangement of bottles on the shelves. We showcase the usage of the reference object in explicit and implicit novel approaches and discuss the benefits and limitations of the proposed methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130027986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Impression Estimation by Clustering People with Similar Tastes 基于相似品味聚类的图像印象估计
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216055
Banri Kojima, Takahiro Komamizu, Yasutomo Kawanishi, Keisuke Doman, I. Ide
This paper proposes a method for estimating impressions from images according to the personal attributes of users so that they can find the desired images based on their tastes. Our previous work, which considered gender and age as personal attributes, showed promising results, but it also showed that users sharing these attributes do not necessarily share similar tastes. Therefore, other attributes should be considered to capture the personal tastes of each user well. However, taking more attributes into account leads to a problem in which insufficient amounts of data are served to classifiers due to the explosion of the number of combinations of attributes. To tackle this problem, we propose an aggregation-based method to condense training data for impression estimation while considering personal attribute information. For evaluation, a dataset of 4,000 carpet images annotated with 24 impression words was prepared. Experimental results showed that the use of combinations of personal attributes improved the accuracy of impression estimation, which indicates the effectiveness of the proposed approach.
本文提出了一种根据用户个人属性对图像进行印象估计的方法,使用户能够根据自己的喜好找到自己想要的图像。我们之前的工作,将性别和年龄作为个人属性,显示出令人鼓舞的结果,但它也表明,共享这些属性的用户不一定有相似的品味。因此,应该考虑其他属性来很好地捕捉每个用户的个人品味。然而,考虑到更多的属性会导致一个问题,即由于属性组合的数量激增,提供给分类器的数据量不足。为了解决这个问题,我们提出了一种基于聚合的方法,在考虑个人属性信息的同时压缩训练数据用于印象估计。为了进行评估,准备了一个包含4000张地毯图像的数据集,其中包含24个印象词。实验结果表明,个人属性组合的使用提高了印象估计的准确性,表明了该方法的有效性。
{"title":"Image Impression Estimation by Clustering People with Similar Tastes","authors":"Banri Kojima, Takahiro Komamizu, Yasutomo Kawanishi, Keisuke Doman, I. Ide","doi":"10.23919/MVA57639.2023.10216055","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216055","url":null,"abstract":"This paper proposes a method for estimating impressions from images according to the personal attributes of users so that they can find the desired images based on their tastes. Our previous work, which considered gender and age as personal attributes, showed promising results, but it also showed that users sharing these attributes do not necessarily share similar tastes. Therefore, other attributes should be considered to capture the personal tastes of each user well. However, taking more attributes into account leads to a problem in which insufficient amounts of data are served to classifiers due to the explosion of the number of combinations of attributes. To tackle this problem, we propose an aggregation-based method to condense training data for impression estimation while considering personal attribute information. For evaluation, a dataset of 4,000 carpet images annotated with 24 impression words was prepared. Experimental results showed that the use of combinations of personal attributes improved the accuracy of impression estimation, which indicates the effectiveness of the proposed approach.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122015977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves 咖啡叶疾病严重程度估计的低层次特征聚合网络
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215626
Takuhiro Okada, Yuantian Huang, Guoqing Hao, S. Iizuka, K. Fukui
This paper presents a deep learning-based approach for the severity classification of coffee leaf diseases. Coffee leaf diseases are one of the significant problems in the coffee industry, where estimating the health status of coffee leaves based on their appearance is crucial in the production process. However, there have been few studies on this task, and cases of misclassification have been reported due to the inability to detect slight color differences when classifying the disease severity. In this work, we propose a low-level feature aggregation technique for neural network-based classifiers to capture the discolored distribution of the entire coffee leaf, which effectively supports discrimination of the severity. This feature aggregation is achieved by incorporating attention mechanisms in the shallow layers of the network that extract low-level features such as color. The attention mechanism in the shallow layers provides the network with information on global dependencies of the color features of the leaves, allowing the network to more easily identify the disease severity. We use an efficient computational technique for the attention modules to reduce memory and computational cost, which enables us to introduce the attention mechanisms in large-sized feature maps in the shallow layers. We conduct in-depth validation experiments on the coffee leaf disease datasets and demonstrate the effectiveness of our proposed model compared to state-of-the-art image classification models in accurately classifying the severity of coffee leaf diseases.
提出了一种基于深度学习的咖啡叶病害严重程度分类方法。咖啡叶病害是咖啡行业的重要问题之一,在生产过程中,根据外观来评估咖啡叶的健康状况是至关重要的。然而,关于这项任务的研究很少,并且由于在对疾病严重程度进行分类时无法检测到轻微的颜色差异而导致错误分类的病例也有报道。在这项工作中,我们提出了一种基于神经网络的分类器的低级特征聚合技术,以捕获整个咖啡叶的变色分布,有效地支持严重程度的区分。这种特征聚合是通过在网络的浅层中结合注意力机制来实现的,该机制提取低级特征(如颜色)。浅层的注意机制为网络提供了叶子颜色特征的全局依赖信息,使网络更容易识别疾病的严重程度。我们对注意模块采用高效的计算技术,减少了内存和计算成本,使我们能够在浅层的大尺寸特征图中引入注意机制。我们在咖啡叶病数据集上进行了深入的验证实验,并与最先进的图像分类模型相比,证明了我们提出的模型在准确分类咖啡叶病严重程度方面的有效性。
{"title":"Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves","authors":"Takuhiro Okada, Yuantian Huang, Guoqing Hao, S. Iizuka, K. Fukui","doi":"10.23919/MVA57639.2023.10215626","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215626","url":null,"abstract":"This paper presents a deep learning-based approach for the severity classification of coffee leaf diseases. Coffee leaf diseases are one of the significant problems in the coffee industry, where estimating the health status of coffee leaves based on their appearance is crucial in the production process. However, there have been few studies on this task, and cases of misclassification have been reported due to the inability to detect slight color differences when classifying the disease severity. In this work, we propose a low-level feature aggregation technique for neural network-based classifiers to capture the discolored distribution of the entire coffee leaf, which effectively supports discrimination of the severity. This feature aggregation is achieved by incorporating attention mechanisms in the shallow layers of the network that extract low-level features such as color. The attention mechanism in the shallow layers provides the network with information on global dependencies of the color features of the leaves, allowing the network to more easily identify the disease severity. We use an efficient computational technique for the attention modules to reduce memory and computational cost, which enables us to introduce the attention mechanisms in large-sized feature maps in the shallow layers. We conduct in-depth validation experiments on the coffee leaf disease datasets and demonstrate the effectiveness of our proposed model compared to state-of-the-art image classification models in accurately classifying the severity of coffee leaf diseases.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126971515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small Object Detection for Birds with Swin Transformer 基于Swin变压器的鸟类小目标检测
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216093
Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide
Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.
目标检测是检测图像中的目标的任务。在这项任务中,小物体的检测尤为困难。除了尺寸小之外,还伴随着模糊、遮挡等问题带来的困难。目前的小目标检测方法主要针对小而密集的场景,如人群中的行人或遥感场景中的远处物体。然而,当目标对象很小且稀疏时,缺乏可用于训练的对象,使得学习有效特征变得更加困难。在本文中,我们提出了一种专门的方法来检测特定类别的小物体;鸟类。特别是,我们改进了颈部学习到的特征;在主干网和预测头之间的子网络中,采用分层设计来学习更有效的特征。我们使用Swin Transformer对图像特征进行上采样。此外,我们改变了偏移窗口的大小,以适应较小的对象。实验表明,基于Swin变压器的颈部与CenterNet相结合,通过改变窗口大小可以获得良好的性能。我们进一步发现较小的窗口大小(默认为2)有利于map进行小目标检测。
{"title":"Small Object Detection for Birds with Swin Transformer","authors":"Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide","doi":"10.23919/MVA57639.2023.10216093","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216093","url":null,"abstract":"Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114070609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grid Sample Based Temporal Iteration and Compactness-coefficient Distance for High Frame and Ultra-low Delay SLIC Segmentation System 基于网格样本的高帧超低延迟SLIC分割系统时间迭代与紧凑系数距离
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215797
Yuan Li, Tingting Hu, Ryuji Fuchikami, T. Ikenaga
High frame rate and ultra-low delay vision systems, which process 1000 FPS videos within 1 ms/frame delay, play an increasingly important role in fields such as robotics and factory automation. Among them, an image segmentation system is necessary as segmentation is a crucial pre-processing step for various applications. Recently many existing researches focus on superpixel segmentation, but few of them attempt to reach high processing speed. To achieve this target, this paper proposes: (A) Grid sample based temporal iteration, which leverages the high frame rate video property to distribute iterations into the temporal domain, ensuring the entire system is within one frame delay. Additionally, grid sample is proposed to add initialization information to temporal iteration for the stability of superpixels. (B) Compactness-coefficient distance is proposed to add information of the entire superpixel instead of only using the information of the center point. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC superpixel segmentation system. For label consistency, the proposed system is more than 0.02 higher than the original system.
高帧率和超低延迟视觉系统在1毫秒/帧延迟内处理1000 FPS视频,在机器人和工厂自动化等领域发挥着越来越重要的作用。其中,图像分割系统是必不可少的,因为分割是各种应用的关键预处理步骤。目前已有的许多研究都集中在超像素分割上,但很少有人试图达到高处理速度。为了实现这一目标,本文提出:(A)基于网格样本的时间迭代,利用视频的高帧率特性将迭代分布到时域,保证整个系统在一帧内延迟。此外,提出了网格样本在时间迭代中加入初始化信息,以保证超像素的稳定性。(B)提出紧凑系数距离,增加整个超像素的信息,而不是只使用中心点的信息。评价结果表明,所提出的超像素分割系统在边界召回和欠分割误差方面与原SLIC超像素分割系统相当。为了标签一致性,建议的系统比原系统高0.02以上。
{"title":"Grid Sample Based Temporal Iteration and Compactness-coefficient Distance for High Frame and Ultra-low Delay SLIC Segmentation System","authors":"Yuan Li, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA57639.2023.10215797","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215797","url":null,"abstract":"High frame rate and ultra-low delay vision systems, which process 1000 FPS videos within 1 ms/frame delay, play an increasingly important role in fields such as robotics and factory automation. Among them, an image segmentation system is necessary as segmentation is a crucial pre-processing step for various applications. Recently many existing researches focus on superpixel segmentation, but few of them attempt to reach high processing speed. To achieve this target, this paper proposes: (A) Grid sample based temporal iteration, which leverages the high frame rate video property to distribute iterations into the temporal domain, ensuring the entire system is within one frame delay. Additionally, grid sample is proposed to add initialization information to temporal iteration for the stability of superpixels. (B) Compactness-coefficient distance is proposed to add information of the entire superpixel instead of only using the information of the center point. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC superpixel segmentation system. For label consistency, the proposed system is more than 0.02 higher than the original system.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124104848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Fall Detection on Edge Devices 边缘设备的无监督跌倒检测
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215993
Takuya Nakabayashi, H. Saito
Automatic fall detection is a crucial task in healthcare as falls pose a significant risk to the health of elderly individuals. This paper presents a lightweight acceleration-based fall detection method that can be implemented on edge devices. The proposed method uses Autoencoders, a type of unsupervised learning, within the framework of anomaly detection, allowing for network training without requiring extensive labeled fall data. One of the challenges in fall detection is the difficulty in collecting fall data. However, our proposed method can overcome this limitation by training the neural network without fall data, using the anomaly detection framework of Autoencoders. Additionally, this method employs an extremely lightweight Autoencoder that can run independently on an edge device, eliminating the need to transmit data to a server and minimizing privacy concerns. We conducted experiments comparing the performance of our proposed method with that of a baseline method using a unique fall detection dataset. Our results confirm that our method outperforms the baseline method in detecting falls with higher accuracy.
跌倒自动检测是医疗保健中的一项重要任务,因为跌倒对老年人的健康构成了重大风险。本文提出了一种可在边缘设备上实现的基于加速度的轻量跌倒检测方法。所提出的方法在异常检测框架内使用自动编码器(一种无监督学习),允许网络训练而不需要大量标记的下降数据。跌倒检测面临的挑战之一是收集跌倒数据的困难。然而,我们提出的方法可以通过使用Autoencoders的异常检测框架,在没有跌倒数据的情况下训练神经网络来克服这一限制。此外,这种方法采用了一个非常轻量级的自动编码器,可以在边缘设备上独立运行,消除了将数据传输到服务器的需要,并最大限度地减少了隐私问题。我们使用一个独特的跌倒检测数据集进行了实验,比较了我们提出的方法与基线方法的性能。我们的结果证实,我们的方法优于基线方法,以更高的精度检测跌倒。
{"title":"Unsupervised Fall Detection on Edge Devices","authors":"Takuya Nakabayashi, H. Saito","doi":"10.23919/MVA57639.2023.10215993","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215993","url":null,"abstract":"Automatic fall detection is a crucial task in healthcare as falls pose a significant risk to the health of elderly individuals. This paper presents a lightweight acceleration-based fall detection method that can be implemented on edge devices. The proposed method uses Autoencoders, a type of unsupervised learning, within the framework of anomaly detection, allowing for network training without requiring extensive labeled fall data. One of the challenges in fall detection is the difficulty in collecting fall data. However, our proposed method can overcome this limitation by training the neural network without fall data, using the anomaly detection framework of Autoencoders. Additionally, this method employs an extremely lightweight Autoencoder that can run independently on an edge device, eliminating the need to transmit data to a server and minimizing privacy concerns. We conducted experiments comparing the performance of our proposed method with that of a baseline method using a unique fall detection dataset. Our results confirm that our method outperforms the baseline method in detecting falls with higher accuracy.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121579551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer 基于跨模态变换的弱监督深度图像哈希
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216160
Ching-Ching Yang, W. Chu, S. Dubey
Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.
由于与上下文文本或标签相关的网络图像大量存在,弱监督图像哈希算法应运而生。与图像弱相关的文本信息可以用来指导深度哈希网络的学习。本文提出了一种基于跨模态变换的弱监督深度哈希算法(WHCMT)。首先,发现图像块之间的跨尺度关注可以形成更有效的视觉表征。采用基线转换器寻找标签的自关注,形成标签表示。其次,利用所提出的跨模态转换器发现图像和标签之间的跨模态关注。然后通过嵌入层生成有效的哈希码。WHCMT在语义图像检索上进行了测试,我们展示了在MIRFLICKR-25K数据集和NUS-WIDE数据集上可以获得新的最先进的结果。
{"title":"Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer","authors":"Ching-Ching Yang, W. Chu, S. Dubey","doi":"10.23919/MVA57639.2023.10216160","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216160","url":null,"abstract":"Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129882408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ViTVO: Vision Transformer based Visual Odometry with Attention Supervision 基于视觉变压器的视觉里程测量与注意监督
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215538
Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee
In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.
在本文中,我们开发了一种基于视觉里程计的视觉变压器,称为ViTVO。ViTVO引入了一种注意力机制来执行视觉里程计。由于VO的性质,基于Transformer的VO模型往往会过度集中在几个点上,这可能会导致精度下降。此外,来自动态对象的噪声通常会给VO任务的执行带来困难。为了克服这些问题,我们提出了在训练期间的注意力损失,它利用地面真相面具或自我监督来引导注意力地图更多地关注图像的静态区域。在我们的实验中,我们证明了ViTVO在Sintel验证集上的优越性能,并验证了我们的注意力监督机制在执行VO任务中的有效性。
{"title":"ViTVO: Vision Transformer based Visual Odometry with Attention Supervision","authors":"Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee","doi":"10.23919/MVA57639.2023.10215538","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215538","url":null,"abstract":"In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127923758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2023 18th International Conference on Machine Vision and Applications (MVA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1