文献互助智能选刊最新文献

高级搜索发布求助登录注册

Joint learning of images and videos with a single Vision Transformer

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI:10.23919/MVA57639.2023.10215661

Shuki Shimizu, Toru Tamaki

引用次数: 0

Abstract

In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用单个视觉转换器联合学习图像和视频

在本研究中，我们提出了一种使用单一模型对图像和视频进行联合学习的方法。一般来说，图像和视频通常由不同的模型进行训练。本文提出了一种将一批图像作为视觉转换器(IV-ViT)的输入，并通过后期融合得到一组具有时间聚合的视频帧的方法。给出了在两个图像数据集和两个动作识别数据集上的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量

0

期刊最新文献

Small Object Detection for Birds with Swin Transformer CG-based dataset generation and adversarial image conversion for deep cucumber recognition Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation Joint Learning with Group Relation and Individual Action Diabetic Retinopathy Grading based on a Sparse Network Fusion of Heterogeneous ConvNeXt Models with Category Attention

0

微信

客服QQ

Book学术公众号

扫码关注我们

反馈

Book学术官方微信

Book学术文献互助

Book学术文献互助群
群号：604180095

文献互助智能选刊最新文献互助须知联系我们：info@booksci.cn

Book学术提供免费学术资源搜索服务，方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。

Copyright © 2023 Book学术 All rights reserved.

京公网安备 11010802042870号京ICP备2023020795号-1