首页 > 最新文献

2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

英文 中文
PALF: Pre-Annotation and Camera-LiDAR Late Fusion for the Easy Annotation of Point Clouds 点云的预标注和相机-激光雷达后期融合
Pub Date : 2023-04-13 DOI: 10.23919/MVA57639.2023.10216156
Yucheng Zhang, Masaki Fukuda, Yasunori Ishii, Kyoko Ohshima, Takayoshi Yamashita
3D object detection has become indispensable in the field of autonomous driving. To date, gratifying breakthroughs have been recorded in 3D object detection research, attributed to deep learning. However, deep learning algorithms are data-driven and require large amounts of annotated point cloud data for training and evaluation. Unlike 2D images, annotating point cloud data is difficult due to the limitations of sparsity, irregularity, and low resolution, which requires more manual work, and the annotation efficiency is much lower than annotating 2D images. Therefore, we propose an annotation algorithm for point cloud data, which is pre-annotation and camera-LiDAR late fusion algorithm to annotate easily and accurately.
在自动驾驶领域,三维目标检测已成为不可或缺的技术。迄今为止,在深度学习的推动下,三维物体检测研究取得了可喜的突破。然而,深度学习算法是数据驱动的,需要大量带注释的点云数据进行训练和评估。与二维图像不同,点云数据的标注由于其稀疏性、不规则性和低分辨率的限制,标注难度较大,需要更多的手工操作,且标注效率远低于二维图像。为此,我们提出了一种点云数据标注算法,即预标注和相机-激光雷达后期融合算法,以方便、准确地标注点云数据。
{"title":"PALF: Pre-Annotation and Camera-LiDAR Late Fusion for the Easy Annotation of Point Clouds","authors":"Yucheng Zhang, Masaki Fukuda, Yasunori Ishii, Kyoko Ohshima, Takayoshi Yamashita","doi":"10.23919/MVA57639.2023.10216156","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216156","url":null,"abstract":"3D object detection has become indispensable in the field of autonomous driving. To date, gratifying breakthroughs have been recorded in 3D object detection research, attributed to deep learning. However, deep learning algorithms are data-driven and require large amounts of annotated point cloud data for training and evaluation. Unlike 2D images, annotating point cloud data is difficult due to the limitations of sparsity, irregularity, and low resolution, which requires more manual work, and the annotation efficiency is much lower than annotating 2D images. Therefore, we propose an annotation algorithm for point cloud data, which is pre-annotation and camera-LiDAR late fusion algorithm to annotate easily and accurately.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131890785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned 标题生成的词到句子视觉语义相似度:经验教训
Pub Date : 2022-09-26 DOI: 10.23919/MVA57639.2023.10215754
Ahmed Sabir
This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.
本文的重点是对图像字幕系统生成的字幕进行增强。我们提出了一种改进字幕生成系统的方法,通过选择与图像最密切相关的输出,而不是模型产生的最可能的输出。我们的模型从视觉上下文的角度修正了语言生成输出光束搜索。我们在单词和句子级别上使用视觉语义度量来匹配图像中的相关信息和适当的标题。这种方法可以作为后处理方法应用于任何字幕系统。
{"title":"Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned","authors":"Ahmed Sabir","doi":"10.23919/MVA57639.2023.10215754","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215754","url":null,"abstract":"This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115249652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QAHOI: Query-Based Anchors for Human-Object Interaction Detection 基于查询的人-物交互检测锚
Pub Date : 2021-12-16 DOI: 10.23919/MVA57639.2023.10215534
Junwen Chen, Keiji Yanai
Human-object interaction (HOI) detection as a downstream of object detection task requires localizing pairs of humans and objects and recognizing the interaction between them. Recent one-stage approaches focus on detecting possible interaction points or filtering human-object pairs, ignoring the variability in the location and size of different objects at spatial scales. In this paper, we propose a transformer-based method, QAHOI (Query-Based Anchors for Human-Object Interaction detection), which leverages a multi-scale architecture to extract features from different spatial scales and uses query-based anchors to predict all the elements of an HOI instance. We further investigate that a powerful backbone significantly increases accuracy for QAHOI, and QAHOI with a transformer-based backbone outperforms recent state-of-the-art methods by large margins on the HICO-DET benchmark.
人-物交互(HOI)检测作为目标检测的下游任务,需要对人-物对进行定位,识别人-物之间的交互。最近的单阶段方法侧重于检测可能的交互点或过滤人-物体对,忽略了不同物体在空间尺度上的位置和大小的可变性。在本文中,我们提出了一种基于转换器的方法QAHOI(基于查询的人-对象交互检测锚),它利用多尺度架构从不同的空间尺度提取特征,并使用基于查询的锚来预测一个HOI实例的所有元素。我们进一步研究了强大的主干网显著提高了QAHOI的准确性,并且在HICO-DET基准上,基于变压器的主干网的QAHOI在很大程度上优于最近最先进的方法。
{"title":"QAHOI: Query-Based Anchors for Human-Object Interaction Detection","authors":"Junwen Chen, Keiji Yanai","doi":"10.23919/MVA57639.2023.10215534","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215534","url":null,"abstract":"Human-object interaction (HOI) detection as a downstream of object detection task requires localizing pairs of humans and objects and recognizing the interaction between them. Recent one-stage approaches focus on detecting possible interaction points or filtering human-object pairs, ignoring the variability in the location and size of different objects at spatial scales. In this paper, we propose a transformer-based method, QAHOI (Query-Based Anchors for Human-Object Interaction detection), which leverages a multi-scale architecture to extract features from different spatial scales and uses query-based anchors to predict all the elements of an HOI instance. We further investigate that a powerful backbone significantly increases accuracy for QAHOI, and QAHOI with a transformer-based backbone outperforms recent state-of-the-art methods by large margins on the HICO-DET benchmark.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125564531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning 自监督视频表示学习的交叉模态流形混合
Pub Date : 2021-12-07 DOI: 10.23919/MVA57639.2023.10216260
Srijan Das, M. Ryoo
In this paper, we address the challenge of obtaining large-scale unlabelled video datasets for contrastive representation learning in real-world applications. We present a novel video augmentation technique for self-supervised learning, called Cross-Modal Manifold Cutmix (CMMC), which generates augmented samples by combining different modalities in videos. By embedding a video tesseract into another across two modalities in the feature space, our method enhances the quality of learned video representations. We perform extensive experiments on two small-scale video datasets, UCF101 and HMDB51, for action recognition and video retrieval tasks. Our approach is also shown to be effective on the NTU dataset with limited domain knowledge. Our CMMC achieves comparable performance to other self-supervised methods while using less training data for both downstream tasks.
在本文中,我们解决了在实际应用中获取大规模未标记视频数据集用于对比表示学习的挑战。我们提出了一种新的用于自监督学习的视频增强技术,称为跨模态流形切割混合(CMMC),它通过组合视频中的不同模态来生成增强样本。通过在特征空间中跨两种模态将一个视频块嵌入到另一个视频块中,我们的方法提高了学习到的视频表示的质量。我们在UCF101和HMDB51两个小型视频数据集上进行了广泛的实验,用于动作识别和视频检索任务。我们的方法在有限领域知识的NTU数据集上也被证明是有效的。我们的CMMC在使用更少的训练数据完成下游任务的同时,实现了与其他自监督方法相当的性能。
{"title":"Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning","authors":"Srijan Das, M. Ryoo","doi":"10.23919/MVA57639.2023.10216260","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216260","url":null,"abstract":"In this paper, we address the challenge of obtaining large-scale unlabelled video datasets for contrastive representation learning in real-world applications. We present a novel video augmentation technique for self-supervised learning, called Cross-Modal Manifold Cutmix (CMMC), which generates augmented samples by combining different modalities in videos. By embedding a video tesseract into another across two modalities in the feature space, our method enhances the quality of learned video representations. We perform extensive experiments on two small-scale video datasets, UCF101 and HMDB51, for action recognition and video retrieval tasks. Our approach is also shown to be effective on the NTU dataset with limited domain knowledge. Our CMMC achieves comparable performance to other self-supervised methods while using less training data for both downstream tasks.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126932278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 18th International Conference on Machine Vision and Applications (MVA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1