首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels NoisyActions2M:一个基于噪声标签的视频理解多媒体数据集
Pub Date : 2021-10-13 DOI: 10.1145/3469877.3490580
Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah
Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.
深度学习在许多问题上都取得了显著的进步。然而,这些模型的有效训练需要大规模的数据集,并且为这些数据集获得注释可能是具有挑战性和昂贵的。在这项工作中,我们从网络视频中探索用户生成的免费标签,用于视频理解。我们创建了一个由大约200万个视频组成的基准数据集,其中包含相关的用户生成的注释和其他元信息。我们利用收集到的数据集进行动作分类,并证明其与现有的小规模注释数据集UCF101和HMDB51的有效性。我们研究了不同的损失函数和两种预训练策略:简单学习和自监督学习。我们还展示了在提议的数据集上预训练的网络如何帮助防止下游数据集中的视频损坏和标签噪声。我们将其作为视频理解的噪声学习的基准数据集。数据集、代码和训练过的模型都可以在这里公开获取,以供将来的研究使用。我们论文的较长版本也可以在这里找到。
{"title":"NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels","authors":"Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah","doi":"10.1145/3469877.3490580","DOIUrl":"https://doi.org/10.1145/3469877.3490580","url":null,"abstract":"Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"525 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132967101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management RoadAtlas:道路缺陷自动检测与资产管理智能平台
Pub Date : 2021-09-08 DOI: 10.1145/3469877.3493589
Zhuoxiao Chen, Yiyun Zhang, Yadan Luo, Zijian Wang, Jinjiang Zhong, Anthony Southon
With the rapid development of intelligent detection algorithms based on deep learning, much progress has been made in automatic road defect recognition and road marking parsing. This can effectively address the issue of an expensive and time-consuming process for professional inspectors to review the street manually. Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.
随着基于深度学习的智能检测算法的快速发展,道路缺陷自动识别和道路标记解析等方面都取得了很大进展。这可以有效地解决专业检查员手动检查街道的昂贵和耗时的过程。为了实现这一目标,我们提出了RoadAtlas,这是一个新颖的端到端集成系统,可以支持1)道路缺陷检测,2)道路标记解析,3)基于web的仪表板,用于用户呈现和输入数据,以及4)包含结构良好的数据库和开发的api的后端。
{"title":"RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management","authors":"Zhuoxiao Chen, Yiyun Zhang, Yadan Luo, Zijian Wang, Jinjiang Zhong, Anthony Southon","doi":"10.1145/3469877.3493589","DOIUrl":"https://doi.org/10.1145/3469877.3493589","url":null,"abstract":"With the rapid development of intelligent detection algorithms based on deep learning, much progress has been made in automatic road defect recognition and road marking parsing. This can effectively address the issue of an expensive and time-consuming process for professional inspectors to review the street manually. Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133015368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Conditional Extreme Value Theory for Open Set Video Domain Adaptation 开放集视频域自适应的条件极值理论
Pub Date : 2021-09-01 DOI: 10.1145/3469877.3490600
Zhuoxiao Chen, Yadan Luo, Mahsa Baktash
With the advent of media streaming, video action recognition has become progressively important for various applications, yet at the high expense of requiring large-scale data labelling. To overcome the problem of expensive data labelling, domain adaptation techniques have been proposed, which transfer knowledge from fully labelled data (i.e., source domain) to unlabelled data (i.e., target domain). The majority of video domain adaptation algorithms are proposed for closed-set scenarios in which all the classes are shared among the domains. In this work, we propose an open-set video domain adaptation approach to mitigate the domain discrepancy between the source and target data, allowing the target data to contain additional classes that do not belong to the source domain. Different from previous works, which only focus on improving accuracy for shared classes, we aim to jointly enhance the alignment of the shared classes and recognition of unknown samples. Towards this goal, class-conditional extreme value theory is applied to enhance the unknown recognition. Specifically, the entropy values of target samples are modelled as generalised extreme value distributions, which allows separating unknown samples lying in the tail of the distribution. To alleviate the negative transfer issue, weights computed by the distance from the sample entropy to the threshold are leveraged in adversarial learning in the sense that confident source and target samples are aligned, and unconfident samples are pushed away. The proposed method has been thoroughly evaluated on both small-scale and large-scale cross-domain video datasets and achieved the state-of-the-art performance.
随着流媒体的出现,视频动作识别在各种应用中变得越来越重要,但代价很高,需要大规模的数据标记。为了克服昂贵的数据标记问题,提出了领域自适应技术,将知识从完全标记的数据(即源领域)转移到未标记的数据(即目标领域)。大多数视频域自适应算法都是针对闭集场景提出的,其中所有类在域之间共享。在这项工作中,我们提出了一种开放集视频域自适应方法来缓解源数据和目标数据之间的域差异,允许目标数据包含不属于源域的其他类。不同于以往的工作只关注于提高共享类的准确率,我们的目标是共同增强共享类的对齐和未知样本的识别。为此,应用类条件极值理论来增强未知识别。具体来说,目标样本的熵值被建模为广义极值分布,这允许分离位于分布尾部的未知样本。为了缓解负迁移问题,在对抗性学习中利用从样本熵到阈值的距离计算的权重,使可信源样本和目标样本对齐,而不可信样本被推开。该方法已经在小尺度和大尺度跨域视频数据集上进行了全面的评估,取得了最先进的性能。
{"title":"Conditional Extreme Value Theory for Open Set Video Domain Adaptation","authors":"Zhuoxiao Chen, Yadan Luo, Mahsa Baktash","doi":"10.1145/3469877.3490600","DOIUrl":"https://doi.org/10.1145/3469877.3490600","url":null,"abstract":"With the advent of media streaming, video action recognition has become progressively important for various applications, yet at the high expense of requiring large-scale data labelling. To overcome the problem of expensive data labelling, domain adaptation techniques have been proposed, which transfer knowledge from fully labelled data (i.e., source domain) to unlabelled data (i.e., target domain). The majority of video domain adaptation algorithms are proposed for closed-set scenarios in which all the classes are shared among the domains. In this work, we propose an open-set video domain adaptation approach to mitigate the domain discrepancy between the source and target data, allowing the target data to contain additional classes that do not belong to the source domain. Different from previous works, which only focus on improving accuracy for shared classes, we aim to jointly enhance the alignment of the shared classes and recognition of unknown samples. Towards this goal, class-conditional extreme value theory is applied to enhance the unknown recognition. Specifically, the entropy values of target samples are modelled as generalised extreme value distributions, which allows separating unknown samples lying in the tail of the distribution. To alleviate the negative transfer issue, weights computed by the distance from the sample entropy to the threshold are leveraged in adversarial learning in the sense that confident source and target samples are aligned, and unconfident samples are pushed away. The proposed method has been thoroughly evaluated on both small-scale and large-scale cross-domain video datasets and achieved the state-of-the-art performance.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134381953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification 面向细粒度视觉分类的跨层导航卷积神经网络
Pub Date : 2021-06-21 DOI: 10.1145/3469877.3490579
Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma
Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git
细粒度视觉分类(FGVC)旨在对同一超类中的对象的子类进行分类(例如,鸟类物种,汽车模型)。对于FGVC任务来说,关键的解决方案是从局部区域中找到目标的判别性细微信息。传统的FGVC模型倾向于使用精细化的特征,即高级语义信息进行识别,很少使用低级信息。然而,事实证明,包含丰富细节信息的低级信息也对提高性能有影响。因此,本文提出了跨层导航卷积神经网络进行特征融合。首先,将骨干网提取的特征映射由高到低依次输入到卷积长短期记忆模型中进行特征聚合;然后,在特征融合后利用注意机制提取空间信息和通道信息,将高层语义信息与低层纹理特征联系起来,更好地定位FGVC的判别区域。在实验中,我们使用了三种常用的FGVC数据集,包括CUB-200-2011、Stanford-Cars和FGVC- aircraft数据集进行了评估,通过与其他参考FGVC方法的比较,我们证明了该方法的优越性,表明该方法取得了更好的结果。https://github.com/PRIS-CV/CN-CNN.git
{"title":"Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification","authors":"Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma","doi":"10.1145/3469877.3490579","DOIUrl":"https://doi.org/10.1145/3469877.3490579","url":null,"abstract":"Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1