ACM Multimedia Asia最新文献

英文中文

NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels NoisyActions2M:一个基于噪声标签的视频理解多媒体数据集

ACM Multimedia Asia

Pub Date : 2021-10-13 DOI: 10.1145/3469877.3490580

Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah

Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.

深度学习在许多问题上都取得了显著的进步。然而，这些模型的有效训练需要大规模的数据集，并且为这些数据集获得注释可能是具有挑战性和昂贵的。在这项工作中，我们从网络视频中探索用户生成的免费标签，用于视频理解。我们创建了一个由大约200万个视频组成的基准数据集，其中包含相关的用户生成的注释和其他元信息。我们利用收集到的数据集进行动作分类，并证明其与现有的小规模注释数据集UCF101和HMDB51的有效性。我们研究了不同的损失函数和两种预训练策略:简单学习和自监督学习。我们还展示了在提议的数据集上预训练的网络如何帮助防止下游数据集中的视频损坏和标签噪声。我们将其作为视频理解的噪声学习的基准数据集。数据集、代码和训练过的模型都可以在这里公开获取，以供将来的研究使用。我们论文的较长版本也可以在这里找到。

引用次数: 2

RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management RoadAtlas:道路缺陷自动检测与资产管理智能平台

ACM Multimedia Asia

Pub Date : 2021-09-08 DOI: 10.1145/3469877.3493589

Zhuoxiao Chen, Yiyun Zhang, Yadan Luo, Zijian Wang, Jinjiang Zhong, Anthony Southon

With the rapid development of intelligent detection algorithms based on deep learning, much progress has been made in automatic road defect recognition and road marking parsing. This can effectively address the issue of an expensive and time-consuming process for professional inspectors to review the street manually. Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.

随着基于深度学习的智能检测算法的快速发展，道路缺陷自动识别和道路标记解析等方面都取得了很大进展。这可以有效地解决专业检查员手动检查街道的昂贵和耗时的过程。为了实现这一目标，我们提出了RoadAtlas，这是一个新颖的端到端集成系统，可以支持1)道路缺陷检测，2)道路标记解析，3)基于web的仪表板，用于用户呈现和输入数据，以及4)包含结构良好的数据库和开发的api的后端。

引用次数: 2

Conditional Extreme Value Theory for Open Set Video Domain Adaptation 开放集视频域自适应的条件极值理论

ACM Multimedia Asia

Pub Date : 2021-09-01 DOI: 10.1145/3469877.3490600

Zhuoxiao Chen, Yadan Luo, Mahsa Baktash

With the advent of media streaming, video action recognition has become progressively important for various applications, yet at the high expense of requiring large-scale data labelling. To overcome the problem of expensive data labelling, domain adaptation techniques have been proposed, which transfer knowledge from fully labelled data (i.e., source domain) to unlabelled data (i.e., target domain). The majority of video domain adaptation algorithms are proposed for closed-set scenarios in which all the classes are shared among the domains. In this work, we propose an open-set video domain adaptation approach to mitigate the domain discrepancy between the source and target data, allowing the target data to contain additional classes that do not belong to the source domain. Different from previous works, which only focus on improving accuracy for shared classes, we aim to jointly enhance the alignment of the shared classes and recognition of unknown samples. Towards this goal, class-conditional extreme value theory is applied to enhance the unknown recognition. Specifically, the entropy values of target samples are modelled as generalised extreme value distributions, which allows separating unknown samples lying in the tail of the distribution. To alleviate the negative transfer issue, weights computed by the distance from the sample entropy to the threshold are leveraged in adversarial learning in the sense that confident source and target samples are aligned, and unconfident samples are pushed away. The proposed method has been thoroughly evaluated on both small-scale and large-scale cross-domain video datasets and achieved the state-of-the-art performance.

随着流媒体的出现，视频动作识别在各种应用中变得越来越重要，但代价很高，需要大规模的数据标记。为了克服昂贵的数据标记问题，提出了领域自适应技术，将知识从完全标记的数据(即源领域)转移到未标记的数据(即目标领域)。大多数视频域自适应算法都是针对闭集场景提出的，其中所有类在域之间共享。在这项工作中，我们提出了一种开放集视频域自适应方法来缓解源数据和目标数据之间的域差异，允许目标数据包含不属于源域的其他类。不同于以往的工作只关注于提高共享类的准确率，我们的目标是共同增强共享类的对齐和未知样本的识别。为此，应用类条件极值理论来增强未知识别。具体来说，目标样本的熵值被建模为广义极值分布，这允许分离位于分布尾部的未知样本。为了缓解负迁移问题，在对抗性学习中利用从样本熵到阈值的距离计算的权重，使可信源样本和目标样本对齐，而不可信样本被推开。该方法已经在小尺度和大尺度跨域视频数据集上进行了全面的评估，取得了最先进的性能。

{"title":"Conditional Extreme Value Theory for Open Set Video Domain Adaptation","authors":"Zhuoxiao Chen, Yadan Luo, Mahsa Baktash","doi":"10.1145/3469877.3490600","DOIUrl":"https://doi.org/10.1145/3469877.3490600","url":null,"abstract":"With the advent of media streaming, video action recognition has become progressively important for various applications, yet at the high expense of requiring large-scale data labelling. To overcome the problem of expensive data labelling, domain adaptation techniques have been proposed, which transfer knowledge from fully labelled data (i.e., source domain) to unlabelled data (i.e., target domain). The majority of video domain adaptation algorithms are proposed for closed-set scenarios in which all the classes are shared among the domains. In this work, we propose an open-set video domain adaptation approach to mitigate the domain discrepancy between the source and target data, allowing the target data to contain additional classes that do not belong to the source domain. Different from previous works, which only focus on improving accuracy for shared classes, we aim to jointly enhance the alignment of the shared classes and recognition of unknown samples. Towards this goal, class-conditional extreme value theory is applied to enhance the unknown recognition. Specifically, the entropy values of target samples are modelled as generalised extreme value distributions, which allows separating unknown samples lying in the tail of the distribution. To alleviate the negative transfer issue, weights computed by the distance from the sample entropy to the threshold are leveraged in adversarial learning in the sense that confident source and target samples are aligned, and unconfident samples are pushed away. The proposed method has been thoroughly evaluated on both small-scale and large-scale cross-domain video datasets and achieved the state-of-the-art performance.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134381953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification 面向细粒度视觉分类的跨层导航卷积神经网络

ACM Multimedia Asia

Pub Date : 2021-06-21 DOI: 10.1145/3469877.3490579

Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma

Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git

细粒度视觉分类(FGVC)旨在对同一超类中的对象的子类进行分类(例如，鸟类物种，汽车模型)。对于FGVC任务来说，关键的解决方案是从局部区域中找到目标的判别性细微信息。传统的FGVC模型倾向于使用精细化的特征，即高级语义信息进行识别，很少使用低级信息。然而，事实证明，包含丰富细节信息的低级信息也对提高性能有影响。因此，本文提出了跨层导航卷积神经网络进行特征融合。首先，将骨干网提取的特征映射由高到低依次输入到卷积长短期记忆模型中进行特征聚合;然后，在特征融合后利用注意机制提取空间信息和通道信息，将高层语义信息与低层纹理特征联系起来，更好地定位FGVC的判别区域。在实验中，我们使用了三种常用的FGVC数据集，包括CUB-200-2011、Stanford-Cars和FGVC- aircraft数据集进行了评估，通过与其他参考FGVC方法的比较，我们证明了该方法的优越性，表明该方法取得了更好的结果。https://github.com/PRIS-CV/CN-CNN.git

{"title":"Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification","authors":"Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma","doi":"10.1145/3469877.3490579","DOIUrl":"https://doi.org/10.1145/3469877.3490579","url":null,"abstract":"Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Multimedia Asia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀