首页 > 最新文献

Proceedings of the ACM Multimedia Asia最新文献

英文 中文
Multi-Objective Particle Swarm Optimization for ROI based Video Coding 基于ROI的视频编码多目标粒子群优化
Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366608
Guangjie Ren, Feiyang Liu, Daiqin Yang, Yiyong Zha, Yunfei Zhang, Xin Liu
In this paper, we propose a new algorithm for High Efficiency Video Coding(HEVC) based on multi-objective particle swarm optimization (MOPSO) to enhance the visual quality of ROI while ensuring a certain overall quality. According to the R-λ model of detected ROI, the fitness function in MOPSO can be designed as the distortion of ROI and that of the overall frame. The particle consists of ROI's rate and other region's rate. After iterating through the multi-objective particle swarm optimization algorithm, the Pareto front is obtained. Then, the final bit allocation result which are the appropriate bit rate for ROI and non-ROI is selected from this set. Finally, according to the R-λ model, the coding parameters could be determined for coding. The experimental results show that the proposed algorithm improves the visual quality of ROI while guarantees overall visual quality.
本文提出了一种基于多目标粒子群优化(MOPSO)的高效视频编码(HEVC)算法,在保证一定整体质量的前提下,提高ROI的视觉质量。根据检测到的感兴趣区域的R-λ模型,MOPSO中的适应度函数可以设计为感兴趣区域的失真和整体框架的失真。该粒子由ROI的速率和其他区域的速率组成。通过多目标粒子群优化算法迭代得到Pareto前沿。然后,从该集合中选择适合感兴趣和非感兴趣的比特率的最终比特分配结果。最后,根据R-λ模型确定编码参数进行编码。实验结果表明,该算法在保证整体视觉质量的同时,提高了ROI的视觉质量。
{"title":"Multi-Objective Particle Swarm Optimization for ROI based Video Coding","authors":"Guangjie Ren, Feiyang Liu, Daiqin Yang, Yiyong Zha, Yunfei Zhang, Xin Liu","doi":"10.1145/3338533.3366608","DOIUrl":"https://doi.org/10.1145/3338533.3366608","url":null,"abstract":"In this paper, we propose a new algorithm for High Efficiency Video Coding(HEVC) based on multi-objective particle swarm optimization (MOPSO) to enhance the visual quality of ROI while ensuring a certain overall quality. According to the R-λ model of detected ROI, the fitness function in MOPSO can be designed as the distortion of ROI and that of the overall frame. The particle consists of ROI's rate and other region's rate. After iterating through the multi-objective particle swarm optimization algorithm, the Pareto front is obtained. Then, the final bit allocation result which are the appropriate bit rate for ROI and non-ROI is selected from this set. Finally, according to the R-λ model, the coding parameters could be determined for coding. The experimental results show that the proposed algorithm improves the visual quality of ROI while guarantees overall visual quality.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121846440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multimodal Attribute and Feature Embedding for Activity Recognition 活动识别的多模态属性和特征嵌入
Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366592
Weiming Zhang, Yi Huang, Wanting Yu, Xiaoshan Yang, Wei Wang, J. Sang
Human Activity Recognition (HAR) automatically recognizes human activities such as daily life and work based on digital records, which is of great significance to medical and health fields. Egocentric video and human acceleration data comprehensively describe human activity patterns from different aspects, which have laid a foundation for activity recognition based on multimodal behavior data. However, on the one hand, the low-level multimodal signal structures differ greatly and the mapping to high-level activities is complicated. On the other hand, the activity labeling based on multimodal behavior data has high cost and limited data amount, which limits the technical development in this field. In this paper, an activity recognition model MAFE based on multimodal attribute feature embedding is proposed. Before the activity recognition, the middle-level attribute features are extracted from the low-level signals of different modes. On the one hand, the mapping complexity from the low-level signals to the high-level activities is reduced, and on the other hand, a large number of middle-level attribute labeling data can be used to reduce the dependency on the activity labeling data. We conducted experiments on Stanford-ECM datasets to verify the effectiveness of the proposed MAFE method.
人类活动识别(Human Activity Recognition, HAR)是一种基于数字记录对人类日常生活、工作等活动进行自动识别的技术,在医疗卫生领域具有重要意义。以自我为中心的视频和人体加速度数据从不同角度全面描述了人类活动模式,为基于多模态行为数据的活动识别奠定了基础。然而,一方面,低层多模态信号结构差异很大,与高层活动的映射比较复杂。另一方面,基于多模态行为数据的活动标注成本高,数据量有限,限制了该领域的技术发展。提出了一种基于多模态属性特征嵌入的活动识别模型。在进行活动识别之前,从不同模式的低级信号中提取中级属性特征。一方面降低了从低级信号到高级活动的映射复杂性,另一方面可以利用大量的中级属性标注数据来减少对活动标注数据的依赖。我们在Stanford-ECM数据集上进行了实验,以验证所提出的mae方法的有效性。
{"title":"Multimodal Attribute and Feature Embedding for Activity Recognition","authors":"Weiming Zhang, Yi Huang, Wanting Yu, Xiaoshan Yang, Wei Wang, J. Sang","doi":"10.1145/3338533.3366592","DOIUrl":"https://doi.org/10.1145/3338533.3366592","url":null,"abstract":"Human Activity Recognition (HAR) automatically recognizes human activities such as daily life and work based on digital records, which is of great significance to medical and health fields. Egocentric video and human acceleration data comprehensively describe human activity patterns from different aspects, which have laid a foundation for activity recognition based on multimodal behavior data. However, on the one hand, the low-level multimodal signal structures differ greatly and the mapping to high-level activities is complicated. On the other hand, the activity labeling based on multimodal behavior data has high cost and limited data amount, which limits the technical development in this field. In this paper, an activity recognition model MAFE based on multimodal attribute feature embedding is proposed. Before the activity recognition, the middle-level attribute features are extracted from the low-level signals of different modes. On the one hand, the mapping complexity from the low-level signals to the high-level activities is reduced, and on the other hand, a large number of middle-level attribute labeling data can be used to reduce the dependency on the activity labeling data. We conducted experiments on Stanford-ECM datasets to verify the effectiveness of the proposed MAFE method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Automated Lung Nodule Segmentation Method Based On Nodule Detection Network and Region Growing 基于结节检测网络和区域生长的肺结节自动分割方法
Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366604
Yanhao Tan, K. Lu, Jian Xue
Segmentation of a specific organ or tissue plays an important role in medical image analysis with the rapid development of clinical decision support systems. With medical imaging equipments, segmenting the lung nodules in the images is able to help physicians diagnose lung cancer diseases and formulate proper schemes. Therefore the research of lung nodule segmentation has attracted a lot of attention these years. However, this task faces some challenges, including the intensity similarity between lung nodules and vessel, inaccurate boundaries and presence of noise in most of the images. In this paper, an automated segmentation method is proposed for lung nodules in CT images. At the first stage, a nodule detection network is used to generate region proposals and locate the bounding boxes of nodules, which are employed as the initial input for the following segmentation. Then the nodules are segmented in the bounding boxes at the second stage. Since the image scale for region growing is reduced by locating the nodule in advance, the efficiency of segmentation can be improved. And due to the localization of nodule before segmentation, some tissues with similar intensity can be excluded from the object region. The proposed method is evaluated on a public lung nodule dataset, and the experimental results indicate the effectiveness and efficiency of the proposed method.
随着临床决策支持系统的快速发展,特定器官或组织的分割在医学图像分析中起着重要的作用。借助医学影像设备,对图像中的肺结节进行分割,可以帮助医生诊断肺癌疾病并制定相应的治疗方案。因此,肺结节分割的研究近年来备受关注。然而,该任务面临着一些挑战,包括肺结节与血管之间的强度相似,大多数图像中存在不准确的边界和噪声。本文提出了一种肺结节CT图像的自动分割方法。第一阶段,利用结节检测网络生成区域建议,定位结节的边界框,作为后续分割的初始输入。第二阶段在边界框中对结节进行分割。通过提前定位结节,减少了区域生长的图像尺度,提高了分割效率。并且由于分割前结节的定位,可以将一些强度相近的组织排除在目标区域之外。在一个公共肺结节数据集上对该方法进行了评估,实验结果表明了该方法的有效性和高效性。
{"title":"An Automated Lung Nodule Segmentation Method Based On Nodule Detection Network and Region Growing","authors":"Yanhao Tan, K. Lu, Jian Xue","doi":"10.1145/3338533.3366604","DOIUrl":"https://doi.org/10.1145/3338533.3366604","url":null,"abstract":"Segmentation of a specific organ or tissue plays an important role in medical image analysis with the rapid development of clinical decision support systems. With medical imaging equipments, segmenting the lung nodules in the images is able to help physicians diagnose lung cancer diseases and formulate proper schemes. Therefore the research of lung nodule segmentation has attracted a lot of attention these years. However, this task faces some challenges, including the intensity similarity between lung nodules and vessel, inaccurate boundaries and presence of noise in most of the images. In this paper, an automated segmentation method is proposed for lung nodules in CT images. At the first stage, a nodule detection network is used to generate region proposals and locate the bounding boxes of nodules, which are employed as the initial input for the following segmentation. Then the nodules are segmented in the bounding boxes at the second stage. Since the image scale for region growing is reduced by locating the nodule in advance, the efficiency of segmentation can be improved. And due to the localization of nodule before segmentation, some tissues with similar intensity can be excluded from the object region. The proposed method is evaluated on a public lung nodule dataset, and the experimental results indicate the effectiveness and efficiency of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125561309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Social Font Search by Multimodal Feature Embedding 基于多模态特征嵌入的社交字体搜索
Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366595
Saemi Choi, Shun Matsumura, K. Aizawa
A typical tag/keyword-based search system retrieves documents where, given a query term q, the query term q occurs in the dataset. However, when applying these systems to a real-world font web community setting, practical challenges arise --- font tags are more subjective than other benchmark datasets, which magnify the tag mismatch problem. To address these challenges, we propose a tag dictionary space leveraged by word embedding, which relates undefined words that have a similar meaning. Even if a query is not defined in the tag dictionary, we can represent it as a vector on the tag dictionary space. The proposed system facilitates multi-modal inputs that can use both textual and image queries. By integrating a visual sentiment concept model that classifies affective concepts as adjective--noun pairs for a given image and uses it as a query, users can interact with the search system in a multi-modal way. We used crowd sourcing to collect user ratings for the retrieved fonts and observed that the retrieved font with the proposed methods obtained a higher score compared to other methods.
典型的基于标记/关键字的搜索系统检索文档,在给定查询词q的情况下,查询词q出现在数据集中。然而,当将这些系统应用于真实的字体web社区设置时,实际的挑战就出现了——字体标签比其他基准数据集更主观,这放大了标签不匹配的问题。为了解决这些挑战,我们提出了一个利用词嵌入的标签字典空间,它将具有相似含义的未定义词联系起来。即使在标记字典中没有定义查询,我们也可以将其表示为标记字典空间中的向量。提出的系统促进多模态输入,可以使用文本和图像查询。通过集成视觉情感概念模型,将给定图像的情感概念分类为形容词-名词对,并将其用作查询,用户可以以多模态方式与搜索系统交互。我们使用众包来收集用户对检索到的字体的评分,并观察到与其他方法相比,使用所提出的方法检索到的字体获得了更高的分数。
{"title":"Social Font Search by Multimodal Feature Embedding","authors":"Saemi Choi, Shun Matsumura, K. Aizawa","doi":"10.1145/3338533.3366595","DOIUrl":"https://doi.org/10.1145/3338533.3366595","url":null,"abstract":"A typical tag/keyword-based search system retrieves documents where, given a query term q, the query term q occurs in the dataset. However, when applying these systems to a real-world font web community setting, practical challenges arise --- font tags are more subjective than other benchmark datasets, which magnify the tag mismatch problem. To address these challenges, we propose a tag dictionary space leveraged by word embedding, which relates undefined words that have a similar meaning. Even if a query is not defined in the tag dictionary, we can represent it as a vector on the tag dictionary space. The proposed system facilitates multi-modal inputs that can use both textual and image queries. By integrating a visual sentiment concept model that classifies affective concepts as adjective--noun pairs for a given image and uses it as a query, users can interact with the search system in a multi-modal way. We used crowd sourcing to collect user ratings for the retrieved fonts and observed that the retrieved font with the proposed methods obtained a higher score compared to other methods.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115034939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading 汉语普通话唇读的级联序列-序列模型
Pub Date : 2019-08-14 DOI: 10.1145/3338533.3366579
Ya Zhao, Rui Xu, Mingli Song
Lip reading aims at decoding texts from the movement of a speaker's mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.
唇读的目的是通过说话人的嘴的运动来解读文本。近年来,唇读方法在英语词汇水平和句子水平上都取得了很大的进步。然而,与英语不同的是,汉语普通话是一种以声调为基础的语言,依靠音高来区分词汇或语法意义,这大大增加了唇读任务的模糊性。在本文中,我们提出了一个串级序列到序列的汉语普通话唇读模型(CSSMCM),该模型在预测句子时明确地建模声调。声调是基于视觉信息和句法结构建模的,用于预测句子的视觉信息和句法结构。为了评估CSSMCM,我们收集并发布了一个名为CMLR (Chinese Mandarin Lip Reading)的数据集,该数据集由来自中国网络电视台网站的10万多条自然句子组成。当在CMLR数据集上训练时,所提出的CSSMCM超过了最先进的唇读框架的性能,这证实了显式语调建模对汉语普通话唇读的有效性。
{"title":"A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading","authors":"Ya Zhao, Rui Xu, Mingli Song","doi":"10.1145/3338533.3366579","DOIUrl":"https://doi.org/10.1145/3338533.3366579","url":null,"abstract":"Lip reading aims at decoding texts from the movement of a speaker's mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131393602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Exploring Semantic Segmentation on the DCT Representation 基于DCT表示的语义分割研究
Pub Date : 2019-07-23 DOI: 10.1145/3338533.3366557
Shao-Yuan Lo, H. Hang
Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred input type, then we tailor an existing network to the DCT inputs. The proposed method has an accuracy close to the RGB model at about the same network complexity. Moreover, we investigate the impact of selecting different DCT components on segmentation performance. With a proper selection, one can achieve the same level accuracy using only 36% of the DCT coefficients. We further show the robustness of our method under the quantization errors. To our knowledge, this paper is the first to explore semantic segmentation on the DCT representation.
典型的卷积网络是在RGB图像上训练和执行的。但是,在实际应用程序中,为了节省内存和有效传输,通常会压缩图像。在本文中,我们探索了在JPEG标准定义的离散余弦变换(DCT)表示上执行语义分割的方法。我们首先重新排列DCT系数以形成首选输入类型,然后根据DCT输入定制现有网络。在相同的网络复杂度下,该方法具有接近RGB模型的精度。此外,我们还研究了选择不同的DCT分量对分割性能的影响。通过适当的选择,仅使用36%的DCT系数就可以达到相同水平的精度。进一步证明了该方法在量化误差下的鲁棒性。据我们所知,本文是第一个在DCT表示上探索语义分割的论文。
{"title":"Exploring Semantic Segmentation on the DCT Representation","authors":"Shao-Yuan Lo, H. Hang","doi":"10.1145/3338533.3366557","DOIUrl":"https://doi.org/10.1145/3338533.3366557","url":null,"abstract":"Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred input type, then we tailor an existing network to the DCT inputs. The proposed method has an accuracy close to the RGB model at about the same network complexity. Moreover, we investigate the impact of selecting different DCT components on segmentation performance. With a proper selection, one can achieve the same level accuracy using only 36% of the DCT coefficients. We further show the robustness of our method under the quantization errors. To our knowledge, this paper is the first to explore semantic segmentation on the DCT representation.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127886660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Make Skeleton-based Action Recognition Model Smaller, Faster and Better 使基于骨骼的动作识别模型更小、更快、更好
Pub Date : 2019-07-23 DOI: 10.1145/3338533.3366569
Fan Yang, S. Sakti, Yang Wu, Satoshi Nakamura
Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed. To alleviate this issue, we analyze skeleton sequence properties to propose a Double-feature Double-motion Network (DD-Net) for skeleton-based action recognition. By using a lightweight network structure (i.e., 0.15 million parameters), DD-Net can reach a super fast speed, as 3,500 FPS on an ordinary GPU (e.g., GTX 1080Ti), or, 2,000 FPS on an ordinary CPU (e.g., Intel E5-2620). By employing robust features, DD-Net achieves state-of-the-art performance on our experiment datasets: SHREC (i.e., hand actions) and JHMDB (i.e., body actions). Our code is on https://github.com/fandulu/DD-Net.
尽管基于骨架的动作识别近年来取得了很大的成功,但现有的大多数方法都存在模型尺寸大、执行速度慢的问题。为了解决这个问题,我们分析了骨架序列的特性,提出了一种基于骨架动作识别的双特征双运动网络(DD-Net)。通过使用轻量级的网络结构(即15万个参数),DD-Net可以达到超快的速度,在普通GPU(如GTX 1080Ti)上可以达到3500 FPS,在普通CPU(如Intel E5-2620)上可以达到2000 FPS。通过采用鲁棒性特征,DD-Net在我们的实验数据集上实现了最先进的性能:SHREC(即手部动作)和JHMDB(即身体动作)。我们的代码在https://github.com/fandulu/DD-Net。
{"title":"Make Skeleton-based Action Recognition Model Smaller, Faster and Better","authors":"Fan Yang, S. Sakti, Yang Wu, Satoshi Nakamura","doi":"10.1145/3338533.3366569","DOIUrl":"https://doi.org/10.1145/3338533.3366569","url":null,"abstract":"Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed. To alleviate this issue, we analyze skeleton sequence properties to propose a Double-feature Double-motion Network (DD-Net) for skeleton-based action recognition. By using a lightweight network structure (i.e., 0.15 million parameters), DD-Net can reach a super fast speed, as 3,500 FPS on an ordinary GPU (e.g., GTX 1080Ti), or, 2,000 FPS on an ordinary CPU (e.g., Intel E5-2620). By employing robust features, DD-Net achieves state-of-the-art performance on our experiment datasets: SHREC (i.e., hand actions) and JHMDB (i.e., body actions). Our code is on https://github.com/fandulu/DD-Net.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114690715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation 基于非对称卷积的高效密集模块实时语义分割
Pub Date : 2018-09-17 DOI: 10.1145/3338533.3366558
Shao-Yuan Lo, H. Hang, S. Chan, Jing-Jhih Lin
Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti.
实时语义分割在自动驾驶、机器人等实际应用中发挥着重要作用。大多数语义分割研究都集中在提高估计精度上,很少考虑效率。以往一些强调高速推理的研究往往不能得到高精度的分割结果。在本文中,我们提出了一种新的卷积网络,称为EDANet (Efficient Dense modules with Asymmetric convolution),它采用非对称卷积结构,结合了扩展卷积和密集连接,以低计算成本和模型大小实现了高效率。EDANet比现有的快速分割网络ICNet快2.7倍,并且在没有任何额外的上下文模块、后处理方案和预训练模型的情况下获得了相似的mIoU分数。我们在城市景观和CamVid数据集上评估EDANet,并将其与其他最先进的系统进行比较。我们的网络可以在一台GTX 1080Ti上以108 FPS的速度运行高分辨率输入。
{"title":"Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation","authors":"Shao-Yuan Lo, H. Hang, S. Chan, Jing-Jhih Lin","doi":"10.1145/3338533.3366558","DOIUrl":"https://doi.org/10.1145/3338533.3366558","url":null,"abstract":"Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127259121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 136
期刊
Proceedings of the ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1