Proceedings of the 26th ACM international conference on Multimedia最新文献

英文中文

Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval 跨模态检索中保留语义结构的嵌入学习

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240521

Yiling Wu, Shuhui Wang, Qingming Huang

This paper learns semantic embeddings for multi-label cross-modal retrieval. Our method exploits the structure in semantics represented by label vectors to guide the learning of embeddings. First, we construct a semantic graph based on label vectors which incorporates data from both modalities, and enforce the embeddings to preserve the local structure of this semantic graph. Second, we enforce the embeddings to well reconstruct the labels, i.e., the global semantic structure. In addition, we encourage the embeddings to preserve local geometric structure of each modality. Accordingly, the local and global semantic structure consistencies as well as the local geometric structure consistency are enforced, simultaneously. The mappings between inputs and embeddings are designed to be nonlinear neural network with larger capacity and more flexibility. The overall objective function is optimized by stochastic gradient descent to gain the scalability on large datasets. Experiments conducted on three real world datasets clearly demonstrate the superiority of our proposed approach over the state-of-the-art methods.

本文学习了多标签跨模态检索的语义嵌入。我们的方法利用标签向量表示的语义结构来指导嵌入的学习。首先，我们基于标签向量构建了一个包含两种模式数据的语义图，并强制嵌入以保持该语义图的局部结构。其次，我们强制嵌入以很好地重建标签，即全局语义结构。此外，我们鼓励嵌入保留每个模态的局部几何结构。因此，局部和全局语义结构的一致性以及局部几何结构的一致性同时被强制执行。输入和嵌入之间的映射被设计成具有更大容量和更大灵活性的非线性神经网络。采用随机梯度下降法对总体目标函数进行优化，以获得在大数据集上的可扩展性。在三个真实世界数据集上进行的实验清楚地表明，我们提出的方法优于最先进的方法。

引用次数: 30

Unsupervised Learning of 3D Model Reconstruction from Hand-Drawn Sketches 从手绘草图中重建3D模型的无监督学习

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240699

Lingjing Wang, Cheng Qian, Jifei Wang, Yi Fang

3D objects modeling has gained considerable attention in the visual computing community. We propose a low-cost unsupervised learning model for 3D objects reconstruction from hand-drawn sketches. Recent advancements in deep learning opened new opportunities to learn high-quality 3D objects from 2D sketches via supervised networks. However, the limited availability of labeled 2D hand-drawn sketches data (i.e. sketches and its corresponding 3D ground truth models) hinders the training process of supervised methods. In this paper, driven by a novel design of combination of retrieval and reconstruction process, we developed a learning paradigm to reconstruct 3D objects from hand-drawn sketches, without the use of well-labeled hand-drawn sketch data during the entire training process. Specifically, the paradigm begins with the training of an adaption network via autoencoder with adversarial loss, embedding the unpaired 2D rendered image domain with the hand-drawn sketch domain to a shared latent vector space. Then from the embedding latent space, for each testing sketch image, we retrieve a few (e.g. five) nearest neighbors from the training 3D data set as prior knowledge for a 3D Generative Adversarial Network. Our experiments verify our network's robust and superior performance in handling 3D volumetric object generation from single hand-drawn sketch without requiring any 3D ground truth labels.

三维对象建模在视觉计算界得到了相当大的关注。我们提出了一种低成本的无监督学习模型，用于从手绘草图中重建3D物体。深度学习的最新进展为通过监督网络从2D草图中学习高质量的3D对象提供了新的机会。然而，有标记的二维手绘草图数据(即草图及其相应的三维地面真值模型)的有限可用性阻碍了监督方法的训练过程。在本文中，我们采用了一种新颖的检索和重建过程相结合的设计，开发了一种从手绘草图中重建三维物体的学习范式，在整个训练过程中不使用标记好的手绘草图数据。具体来说，该范式首先通过具有对抗性损失的自编码器训练自适应网络，将未配对的2D渲染图像域与手绘草图域嵌入到共享潜在向量空间。然后，从嵌入的潜在空间中，对于每个测试草图图像，我们从训练3D数据集中检索几个(例如五个)最近的邻居作为3D生成对抗网络的先验知识。我们的实验验证了我们的网络在处理单个手绘草图的3D体积对象生成方面的鲁棒性和卓越性能，而不需要任何3D地面真值标签。

{"title":"Unsupervised Learning of 3D Model Reconstruction from Hand-Drawn Sketches","authors":"Lingjing Wang, Cheng Qian, Jifei Wang, Yi Fang","doi":"10.1145/3240508.3240699","DOIUrl":"https://doi.org/10.1145/3240508.3240699","url":null,"abstract":"3D objects modeling has gained considerable attention in the visual computing community. We propose a low-cost unsupervised learning model for 3D objects reconstruction from hand-drawn sketches. Recent advancements in deep learning opened new opportunities to learn high-quality 3D objects from 2D sketches via supervised networks. However, the limited availability of labeled 2D hand-drawn sketches data (i.e. sketches and its corresponding 3D ground truth models) hinders the training process of supervised methods. In this paper, driven by a novel design of combination of retrieval and reconstruction process, we developed a learning paradigm to reconstruct 3D objects from hand-drawn sketches, without the use of well-labeled hand-drawn sketch data during the entire training process. Specifically, the paradigm begins with the training of an adaption network via autoencoder with adversarial loss, embedding the unpaired 2D rendered image domain with the hand-drawn sketch domain to a shared latent vector space. Then from the embedding latent space, for each testing sketch image, we retrieve a few (e.g. five) nearest neighbors from the training 3D data set as prior knowledge for a 3D Generative Adversarial Network. Our experiments verify our network's robust and superior performance in handling 3D volumetric object generation from single hand-drawn sketch without requiring any 3D ground truth labels.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126644567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia 社会化多媒体社区问答的细心交互卷积匹配

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240626

Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu

Nowadays, community-based question answering (CQA) services have accumulated millions of users to share valuable knowledge. An essential function in CQA tasks is the accurate matching of answers w.r.t given questions. Existing methods usually ignore the redundant, heterogeneous, and multi-modal properties of CQA systems. In this paper, we propose a multi-modal attentive interactive convolutional matching method (MMAICM) to model the multi-modal content and social context jointly for questions and answers in a unified framework for CQA retrieval, which explores the redundant, heterogeneous, and multi-modal properties of CQA systems jointly. A well-designed attention mechanism is proposed to focus on useful word-pair interactions and neglect meaningless and noisy word-pair interactions. Moreover, a multi-modal interaction matrix method and a novel meta-path based network representation approach are proposed to consider the multi-modal content and social context, respectively. The attentive interactive convolutional matching network is proposed to infer the relevance between questions and answers, which can capture both the lexical and the sequential information of the contents. Experiment results on two real-world datasets demonstrate the superior performance of MMAICM compared with other state-of-the-art algorithms.

如今，基于社区的问答(CQA)服务已经积累了数百万用户来分享有价值的知识。CQA任务的一个基本功能是准确匹配给定问题的答案。现有的方法通常忽略了CQA系统的冗余、异构和多模态特性。本文提出了一种多模态关注交互卷积匹配方法(MMAICM)，在统一的CQA检索框架中对问答的多模态内容和社会上下文进行联合建模，共同探索CQA系统的冗余性、异构性和多模态特性。提出了一种精心设计的注意机制，以关注有用的词对交互，忽略无意义的和嘈杂的词对交互。此外，提出了一种多模态交互矩阵方法和一种新的基于元路径的网络表示方法，分别考虑了多模态内容和社会背景。提出了一种关注交互卷积匹配网络来推断问题和答案之间的相关性，该网络可以同时捕获内容的词汇信息和顺序信息。在两个真实数据集上的实验结果表明，与其他最新算法相比，MMAICM具有优越的性能。

{"title":"Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia","authors":"Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu","doi":"10.1145/3240508.3240626","DOIUrl":"https://doi.org/10.1145/3240508.3240626","url":null,"abstract":"Nowadays, community-based question answering (CQA) services have accumulated millions of users to share valuable knowledge. An essential function in CQA tasks is the accurate matching of answers w.r.t given questions. Existing methods usually ignore the redundant, heterogeneous, and multi-modal properties of CQA systems. In this paper, we propose a multi-modal attentive interactive convolutional matching method (MMAICM) to model the multi-modal content and social context jointly for questions and answers in a unified framework for CQA retrieval, which explores the redundant, heterogeneous, and multi-modal properties of CQA systems jointly. A well-designed attention mechanism is proposed to focus on useful word-pair interactions and neglect meaningless and noisy word-pair interactions. Moreover, a multi-modal interaction matrix method and a novel meta-path based network representation approach are proposed to consider the multi-modal content and social context, respectively. The attentive interactive convolutional matching network is proposed to infer the relevance between questions and answers, which can capture both the lexical and the sequential information of the contents. Experiment results on two real-world datasets demonstrate the superior performance of MMAICM compared with other state-of-the-art algorithms.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123339003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling 基于卷积序列建模的场景文本识别的注意力和语言集成

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240571

Shancheng Fang, Hongtao Xie, Zhengjun Zha, Nannan Sun, Jianlong Tan, Yongdong Zhang

Recent dominant approaches for scene text recognition are mainly based on convolutional neural network (CNN) and recurrent neural network (RNN), where the CNN processes images and the RNN generates character sequences. Different from these methods, we propose an attention-based architecture1 which is completely based on CNNs. The distinctive characteristics of our method include: (1) the method follows encoder-decoder architecture, in which the encoder is a two-dimensional residual CNN and the decoder is a deep one-dimensional CNN. (2) An attention module that captures visual cues, and a language module that models linguistic rules are designed equally in the decoder. Therefore the attention and language can be viewed as an ensemble to boost predictions jointly. (3) Instead of using a single loss from language aspect, multiple losses from attention and language are accumulated for training the networks in an end-to-end way. We conduct experiments on standard datasets for scene text recognition, including Street View Text, IIIT5K and ICDAR datasets. The experimental results show our CNN-based method has achieved state-of-the-art performance on several benchmark datasets, even without the use of RNN.

目前主流的场景文本识别方法主要基于卷积神经网络(CNN)和递归神经网络(RNN)，其中CNN处理图像，RNN生成字符序列。与这些方法不同，我们提出了一种完全基于cnn的基于注意力的架构1。该方法的显著特点包括:(1)采用编码器-解码器结构，其中编码器为二维残差CNN，解码器为一维深度CNN。(2)在解码器中，捕获视觉线索的注意模块和模拟语言规则的语言模块设计相同。因此，注意力和语言可以看作是一个整体，共同提高预测。(3)不再使用语言方面的单一损失，而是将注意力和语言方面的多重损失累积起来，端到端训练网络。我们在场景文本识别的标准数据集上进行了实验，包括街景文本、IIIT5K和ICDAR数据集。实验结果表明，即使不使用RNN，基于cnn的方法在几个基准数据集上也取得了最先进的性能。

{"title":"Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling","authors":"Shancheng Fang, Hongtao Xie, Zhengjun Zha, Nannan Sun, Jianlong Tan, Yongdong Zhang","doi":"10.1145/3240508.3240571","DOIUrl":"https://doi.org/10.1145/3240508.3240571","url":null,"abstract":"Recent dominant approaches for scene text recognition are mainly based on convolutional neural network (CNN) and recurrent neural network (RNN), where the CNN processes images and the RNN generates character sequences. Different from these methods, we propose an attention-based architecture1 which is completely based on CNNs. The distinctive characteristics of our method include: (1) the method follows encoder-decoder architecture, in which the encoder is a two-dimensional residual CNN and the decoder is a deep one-dimensional CNN. (2) An attention module that captures visual cues, and a language module that models linguistic rules are designed equally in the decoder. Therefore the attention and language can be viewed as an ensemble to boost predictions jointly. (3) Instead of using a single loss from language aspect, multiple losses from attention and language are accumulated for training the networks in an end-to-end way. We conduct experiments on standard datasets for scene text recognition, including Street View Text, IIIT5K and ICDAR datasets. The experimental results show our CNN-based method has achieved state-of-the-art performance on several benchmark datasets, even without the use of RNN.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"136 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125813784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

ALERT 警报

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241912

K. Bahirat, Umang Shah, A. Cárdenas, B. Prabhakaran

UPDATE: MANAGEMENT OF ABNORMAL CERVICAL CYTOLOGY Invasive cervical cancer is a preventable disease in large majority of women, as long as preinvasive cervical lesions are effectively detected and treated. The Family PACT Program has adopted the 2006 Consensus Guidelines of the American Society for Colposcopy and Cervical Pathology (ASCCP), which are included with this Alert. KEY POINTS • The purpose of cervical cancer screening is the detection and treatment of high-grade squamous epithelial lesions (CIN 2, 3), adenocarcinoma precursors, and cervical cancers. • Women with biopsy proven CIN 1 should be observed carefully and treated only if the lesion progresses to CIN 2, 3, is persistent for two years or more, or if the woman insists upon early treatment. • An office-based tracking system should be used to ensure that women with abnormal cytology findings have been notified of their results and that those who are being followed are reminded of the need for return visits, tests, and procedures. • The tables included in this Alert summarize the 2006 ASCCP Guidelines, but more comprehensive versions are listed as references. Since not all recommended interventions are Program benefits, please refer to the Family PACT Policies, Procedures and Billing Instructions (PPBI) for more information. QUESTIONS AND ANSWERS What is the role of HPV-DNA testing in women under 21 years old? The new guidelines emphasize that there is no role for HPV-DNA testing in women under 21 years old, since incident HPV infections are common and a positive test result would have no impact on client management. HPV infections in young women are likely to be transient and most will resolve quickly. What is the preferred approach to managing ASC-US? Adolescents with results of ASC-US or LSIL should have repeat cytology in one year, but not HPV testing or colposcopy. Consequently, in women under 21 years old, “reflex HPV tests for ASC-US” must not be ordered when submitting the Pap request to the laboratory. Women 21 years of age and older can be managed by either repeat cytology in six months, reflex HPV-DNA testing, or colposcopy. Why aren’t all women with CIN 1 treated with cryotherapy or LEEP? Of women with CIN 1 lesions, fewer than 20 percent will develop a high grade lesion, with even lower progression rates in adolescents. For women 21 years and older, observation is recommended, with treatment only if the CIN 1 lesion progresses or persists for at least two years. Should all women with CIN 2 or 3 be treated? In general, the treatment for CIN 2 or 3 is cryotherapy or a LEEP procedure. However, the preferred treatment for adolescent and young women with CIN 2 and satisfactory colposcopy is observation, which consists of colposcopy plus cytology every six months for up to 24 months. If the colposcopic pattern worsens or a high grade lesion persists for more than 24 months from diagnosis, treatment is necessary. What are the indications for colposcopy? • Cytology result with A

浸润性宫颈癌在大多数女性中是一种可预防的疾病，只要有效地检测和治疗浸润性宫颈病变。家庭PACT项目采用了美国阴道镜检查和宫颈病理学会(ASCCP) 2006年共识指南，该指南包含在本警报中。•宫颈癌筛查的目的是检测和治疗高级别鳞状上皮病变(CIN 2,3)、腺癌前体和宫颈癌。•活检证实为CIN 1的妇女应仔细观察，只有当病变进展为CIN 2、3、持续两年或更长时间，或妇女坚持早期治疗时才应治疗。•应使用以办公室为基础的跟踪系统，以确保通知细胞学检查结果异常的妇女，并提醒正在接受跟踪的妇女需要进行回访、检查和程序。•本警报中包含的表格总结了2006年ASCCP指南，但列出了更全面的版本作为参考。由于并非所有推荐的干预措施都是项目的福利，请参阅家庭PACT政策，程序和计费说明(PPBI)了解更多信息。HPV-DNA检测在21岁以下女性中的作用是什么?新指南强调，在21岁以下的女性中没有HPV- dna检测的作用，因为HPV感染事件很常见，阳性检测结果对客户管理没有影响。年轻女性的HPV感染可能是短暂的，大多数会很快消退。管理ASC-US的首选方法是什么?有ASC-US或LSIL结果的青少年应该在一年内重复细胞学检查，但不需要HPV检测或阴道镜检查。因此，在21岁以下的女性中，在向实验室提交巴氏涂片检查请求时，不得要求进行“ASC-US反射性HPV检测”。21岁及以上的女性可以通过六个月的重复细胞学检查、反射性HPV-DNA检测或阴道镜检查来治疗。为什么不是所有患有CIN的女性都用冷冻疗法或LEEP治疗?在患有CIN病变的女性中，只有不到20%的人会发展为高级别病变，青少年的进展率甚至更低。对于21岁及以上的女性，建议观察，只有当CIN病变进展或持续至少2年时才进行治疗。是否所有CIN 2或3的女性都应该接受治疗?一般来说，CIN 2或3的治疗是冷冻治疗或LEEP手术。然而，对于患有CIN 2且阴道镜检查满意的青少年和年轻女性，首选的治疗方法是观察，包括每6个月一次的阴道镜检查和细胞学检查，最长可达24个月。如果阴道镜模式恶化或高度病变自诊断后持续超过24个月，治疗是必要的。阴道镜检查的指征是什么?•细胞学结果为ASC-H、HSIL或怀疑癌症•细胞学结果为LSIL的女性bb0 - 21岁(除非怀孕或绝经后)•细胞学结果为非典型腺细胞(AGC)，除非AGC-非典型子宫内膜细胞和子宫内膜取样阳性•细胞学结果显示ASC-US在以下情况下:•不愿频繁随访的女性•在观察期间重复进行ASC-US细胞学检查或表现较差的女性(青少年除外)•在最初或随后的检查中存在高危HPV-DNA(青少年除外)•宫颈白斑(可见的白色病变)或其他原因不明的宫颈病变，无论细胞学结果如何•原因不明或持续的宫颈出血，无论细胞学结果如何为什么家庭PACT不支付LEEP锥或“冷刀”锥活检?家庭行动计划是一项有限收益的计划生育和性传播感染(STI)计划。当妇女需要医疗上必要的服务而没有其他保险时，加州乳腺癌和宫颈癌治疗方案(BCCTP)可以提供支持。家庭PACT提供者可以通过互联网应用程序轻松地认证和注册BCCTP客户。计划政策

{"title":"ALERT","authors":"K. Bahirat, Umang Shah, A. Cárdenas, B. Prabhakaran","doi":"10.1145/3240508.3241912","DOIUrl":"https://doi.org/10.1145/3240508.3241912","url":null,"abstract":"UPDATE: MANAGEMENT OF ABNORMAL CERVICAL CYTOLOGY Invasive cervical cancer is a preventable disease in large majority of women, as long as preinvasive cervical lesions are effectively detected and treated. The Family PACT Program has adopted the 2006 Consensus Guidelines of the American Society for Colposcopy and Cervical Pathology (ASCCP), which are included with this Alert. KEY POINTS • The purpose of cervical cancer screening is the detection and treatment of high-grade squamous epithelial lesions (CIN 2, 3), adenocarcinoma precursors, and cervical cancers. • Women with biopsy proven CIN 1 should be observed carefully and treated only if the lesion progresses to CIN 2, 3, is persistent for two years or more, or if the woman insists upon early treatment. • An office-based tracking system should be used to ensure that women with abnormal cytology findings have been notified of their results and that those who are being followed are reminded of the need for return visits, tests, and procedures. • The tables included in this Alert summarize the 2006 ASCCP Guidelines, but more comprehensive versions are listed as references. Since not all recommended interventions are Program benefits, please refer to the Family PACT Policies, Procedures and Billing Instructions (PPBI) for more information. QUESTIONS AND ANSWERS What is the role of HPV-DNA testing in women under 21 years old? The new guidelines emphasize that there is no role for HPV-DNA testing in women under 21 years old, since incident HPV infections are common and a positive test result would have no impact on client management. HPV infections in young women are likely to be transient and most will resolve quickly. What is the preferred approach to managing ASC-US? Adolescents with results of ASC-US or LSIL should have repeat cytology in one year, but not HPV testing or colposcopy. Consequently, in women under 21 years old, “reflex HPV tests for ASC-US” must not be ordered when submitting the Pap request to the laboratory. Women 21 years of age and older can be managed by either repeat cytology in six months, reflex HPV-DNA testing, or colposcopy. Why aren’t all women with CIN 1 treated with cryotherapy or LEEP? Of women with CIN 1 lesions, fewer than 20 percent will develop a high grade lesion, with even lower progression rates in adolescents. For women 21 years and older, observation is recommended, with treatment only if the CIN 1 lesion progresses or persists for at least two years. Should all women with CIN 2 or 3 be treated? In general, the treatment for CIN 2 or 3 is cryotherapy or a LEEP procedure. However, the preferred treatment for adolescent and young women with CIN 2 and satisfactory colposcopy is observation, which consists of colposcopy plus cytology every six months for up to 24 months. If the colposcopic pattern worsens or a high grade lesion persists for more than 24 months from diagnosis, treatment is necessary. What are the indications for colposcopy? • Cytology result with A","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114214107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

DASH for 3D Networked Virtual Environment DASH:三维网络虚拟环境

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240701

Thomas Forgione, A. Carlier, Géraldine Morin, Wei Tsang Ooi, V. Charvillat, P. Yadav

DASH is now a widely deployed standard for streaming video content due to its simplicity, scalability, and ease of deployment. In this paper, we explore the use of DASH for a different type of media content -- networked virtual environment (NVE), with different properties and requirements. We organize a polygon soup with textures into a structure that is compatible with DASH MPD (Media Presentation Description), with a minimal set of view-independent metadata for the client to make intelligent decisions about what data to download at which resolution. We also present a DASH-based NVE client that uses a view-dependent and network dependent utility metric to decide what to download, based only on the information in the MPD file. We show that DASH can be used on NVE for 3D content streaming. Our work opens up the possibility of using DASH for highly interactive applications, beyond its current use in video streaming.

由于其简单、可扩展性和易于部署，DASH现在是流媒体视频内容广泛部署的标准。在本文中，我们探讨了DASH在具有不同属性和要求的不同类型的媒体内容——网络虚拟环境(NVE)中的使用。我们将一个带有纹理的多边形汤组织成一个与DASH MPD(媒体呈现描述)兼容的结构，并使用一组最小的与视图无关的元数据，以便客户端做出关于以哪种分辨率下载什么数据的智能决策。我们还提供了一个基于dash的NVE客户端，该客户端使用依赖于视图和网络的效用指标来决定下载什么，仅基于MPD文件中的信息。我们证明DASH可以在NVE上用于3D内容流。我们的工作开辟了将DASH用于高度交互应用的可能性，而不仅仅是目前在视频流中的使用。

引用次数: 15

Partial Multi-view Subspace Clustering 部分多视图子空间聚类

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240679

Nan Xu, Yanqing Guo, Xin Zheng, Qianyu Wang, Xiangyang Luo

For many real-world multimedia applications, data are often described by multiple views. Therefore, multi-view learning researches are of great significance. Traditional multi-view clustering methods assume that each view has complete data. However, missing data or partial data are more common in real tasks, which results in partial multi-view learning. Therefore, we propose a novel multi-view clustering method, called Partial Multi-view Subspace Clustering (PMSC), to address the partial multi-view problem. Unlike most existing partial multi-view clustering methods that only learn a new representation of the original data, our method seeks the latent space and performs data reconstruction simultaneously to learn the subspace representation. The learned subspace representation can reveal the underlying subspace structure embedded in original data, leading to a more comprehensive data description. In addition, we enforce the subspace representation to be non-negative, yielding an intuitive weight interpretation among different data. The proposed method can be optimized by the Augmented Lagrange Multiplier (ALM) algorithm. Experiments on one synthetic dataset and four benchmark datasets validate the effectiveness of PMSC under the partial multi-view scenario.

对于许多实际的多媒体应用程序，数据通常由多个视图描述。因此，多视角学习研究具有重要意义。传统的多视图聚类方法假设每个视图都有完整的数据。然而，在实际任务中，数据缺失或部分数据更为常见，这导致了部分多视图学习。因此，我们提出了一种新的多视图聚类方法，称为部分多视图子空间聚类(PMSC)，以解决部分多视图问题。与大多数现有的部分多视图聚类方法只学习原始数据的新表示不同，我们的方法在寻找潜在空间的同时进行数据重构以学习子空间表示。学习到的子空间表示可以揭示嵌入在原始数据中的底层子空间结构，从而实现更全面的数据描述。此外，我们强制子空间表示是非负的，从而在不同数据之间产生直观的权重解释。该方法可通过增广拉格朗日乘子(ALM)算法进行优化。在一个合成数据集和四个基准数据集上的实验验证了PMSC在部分多视图场景下的有效性。

{"title":"Partial Multi-view Subspace Clustering","authors":"Nan Xu, Yanqing Guo, Xin Zheng, Qianyu Wang, Xiangyang Luo","doi":"10.1145/3240508.3240679","DOIUrl":"https://doi.org/10.1145/3240508.3240679","url":null,"abstract":"For many real-world multimedia applications, data are often described by multiple views. Therefore, multi-view learning researches are of great significance. Traditional multi-view clustering methods assume that each view has complete data. However, missing data or partial data are more common in real tasks, which results in partial multi-view learning. Therefore, we propose a novel multi-view clustering method, called Partial Multi-view Subspace Clustering (PMSC), to address the partial multi-view problem. Unlike most existing partial multi-view clustering methods that only learn a new representation of the original data, our method seeks the latent space and performs data reconstruction simultaneously to learn the subspace representation. The learned subspace representation can reveal the underlying subspace structure embedded in original data, leading to a more comprehensive data description. In addition, we enforce the subspace representation to be non-negative, yielding an intuitive weight interpretation among different data. The proposed method can be optimized by the Augmented Lagrange Multiplier (ALM) algorithm. Experiments on one synthetic dataset and four benchmark datasets validate the effectiveness of PMSC under the partial multi-view scenario.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121187751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Session details: FF-1 会话详细信息:FF-1

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3286915

C. Changwen

引用次数: 0

ChipGAN

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240655

Bin He, Feng Gao, Daiqian Ma, Boxin Shi, Ling-yu Duan

Style transfer has been successfully applied on photos to generate realistic western paintings. However, because of the inherently different painting techniques adopted by Chinese and western paintings, directly applying existing methods cannot generate satisfactory results for Chinese ink wash painting style transfer. This paper proposes ChipGAN, an end-to-end Generative Adversarial Network based architecture for photo to Chinese ink wash painting style transfer. The core modules of ChipGAN enforce three constraints -- voids, brush strokes, and ink wash tone and diffusion -- to address three key techniques commonly adopted in Chinese ink wash painting. We conduct stylization perceptual study to score the similarity of generated paintings to real paintings by consulting with professional artists based on the newly built Chinese ink wash photo and image dataset. The advantages in visual quality compared with state-of-the-art networks and high stylization perceptual study scores show the effectiveness of the proposed method.

引用次数: 49

Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern 基于时空上下文模式的在线动作管检测

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240659

Jingjia Huang, Nannan Li, Jia-Xing Zhong, Thomas H. Li, Ge Li

At present, spatio-temporal action detection in the video is still a challenging problem, considering the complexity of the background, the variety of the action or the change of the viewpoint in the unconstrained environment. Most of current approaches solve the problem via a two-step processing: first detecting actions at each frame; then linking them, which neglects the continuity of the action and operates in an offline and batch processing manner. In this paper, we attempt to build an online action detection model that introduces the spatio-temporal coherence existed among action regions when performing action category inference and position localization. Specifically, we seek to represent the spatio-temporal context pattern via establishing an encoder-decoder model based on the convolutional recurrent network. The model accepts a video snippet as input and encodes the dynamic information of the action in the forward pass. During the backward pass, it resolves such information at each time instant for action detection via fusing the current static or motion cue. Additionally, we propose an incremental action tube generation algorithm, which accomplishes action bounding-boxes association, action label determination and the temporal trimming in a single pass. Our model takes in the appearance, motion or fused signals as input and is tested on two prevailing datasets, UCF-Sports and UCF-101. The experiment results demonstrate the effectiveness of our method which achieves a performance superior or comparable to compared existing approaches.

目前，考虑到背景的复杂性、动作的多样性以及无约束环境下视点的变化，视频中的时空动作检测仍然是一个具有挑战性的问题。目前大多数方法通过两步处理来解决问题:首先检测每帧的动作;然后将它们链接起来，忽略了动作的连续性，以离线和批处理的方式操作。在本文中，我们试图建立一个在线动作检测模型，该模型引入了动作区域在进行动作类别推理和位置定位时存在的时空相干性。具体来说，我们试图通过建立基于卷积循环网络的编码器-解码器模型来表示时空上下文模式。该模型接受视频片段作为输入，并对动作的动态信息进行编码。在反向传递过程中，它通过融合当前静态或运动线索，在每个时间瞬间解析这些信息以进行动作检测。此外，我们还提出了一种增量动作管生成算法，该算法在一次遍历中完成动作边界盒关联、动作标签确定和时间修剪。我们的模型将外观，运动或融合信号作为输入，并在两个流行的数据集UCF-Sports和UCF-101上进行了测试。实验结果证明了该方法的有效性，其性能优于或可与已有的方法相媲美。

{"title":"Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern","authors":"Jingjia Huang, Nannan Li, Jia-Xing Zhong, Thomas H. Li, Ge Li","doi":"10.1145/3240508.3240659","DOIUrl":"https://doi.org/10.1145/3240508.3240659","url":null,"abstract":"At present, spatio-temporal action detection in the video is still a challenging problem, considering the complexity of the background, the variety of the action or the change of the viewpoint in the unconstrained environment. Most of current approaches solve the problem via a two-step processing: first detecting actions at each frame; then linking them, which neglects the continuity of the action and operates in an offline and batch processing manner. In this paper, we attempt to build an online action detection model that introduces the spatio-temporal coherence existed among action regions when performing action category inference and position localization. Specifically, we seek to represent the spatio-temporal context pattern via establishing an encoder-decoder model based on the convolutional recurrent network. The model accepts a video snippet as input and encodes the dynamic information of the action in the forward pass. During the backward pass, it resolves such information at each time instant for action detection via fusing the current static or motion cue. Additionally, we propose an incremental action tube generation algorithm, which accomplishes action bounding-boxes association, action label determination and the temporal trimming in a single pass. Our model takes in the appearance, motion or fused signals as input and is tested on two prevailing datasets, UCF-Sports and UCF-101. The experiment results demonstrate the effectiveness of our method which achieves a performance superior or comparable to compared existing approaches.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125253743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 26th ACM international conference on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀