Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献

英文中文

Improving face recognition in surveillance video with judicious selection and fusion of representative frames 通过代表性帧的选择和融合，提高监控视频中的人脸识别能力

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446259

Zhaozhen Ding, Qingfang Zheng, Chunhua Hou, Guang Shen

Face recognition in unconstrained surveillance videos is challenging due to the different acquisition settings and face variations. We propose to utilize the complementary correlation between multi-frames to improve face recognition performance. We design an algorithm to build a representative frame set from the video sequence, selecting faces with high quality and large appearance diversity. We also devise a refined Deep Residual Equivariant Mapping (DREAM) block to improve the discriminative power of the extracted deep features. Extensive experiments on two relevant face recognition benchmarks, YouTube Face and IJB-A, show the effectiveness of the proposed method. Our work is also lightweight, and can be easily embedded into existing CNN based face recognition systems.

由于采集设置和人脸变化的不同，无约束监控视频中的人脸识别具有挑战性。我们提出利用多帧之间的互补相关性来提高人脸识别性能。我们设计了一种算法，从视频序列中构建具有代表性的帧集，选择具有高质量和大外观多样性的人脸。我们还设计了一个改进的深度残差等变映射(DREAM)块，以提高提取的深度特征的判别能力。在YouTube face和IJB-A两个相关的人脸识别基准上进行的大量实验表明了所提出方法的有效性。我们的工作也是轻量级的，可以很容易地嵌入到现有的基于CNN的人脸识别系统中。

引用次数: 0

Local structure alignment guided domain adaptation with few source samples 局部结构对齐引导域自适应少源样本

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446327

Yuying Cai, Jinfeng Li, Baodi Liu, Weifeng Liu, Kai Zhang, Changsheng Xu

Domain adaptation has received lots of attention for its high efficiency in dealing with cross-domain learning tasks. Most existing domain adaptation methods adopt the strategies relying on large amounts of source label information, which limits their applications in the real world where only a few label samples are available. We exploit the local geometric connections to tackle this problem and propose a Local Structure Alignment (LSA) guided domain adaptation method in this paper. LSA leverages the Nyström method to describe the distribution difference from the geometric perspective and then perform the distribution alignment between domains. Specifically, LSA constructs a domain-invariant Hessian matrix to locally connect the data of the two domains through minimizing the Nyström approximation error. And then it integrates the domain-invariant Hessian matrix with the semi-supervised learning and finally builds an adaptive semi-supervised model. Extensive experimental results validate that the proposed LSA outperforms the traditional domain adaptation methods especially when only sparse source label information is available.

领域自适应以其处理跨领域学习任务的高效率而受到广泛关注。现有的领域自适应方法大多采用依赖于大量源标签信息的策略，这限制了它们在标签样本较少的现实世界中的应用。本文利用局部几何连接来解决这一问题，提出了一种局部结构对齐(LSA)引导的域自适应方法。LSA利用Nyström方法从几何角度描述分布差异，然后执行域之间的分布对齐。具体来说，LSA构建一个域不变的Hessian矩阵，通过最小化Nyström近似误差来局部连接两个域的数据。然后将域不变Hessian矩阵与半监督学习相结合，建立自适应半监督学习模型。大量的实验结果验证了该方法优于传统的域自适应方法，特别是在只有稀疏源标签信息的情况下。

{"title":"Local structure alignment guided domain adaptation with few source samples","authors":"Yuying Cai, Jinfeng Li, Baodi Liu, Weifeng Liu, Kai Zhang, Changsheng Xu","doi":"10.1145/3444685.3446327","DOIUrl":"https://doi.org/10.1145/3444685.3446327","url":null,"abstract":"Domain adaptation has received lots of attention for its high efficiency in dealing with cross-domain learning tasks. Most existing domain adaptation methods adopt the strategies relying on large amounts of source label information, which limits their applications in the real world where only a few label samples are available. We exploit the local geometric connections to tackle this problem and propose a Local Structure Alignment (LSA) guided domain adaptation method in this paper. LSA leverages the Nyström method to describe the distribution difference from the geometric perspective and then perform the distribution alignment between domains. Specifically, LSA constructs a domain-invariant Hessian matrix to locally connect the data of the two domains through minimizing the Nyström approximation error. And then it integrates the domain-invariant Hessian matrix with the semi-supervised learning and finally builds an adaptive semi-supervised model. Extensive experimental results validate that the proposed LSA outperforms the traditional domain adaptation methods especially when only sparse source label information is available.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116463023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intermediate coordinate based pose non-perspective estimation from line correspondences 基于中间坐标的线对应位姿非透视估计

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446299

Yujia Cao, Zhichao Cui, Yuehu Liu, Xiaojun Lv, K.C.C. Peng

In this paper, a non-iterative solution to the non-perspective pose estimation from line correspondences was proposed. Specifically, the proposed method uses an intermediate camera frame and an intermediate world frame, which simplifies the expression of rotation matrix by reducing to the two freedoms from three in the rotation matrix R. Then formulate the pose estimation problem into an optimal problem. Our method solve the parameters of rotation matrix by building the fifteenth-order and fourth-order univariate polynomial. The proposed method can be applied into the pose estimation of the perspective camera. We utilize both the simulated data and real data to conduct the comparative experiments. The experimental results show that the proposed method is comparable or better than existing methods in the aspects of accuracy, stability and efficiency.

本文提出了一种基于直线对应的非透视姿态估计的非迭代解。该方法采用中间相机帧和中间世界帧，将旋转矩阵r中的三个自由度简化为两个自由度，从而将姿态估计问题转化为最优问题。该方法通过建立十五阶和四阶单变量多项式来求解旋转矩阵的参数。该方法可应用于透视相机的姿态估计。我们利用模拟数据和真实数据进行对比实验。实验结果表明，该方法在精度、稳定性和效率方面与现有方法相当或更好。

引用次数: 0

Robust visual tracking via scale-aware localization and peak response strength 鲁棒视觉跟踪通过规模感知定位和峰值响应强度

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446274

Ying Wang, Luo Xiong, Kaiwen Du, Yan Yan, Hanzi Wang

Existing regression-based deep trackers usually localize a target based on a response map, where the highest peak response corresponds to the predicted target location. Nevertheless, when the background distractors appear or the target scale changes frequently, the response map is prone to produce multiple sub-peak responses to interfere with model prediction. In this paper, we propose a robust online tracking method via Scale-Aware localization and Peak Response strength (SAPR), which can learn a discriminative model predictor to estimate a target state accurately. Specifically, to cope with large scale variations, we propose a Scale-Aware Localization (SAL) module to provide multi-scale response maps based on the scale pyramid scheme. Furthermore, to focus on the target response, we propose a simple yet effective Peak Response Strength (PRS) module to fuse the multi-scale response maps and the response maps generated by a correlation filter. According to the response map with the maximum classification score, the model predictor iteratively updates its filter weights for accurate target state estimation. Experimental results on three benchmark datasets, including OTB100, VOT2018 and LaSOT, demonstrate that the proposed SAPR accurately estimates the target state, achieving the favorable performance against several state-of-the-art trackers.

现有的基于回归的深度跟踪器通常基于响应图来定位目标，其中峰值响应与预测的目标位置相对应。然而，当背景干扰物出现或目标标度频繁变化时，响应图容易产生多个亚峰响应，干扰模型预测。在本文中，我们提出了一种基于尺度感知定位和峰值响应强度(SAPR)的鲁棒在线跟踪方法，该方法可以学习判别模型预测器来准确估计目标状态。具体来说，为了应对大尺度变化，我们提出了一个基于尺度金字塔方案的尺度感知定位(SAL)模块来提供多尺度响应图。此外，为了关注目标响应，我们提出了一个简单而有效的峰值响应强度(PRS)模块来融合多尺度响应图和由相关滤波器生成的响应图。模型预测器根据分类得分最高的响应图，迭代更新滤波器权重，以准确估计目标状态。在OTB100、VOT2018和LaSOT三个基准数据集上的实验结果表明，所提出的SAPR能够准确地估计目标状态，在几种最先进的跟踪器中取得了良好的性能。

{"title":"Robust visual tracking via scale-aware localization and peak response strength","authors":"Ying Wang, Luo Xiong, Kaiwen Du, Yan Yan, Hanzi Wang","doi":"10.1145/3444685.3446274","DOIUrl":"https://doi.org/10.1145/3444685.3446274","url":null,"abstract":"Existing regression-based deep trackers usually localize a target based on a response map, where the highest peak response corresponds to the predicted target location. Nevertheless, when the background distractors appear or the target scale changes frequently, the response map is prone to produce multiple sub-peak responses to interfere with model prediction. In this paper, we propose a robust online tracking method via Scale-Aware localization and Peak Response strength (SAPR), which can learn a discriminative model predictor to estimate a target state accurately. Specifically, to cope with large scale variations, we propose a Scale-Aware Localization (SAL) module to provide multi-scale response maps based on the scale pyramid scheme. Furthermore, to focus on the target response, we propose a simple yet effective Peak Response Strength (PRS) module to fuse the multi-scale response maps and the response maps generated by a correlation filter. According to the response map with the maximum classification score, the model predictor iteratively updates its filter weights for accurate target state estimation. Experimental results on three benchmark datasets, including OTB100, VOT2018 and LaSOT, demonstrate that the proposed SAPR accurately estimates the target state, achieving the favorable performance against several state-of-the-art trackers.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130979850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A background-induced generative network with multi-level discriminator for text-to-image generation 基于背景诱导的多层次鉴别器的文本到图像生成网络

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446291

Ping Wang, Li Liu, Huaxiang Zhang, Tianshi Wang

Most existing text-to-image generation methods focus on synthesizing images using only text descriptions, but this cannot meet the requirement of generating desired objects with given backgrounds. In this paper, we propose a Background-induced Generative Network (BGNet) that combines attention mechanisms, background synthesis, and multi-level discriminator to generate realistic images with given backgrounds according to text descriptions. BGNet takes a multi-stage generation as the basic framework to generate fine-grained images and introduces a hybrid attention mechanism to capture the local semantic correlation between texts and images. To adjust the impact of the given backgrounds on the synthesized images, synthesis blocks are added at each stage of image generation, which appropriately combines the foreground objects generated by the text descriptions with the given background images. Besides, a multi-level discriminator and its corresponding loss function are proposed to optimize the synthesized images. The experimental results on the CUB bird dataset demonstrate the superiority of our method and its ability to generate realistic images with given backgrounds.

现有的文本到图像的生成方法大多侧重于仅使用文本描述来合成图像，但这不能满足给定背景下生成所需对象的要求。本文提出了一种结合注意机制、背景合成和多层次鉴别器的背景诱导生成网络(BGNet)，根据文本描述生成具有给定背景的逼真图像。BGNet以多阶段生成为基本框架生成细粒度图像，并引入混合注意机制捕获文本和图像之间的局部语义关联。为了调整给定背景对合成图像的影响，在图像生成的每个阶段添加合成块，将文本描述生成的前景对象与给定背景图像适当地结合起来。此外，提出了一种多级鉴别器及其相应的损失函数来优化合成图像。在CUB鸟类数据集上的实验结果证明了该方法的优越性和在给定背景下生成真实图像的能力。

{"title":"A background-induced generative network with multi-level discriminator for text-to-image generation","authors":"Ping Wang, Li Liu, Huaxiang Zhang, Tianshi Wang","doi":"10.1145/3444685.3446291","DOIUrl":"https://doi.org/10.1145/3444685.3446291","url":null,"abstract":"Most existing text-to-image generation methods focus on synthesizing images using only text descriptions, but this cannot meet the requirement of generating desired objects with given backgrounds. In this paper, we propose a Background-induced Generative Network (BGNet) that combines attention mechanisms, background synthesis, and multi-level discriminator to generate realistic images with given backgrounds according to text descriptions. BGNet takes a multi-stage generation as the basic framework to generate fine-grained images and introduces a hybrid attention mechanism to capture the local semantic correlation between texts and images. To adjust the impact of the given backgrounds on the synthesized images, synthesis blocks are added at each stage of image generation, which appropriately combines the foreground objects generated by the text descriptions with the given background images. Besides, a multi-level discriminator and its corresponding loss function are proposed to optimize the synthesized images. The experimental results on the CUB bird dataset demonstrate the superiority of our method and its ability to generate realistic images with given backgrounds.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133740910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Transfer non-stationary texture with complex appearance 转移具有复杂外观的非静止纹理

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446297

Cheng Peng, Na Qi, Qing Zhu

Texture transfer has been successfully applied in computer vision and computer graphics. Since non-stationary textures are usually complex and anisotropic, it is challenging to transfer these textures by simple supervised method. In this paper, we propose a general solution for non-stationary texture transfer, which can preserve the local structure and visual richness of textures. The inputs of our framework are source texture and semantic annotation pair. We record different semantics as different regions and obtain the color and distribution information from different regions, which is used to guide the the low-level texture transfer algorithm. Specifically, we exploit these local distributions to regularize the texture transfer objective function, which is minimized by iterative search and voting steps. In the search step, we search the nearest neighbor fields of source image to target image through Generalized PatchMatch (GPM) algorithm. In the voting step, we calculate histogram weights and coherence weights for different semantic regions to ensure color accuracy and texture continuity, and to further transfer the textures from the source to the target. By comparing with state-of-the-art algorithms, we demonstrate the effectiveness and superiority of our technique in various non-stationary textures.

纹理传输技术在计算机视觉和计算机图形学中得到了成功的应用。由于非静止纹理通常是复杂的和各向异性的，用简单的监督方法转移这些纹理是具有挑战性的。本文提出了一种非平稳纹理传输的通用解决方案，该方案既能保持纹理的局部结构，又能保持纹理的视觉丰富性。框架的输入是源纹理和语义标注对。我们将不同的语义记录为不同的区域，获取不同区域的颜色和分布信息，用于指导底层纹理转移算法。具体来说，我们利用这些局部分布来正则化纹理传递目标函数，并通过迭代搜索和投票步骤最小化纹理传递目标函数。在搜索步骤中，我们通过广义PatchMatch (GPM)算法搜索源图像到目标图像的最近邻字段。在投票步骤中，我们计算不同语义区域的直方图权值和相干权值，以确保颜色的准确性和纹理的连续性，并进一步将纹理从源转移到目标。通过与现有算法的比较，我们证明了该技术在各种非平稳纹理中的有效性和优越性。

{"title":"Transfer non-stationary texture with complex appearance","authors":"Cheng Peng, Na Qi, Qing Zhu","doi":"10.1145/3444685.3446297","DOIUrl":"https://doi.org/10.1145/3444685.3446297","url":null,"abstract":"Texture transfer has been successfully applied in computer vision and computer graphics. Since non-stationary textures are usually complex and anisotropic, it is challenging to transfer these textures by simple supervised method. In this paper, we propose a general solution for non-stationary texture transfer, which can preserve the local structure and visual richness of textures. The inputs of our framework are source texture and semantic annotation pair. We record different semantics as different regions and obtain the color and distribution information from different regions, which is used to guide the the low-level texture transfer algorithm. Specifically, we exploit these local distributions to regularize the texture transfer objective function, which is minimized by iterative search and voting steps. In the search step, we search the nearest neighbor fields of source image to target image through Generalized PatchMatch (GPM) algorithm. In the voting step, we calculate histogram weights and coherence weights for different semantic regions to ensure color accuracy and texture continuity, and to further transfer the textures from the source to the target. By comparing with state-of-the-art algorithms, we demonstrate the effectiveness and superiority of our technique in various non-stationary textures.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel system architecture and an automatic monitoring method for remote production 一种新颖的系统架构和远程生产的自动监控方法

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446277

Yasuhiro Mochida, D. Shirai, Takahiro Yamaguchi, S. Kuwabara, H. Nishizawa

Remote production is an emerging concept concerning the outside-broadcasting workflow enabled by Internet Protocol (IP)-based production systems, and it is expected to be much more efficient than the conventional workflow. However, long-distance transmission of uncompressed video signals and time synchronization of distributed IP-video devices are challenging. A system architecture for remote production using optical transponders (capable of long-distance and large-capacity optical communication) is proposed. A field experiment confirmed that uncompressed video signals can be transmitted successfully by this architecture. The status monitoring of uncompressed video transmission in remote production is also challenging. To address the challenge, a method for automatically monitoring the status of IP-video devices is also proposed. The monitoring system was implemented by using whitebox transponders, and it was confirmed that the system can automatically register IP-video devices, generate an IP-video flow model, and detect traffic anomalies.

远程生产是基于互联网协议(IP)的生产系统所支持的外部广播工作流程的新兴概念，它有望比传统的工作流程效率高得多。然而，分布式ip视频设备的非压缩视频信号的远距离传输和时间同步是一个挑战。提出了一种利用光转发器实现远距离大容量光通信的远程生产系统架构。现场实验证明，该结构可以成功传输未压缩的视频信号。远程生产中非压缩视频传输的状态监控也是一个挑战。为了解决这一问题，本文还提出了一种自动监控ip视频设备状态的方法。利用白盒应答器实现了监控系统，验证了系统能够自动注册ip -视频设备，生成ip -视频流模型，检测流量异常。

引用次数: 1

Multi-focus noisy image fusion based on gradient regularized convolutional sparse representatione 基于梯度正则化卷积稀疏表示的多聚焦噪声图像融合

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446325

Xuanjing Shen, Yunqi Zhang, Haipeng Chen, Di Gai

The method proposes a multi-focus noisy image fusion algorithm combining gradient regularized convolutional sparse representatione and spatial frequency. Firstly, the source image is decomposed into a base layer and a detail layer through two-scale image decomposition. The detail layer uses the Alternating Direction Method of Multipliers (ADMM) to solve the convolutional sparse coefficients with gradient penalties to complete the fusion of detail layer coefficients. Then, The base layer uses the spatial frequency to judge the focus area, the spatial frequency and the "choose-max" strategy are applied to achieved the multi-focus fusion result of base layer. Finally, the fused image is calculated as a superposition of the base layer and the detail layer. Experimental results show that compared with other algorithms, this algorithm provides excellent subjective visual perception and objective evaluation metrics.

该方法提出了一种结合梯度正则化卷积稀疏表示和空间频率的多焦点噪声图像融合算法。首先，通过二尺度图像分解将源图像分解为基层和细节层;细节层采用交替方向乘法器(ADMM)求解带有梯度惩罚的卷积稀疏系数，完成细节层系数的融合。然后，基层利用空间频率判断焦点区域，利用空间频率和“选择-最大”策略实现基层的多焦点融合结果。最后，将融合后的图像作为基础层和细节层的叠加进行计算。实验结果表明，与其他算法相比，该算法提供了良好的主观视觉感知和客观评价指标。

引用次数: 0

10 years of video browser showdown 10年的视频浏览器对决

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3450215

K. Schoeffmann, Jakub Lokoč, W. Bailer

The Video Browser Showdown (VBS) has influenced the Multimedia community already for 10 years now. More than 30 unique teams from over 21 countries participated in the VBS since 2012 already. In 2021, we are celebrating the 10th anniversary of VBS, where 17 international teams compete against each other in an unprecedented contest of fast and accurate multimedia retrieval. In this tutorial we discuss the motivation and details of the VBS contest, including its history, rules, evaluation metrics, and achievements for multimedia retrieval. We talk about the properties of specific VBS retrieval systems and their unique characteristics, as well as existing open-source tools that can be used as a starting point for participating for the first time. Participants of this tutorial get a detailed understanding of the VBS and its search systems, and see the latest developments of interactive video retrieval.

视频浏览器对决(VBS)已经影响多媒体社区10年了。自2012年以来，已有来自21个国家的30多个独特的团队参加了VBS。2021年，我们迎来了VBS的十周年，17个国际团队将在一场前所未有的快速准确的多媒体检索比赛中展开竞争。在本教程中，我们将讨论VBS竞赛的动机和细节，包括其历史、规则、评估指标和多媒体检索方面的成就。我们讨论了特定VBS检索系统的特性及其独特的特性，以及现有的开源工具，可以作为第一次参与的起点。本教程的参与者将详细了解VBS及其搜索系统，并了解交互式视频检索的最新发展。

引用次数: 0

Story segmentation for news broadcast based on primary caption 基于主标题的新闻广播故事分割

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446298

Heling Chen, Zhongyuan Wang, Yingjiao Pei, Baojin Huang, Weiping Tu

In the information explosion era, people only want to access the news information that they are interested in. News broadcast story segmentation is strongly needed, which is an essential basis for personalized delivery and short video. The existing advanced story boundary segmentation methods utilize semantic similarity of subtitles, thus entailing complex semantic computation. The title texts of news broadcast programs include headline (or primary) captions, dialogue captions and the channel logo, while the same story clips only render one primary caption in most news broadcast. Inspired by this fact, we propose a simple method for story segmentation based on the primary caption, which combines YOLOv3 based primary caption extraction and preliminary location of boundaries. In particular, we introduce mean hash to achieve the fast and reliable comparison for detected small-size primary caption blocks. We further incorporate scene recognition to exact the preliminary boundaries, because the primary captions always appear later than the story boundary. Experimental results on two Chinese news broadcast datasets show that our method enjoys high accuracy in terms of R, P and F1-measures.

在信息爆炸时代，人们只想获取自己感兴趣的新闻信息。新闻广播的故事分割是非常必要的，这是个性化传递和短视频的重要基础。现有的高级故事边界分割方法利用字幕的语义相似度，因此需要进行复杂的语义计算。新闻联播节目的标题文本包括标题(或主)说明文字、对话说明文字和频道标志，而在大多数新闻联播中，同一个故事片段只有一个主说明文字。受此启发，我们提出了一种简单的基于主标题的故事分割方法，该方法将基于YOLOv3的主标题提取和边界的初步定位相结合。特别是，我们引入均值哈希来实现对检测到的小尺寸主标题块的快速可靠的比较。我们进一步结合场景识别来确定初步边界，因为主要字幕总是出现在故事边界之后。在两个中文新闻广播数据集上的实验结果表明，我们的方法在R、P和f1测度方面具有较高的准确率。

{"title":"Story segmentation for news broadcast based on primary caption","authors":"Heling Chen, Zhongyuan Wang, Yingjiao Pei, Baojin Huang, Weiping Tu","doi":"10.1145/3444685.3446298","DOIUrl":"https://doi.org/10.1145/3444685.3446298","url":null,"abstract":"In the information explosion era, people only want to access the news information that they are interested in. News broadcast story segmentation is strongly needed, which is an essential basis for personalized delivery and short video. The existing advanced story boundary segmentation methods utilize semantic similarity of subtitles, thus entailing complex semantic computation. The title texts of news broadcast programs include headline (or primary) captions, dialogue captions and the channel logo, while the same story clips only render one primary caption in most news broadcast. Inspired by this fact, we propose a simple method for story segmentation based on the primary caption, which combines YOLOv3 based primary caption extraction and preliminary location of boundaries. In particular, we introduce mean hash to achieve the fast and reliable comparison for detected small-size primary caption blocks. We further incorporate scene recognition to exact the preliminary boundaries, because the primary captions always appear later than the story boundary. Experimental results on two Chinese news broadcast datasets show that our method enjoys high accuracy in terms of R, P and F1-measures.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131264427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀