Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery最新文献

英文中文

Video Analysis for Interactive Story Creation: The Sandmännchen Showcase 互动故事创作的视频分析:Sandmännchen展示

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423061

Miggi Zwicklbauer, W. Lamm, Martin Gordon, Konstantinos Apostolidis, Basil Philipp, V. Mezaris

This paper presents a method to interactively create a new Sandmannchen story. We built an application which is deployed on a smart speaker, interacts with a user, selects appropriate segments from a database of Sandmannchen episodes and combines them to generate a new story that is compatible with the user requests. The underlying video analysis technologies are presented and evaluated. We additionally showcase example results from using the complete application, as a proof of concept.

本文提出了一种以互动方式创造一个新的桑德曼故事的方法。我们构建了一个应用程序，它部署在智能扬声器上，与用户交互，从Sandmannchen剧集的数据库中选择合适的片段，并将它们组合起来生成一个与用户请求兼容的新故事。介绍并评价了基本的视频分析技术。我们还展示了使用完整应用程序的示例结果，作为概念验证。

引用次数: 5

Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories 基于神经风格转移的个性化音频故事语音模仿

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423063

Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza

This paper demonstrates a CNN based neural style transfer on audio dataset to make storytelling a personalized experience by asking users to record a few sentences that are used to mimic their voice. User audios are converted to spectrograms, the style of which is transferred to the spectrogram of a base voice narrating the story. This neural style transfer is similar to the style transfer on images. This approach stands out as it needs a small dataset and therefore, also takes less time to train the model. This project is intended specifically for children who prefer digital interaction and are also increasingly leaving behind the storytelling culture and for working parents who are not able to spend enough time with their children. By using a parent's initial recording to narrate a given story, it is designed to serve as a conjunction between storytelling and screen-time to incorporate children's interest through the implicit ethical themes of the stories, connecting children to their loved ones simultaneously ensuring an innocuous and meaningful learning experience.

本文在音频数据集上展示了一种基于CNN的神经风格转移，通过要求用户记录一些句子来模仿他们的声音，使讲故事成为一种个性化的体验。用户音频被转换成声谱图，声谱图的风格被转换成叙述故事的基本声音的声谱图。这种神经风格迁移类似于图像的风格迁移。这种方法脱颖而出，因为它需要一个小的数据集，因此也需要更少的时间来训练模型。这个项目是专门为那些喜欢数字互动的孩子们设计的，他们也越来越多地离开了讲故事的文化，以及那些不能花足够的时间和孩子在一起的工作父母。通过使用家长的原始录音来讲述一个给定的故事，它的设计是将讲故事和屏幕时间结合起来，通过故事中隐含的道德主题来结合孩子的兴趣，将孩子与他们所爱的人联系起来，同时确保无害和有意义的学习体验。

{"title":"Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories","authors":"Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza","doi":"10.1145/3422839.3423063","DOIUrl":"https://doi.org/10.1145/3422839.3423063","url":null,"abstract":"This paper demonstrates a CNN based neural style transfer on audio dataset to make storytelling a personalized experience by asking users to record a few sentences that are used to mimic their voice. User audios are converted to spectrograms, the style of which is transferred to the spectrogram of a base voice narrating the story. This neural style transfer is similar to the style transfer on images. This approach stands out as it needs a small dataset and therefore, also takes less time to train the model. This project is intended specifically for children who prefer digital interaction and are also increasingly leaving behind the storytelling culture and for working parents who are not able to spend enough time with their children. By using a parent's initial recording to narrate a given story, it is designed to serve as a conjunction between storytelling and screen-time to incorporate children's interest through the implicit ethical themes of the stories, connecting children to their loved ones simultaneously ensuring an innocuous and meaningful learning experience.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129940240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

And, Action! Towards Leveraging Multimodal Patterns for Storytelling and Content Analysis ,行动!利用多模态模式进行故事叙述和内容分析

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423060

Natalie Parde

Humans perform intelligent tasks by productively leveraging relevant information from numerous sensory and experiential inputs, and recent scientific and hardware advances have made it increasingly possible for machines to attempt this as well. However, improved resource availability does not automatically give rise to humanlike performance in complex tasks [1]. In this talk, I discuss recent work towards three tasks that benefit from an elegant synthesis of linguistic and visual input: visual storytelling, visual question answering (VQA), and affective content analysis. I focus primarily on visual storytelling, a burgeoning task with the goal of generating coherent, sensible narratives for sequences of input images [2]. I analyze recent work in this area, and then introduce a novel visual storytelling approach that employs a hierarchical context-based network, with a co-attention mechanism that jointly attends to patterns in visual (image) and linguistic (description) input. Following this, I describe ongoing work in VQA, another inherently multimodal task with the goal of producing accurate, sensible answers to questions about images. I explore a formulation in which the VQA model generates unconstrained, free-form text, providing preliminary evidence that harnessing the linguistic patterns latent in language models results in competitive task performance [3]. Finally, I introduce some intriguing new work that investigates the utility of linguistic patterns in a task that is not inherently multimodal: analyzing the affective content of images. I close by suggesting some exciting future directions for each of these tasks as they pertain to multimodal media analysis.

人类通过有效地利用来自大量感官和经验输入的相关信息来执行智能任务，而最近的科学和硬件进步也使机器越来越有可能尝试这一点。然而，在复杂的任务中，资源可用性的提高并不会自动产生类似人类的性能[1]。在这次演讲中，我将讨论最近在三个任务方面的工作，这些任务得益于语言和视觉输入的优雅综合:视觉讲故事，视觉问答(VQA)和情感内容分析。我主要专注于视觉叙事，这是一项新兴的任务，目标是为输入图像序列生成连贯、合理的叙事。我分析了该领域的最新工作，然后介绍了一种新颖的视觉叙事方法，该方法采用基于上下文的分层网络，并采用共同注意机制，共同关注视觉(图像)和语言(描述)输入中的模式。接下来，我将描述VQA中正在进行的工作，这是另一个固有的多模式任务，其目标是为有关图像的问题提供准确、合理的答案。我探索了一个VQA模型生成不受约束的、自由形式的文本的公式，提供了初步的证据，证明利用语言模型中潜在的语言模式会导致竞争性任务绩效bb0。最后，我介绍了一些有趣的新工作，研究了语言模式在非多模态任务中的效用:分析图像的情感内容。最后，我提出了这些任务的一些令人兴奋的未来方向，因为它们与多模式媒体分析有关。

{"title":"And, Action! Towards Leveraging Multimodal Patterns for Storytelling and Content Analysis","authors":"Natalie Parde","doi":"10.1145/3422839.3423060","DOIUrl":"https://doi.org/10.1145/3422839.3423060","url":null,"abstract":"Humans perform intelligent tasks by productively leveraging relevant information from numerous sensory and experiential inputs, and recent scientific and hardware advances have made it increasingly possible for machines to attempt this as well. However, improved resource availability does not automatically give rise to humanlike performance in complex tasks [1]. In this talk, I discuss recent work towards three tasks that benefit from an elegant synthesis of linguistic and visual input: visual storytelling, visual question answering (VQA), and affective content analysis. I focus primarily on visual storytelling, a burgeoning task with the goal of generating coherent, sensible narratives for sequences of input images [2]. I analyze recent work in this area, and then introduce a novel visual storytelling approach that employs a hierarchical context-based network, with a co-attention mechanism that jointly attends to patterns in visual (image) and linguistic (description) input. Following this, I describe ongoing work in VQA, another inherently multimodal task with the goal of producing accurate, sensible answers to questions about images. I explore a formulation in which the VQA model generates unconstrained, free-form text, providing preliminary evidence that harnessing the linguistic patterns latent in language models results in competitive task performance [3]. Finally, I introduce some intriguing new work that investigates the utility of linguistic patterns in a task that is not inherently multimodal: analyzing the affective content of images. I close by suggesting some exciting future directions for each of these tasks as they pertain to multimodal media analysis.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130408925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Session 1: Video Analytics and Storytelling 会议详情:会议1:视频分析和讲故事

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3429509

V. Mezaris

引用次数: 0

Session details: Keynote & Invited Talks 会议详情:主题演讲和特邀演讲

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3429508

Raphael Troncy

引用次数: 0

Named Entity Recognition for Spoken Finnish 芬兰语口语的命名实体识别

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423066

Dejan Porjazovski, Juho Leinonen, M. Kurimo

In this paper we present a Bidirectional LSTM neural network with a Conditional Random Field layer on top, which utilizes word, character and morph embeddings in order to perform named entity recognition on various Finnish datasets. To overcome the lack of annotated training corpora that arises when dealing with low-resource languages like Finnish, we tried a knowledge transfer technique to transfer tags from Estonian dataset. On the human annotated in-domain Digitoday dataset, out system achieved F1 score of 84.73. On the out-of-domain Wikipedia set we got F1 score of 67.66. In order to see how well the system performs on speech data, we used two datasets containing automatic speech recognition outputs. Since we do not have true labels for those datasets, we used a rule-based system to annotate them and used those annotations as reference labels. On the first dataset which contains Finnish parliament sessions we obtained F1 score of 42.09 and on the second one which contains talks from Yle Pressiklubi we obtained F1 score of 74.54.

在本文中，我们提出了一个双向LSTM神经网络，其顶部有一个条件随机场层，它利用词、字符和形态嵌入来对各种芬兰数据集进行命名实体识别。为了克服在处理像芬兰语这样的低资源语言时出现的缺乏注释的训练语料库的问题，我们尝试了一种知识转移技术来转移爱沙尼亚数据集的标签。在人类标注的域内数据集Digitoday上，我们的系统获得了84.73的F1分数。在域外维基百科集合上，我们获得了67.66的F1分数。为了观察系统在语音数据上的表现，我们使用了两个包含自动语音识别输出的数据集。由于这些数据集没有真正的标签，我们使用基于规则的系统对它们进行注释，并将这些注释用作参考标签。在包含芬兰议会会议的第一个数据集上，我们获得了42.09的F1分数，在包含Yle Pressiklubi谈话的第二个数据集上，我们获得了74.54的F1分数。

引用次数: 6

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery 第二届人工智能智能电视内容制作、访问和交付国际研讨会论文集

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839

引用次数: 1

Predicting Your Future Audience's Popular Topics to Optimize TV Content Marketing Success 预测未来观众的热门话题，优化电视内容营销的成功

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423062

L. Nixon

TV broadcasters and other organizations with online media collections which wish to extend the reach of and engagement with their media assets conduct digital marketing activities. The marketing success depends on the relevance of the topics of the media content to the audience, which is made even more difficult when planning future marketing activities as one needs to know the topics that the future audience will be interested in. This paper presents the innovative application of AI based predictive analytics to identify the topics that will be more popular among future audiences and its use in a digital content marketing strategy of media organisations.

电视广播公司和其他拥有在线媒体集合的组织希望扩大其媒体资产的覆盖范围和参与度，从而开展数字营销活动。营销的成功取决于媒体内容的主题与受众的相关性，这在规划未来的营销活动时变得更加困难，因为人们需要知道未来的受众将感兴趣的主题。本文介绍了基于人工智能的预测分析的创新应用，以确定未来受众中更受欢迎的主题，并将其用于媒体组织的数字内容营销策略。

引用次数: 6

Session details: Session 2: Video Annotation and Summarization 会话详细信息:会话2:视频注释和总结

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3429510

Jorma T. Laaksonen

引用次数: 0

Avoid Crowding in the Battlefield: Semantic Placement of Social Messages in Entertainment Programs 避免在战场上拥挤:娱乐节目中社会信息的语义放置

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423065

Yashaswi Rauthan, Vatsala Singh, Rishabh Agrawal, Satej Kadlay, N. Pedanekar, Shirish S. Karande, Manasi Malik, Iaphi Tariang

Crisis situations often require authorities to convey important messages to a large population of varying demographics. An example of such a message is maintain a distance of 6 ft from others in times of the present COVID-19 crisis. In this paper, we propose a method to programmatically place such messages in existing entertainment media as overlays at semantically relevant locations. For this purpose, we use generic semantic annotations on the media and subsequent spatio-temporal querying on these annotations to find candidate locations for message placement. We then propose choosing the final locations optimally using parameters such as spacing of messages, length of the messages and confidence of query results. We present preliminary results for optimal placement of messages in popular entertainment media.

危机情况往往要求当局向人口结构各异的大量人口传达重要信息。这种信息的一个例子是，在当前的COVID-19危机期间，与他人保持6英尺的距离。在本文中，我们提出了一种方法，以程序化的方式将这些信息作为覆盖在语义相关位置的现有娱乐媒体中。为此，我们在媒体上使用通用语义注释，然后对这些注释进行时空查询，以找到消息放置的候选位置。然后，我们建议使用消息间距、消息长度和查询结果置信度等参数来选择最佳的最终位置。我们提出了在流行娱乐媒体中信息的最佳放置的初步结果。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀