基于深度学习的多模态元数据扩展用于视频剪辑

Q3 Agricultural and Biological Sciences International Journal on Advanced Science, Engineering and Information Technology Pub Date : 2024-02-28 DOI:10.18517/ijaseit.14.1.19047

Woo-Hyeon Kim, Geon-Woo Kim, Joo-Chang Kim

{"title":"基于深度学习的多模态元数据扩展用于视频剪辑","authors":"Woo-Hyeon Kim, Geon-Woo Kim, Joo-Chang Kim","doi":"10.18517/ijaseit.14.1.19047","DOIUrl":null,"url":null,"abstract":"General video search and recommendation systems primarily rely on metadata and personal information. Metadata includes file names, keywords, tags, and genres, among others, and is used to describe the video's content. The video platform assesses the relevance of user search queries to the video metadata and presents search results in order of highest relevance. Recommendations are based on videos with metadata judged to be similar to the one the user is currently watching. Most platforms offer search and recommendation services by employing separate algorithms for metadata and personal information. Therefore, metadata plays a vital role in video search. Video service platforms develop various algorithms to provide users with more accurate search results and recommendations. Quantifying video similarity is essential to enhance the accuracy of search results and recommendations. Since content producers primarily provide basic metadata, it can be abused. Additionally, the resemblance between similar video segments may diminish depending on its duration. This paper proposes a metadata expansion model that utilizes object recognition and Speech-to-Text (STT) technology. The model selects key objects by analyzing the frequency of their appearance in the video, extracts audio separately, transcribes it into text, and extracts the script. Scripts are quantified by tokenizing them into words using text-mining techniques. By augmenting metadata with key objects and script tokens, various video content search and recommendation platforms are expected to deliver results closer to user search terms and recommend related content.","PeriodicalId":14471,"journal":{"name":"International Journal on Advanced Science, Engineering and Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Modal Deep Learning based Metadata Extensions for Video Clipping\",\"authors\":\"Woo-Hyeon Kim, Geon-Woo Kim, Joo-Chang Kim\",\"doi\":\"10.18517/ijaseit.14.1.19047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"General video search and recommendation systems primarily rely on metadata and personal information. Metadata includes file names, keywords, tags, and genres, among others, and is used to describe the video's content. The video platform assesses the relevance of user search queries to the video metadata and presents search results in order of highest relevance. Recommendations are based on videos with metadata judged to be similar to the one the user is currently watching. Most platforms offer search and recommendation services by employing separate algorithms for metadata and personal information. Therefore, metadata plays a vital role in video search. Video service platforms develop various algorithms to provide users with more accurate search results and recommendations. Quantifying video similarity is essential to enhance the accuracy of search results and recommendations. Since content producers primarily provide basic metadata, it can be abused. Additionally, the resemblance between similar video segments may diminish depending on its duration. This paper proposes a metadata expansion model that utilizes object recognition and Speech-to-Text (STT) technology. The model selects key objects by analyzing the frequency of their appearance in the video, extracts audio separately, transcribes it into text, and extracts the script. Scripts are quantified by tokenizing them into words using text-mining techniques. By augmenting metadata with key objects and script tokens, various video content search and recommendation platforms are expected to deliver results closer to user search terms and recommend related content.\",\"PeriodicalId\":14471,\"journal\":{\"name\":\"International Journal on Advanced Science, Engineering and Information Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Advanced Science, Engineering and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18517/ijaseit.14.1.19047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Advanced Science, Engineering and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18517/ijaseit.14.1.19047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 0

摘要

一般的视频搜索和推荐系统主要依靠元数据和个人信息。元数据包括文件名、关键词、标签和流派等，用于描述视频内容。视频平台会评估用户搜索查询与视频元数据的相关性，并按照相关性最高的顺序呈现搜索结果。推荐则基于元数据判断为与用户当前观看的视频相似的视频。大多数平台通过对元数据和个人信息采用不同的算法来提供搜索和推荐服务。因此，元数据在视频搜索中起着至关重要的作用。视频服务平台开发了各种算法，为用户提供更准确的搜索结果和推荐。量化视频相似度对于提高搜索结果和推荐的准确性至关重要。由于内容生产者主要提供基本元数据，因此元数据有可能被滥用。此外，相似视频片段之间的相似度可能会随着时间的推移而降低。本文提出了一种利用对象识别和语音转文本（STT）技术的元数据扩展模型。该模型通过分析关键对象在视频中出现的频率来选择关键对象，分别提取音频，将其转录为文本，并提取脚本。使用文本挖掘技术将脚本标记为单词，从而对脚本进行量化。通过用关键对象和脚本标记增强元数据，各种视频内容搜索和推荐平台有望提供更贴近用户搜索词的结果，并推荐相关内容。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-Modal Deep Learning based Metadata Extensions for Video Clipping

General video search and recommendation systems primarily rely on metadata and personal information. Metadata includes file names, keywords, tags, and genres, among others, and is used to describe the video's content. The video platform assesses the relevance of user search queries to the video metadata and presents search results in order of highest relevance. Recommendations are based on videos with metadata judged to be similar to the one the user is currently watching. Most platforms offer search and recommendation services by employing separate algorithms for metadata and personal information. Therefore, metadata plays a vital role in video search. Video service platforms develop various algorithms to provide users with more accurate search results and recommendations. Quantifying video similarity is essential to enhance the accuracy of search results and recommendations. Since content producers primarily provide basic metadata, it can be abused. Additionally, the resemblance between similar video segments may diminish depending on its duration. This paper proposes a metadata expansion model that utilizes object recognition and Speech-to-Text (STT) technology. The model selects key objects by analyzing the frequency of their appearance in the video, extracts audio separately, transcribes it into text, and extracts the script. Scripts are quantified by tokenizing them into words using text-mining techniques. By augmenting metadata with key objects and script tokens, various video content search and recommendation platforms are expected to deliver results closer to user search terms and recommend related content.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal on Advanced Science, Engineering and Information Technology Agricultural and Biological Sciences-Agricultural and Biological Sciences (all)

CiteScore

1.40

自引率

0.00%

发文量

272

期刊介绍： International Journal on Advanced Science, Engineering and Information Technology (IJASEIT) is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of science, engineering and information technology. The journal publishes state-of-art papers in fundamental theory, experiments and simulation, as well as applications, with a systematic proposed method, sufficient review on previous works, expanded discussion and concise conclusion. As our commitment to the advancement of science and technology, the IJASEIT follows the open access policy that allows the published articles freely available online without any subscription. The journal scopes include (but not limited to) the followings: -Science: Bioscience & Biotechnology. Chemistry & Food Technology, Environmental, Health Science, Mathematics & Statistics, Applied Physics -Engineering: Architecture, Chemical & Process, Civil & structural, Electrical, Electronic & Systems, Geological & Mining Engineering, Mechanical & Materials -Information Science & Technology: Artificial Intelligence, Computer Science, E-Learning & Multimedia, Information System, Internet & Mobile Computing