通过转录和维基id映射技术增强和标准化视频标签

2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC) Pub Date : 2023-05-04 DOI:10.1109/ESDC56251.2023.10149851

Dinu Thomas, David Pratap, B. Sudha

{"title":"通过转录和维基id映射技术增强和标准化视频标签","authors":"Dinu Thomas, David Pratap, B. Sudha","doi":"10.1109/ESDC56251.2023.10149851","DOIUrl":null,"url":null,"abstract":"Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"240 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques\",\"authors\":\"Dinu Thomas, David Pratap, B. Sudha\",\"doi\":\"10.1109/ESDC56251.2023.10149851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.\",\"PeriodicalId\":354855,\"journal\":{\"name\":\"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)\",\"volume\":\"240 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESDC56251.2023.10149851\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESDC56251.2023.10149851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视频内容的数量超过了互联网上所有其他类型的内容。根据不同来源的报告，到2022年，视频流量占互联网使用量的82%。在未来几年，视频将在用户参与、广告和营销、新闻、教育等方面发挥更重要的作用。视频信息检索成为这一背景下需要解决的重要问题。一个准确、快速的视频标签系统可以为最终用户提供良好的内容推荐。它有助于自动审核内容，从而平台可以控制在政治和道德上有害的内容。目前还没有很多更快或更具成本效益的机制来标记用户生成的视频。手动标记是一项昂贵且耗时的任务。新闻、体育等视频的索引延迟会降低其新鲜度和相关性。深度学习技术在文本和图像等内容上已经成熟，但在视频方面还没有成熟。深度学习模型由于视频的多模态性质和时间行为，需要更多的资源来处理视频。除此之外，目前还没有很多大规模的视频数据集可用。Youtube-8M是目前最大的公开数据集。许多研究工作都是在Youtube-8M数据集上进行的。从我们的研究来看，所有这些都有潜在的局限性。例如，在Youtube-8M中，视频标签只有3.8K左右，这并不能覆盖所有现实世界的标签。它不包括随着内容流量激增而创建的新域名。本研究旨在通过不同的可用方法来处理标签创建的问题，从而将标签增强到更广泛的集合。这项工作还旨在产生一个可扩展的标签管道，该管道使用多种检索机制，并结合它们的结果。这项工作的目的是标准化跨语言检索的标记。这项工作创建了一个数据集作为“WikiData”的结果，它可以用于任何基于NLP的标准化用例。我们尝试通过嵌入维基id来消除歧义。本文提出了一种新的WikiData嵌入方法，该方法可用于去除带有噪声的标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques

Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)

自引率

0.00%

发文量