{"title":"通过转录和维基id映射技术增强和标准化视频标签","authors":"Dinu Thomas, David Pratap, B. Sudha","doi":"10.1109/ESDC56251.2023.10149851","DOIUrl":null,"url":null,"abstract":"Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"240 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques\",\"authors\":\"Dinu Thomas, David Pratap, B. Sudha\",\"doi\":\"10.1109/ESDC56251.2023.10149851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.\",\"PeriodicalId\":354855,\"journal\":{\"name\":\"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)\",\"volume\":\"240 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESDC56251.2023.10149851\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESDC56251.2023.10149851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques
Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.