Truong Le, Minh-Vuong Nguyen-Thi, Minh-Tu Le, Hien-Vi Nguyen-Thi, Tung Le, Huy Tien Nguyen
{"title":"EnTube: Exploring key video features for advancing YouTube engagement","authors":"Truong Le, Minh-Vuong Nguyen-Thi, Minh-Tu Le, Hien-Vi Nguyen-Thi, Tung Le, Huy Tien Nguyen","doi":"10.1016/j.entcom.2025.100934","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of video sharing on platforms like YouTube has highlighted the importance of accurately predicting video engagement. Existing models for predicting video appeal face challenges in transparency and accuracy. This study proposes a multi-modal deep learning approach to forecast video engagement on YouTube. We utilize a multi-modal deep learning model that integrates video titles, audio, thumbnails, content, and tags for engagement prediction, classifying videos into three engagement categories: Engage, Neutral, and Not Engage. A unique dataset, the EnTube dataset, was compiled, featuring 23,738 videos from various genres and 72 Vietnamese YouTube channels. This dataset aids in overcoming the obstacles of data collection and analysis for video engagement. Our approach demonstrates the potential of multi-modal features in enhancing prediction accuracy beyond single-feature models. Explainable Artificial Intelligence techniques are employed to interpret the factors influencing video engagement, offering insights for content optimization. The study’s findings hold promise for applications in video recommendation systems and content strategy adjustments.</div></div>","PeriodicalId":55997,"journal":{"name":"Entertainment Computing","volume":"53 ","pages":"Article 100934"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entertainment Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187595212500014X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of video sharing on platforms like YouTube has highlighted the importance of accurately predicting video engagement. Existing models for predicting video appeal face challenges in transparency and accuracy. This study proposes a multi-modal deep learning approach to forecast video engagement on YouTube. We utilize a multi-modal deep learning model that integrates video titles, audio, thumbnails, content, and tags for engagement prediction, classifying videos into three engagement categories: Engage, Neutral, and Not Engage. A unique dataset, the EnTube dataset, was compiled, featuring 23,738 videos from various genres and 72 Vietnamese YouTube channels. This dataset aids in overcoming the obstacles of data collection and analysis for video engagement. Our approach demonstrates the potential of multi-modal features in enhancing prediction accuracy beyond single-feature models. Explainable Artificial Intelligence techniques are employed to interpret the factors influencing video engagement, offering insights for content optimization. The study’s findings hold promise for applications in video recommendation systems and content strategy adjustments.
期刊介绍:
Entertainment Computing publishes original, peer-reviewed research articles and serves as a forum for stimulating and disseminating innovative research ideas, emerging technologies, empirical investigations, state-of-the-art methods and tools in all aspects of digital entertainment, new media, entertainment computing, gaming, robotics, toys and applications among researchers, engineers, social scientists, artists and practitioners. Theoretical, technical, empirical, survey articles and case studies are all appropriate to the journal.