Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation

IF 7.2 4区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-06 DOI:10.1145/3650042
Jing Yi, Zhenzhong Chen
{"title":"Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation","authors":"Jing Yi, Zhenzhong Chen","doi":"10.1145/3650042","DOIUrl":null,"url":null,"abstract":"<p>Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3650042","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于内容的微视频背景音乐推荐的去基础跨模态匹配
面向对象的微视频背景音乐推荐是一项复杂的任务,视频与背景音乐之间的匹配度是一个主要问题。然而,用户生成内容(UGC)中的音乐选择容易因上传者的历史偏好而产生选择偏差。由于历史偏好并不完全可靠,而且可能反映的是过时的行为,因此随着知识和兴趣的动态发展,应避免过度依赖历史偏好。在本文中,我们提出了一种去基础交叉模式(DecCM)匹配模型来减轻这种偏差。具体来说,上传者对音乐流派的个人偏好会被识别为混杂因素,这些混杂因素会使音乐嵌入和背景音乐选择之间产生虚假关联,从而导致学习系统过度推荐来自多数群体的音乐。为了解决这种混杂因素,我们利用后门调整来消除音乐嵌入和预测分数之间的虚假相关性。我们进一步利用蒙特卡洛(Monte Carlo,MC)估计器和批量平均值作为近似值,以避免整合调整计算出的整个混杂因素空间。此外,我们还设计了一个师生网络,利用音乐视频的匹配(即专业生成内容(PGC)的专业匹配)来更好地推荐内容匹配的背景音乐。教师网络对 PGC 数据进行建模,通过基于库尔贝克-莱伯勒的知识转移,指导学生网络对上传者选择的 UGC 数据进行匹配。在 TT-150k-genre 数据集上进行的大量实验证明了所提方法的有效性。代码可在以下网址公开获取:https://github.com/jing-1/DecCM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.30
自引率
2.00%
发文量
131
期刊介绍: ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.
期刊最新文献
A Survey of Trustworthy Federated Learning: Issues, Solutions, and Challenges DeepSneak: User GPS Trajectory Reconstruction from Federated Route Recommendation Models WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories Self-supervised Text Style Transfer using Cycle-Consistent Adversarial Networks Federated Learning Survey: A Multi-Level Taxonomy of Aggregation Techniques, Experimental Insights, and Future Frontiers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1