Pub Date : 2024-06-05DOI: 10.1109/tmm.2024.3410133
Ke Liu, Jiwei Wei, Jie Zou, Peng Wang, Yang Yang, Heng Tao Shen
{"title":"Improving Pre-trained Model-based Speech Emotion Recognition from a Low-level Speech Feature Perspective","authors":"Ke Liu, Jiwei Wei, Jie Zou, Peng Wang, Yang Yang, Heng Tao Shen","doi":"10.1109/tmm.2024.3410133","DOIUrl":"https://doi.org/10.1109/tmm.2024.3410133","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"12 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1109/tmm.2024.3410129
Jiayi Li, Min Jiang, Jun Kong, Xuefeng Tao, Xi Luo
{"title":"Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval","authors":"Jiayi Li, Min Jiang, Jun Kong, Xuefeng Tao, Xi Luo","doi":"10.1109/tmm.2024.3410129","DOIUrl":"https://doi.org/10.1109/tmm.2024.3410129","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"29 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1109/tmm.2024.3410139
Zhenglong Cui, Da Yang, Hao Sheng, Sizhe Wang, Rongshan Chen, Ruixuan Cong, Wei Ke
{"title":"Triple Consistency for Transparent Cheating Problem in Light Field Depth Estimation","authors":"Zhenglong Cui, Da Yang, Hao Sheng, Sizhe Wang, Rongshan Chen, Ruixuan Cong, Wei Ke","doi":"10.1109/tmm.2024.3410139","DOIUrl":"https://doi.org/10.1109/tmm.2024.3410139","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"13 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1109/TMM.2024.3410116
Chao Jiao;Huanqiang Zeng;Jing Chen;Chih-Hsien Hsia;Tianlei Wang;Kai-Kuang Ma
Screen content coding (SCC) in Versatile Video Coding (VVC) improves the coding efficiency of screen content videos (SCVs) significantly but results in high computational complexity due to the quad-tree plus multi-type tree (QTMT) structure of the coding unit (CU) partitioning. Therefore, we make the first attempt to reduce the encoding complexity from the perspective of CU partitioning for SCC in VVC. To this end, a fast CU partition prediction method is technically developed for VVC-SCC. First, to solve the problem of lacking sufficient SCC training data, SCVs are collected to establish a database containing CUs of various sizes and corresponding partition labels. Second, to determine the partition decision in advance, a novel WA-CNN model is proposed, which is capable of predicting two large CUs for VVC-SCC by adjusting the feature channels based on the size of input CU blocks. Finally, considering the imbalanced proportion of diverse partition decisions, a loss function with the weight that equalizes the contribution of imbalanced data is formulated to train the proposed WA-CNN model. Experimental results show that the proposed model reduces the SCC intra-encoding time by 35.65% ${sim }$