{"title":"MMMLP:时序推荐的多模态多层感知器","authors":"Jiahao Liang, Xiangyu Zhao, Muyang Li, Zijian Zhang, Wanyu Wang, Haochen Liu, Zitao Liu","doi":"10.1145/3543507.3583378","DOIUrl":null,"url":null,"abstract":"Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations\",\"authors\":\"Jiahao Liang, Xiangyu Zhao, Muyang Li, Zijian Zhang, Wanyu Wang, Haochen Liu, Zitao Liu\",\"doi\":\"10.1145/3543507.3583378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.\",\"PeriodicalId\":296351,\"journal\":{\"name\":\"Proceedings of the ACM Web Conference 2023\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Web Conference 2023\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3543507.3583378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Web Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543507.3583378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations
Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.