Min Tang , Shujie Cui , Zhe Jin , Shiuan-ni Liang , Chenliang Li , Lixin Zou
{"title":"Sequential recommendation by reprogramming pretrained transformer","authors":"Min Tang , Shujie Cui , Zhe Jin , Shiuan-ni Liang , Chenliang Li , Lixin Zou","doi":"10.1016/j.ipm.2024.103938","DOIUrl":null,"url":null,"abstract":"<div><div>Inspired by the success of Pre-trained language models (PLMs), numerous sequential recommenders attempted to replicate its achievements by employing PLMs’ efficient architectures for building large models and using self-supervised learning for broadening training data. Despite their success, there is curiosity about developing a large-scale sequential recommender system since existing methods either build models within a single dataset or utilize text as an intermediary for alignment across different datasets. However, due to the sparsity of user–item interactions, unalignment between different datasets, and lack of global information in the sequential recommendation, directly pre-training a large foundation model may not be feasible.</div><div>Towards this end, we propose the <span>RecPPT</span> that firstly employs the GPT-2 to model historical sequence by training the input item embedding and the output layer from scratch, which avoids training a large model on the sparse user–item interactions. Additionally, to alleviate the burden of unalignment, the <span>RecPPT</span> is equipped with a reprogramming module to reprogram the target embedding to existing well-trained proto-embeddings. Furthermore, <span>RecPPT</span> integrates global information into sequences by initializing the item embedding using an SVD-based initializer. Extensive experiments over four datasets demonstrated the <span>RecPPT</span> achieved an average improvement of 6.5% on NDCG@5, 6.2% on NDCG@10, 6.1% on Recall@5, and 5.4% on Recall@10 compared to the baselines. Particularly in few-shot scenarios, the significant improvements in NDCG@10 confirm the superiority of the proposed method.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103938"},"PeriodicalIF":7.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002978","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Inspired by the success of Pre-trained language models (PLMs), numerous sequential recommenders attempted to replicate its achievements by employing PLMs’ efficient architectures for building large models and using self-supervised learning for broadening training data. Despite their success, there is curiosity about developing a large-scale sequential recommender system since existing methods either build models within a single dataset or utilize text as an intermediary for alignment across different datasets. However, due to the sparsity of user–item interactions, unalignment between different datasets, and lack of global information in the sequential recommendation, directly pre-training a large foundation model may not be feasible.
Towards this end, we propose the RecPPT that firstly employs the GPT-2 to model historical sequence by training the input item embedding and the output layer from scratch, which avoids training a large model on the sparse user–item interactions. Additionally, to alleviate the burden of unalignment, the RecPPT is equipped with a reprogramming module to reprogram the target embedding to existing well-trained proto-embeddings. Furthermore, RecPPT integrates global information into sequences by initializing the item embedding using an SVD-based initializer. Extensive experiments over four datasets demonstrated the RecPPT achieved an average improvement of 6.5% on NDCG@5, 6.2% on NDCG@10, 6.1% on Recall@5, and 5.4% on Recall@10 compared to the baselines. Particularly in few-shot scenarios, the significant improvements in NDCG@10 confirm the superiority of the proposed method.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.