Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan
{"title":"解码风格:高效微调 LLM,实现图像引导下的服装偏好推荐","authors":"Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan","doi":"arxiv-2409.12150","DOIUrl":null,"url":null,"abstract":"Personalized outfit recommendation remains a complex challenge, demanding\nboth fashion compatibility understanding and trend awareness. This paper\npresents a novel framework that harnesses the expressive power of large\nlanguage models (LLMs) for this task, mitigating their \"black box\" and static\nnature through fine-tuning and direct feedback integration. We bridge the item\nvisual-textual gap in items descriptions by employing image captioning with a\nMultimodal Large Language Model (MLLM). This enables the LLM to extract style\nand color characteristics from human-curated fashion images, forming the basis\nfor personalized recommendations. The LLM is efficiently fine-tuned on the\nopen-source Polyvore dataset of curated fashion images, optimizing its ability\nto recommend stylish outfits. A direct preference mechanism using negative\nexamples is employed to enhance the LLM's decision-making process. This creates\na self-enhancing AI feedback loop that continuously refines recommendations in\nline with seasonal fashion trends. Our framework is evaluated on the Polyvore\ndataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank,\nand complementary item retrieval. These evaluations underline the framework's\nability to generate stylish, trend-aligned outfit suggestions, continuously\nimproving through direct feedback. The evaluation results demonstrated that our\nproposed framework significantly outperforms the base LLM, creating more\ncohesive outfits. The improved performance in these tasks underscores the\nproposed framework's potential to enhance the shopping experience with accurate\nsuggestions, proving its effectiveness over the vanilla LLM based outfit\ngeneration.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference\",\"authors\":\"Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan\",\"doi\":\"arxiv-2409.12150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Personalized outfit recommendation remains a complex challenge, demanding\\nboth fashion compatibility understanding and trend awareness. This paper\\npresents a novel framework that harnesses the expressive power of large\\nlanguage models (LLMs) for this task, mitigating their \\\"black box\\\" and static\\nnature through fine-tuning and direct feedback integration. We bridge the item\\nvisual-textual gap in items descriptions by employing image captioning with a\\nMultimodal Large Language Model (MLLM). This enables the LLM to extract style\\nand color characteristics from human-curated fashion images, forming the basis\\nfor personalized recommendations. The LLM is efficiently fine-tuned on the\\nopen-source Polyvore dataset of curated fashion images, optimizing its ability\\nto recommend stylish outfits. A direct preference mechanism using negative\\nexamples is employed to enhance the LLM's decision-making process. This creates\\na self-enhancing AI feedback loop that continuously refines recommendations in\\nline with seasonal fashion trends. Our framework is evaluated on the Polyvore\\ndataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank,\\nand complementary item retrieval. These evaluations underline the framework's\\nability to generate stylish, trend-aligned outfit suggestions, continuously\\nimproving through direct feedback. The evaluation results demonstrated that our\\nproposed framework significantly outperforms the base LLM, creating more\\ncohesive outfits. The improved performance in these tasks underscores the\\nproposed framework's potential to enhance the shopping experience with accurate\\nsuggestions, proving its effectiveness over the vanilla LLM based outfit\\ngeneration.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12150\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference
Personalized outfit recommendation remains a complex challenge, demanding
both fashion compatibility understanding and trend awareness. This paper
presents a novel framework that harnesses the expressive power of large
language models (LLMs) for this task, mitigating their "black box" and static
nature through fine-tuning and direct feedback integration. We bridge the item
visual-textual gap in items descriptions by employing image captioning with a
Multimodal Large Language Model (MLLM). This enables the LLM to extract style
and color characteristics from human-curated fashion images, forming the basis
for personalized recommendations. The LLM is efficiently fine-tuned on the
open-source Polyvore dataset of curated fashion images, optimizing its ability
to recommend stylish outfits. A direct preference mechanism using negative
examples is employed to enhance the LLM's decision-making process. This creates
a self-enhancing AI feedback loop that continuously refines recommendations in
line with seasonal fashion trends. Our framework is evaluated on the Polyvore
dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank,
and complementary item retrieval. These evaluations underline the framework's
ability to generate stylish, trend-aligned outfit suggestions, continuously
improving through direct feedback. The evaluation results demonstrated that our
proposed framework significantly outperforms the base LLM, creating more
cohesive outfits. The improved performance in these tasks underscores the
proposed framework's potential to enhance the shopping experience with accurate
suggestions, proving its effectiveness over the vanilla LLM based outfit
generation.