{"title":"NeSHFS: Neighborhood Search with Heuristic-based Feature Selection for Click-Through Rate Prediction","authors":"Dogukan Aksu, Ismail Hakki Toroslu, Hasan Davulcu","doi":"arxiv-2409.08703","DOIUrl":null,"url":null,"abstract":"Click-through-rate (CTR) prediction plays an important role in online\nadvertising and ad recommender systems. In the past decade, maximizing CTR has\nbeen the main focus of model development and solution creation. Therefore,\nresearchers and practitioners have proposed various models and solutions to\nenhance the effectiveness of CTR prediction. Most of the existing literature\nfocuses on capturing either implicit or explicit feature interactions. Although\nimplicit interactions are successfully captured in some studies, explicit\ninteractions present a challenge for achieving high CTR by extracting both\nlow-order and high-order feature interactions. Unnecessary and irrelevant\nfeatures may cause high computational time and low prediction performance.\nFurthermore, certain features may perform well with specific predictive models\nwhile underperforming with others. Also, feature distribution may fluctuate due\nto traffic variations. Most importantly, in live production environments,\nresources are limited, and the time for inference is just as crucial as\ntraining time. Because of all these reasons, feature selection is one of the\nmost important factors in enhancing CTR prediction model performance. Simple\nfilter-based feature selection algorithms do not perform well and they are not\nsufficient. An effective and efficient feature selection algorithm is needed to\nconsistently filter the most useful features during live CTR prediction\nprocess. In this paper, we propose a heuristic algorithm named Neighborhood\nSearch with Heuristic-based Feature Selection (NeSHFS) to enhance CTR\nprediction performance while reducing dimensionality and training time costs.\nWe conduct comprehensive experiments on three public datasets to validate the\nefficiency and effectiveness of our proposed solution.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Click-through-rate (CTR) prediction plays an important role in online
advertising and ad recommender systems. In the past decade, maximizing CTR has
been the main focus of model development and solution creation. Therefore,
researchers and practitioners have proposed various models and solutions to
enhance the effectiveness of CTR prediction. Most of the existing literature
focuses on capturing either implicit or explicit feature interactions. Although
implicit interactions are successfully captured in some studies, explicit
interactions present a challenge for achieving high CTR by extracting both
low-order and high-order feature interactions. Unnecessary and irrelevant
features may cause high computational time and low prediction performance.
Furthermore, certain features may perform well with specific predictive models
while underperforming with others. Also, feature distribution may fluctuate due
to traffic variations. Most importantly, in live production environments,
resources are limited, and the time for inference is just as crucial as
training time. Because of all these reasons, feature selection is one of the
most important factors in enhancing CTR prediction model performance. Simple
filter-based feature selection algorithms do not perform well and they are not
sufficient. An effective and efficient feature selection algorithm is needed to
consistently filter the most useful features during live CTR prediction
process. In this paper, we propose a heuristic algorithm named Neighborhood
Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR
prediction performance while reducing dimensionality and training time costs.
We conduct comprehensive experiments on three public datasets to validate the
efficiency and effectiveness of our proposed solution.