{"title":"Entire Regularization Path for Sparse Nonnegative Interaction Model","authors":"Mirai Takayanagi, Yasuo Tabei, Hiroto Saigo","doi":"10.1109/ICDM.2018.00168","DOIUrl":null,"url":null,"abstract":"Building sparse combinatorial model with non-negative constraint is essential in solving real-world problems such as in biology, in which the target response is often formulated by additive linear combination of features variables. This paper presents a solution to this problem by combining itemset mining with non-negative least squares. However, once incorporation of modern regularization is considered, then a naive solution requires to solve expensive enumeration problem many times for every regularization parameter. In this paper, we devise a regularization path tracking algorithm such that combinatorial feature is searched and included one by one to the solution set. Our contribution is a proposal of novel bounds specifically designed for the feature search problem. In synthetic dataset, the proposed method is demonstrated to run orders of magnitudes faster than a naive counterpart which does not employ tree pruning. We also empirically show that non-negativity constraints can reduce the number of active features much less than that of LASSO, leading to significant speed-ups in pattern search. In experiments using HIV-1 drug resistance dataset, the proposed method could successfully model the rapidly increasing drug resistance triggered by accumulation of mutations in HIV-1 genetic sequences. We also demonstrate the effectiveness of non-negativity constraints in suppressing false positive features, resulting in a model with smaller number of features and thereby improved interpretability.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"26 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Building sparse combinatorial model with non-negative constraint is essential in solving real-world problems such as in biology, in which the target response is often formulated by additive linear combination of features variables. This paper presents a solution to this problem by combining itemset mining with non-negative least squares. However, once incorporation of modern regularization is considered, then a naive solution requires to solve expensive enumeration problem many times for every regularization parameter. In this paper, we devise a regularization path tracking algorithm such that combinatorial feature is searched and included one by one to the solution set. Our contribution is a proposal of novel bounds specifically designed for the feature search problem. In synthetic dataset, the proposed method is demonstrated to run orders of magnitudes faster than a naive counterpart which does not employ tree pruning. We also empirically show that non-negativity constraints can reduce the number of active features much less than that of LASSO, leading to significant speed-ups in pattern search. In experiments using HIV-1 drug resistance dataset, the proposed method could successfully model the rapidly increasing drug resistance triggered by accumulation of mutations in HIV-1 genetic sequences. We also demonstrate the effectiveness of non-negativity constraints in suppressing false positive features, resulting in a model with smaller number of features and thereby improved interpretability.