流感病毒数据中任意顺序相互作用抗原位点识别的广义层次稀疏模型。

Lei Han, Yu Zhang, Xiu-Feng Wan, Tong Zhang
{"title":"流感病毒数据中任意顺序相互作用抗原位点识别的广义层次稀疏模型。","authors":"Lei Han,&nbsp;Yu Zhang,&nbsp;Xiu-Feng Wan,&nbsp;Tong Zhang","doi":"10.1145/2939672.2939786","DOIUrl":null,"url":null,"abstract":"<p><p>Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem. In this paper, we propose a generalized hierarchical sparse model (GHSM) as a generalization of the HSM models to tackle arbitrary-order interactions. The GHSM applies the ℓ<sub>1</sub> penalty to all the model coefficients under a constraint that given any covariate, if none of its associated <i>k</i>th-order interactions contribute to the regression model, then neither do its associated higher-order interactions. The resulting objective function is non-convex with a challenge lying in the coupled variables appearing in the arbitrary-order hierarchical constraints and we devise an efficient optimization algorithm to directly solve it. Specifically, we decouple the variables in the constraints via both the general iterative shrinkage and thresholding (GIST) and the alternating direction method of multipliers (ADMM) methods into three subproblems, each of which is proved to admit an efficiently analytical solution. We evaluate the GHSM method in both synthetic problem and the antigenic sites identification problem for the influenza virus data, where we expand the feature space up to the 5th-order interactions. Empirical results demonstrate the effectiveness and efficiency of the proposed methods and the learned high-order interactions have meaningful synergistic covariate patterns in the influenza virus antigenicity.</p>","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2939672.2939786","citationCount":"3","resultStr":"{\"title\":\"Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data.\",\"authors\":\"Lei Han,&nbsp;Yu Zhang,&nbsp;Xiu-Feng Wan,&nbsp;Tong Zhang\",\"doi\":\"10.1145/2939672.2939786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem. In this paper, we propose a generalized hierarchical sparse model (GHSM) as a generalization of the HSM models to tackle arbitrary-order interactions. The GHSM applies the ℓ<sub>1</sub> penalty to all the model coefficients under a constraint that given any covariate, if none of its associated <i>k</i>th-order interactions contribute to the regression model, then neither do its associated higher-order interactions. The resulting objective function is non-convex with a challenge lying in the coupled variables appearing in the arbitrary-order hierarchical constraints and we devise an efficient optimization algorithm to directly solve it. Specifically, we decouple the variables in the constraints via both the general iterative shrinkage and thresholding (GIST) and the alternating direction method of multipliers (ADMM) methods into three subproblems, each of which is proved to admit an efficiently analytical solution. We evaluate the GHSM method in both synthetic problem and the antigenic sites identification problem for the influenza virus data, where we expand the feature space up to the 5th-order interactions. Empirical results demonstrate the effectiveness and efficiency of the proposed methods and the learned high-order interactions have meaningful synergistic covariate patterns in the influenza virus antigenicity.</p>\",\"PeriodicalId\":74037,\"journal\":{\"name\":\"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/2939672.2939786\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2939672.2939786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2939672.2939786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

最近的统计证据表明,纳入原始协变量/特征之间相互作用的回归模型可以显着提高生物数据的可解释性。当向模型中添加高阶特征交互时,一个主要的挑战是指数扩展的特征空间。为了解决大维度问题,在协变量相互作用的遗传结构下,建立了层次稀疏模型(HSM)。然而,现有的方法只考虑两两相互作用,使得发现重要的高阶相互作用成为一个非平凡的开放问题。在本文中,我们提出了一种广义层次稀疏模型(GHSM)作为HSM模型的推广来处理任意阶的相互作用。GHSM在给定任何协变量的约束下对所有模型系数应用l1惩罚,如果其相关的第k阶相互作用对回归模型没有贡献,则其相关的高阶相互作用也没有贡献。所得到的目标函数是非凸的,挑战在于任意阶层次约束中出现的耦合变量,我们设计了一种有效的优化算法来直接求解它。具体而言,我们通过一般迭代收缩阈值法(GIST)和乘法器的交替方向法(ADMM)方法将约束中的变量解耦为三个子问题,每个子问题都证明了一个有效的解析解。我们在流感病毒数据的合成问题和抗原位点识别问题中评估了GHSM方法,其中我们将特征空间扩展到5阶相互作用。实证结果证明了所提出方法的有效性和效率,并且所学习的高阶相互作用在流感病毒抗原性中具有有意义的协同协变量模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data.

Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem. In this paper, we propose a generalized hierarchical sparse model (GHSM) as a generalization of the HSM models to tackle arbitrary-order interactions. The GHSM applies the ℓ1 penalty to all the model coefficients under a constraint that given any covariate, if none of its associated kth-order interactions contribute to the regression model, then neither do its associated higher-order interactions. The resulting objective function is non-convex with a challenge lying in the coupled variables appearing in the arbitrary-order hierarchical constraints and we devise an efficient optimization algorithm to directly solve it. Specifically, we decouple the variables in the constraints via both the general iterative shrinkage and thresholding (GIST) and the alternating direction method of multipliers (ADMM) methods into three subproblems, each of which is proved to admit an efficiently analytical solution. We evaluate the GHSM method in both synthetic problem and the antigenic sites identification problem for the influenza virus data, where we expand the feature space up to the 5th-order interactions. Empirical results demonstrate the effectiveness and efficiency of the proposed methods and the learned high-order interactions have meaningful synergistic covariate patterns in the influenza virus antigenicity.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting Age-Related Macular Degeneration Progression with Contrastive Attention and Time-Aware LSTM. MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization. Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. Federated Adversarial Debiasing for Fair and Transferable Representations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1