HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression.

Forestry research Pub Date : 2021-03-30 eCollection Date: 2021-01-01 DOI:10.48130/FR-2021-0006
Wenping Deng, Kui Zhang, Cheng He, Sanzhen Liu, Hairong Wei
{"title":"HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression.","authors":"Wenping Deng, Kui Zhang, Cheng He, Sanzhen Liu, Hairong Wei","doi":"10.48130/FR-2021-0006","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in <i>Arabidopsis thaliana</i> led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.</p>","PeriodicalId":520285,"journal":{"name":"Forestry research","volume":"1 ","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524267/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forestry research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48130/FR-2021-0006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in Arabidopsis thaliana led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HB-PLS:一种通过将 Huber 损失和 Berhu 惩罚与偏最小二乘法回归相结合来识别生物过程或途径调节器的统计方法。
基因表达数据具有高维、多共线性和非高斯分布噪声等特点,给识别控制生物过程或通路的真正调控基因带来了障碍。本研究将 Huber 损失函数和 Berhu 惩罚(HB)整合到偏最小二乘法(PLS)框架中,以处理基因表达数据的高维和多共线性特性,并开发了一种名为 HB-PLS 回归的新方法来建立调控基因和通路基因之间关系的模型。为解决 Huber-Berhu 优化问题,开发了一种加速的近端梯度下降算法,其速度比一般凸优化求解器(CVX)至少快 10 倍。应用 HB-PLS 来识别拟南芥木质素生物合成和光合作用的通路调节因子,发现了许多之前已通过实验验证的已知正通路调节因子。稀疏偏最小二乘法(SPLS)是处理多重共线性时进行变量选择和降维的有效方法,与该方法相比,HB-PLS 在识别更多已知阳性调控因子方面具有更高的效率,在将真正的已知阳性调控因子排在上述两条途径输出调控基因列表的前列方面,灵敏度/(1-特异性)更高但略低。此外,每种方法都能识别出一些其他方法无法识别的独特调控因子。我们的研究结果表明,HB-PLS 的总体性能略高于 SPLS,但这两种方法都有助于从高通量基因表达数据中识别真正的通路调控因子,这表明统计、机器精益和凸优化的整合可以产生一种高效的方法,值得进一步探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Characterization of UGT71, a major glycosyltransferase family for triterpenoids, flavonoids and phytohormones-biosynthetic in plants. CRISPR/Cas9 ribonucleoprotein mediated DNA-free genome editing in larch. The revelation of genomic breed composition using target capture sequencing: a case of Taxodium. Responses of isolated balsam-fir stem segments to exogenous ACC, IAA, and IBA. Combating browning: mechanisms and management strategies in in vitro culture of economic woody plants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1