On potential limitations of differential expression analysis with non-linear machine learning models

G. Sabbatini, L. Manganaro
{"title":"On potential limitations of differential expression analysis with non-linear machine learning models","authors":"G. Sabbatini, L. Manganaro","doi":"10.14806/ej.28.0.1035","DOIUrl":null,"url":null,"abstract":"Recently, there has been a growing interest in bioinformatics toward the adoption of increasingly complex machine learning models for the analysis of next-generation sequencing data with the goal of disease subtyping (i.e., patient stratification based on molecular features) or risk-based classification for specific endpoints, such as survival. With gene-expression data, a common approach consists in characterising the emerging groups by exploiting a differential expression analysis, which selects relevant gene sets coupled with pathway enrichment analysis, providing an insight into the underlying biological processes. However, when non-linear machine learning models are involved, differential expression analysis could be limiting since patient groupings identified by the model could be based on a set of genes that are hidden to differential expression due to its linear nature, affecting subsequent biological characterisation and validation. The aim of this study is to provide a proof-of-concept example demonstrating such a limitation. Moreover, we suggest that this issue could be overcome by the adoption of the innovative paradigm of eXplainable Artificial Intelligence, which consists in building an additional explainer to get a trustworthy interpretation of the model outputs and building a reliable set of genes characterising each group, preserving also non-linear relations, to be used for downstream analysis and validation.","PeriodicalId":72893,"journal":{"name":"EMBnet.journal","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EMBnet.journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14806/ej.28.0.1035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Recently, there has been a growing interest in bioinformatics toward the adoption of increasingly complex machine learning models for the analysis of next-generation sequencing data with the goal of disease subtyping (i.e., patient stratification based on molecular features) or risk-based classification for specific endpoints, such as survival. With gene-expression data, a common approach consists in characterising the emerging groups by exploiting a differential expression analysis, which selects relevant gene sets coupled with pathway enrichment analysis, providing an insight into the underlying biological processes. However, when non-linear machine learning models are involved, differential expression analysis could be limiting since patient groupings identified by the model could be based on a set of genes that are hidden to differential expression due to its linear nature, affecting subsequent biological characterisation and validation. The aim of this study is to provide a proof-of-concept example demonstrating such a limitation. Moreover, we suggest that this issue could be overcome by the adoption of the innovative paradigm of eXplainable Artificial Intelligence, which consists in building an additional explainer to get a trustworthy interpretation of the model outputs and building a reliable set of genes characterising each group, preserving also non-linear relations, to be used for downstream analysis and validation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
非线性机器学习模型差分表达式分析的潜在局限性
最近,人们对生物信息学越来越感兴趣,采用越来越复杂的机器学习模型来分析下一代测序数据,目标是疾病亚型(即,基于分子特征的患者分层)或基于风险的特定终点分类,如生存。对于基因表达数据,一种常见的方法是通过利用差异表达分析来表征新兴群体,该分析选择相关基因集,结合途径富集分析,提供对潜在生物学过程的洞察。然而,当涉及非线性机器学习模型时,差异表达分析可能会受到限制,因为模型识别的患者分组可能基于一组由于其线性性质而隐藏于差异表达的基因,从而影响随后的生物学表征和验证。本研究的目的是提供一个概念验证的例子来证明这种限制。此外,我们建议可以通过采用可解释人工智能的创新范式来克服这个问题,该范式包括建立一个额外的解释器,以获得对模型输出的可信解释,并建立一组可靠的基因来表征每个组,同时保留非线性关系,用于下游分析和验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Milk exosomes and a new way of communication between mother and child Exosomal Epigenetics Fingerprinting Breast Milk; insights into Milk Exosomics Ds-Seq: An Integrated Pipeline for In Silico Small RNA Se-quence Analysis for Host-pathogen Interaction Studies The Intersection of Artificial Intelligence and Precision Endocrinology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1