Machine learning approaches for prediction of ovarian cancer driver genes from mutational and network analysis

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2023-04-28 DOI:10.1108/dta-03-2022-0096
Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas
{"title":"Machine learning approaches for prediction of ovarian cancer driver genes from mutational and network analysis","authors":"Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas","doi":"10.1108/dta-03-2022-0096","DOIUrl":null,"url":null,"abstract":"PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-03-2022-0096","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从突变和网络分析中预测卵巢癌驱动基因的机器学习方法
目的癌症是世界上最常见的妇科癌症,死亡率高。由于一般症状的表现和缺乏特定的生物标志物,OC通常在晚期被诊断出来。机器学习模型可以用来预测与致病突变有关的驱动基因。设计/方法/方法在本研究中,对47名OC患者的全外显子组序列进行了全面的下一代测序(NGS)分析,以确定具有临床意义的突变。通过使用用于预测OC驱动基因的极限梯度增强(XGBoost)分类器方法,将所识别的708个突变的9个功能特征输入到机器学习分类模型中。结果XGBoost分类器模型的分类精度为0.946,优于决策树、朴素贝叶斯、随机森林和支持向量机等其他分类器。此外,生成了一个交互网络,以识别和建立与癌症相关途径和基因本体数据的相关性。原创性/价值最终结果揭示了12个推定的癌症驱动基因候选,即LAMA3、LAMC3、COL6A1、COL5A1、COL2A1、UGT1A1、BDNF、ANK1、WNT10A、FZD4、PLEKHG5和CYP2C9,这些基因可能对临床诊断有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data Technologies and Applications
Data Technologies and Applications Social Sciences-Library and Information Sciences
CiteScore
3.80
自引率
6.20%
发文量
29
期刊介绍: Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies
期刊最新文献
Understanding customer behavior by mapping complaints to personality based on social media textual data A systematic review of the use of FHIR to support clinical research, public health and medical education Novel framework for learning performance prediction using pattern identification and deep learning A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1