Optimizing machine-learning models for mutagenicity prediction through better feature selection.

IF 2.5 4区 医学 Q3 GENETICS & HEREDITY Mutagenesis Pub Date : 2022-10-26 DOI:10.1093/mutage/geac010
Nicolas K Shinada,Naoki Koyama,Megumi Ikemori,Tomoki Nishioka,Seiji Hitaoka,Atsushi Hakura,Shoji Asakura,Yukiko Matsuoka,Sucheendra K Palaniappan
{"title":"Optimizing machine-learning models for mutagenicity prediction through better feature selection.","authors":"Nicolas K Shinada,Naoki Koyama,Megumi Ikemori,Tomoki Nishioka,Seiji Hitaoka,Atsushi Hakura,Shoji Asakura,Yukiko Matsuoka,Sucheendra K Palaniappan","doi":"10.1093/mutage/geac010","DOIUrl":null,"url":null,"abstract":"Assessing a compound's mutagenicity using machine learning is an important activity in the drug discovery and development process. Traditional methods of mutagenicity detection, such as Ames test, are expensive and time and labor intensive. In this context, in silico methods that predict a compound mutagenicity with high accuracy are important. Recently, machine-learning (ML) models are increasingly being proposed to improve the accuracy of mutagenicity prediction. While these models are used in practice, there is further scope to improve the accuracy of these models. We hypothesize that choosing the right features to train the model can further lead to better accuracy. We systematically consider and evaluate a combination of novel structural and molecular features which have the maximal impact on the accuracy of models. We rigorously evaluate these features against multiple classification models (from classical ML models to deep neural network models). The performance of the models was assessed using 5- and 10-fold cross-validation and we show that our approach using the molecule structure, molecular properties, and structural alerts as feature sets successfully outperform the state-of-the-art methods for mutagenicity prediction for the Hansen et al. benchmark dataset with an area under the receiver operating characteristic curve of 0.93. More importantly, our framework shows how combining features could benefit model accuracy improvements.","PeriodicalId":18889,"journal":{"name":"Mutagenesis","volume":"25 1","pages":"191-202"},"PeriodicalIF":2.5000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mutagenesis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/mutage/geac010","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 1

Abstract

Assessing a compound's mutagenicity using machine learning is an important activity in the drug discovery and development process. Traditional methods of mutagenicity detection, such as Ames test, are expensive and time and labor intensive. In this context, in silico methods that predict a compound mutagenicity with high accuracy are important. Recently, machine-learning (ML) models are increasingly being proposed to improve the accuracy of mutagenicity prediction. While these models are used in practice, there is further scope to improve the accuracy of these models. We hypothesize that choosing the right features to train the model can further lead to better accuracy. We systematically consider and evaluate a combination of novel structural and molecular features which have the maximal impact on the accuracy of models. We rigorously evaluate these features against multiple classification models (from classical ML models to deep neural network models). The performance of the models was assessed using 5- and 10-fold cross-validation and we show that our approach using the molecule structure, molecular properties, and structural alerts as feature sets successfully outperform the state-of-the-art methods for mutagenicity prediction for the Hansen et al. benchmark dataset with an area under the receiver operating characteristic curve of 0.93. More importantly, our framework shows how combining features could benefit model accuracy improvements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
优化机器学习模型,通过更好的特征选择来预测突变性。
利用机器学习评估化合物的突变性是药物发现和开发过程中的一项重要活动。传统的诱变检测方法,如Ames试验,价格昂贵,耗时费力。在这种情况下,预测具有高准确性的化合物致突变性的计算机方法是重要的。最近,机器学习(ML)模型被越来越多地用于提高诱变预测的准确性。虽然这些模型在实践中使用,但这些模型的准确性还有进一步提高的余地。我们假设选择正确的特征来训练模型可以进一步提高准确性。我们系统地考虑和评估对模型准确性有最大影响的新结构和分子特征的组合。我们针对多个分类模型(从经典ML模型到深度神经网络模型)严格评估这些特征。使用5倍和10倍交叉验证评估了模型的性能,我们表明,我们使用分子结构、分子特性和结构警报作为特征集的方法成功地优于Hansen等基准数据集的最先进的诱变性预测方法,其接收器工作特征曲线下的面积为0.93。更重要的是,我们的框架展示了如何结合特征来提高模型的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mutagenesis
Mutagenesis 生物-毒理学
CiteScore
5.90
自引率
3.70%
发文量
22
审稿时长
6-12 weeks
期刊介绍: Mutagenesis is an international multi-disciplinary journal designed to bring together research aimed at the identification, characterization and elucidation of the mechanisms of action of physical, chemical and biological agents capable of producing genetic change in living organisms and the study of the consequences of such changes.
期刊最新文献
Tribute to Professor Diana Anderson: A scientist extra ordinaire (11.12.1940 - 11.10.2024). Maternal exercise before and during pregnancy protects against genotoxicity and promotes offspring hippocampal health in mice prenatally exposed to high fructose. A pooled analysis of host factors that affect nucleotide excision repair in humans. Piper auritum ethanol extract is a potent antimutagen against food-borne aromatic amines: mechanisms of action and chemical composition. Impact of DNA ligase inhibition on the nick sealing of polβ nucleotide insertion products at the downstream steps of base excision repair pathway.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1