Ghita Ghislat, Saiveth Hernandez-Hernandez, Chayanit Piwajanusorn, Pedro J. Ballester
{"title":"Challenges with the application and adoption of artificial intelligence for drug discovery","authors":"Ghita Ghislat, Saiveth Hernandez-Hernandez, Chayanit Piwajanusorn, Pedro J. Ballester","doi":"arxiv-2407.05150","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI) is exhibiting tremendous potential to reduce the\nmassive costs and long timescales of drug discovery. There are however\nimportant challenges limiting the impact and scope of AI models. Typically,\nthese models are evaluated on benchmarks that are unlikely to anticipate their\nprospective performance, which inadvertently misguides their development.\nIndeed, while all the developed models excel in a selected benchmark, only a\nsmall proportion of them are ultimately reported to have prospective value\n(e.g. by discovering potent and innovative drug leads for a therapeutic\ntarget). Here we discuss a range of data issues (bias, inconsistency, skewness,\nirrelevance, small size, high dimensionality), how they challenge AI models and\nwhich issue-specific mitigations have been effective. Next, we point out the\nchallenges faced by uncertainty quantification techniques aimed at enhancing\nthese AI models. We also discuss how conceptual errors, unrealistic benchmarks\nand performance misestimation can confound the evaluation of models and thus\ntheir development. Lastly, we explain how human bias, whether from AI experts\nor drug discovery experts, constitutes another challenge that can be alleviated\nwith prospective studies.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.05150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence (AI) is exhibiting tremendous potential to reduce the
massive costs and long timescales of drug discovery. There are however
important challenges limiting the impact and scope of AI models. Typically,
these models are evaluated on benchmarks that are unlikely to anticipate their
prospective performance, which inadvertently misguides their development.
Indeed, while all the developed models excel in a selected benchmark, only a
small proportion of them are ultimately reported to have prospective value
(e.g. by discovering potent and innovative drug leads for a therapeutic
target). Here we discuss a range of data issues (bias, inconsistency, skewness,
irrelevance, small size, high dimensionality), how they challenge AI models and
which issue-specific mitigations have been effective. Next, we point out the
challenges faced by uncertainty quantification techniques aimed at enhancing
these AI models. We also discuss how conceptual errors, unrealistic benchmarks
and performance misestimation can confound the evaluation of models and thus
their development. Lastly, we explain how human bias, whether from AI experts
or drug discovery experts, constitutes another challenge that can be alleviated
with prospective studies.