Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt
{"title":"Modeling bioconcentration factors in fish with explainable deep learning","authors":"Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt","doi":"10.1016/j.ailsci.2022.100047","DOIUrl":null,"url":null,"abstract":"<div><p>The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.68 and RMSE of 0.59 log units on an external test set; <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.</p><p>Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100047"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000174/pdfft?md5=d1e08bc12ac334ce4c4ea0eb17936560&pid=1-s2.0-S2667318522000174-main.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318522000174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models ( of 0.68 and RMSE of 0.59 log units on an external test set; of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.
Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)