This study presents a comparative assessment of predictive methods for ready biodegradation using a curated dataset with REACH experimental information for 2684 industrial chemicals. A large part of these structures is not present in the training and validation sets of the models allowing for their unbiased external validation. We evaluated various QSAR models that can be readily used, including Biowin, Opera, Vega, Catalogic, and a recent model by Körner et al. The models were compared based on how well their training sets span the industrial chemical space, their predictive performance and applicability domain coverage. The balanced accuracy ranged from 0.600 to 0.771, while the sensitivity for identifying non-readily biodegradable substances varied between 0.217 and 0.848, reflecting the expected trade-off with specificity. The applicability domain coverage ranged from 28.5% to nearly the entire chemical space. Consensus models were developed using majority voting to explore the sensitivity and specificity interplay by combining model predictions, but did not yield appreciable increases in balanced accuracy or F1 score, although they increased the reliability of non-readily biodegradable predictions at the detriment of applicability domain coverage. This work underscores the potential of in silico methods for predicting the fate properties of substances, even before they are synthesised or commercialised, thereby fulfilling regulatory information requirements and prioritizing substances for testing. However, further developments are needed to achieve predictive performance that is comparable to the variability in the experimental test. The curated dataset has been made publicly available as supporting information, facilitating the further development and validation of predictive methods.
扫码关注我们
求助内容:
应助结果提醒方式:
