New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds

IF 1.2 4区 环境科学与生态学 Q4 ENVIRONMENTAL SCIENCES Journal of Environmental Science and Health Part C-Toxicology and Carcinogenesis Pub Date : 2016-03-17 DOI:10.1080/10590501.2016.1166879
A. Golbamaki, E. Benfenati, N. Golbamaki, A. Manganaro, Erinc Merdivan, A. Roncaglioni, G. Gini
{"title":"New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds","authors":"A. Golbamaki, E. Benfenati, N. Golbamaki, A. Manganaro, Erinc Merdivan, A. Roncaglioni, G. Gini","doi":"10.1080/10590501.2016.1166879","DOIUrl":null,"url":null,"abstract":"ABSTRACT In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.","PeriodicalId":53200,"journal":{"name":"Journal of Environmental Science and Health Part C-Toxicology and Carcinogenesis","volume":"34 1","pages":"113 - 97"},"PeriodicalIF":1.2000,"publicationDate":"2016-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10590501.2016.1166879","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Science and Health Part C-Toxicology and Carcinogenesis","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/10590501.2016.1166879","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 27

Abstract

ABSTRACT In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从两个大的化合物数据集中获得的致癌相关亚结构的新线索
在这项研究中,引入了与遗传毒性和非遗传毒性致癌物相关的新分子片段来估计化合物的致癌潜力。借助SARpy建立了两个基于规则的致癌模型:R模型(来自啮齿动物实验数据)和E模型(来自人类致癌性数据)。SARpy的结构警报提取方法采用完全自动化、无偏的方式,具有统计显著性。本研究中建立的致癌性模型是从两个致癌性数据库中提取的潜在致癌性片段的集合:ANTARES致癌性数据集,其中包含大鼠生物测定信息;iscan和CGX数据集的组合,其中考虑了基于人类的评估。使用258个复合案例研究数据集,通过交叉验证和外部验证来评估这两个模型的性能。结合R和H预测,当两个模型在预测上一致时,对阳性或阴性结果进行评分,将外部测试集的准确性提高到72%,特异性提高到79%。从化学类的角度对两种模型中存在的致癌碎片进行了比较和分析。本研究结果表明,所建立的规则集将是识别一些新的致癌性结构警报的有用工具,并为致癌化学物质的分子结构提供有效的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.60
自引率
0.00%
发文量
10
期刊最新文献
An efficient enzymatic system for studying structure-carcinogenicity relationships: metabolism of pyrrolizidine alkaloids by human liver microsomes in the presence of calf thymus DNA, resulting in the formation of DNA adducts. Reconsideration of the health effects of monosodium glutamate: from bench to bedside evidence. Unlocking the potential of AI: Machine learning and deep learning models for predicting carcinogenicity of chemicals. Hepatotoxicity of usnic acid and underlying mechanisms. Heavy metal and microbial testing of selected cosmetic products in the Palestinian market.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1