首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Computational prediction of frequent hitters in target-based and cell-based assays 基于靶标和基于细胞的检测中频繁撞击的计算预测
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100007
Conrad Stork , Neann Mathai , Johannes Kirchmair

Compounds interfering with high-throughput screening (HTS) assay technologies (also known as “badly behaving compounds”, “bad actors”, “nuisance compounds” or “PAINS”) pose a major challenge to early-stage drug discovery. Many of these problematic compounds are “frequent hitters”, and we have recently published a set of machine learning models (“Hit Dexter 2.0”) for flagging such compounds.

Here we present a new generation of machine learning models which are derived from a large, manually curated and annotated data set. For the first time, these models cover, in addition to target-based assays, also cell-based assays. Our experiments show that cell-based assays behave indeed differently from target-based assays, with respect to hit rates and frequent hitters, and that dedicated models are required to produce meaningful predictions. In addition to these extensions and refinements, we explored a variety of additional setups for modeling, including the combination of four machine learning classifiers (i.e. k-nearest neighbors (KNN), extra trees, random forest and multilayer perceptron) with four sets of descriptors (Morgan2 fingerprints, Morgan3 fingerprints, MACCS keys and 2D physicochemical property descriptors).

Testing on holdout data as well as data sets of “dark chemical matter” (i.e. compounds that have been extensively tested in biological assays but have never shown activity) and known bad actors show that the multilayer perceptron classifiers in combination with Morgan2 fingerprints outperform other setups in most cases. The best multilayer perceptron classifiers obtained Matthews correlation coefficients of up to 0.648 on holdout data. These models are available via a free web service.

干扰高通量筛选(HTS)测定技术的化合物(也称为“不良行为化合物”、“不良行为者”、“滋扰化合物”或“PAINS”)对早期药物发现构成了重大挑战。这些有问题的化合物中有许多是“频繁攻击者”,我们最近发布了一组机器学习模型(“Hit Dexter 2.0”)来标记这些化合物。在这里,我们提出了新一代的机器学习模型,这些模型来自于一个大型的、人工整理和注释的数据集。这是第一次,这些模型覆盖,除了基于目标的分析,也基于细胞的分析。我们的实验表明,基于细胞的分析在命中率和频繁击中方面确实与基于目标的分析不同,并且需要专门的模型来产生有意义的预测。除了这些扩展和改进之外,我们还探索了各种额外的建模设置,包括四种机器学习分类器(即k近邻(KNN),额外树,随机森林和多层感知器)与四组描述符(Morgan2指纹,Morgan3指纹,MACCS密钥和2D物理化学性质描述符)的组合。对保留数据以及“暗化学物质”(即在生物分析中经过广泛测试但从未显示出活性的化合物)和已知不良分子的数据集进行的测试表明,多层感知器分类器与Morgan2指纹相结合在大多数情况下优于其他设置。最好的多层感知器分类器在holdout数据上获得的马修斯相关系数高达0.648。这些模型可以通过一个免费的网络服务获得。
{"title":"Computational prediction of frequent hitters in target-based and cell-based assays","authors":"Conrad Stork ,&nbsp;Neann Mathai ,&nbsp;Johannes Kirchmair","doi":"10.1016/j.ailsci.2021.100007","DOIUrl":"10.1016/j.ailsci.2021.100007","url":null,"abstract":"<div><p>Compounds interfering with high-throughput screening (HTS) assay technologies (also known as “badly behaving compounds”, “bad actors”, “nuisance compounds” or “PAINS”) pose a major challenge to early-stage drug discovery. Many of these problematic compounds are “frequent hitters”, and we have recently published a set of machine learning models (“Hit Dexter 2.0”) for flagging such compounds.</p><p>Here we present a new generation of machine learning models which are derived from a large, manually curated and annotated data set. For the first time, these models cover, in addition to target-based assays, also cell-based assays. Our experiments show that cell-based assays behave indeed differently from target-based assays, with respect to hit rates and frequent hitters, and that dedicated models are required to produce meaningful predictions. In addition to these extensions and refinements, we explored a variety of additional setups for modeling, including the combination of four machine learning classifiers (i.e. k-nearest neighbors (KNN), extra trees, random forest and multilayer perceptron) with four sets of descriptors (Morgan2 fingerprints, Morgan3 fingerprints, MACCS keys and 2D physicochemical property descriptors).</p><p>Testing on holdout data as well as data sets of “dark chemical matter” (i.e. compounds that have been extensively tested in biological assays but have never shown activity) and known bad actors show that the multilayer perceptron classifiers in combination with Morgan2 fingerprints outperform other setups in most cases. The best multilayer perceptron classifiers obtained Matthews correlation coefficients of up to 0.648 on holdout data. These models are available via a free web service.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113386911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introducing artificial intelligence in the life sciences 在生命科学领域引入人工智能
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100001
Mingyue Zheng , Carolina Horta Andrade , Jürgen Bajorath
{"title":"Introducing artificial intelligence in the life sciences","authors":"Mingyue Zheng ,&nbsp;Carolina Horta Andrade ,&nbsp;Jürgen Bajorath","doi":"10.1016/j.ailsci.2021.100001","DOIUrl":"https://doi.org/10.1016/j.ailsci.2021.100001","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136694523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An in silico pipeline for the discovery of multitarget ligands: A case study for epi-polypharmacology based on DNMT1/HDAC2 inhibition 发现多靶点配体的硅管道:基于DNMT1/HDAC2抑制的外源性多药理学案例研究
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100008
Fernando D. Prieto-Martínez , Eli Fernández-de Gortari , José L. Medina-Franco , L. Michel Espinoza-Fonseca

The search for novel therapeutic compounds remains an overwhelming task owing to the time-consuming and expensive nature of the drug development process and low success rates. Traditional methodologies that rely on the one drug-one target paradigm have proven insufficient for the treatment of multifactorial diseases, leading to a shift to multitarget approaches. In this emerging paradigm, molecules with off-target and promiscuous interactions may result in preferred therapies. In this study, we developed a general pipeline combining machine learning algorithms and a deep generator network to train a dual inhibitor classifier capable of identifying putative pharmacophoric traits. As a case study, we focused on dual inhibitors targeting DNA methyltransferase 1 (DNMT) and histone deacetylase 2 (HDAC2), two enzymes that play a central role in epigenetic regulation. We used this approach to identify dual inhibitors from a novel large natural product database in the public domain. We used docking and atomistic simulations as complementary approaches to establish the ligand-interaction profiles between the best hits and DNMT1/HDAC2. By using the combined ligand- and structure-based approaches, we discovered two promising novel scaffolds that can be used to simultaneously target both DNMT1 and HDAC2. We conclude that the flexibility and adaptability of the proposed pipeline has predictive capabilities of similar or derivative methods and is readily applicable to the discovery of small molecules targeting many other therapeutically relevant proteins.

由于药物开发过程耗时和昂贵的性质以及低成功率,寻找新的治疗化合物仍然是一项艰巨的任务。依靠一种药物-一种靶点范式的传统方法已被证明不足以治疗多因素疾病,导致向多靶点方法的转变。在这个新出现的范例中,具有脱靶和混杂相互作用的分子可能导致首选治疗。在本研究中,我们开发了一种结合机器学习算法和深度生成器网络的通用管道,以训练能够识别假定药效性状的双抑制剂分类器。作为一个案例研究,我们重点研究了靶向DNA甲基转移酶1 (DNMT)和组蛋白去乙酰化酶2 (HDAC2)的双重抑制剂,这两种酶在表观遗传调控中起着核心作用。我们使用这种方法从公共领域的一个新的大型天然产物数据库中识别双重抑制剂。我们使用对接和原子模拟作为互补的方法来建立最佳命中与DNMT1/HDAC2之间的配体相互作用谱。通过结合基于配体和结构的方法,我们发现了两种有希望的新型支架,可以同时靶向DNMT1和HDAC2。我们的结论是,所提出的管道的灵活性和适应性具有类似或衍生方法的预测能力,并且很容易适用于发现靶向许多其他治疗相关蛋白质的小分子。
{"title":"An in silico pipeline for the discovery of multitarget ligands: A case study for epi-polypharmacology based on DNMT1/HDAC2 inhibition","authors":"Fernando D. Prieto-Martínez ,&nbsp;Eli Fernández-de Gortari ,&nbsp;José L. Medina-Franco ,&nbsp;L. Michel Espinoza-Fonseca","doi":"10.1016/j.ailsci.2021.100008","DOIUrl":"10.1016/j.ailsci.2021.100008","url":null,"abstract":"<div><p>The search for novel therapeutic compounds remains an overwhelming task owing to the time-consuming and expensive nature of the drug development process and low success rates. Traditional methodologies that rely on the one drug-one target paradigm have proven insufficient for the treatment of multifactorial diseases, leading to a shift to multitarget approaches. In this emerging paradigm, molecules with off-target and promiscuous interactions may result in preferred therapies. In this study, we developed a general pipeline combining machine learning algorithms and a deep generator network to train a dual inhibitor classifier capable of identifying putative pharmacophoric traits. As a case study, we focused on dual inhibitors targeting DNA methyltransferase 1 (DNMT) and histone deacetylase 2 (HDAC2), two enzymes that play a central role in epigenetic regulation. We used this approach to identify dual inhibitors from a novel large natural product database in the public domain. We used docking and atomistic simulations as complementary approaches to establish the ligand-interaction profiles between the best hits and DNMT1/HDAC2. By using the combined ligand- and structure-based approaches, we discovered two promising novel scaffolds that can be used to simultaneously target both DNMT1 and HDAC2. We conclude that the flexibility and adaptability of the proposed pipeline has predictive capabilities of similar or derivative methods and is readily applicable to the discovery of small molecules targeting many other therapeutically relevant proteins.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9530984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Chemistry-centric explanation of machine learning models 以化学为中心解释机器学习模型
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100009
Raquel Rodríguez-Pérez , Jürgen Bajorath
{"title":"Chemistry-centric explanation of machine learning models","authors":"Raquel Rodríguez-Pérez ,&nbsp;Jürgen Bajorath","doi":"10.1016/j.ailsci.2021.100009","DOIUrl":"10.1016/j.ailsci.2021.100009","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266731852100009X/pdfft?md5=6bf9c6213d02c78ea314eab068194508&pid=1-s2.0-S266731852100009X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48664977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals 新的计算模型为评估化学品的眼睛刺激和腐蚀潜力提供了替代动物试验的方法
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100028
Arthur C. Silva , Joyce V.V.B. Borba , Vinicius M. Alves , Steven U.S. Hall , Nicholas Furnham , Nicole Kleinstreuer , Eugene Muratov , Alexander Tropsha , Carolina Horta Andrade

Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (https://stoptox.mml.unc.edu/). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds.

从清洁产品到眼科溶液,在开发用于眼睛或眼睛附近的化学品时,眼睛刺激和腐蚀是基本考虑因素。不幸的是,动物试验目前是鉴定引起眼睛刺激或腐蚀的化合物的标准方法。然而,美国和国外的监管机构面临越来越大的压力,要求开发新的方法方法(NAMs),以帮助减少对动物试验的需求,并解决未满足的化学品危害安全评估现代化需求。为了进一步发展和应用计算NAMs在化学安全评估中的应用,在本研究中,我们收集了最大的专家整理的化合物的眼睛刺激和腐蚀测试数据集,并利用这些数据建立和验证二元和多分类的定量结构-活性关系(QSAR)模型,该模型可以可靠地评估新的未经测试的化合物的眼睛刺激/腐蚀潜力。使用随机森林(RF)和多描述符跨读(MuDRA)机器学习(ML)方法生成QSAR模型,并使用5倍外部交叉验证协议进行验证。这些模型具有较高的平衡准确性(CCR为0.68 ~ 0.88)、敏感性(SE为0.61 ~ 0.84)、阳性预测值(PPV为0.65 ~ 0.90)、特异性(SP为0.56 ~ 0.91)和阴性预测值(NPV为0.68 ~ 0.85)。总体而言,MuDRA模型优于RF模型,并应用于预测来自非活性成分数据库(包含fda批准的药品中存在的成分)和化妆品成分数据库(欧盟委员会化妆品物质信息来源)的化合物的刺激/腐蚀电位。在这项研究中建立和验证的所有模型都可以在STopTox网站上公开获得(https://stoptox.mml.unc.edu/)。这些模型可以作为识别潜在的眼睛刺激性/腐蚀性化合物的可靠工具。
{"title":"Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals","authors":"Arthur C. Silva ,&nbsp;Joyce V.V.B. Borba ,&nbsp;Vinicius M. Alves ,&nbsp;Steven U.S. Hall ,&nbsp;Nicholas Furnham ,&nbsp;Nicole Kleinstreuer ,&nbsp;Eugene Muratov ,&nbsp;Alexander Tropsha ,&nbsp;Carolina Horta Andrade","doi":"10.1016/j.ailsci.2021.100028","DOIUrl":"10.1016/j.ailsci.2021.100028","url":null,"abstract":"<div><p>Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (<span>https://stoptox.mml.unc.edu/</span><svg><path></path></svg>). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9355119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40588277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Current status of active learning for drug discovery 药物发现中主动学习的现状
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100023
Jie Yu , Xutong Li , Mingyue Zheng

Active learning has been widely used in drug discovery and design in recent years. In this viewpoint, we will briefly summarize applications of AL for drug discovery and propose two potential limitations of research in this field.

近年来,主动学习在药物发现和设计中得到了广泛的应用。从这个角度来看,我们将简要总结人工智能在药物发现中的应用,并提出该领域研究的两个潜在局限性。
{"title":"Current status of active learning for drug discovery","authors":"Jie Yu ,&nbsp;Xutong Li ,&nbsp;Mingyue Zheng","doi":"10.1016/j.ailsci.2021.100023","DOIUrl":"10.1016/j.ailsci.2021.100023","url":null,"abstract":"<div><p>Active learning has been widely used in drug discovery and design in recent years. In this viewpoint, we will briefly summarize applications of AL for drug discovery and propose two potential limitations of research in this field.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000234/pdfft?md5=4b66ffe5aa91d2b4ff6b1d0f8fc4a84c&pid=1-s2.0-S2667318521000234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46279614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Machine learning in agriculture domain: A state-of-art survey 农业领域的机器学习:现状调查
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100010
Vishal Meshram , Kailas Patil , Vidula Meshram , Dinesh Hanchate , S.D. Ramkteke

Food is considered as a basic need of human being which can be satisfied through farming. Agriculture not only fulfills humans’ basic needs, but also considered as source of employment worldwide. Agriculture is considered as a backbone of economy and source of employment in the developing countries like India. Agriculture contributes 15.4% in the GDP of India. Agriculture activities are broadly categorized into three major areas: pre-harvesting, harvesting and post harvesting. Advancement in area of machine learning has helped improving gains in agriculture. Machine learning is the current technology which is benefiting farmers to minimize the losses in the farming by providing rich recommendations and insights about the crops. This paper presents an extensive survey of latest machine learning application in agriculture to alleviate the problems in the three areas of pre-harvesting, harvesting and post-harvesting. Application of machine learning in agriculture allows more efficient and precise farming with less human manpower with high quality production.

食物被认为是人类的基本需求,可以通过农业来满足。农业不仅满足了人类的基本需求,而且在世界范围内被认为是就业的来源。农业被认为是印度等发展中国家的经济支柱和就业来源。农业占印度GDP的15.4%。农业活动大致分为三个主要领域:收获前、收获和收获后。机器学习领域的进步有助于提高农业的收益。机器学习是当前的技术,通过提供丰富的建议和对作物的见解,使农民受益,从而最大限度地减少农业损失。本文对机器学习在农业中的最新应用进行了广泛的综述,以缓解收获前、收获和收获后三个方面的问题。机器学习在农业中的应用,可以用更少的人力和高质量的产品实现更高效、更精确的农业生产。
{"title":"Machine learning in agriculture domain: A state-of-art survey","authors":"Vishal Meshram ,&nbsp;Kailas Patil ,&nbsp;Vidula Meshram ,&nbsp;Dinesh Hanchate ,&nbsp;S.D. Ramkteke","doi":"10.1016/j.ailsci.2021.100010","DOIUrl":"10.1016/j.ailsci.2021.100010","url":null,"abstract":"<div><p>Food is considered as a basic need of human being which can be satisfied through farming. Agriculture not only fulfills humans’ basic needs, but also considered as source of employment worldwide. Agriculture is considered as a backbone of economy and source of employment in the developing countries like India. Agriculture contributes 15.4% in the GDP of India. Agriculture activities are broadly categorized into three major areas: pre-harvesting, harvesting and post harvesting. Advancement in area of machine learning has helped improving gains in agriculture. Machine learning is the current technology which is benefiting farmers to minimize the losses in the farming by providing rich recommendations and insights about the crops. This paper presents an extensive survey of latest machine learning application in agriculture to alleviate the problems in the three areas of pre-harvesting, harvesting and post-harvesting. Application of machine learning in agriculture allows more efficient and precise farming with less human manpower with high quality production.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000106/pdfft?md5=d2887b03e3cdff4a52c5bc0462338732&pid=1-s2.0-S2667318521000106-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46325215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
BeeToxAI: An artificial intelligence-based web app to assess acute toxicity of chemicals to honey bees BeeToxAI:一款基于人工智能的网络应用程序,用于评估化学品对蜜蜂的急性毒性
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100013
José T. Moreira-Filho , Rodolpho C. Braga , Jade Milhomem Lemos , Vinicius M. Alves , Joyce V.V.B. Borba , Wesley S. Costa , Nicole Kleinstreuer , Eugene N. Muratov , Carolina Horta Andrade , Bruno J. Neves

Chemically induced toxicity is the leading cause of recent extinction of honey bees. In this regard, we developed an innovative artificial intelligence-based web app (BeeToxAI) for assessing the acute toxicity of chemicals to Apis mellifera. Initially, we developed and externally validated QSAR models for classification (external set accuracy ∼91%) through the combination of Random Forest and molecular fingerprints to predict the potential for chemicals to cause acute contact toxicity and acute oral toxicity to honey bees. Then, we developed and externally validated regression QSAR models (R2 = 0.75) using Feedforward Neural Networks (FNNs). Afterward, the best models were implemented in the publicly available BeeToxAI web app (http://beetoxai.labmol.com.br/). The outputs of BeeToxAI are: toxicity predictions with estimated confidence, applicability domain estimation, and color-coded maps of relative structure fragment contributions to toxicity. As an additional assessment of BeeToxAI performance, we collected an external set of pesticides with known bee toxicity that were not included in our modeling dataset. BeeToxAI classification models were able to predict four out of five pesticides correctly. The acute contact toxicity model correctly predicted all of the eight pesticides. Here we demonstrate that BeeToxAI can be used as a rapid new approach methodology for predicting acute toxicity of chemicals in honey bees.

化学诱导的毒性是最近蜜蜂灭绝的主要原因。在这方面,我们开发了一个创新的基于人工智能的web应用程序(BeeToxAI),用于评估化学品对蜜蜂的急性毒性。最初,我们通过随机森林和分子指纹相结合,开发并外部验证了用于分类的QSAR模型(外部集精度约91%),以预测化学物质对蜜蜂造成急性接触毒性和急性口服毒性的可能性。然后,我们利用前馈神经网络(fnn)建立并外部验证了回归QSAR模型(R2 = 0.75)。之后,最好的模型在公开可用的BeeToxAI web应用程序(http://beetoxai.labmol.com.br/)中实现。BeeToxAI的输出是:估计置信度的毒性预测,适用性域估计,以及相对结构片段对毒性贡献的彩色编码图。作为对BeeToxAI性能的额外评估,我们收集了一组已知具有蜜蜂毒性的外部杀虫剂,这些杀虫剂未包括在我们的建模数据集中。BeeToxAI分类模型能够正确预测五种农药中的四种。急性接触毒性模型正确预测了所有8种农药。在这里,我们证明BeeToxAI可以作为一种快速的新方法来预测化学物质对蜜蜂的急性毒性。
{"title":"BeeToxAI: An artificial intelligence-based web app to assess acute toxicity of chemicals to honey bees","authors":"José T. Moreira-Filho ,&nbsp;Rodolpho C. Braga ,&nbsp;Jade Milhomem Lemos ,&nbsp;Vinicius M. Alves ,&nbsp;Joyce V.V.B. Borba ,&nbsp;Wesley S. Costa ,&nbsp;Nicole Kleinstreuer ,&nbsp;Eugene N. Muratov ,&nbsp;Carolina Horta Andrade ,&nbsp;Bruno J. Neves","doi":"10.1016/j.ailsci.2021.100013","DOIUrl":"10.1016/j.ailsci.2021.100013","url":null,"abstract":"<div><p>Chemically induced toxicity is the leading cause of recent extinction of honey bees. In this regard, we developed an innovative artificial intelligence-based web app (BeeToxAI) for assessing the acute toxicity of chemicals to <em>Apis mellifera</em>. Initially, we developed and externally validated QSAR models for classification (external set accuracy ∼91%) through the combination of Random Forest and molecular fingerprints to predict the potential for chemicals to cause acute contact toxicity and acute oral toxicity to honey bees. Then, we developed and externally validated regression QSAR models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> = 0.75) using Feedforward Neural Networks (FNNs). Afterward, the best models were implemented in the publicly available BeeToxAI web app (<span>http://beetoxai.labmol.com.br/</span><svg><path></path></svg><u>)</u>. The outputs of BeeToxAI are: toxicity predictions with estimated confidence, applicability domain estimation, and color-coded maps of relative structure fragment contributions to toxicity. As an additional assessment of BeeToxAI performance, we collected an external set of pesticides with known bee toxicity that were not included in our modeling dataset. BeeToxAI classification models were able to predict four out of five pesticides correctly. The acute contact toxicity model correctly predicted all of the eight pesticides. Here we demonstrate that BeeToxAI can be used as a rapid new approach methodology for predicting acute toxicity of chemicals in honey bees.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000131/pdfft?md5=f4b6e96a7da27f813679c0aab8f1014d&pid=1-s2.0-S2667318521000131-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48100929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Quantifying sources of uncertainty in drug discovery predictions with probabilistic models 用概率模型量化药物发现预测中的不确定性来源
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100004
Stanley E. Lazic , Dominic P. Williams

Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (https://github.com/stanlazic/ML_uncertainty_quantification).

当做出昂贵的投资决策和患者安全至关重要时,了解预测中的不确定性至关重要,但药物发现中的机器学习(ML)模型通常只提供一个最佳估计,而忽略了所有不确定性来源。因此,这些模型的预测可能过于自信,这可能使患者面临风险,并在注定失败的化合物进一步开发时浪费资源。概率预测模型(PPMs)可以包含所有不确定性的来源,并且它们返回表示预测中的不确定性的预测值的分布。我们描述了PPMs中的七个不确定性来源:数据、分布函数、均值函数、方差函数、链接函数、参数和超参数。我们以毒性预测为例,但同样的原理适用于所有的预测模型。还描述了忽略不确定性的后果以及PPMs如何解释不确定性。我们的目标是使广泛的非数学观众可以进行讨论。为数学读者提供了公式,使思想具体化(但可以跳过而不会失去理解),计算研究人员可以使用代码(https://github.com/stanlazic/ML_uncertainty_quantification)。
{"title":"Quantifying sources of uncertainty in drug discovery predictions with probabilistic models","authors":"Stanley E. Lazic ,&nbsp;Dominic P. Williams","doi":"10.1016/j.ailsci.2021.100004","DOIUrl":"10.1016/j.ailsci.2021.100004","url":null,"abstract":"<div><p>Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (<span>https://github.com/stanlazic/ML_uncertainty_quantification</span><svg><path></path></svg>).</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90695567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
AutoGGN: A gene graph network AutoML tool for multi-omics research AutoGGN:一个用于多组学研究的基因图网络AutoML工具
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100019
Lei Zhang , Wen Shen , Ping Li , Chi Xu , Denghui Liu , Wenjun He , Zhimeng Xu , Deyong Wang , Chenyi Zhang , Hualiang Jiang , Mingyue Zheng , Nan Qiao

Omics data can be used to identify biological characteristics from genetic to phenotypic levels during the life span of a living being, while molecular interaction networks have a fundamental impact on life activities. Integrating omics data and molecular interaction networks will help researchers delve into comprehensive information hidden in the data. Here, we propose a new multimodal method — AutoGGN — to integrate multi-omics data with molecular interaction networks based on graph convolutional neural networks (GCNs). We evaluated AutoGGN using three classification tasks: single-cell embryonic developmental stage classification, pan-cancer type classification, and breast cancer subtyping. On all three tasks, AutoGGN showed better performance than other methods. This means AutoGGN has the potential to extract insights more effectively by means of integrating molecular interaction networks with multi-omics data. Additionally, in order to provide a better understanding of how our model makes predictions, we utilized the SHAP module and identified the key genes contributing to the classification, providing insight for the design of downstream biological experiments.

组学数据可用于识别生物生命周期中从遗传到表型水平的生物学特征,而分子相互作用网络对生命活动具有根本影响。将组学数据与分子相互作用网络相结合,将有助于研究人员深入挖掘隐藏在数据中的综合信息。在此,我们提出了一种新的基于图卷积神经网络(GCNs)的多模态方法AutoGGN,将多组学数据与分子相互作用网络相结合。我们通过三个分类任务来评估AutoGGN:单细胞胚胎发育阶段分类、泛癌类型分类和乳腺癌亚型。在这三个任务上,AutoGGN都比其他方法表现得更好。这意味着AutoGGN有潜力通过整合分子相互作用网络和多组学数据来更有效地提取见解。此外,为了更好地理解我们的模型是如何进行预测的,我们利用了SHAP模块并确定了有助于分类的关键基因,为下游生物实验的设计提供了见解。
{"title":"AutoGGN: A gene graph network AutoML tool for multi-omics research","authors":"Lei Zhang ,&nbsp;Wen Shen ,&nbsp;Ping Li ,&nbsp;Chi Xu ,&nbsp;Denghui Liu ,&nbsp;Wenjun He ,&nbsp;Zhimeng Xu ,&nbsp;Deyong Wang ,&nbsp;Chenyi Zhang ,&nbsp;Hualiang Jiang ,&nbsp;Mingyue Zheng ,&nbsp;Nan Qiao","doi":"10.1016/j.ailsci.2021.100019","DOIUrl":"https://doi.org/10.1016/j.ailsci.2021.100019","url":null,"abstract":"<div><p>Omics data can be used to identify biological characteristics from genetic to phenotypic levels during the life span of a living being, while molecular interaction networks have a fundamental impact on life activities. Integrating omics data and molecular interaction networks will help researchers delve into comprehensive information hidden in the data. Here, we propose a new multimodal method — AutoGGN — to integrate multi-omics data with molecular interaction networks based on graph convolutional neural networks (GCNs). We evaluated AutoGGN using three classification tasks: single-cell embryonic developmental stage classification, pan-cancer type classification, and breast cancer subtyping. On all three tasks, AutoGGN showed better performance than other methods. This means AutoGGN has the potential to extract insights more effectively by means of integrating molecular interaction networks with multi-omics data. Additionally, in order to provide a better understanding of how our model makes predictions, we utilized the SHAP module and identified the key genes contributing to the classification, providing insight for the design of downstream biological experiments.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000192/pdfft?md5=91b39ee64c55f03bb6fc4708ba1153ea&pid=1-s2.0-S2667318521000192-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136694940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1