首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals 新的计算模型为评估化学品的眼睛刺激和腐蚀潜力提供了替代动物试验的方法
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100028
Arthur C. Silva , Joyce V.V.B. Borba , Vinicius M. Alves , Steven U.S. Hall , Nicholas Furnham , Nicole Kleinstreuer , Eugene Muratov , Alexander Tropsha , Carolina Horta Andrade

Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (https://stoptox.mml.unc.edu/). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds.

从清洁产品到眼科溶液,在开发用于眼睛或眼睛附近的化学品时,眼睛刺激和腐蚀是基本考虑因素。不幸的是,动物试验目前是鉴定引起眼睛刺激或腐蚀的化合物的标准方法。然而,美国和国外的监管机构面临越来越大的压力,要求开发新的方法方法(NAMs),以帮助减少对动物试验的需求,并解决未满足的化学品危害安全评估现代化需求。为了进一步发展和应用计算NAMs在化学安全评估中的应用,在本研究中,我们收集了最大的专家整理的化合物的眼睛刺激和腐蚀测试数据集,并利用这些数据建立和验证二元和多分类的定量结构-活性关系(QSAR)模型,该模型可以可靠地评估新的未经测试的化合物的眼睛刺激/腐蚀潜力。使用随机森林(RF)和多描述符跨读(MuDRA)机器学习(ML)方法生成QSAR模型,并使用5倍外部交叉验证协议进行验证。这些模型具有较高的平衡准确性(CCR为0.68 ~ 0.88)、敏感性(SE为0.61 ~ 0.84)、阳性预测值(PPV为0.65 ~ 0.90)、特异性(SP为0.56 ~ 0.91)和阴性预测值(NPV为0.68 ~ 0.85)。总体而言,MuDRA模型优于RF模型,并应用于预测来自非活性成分数据库(包含fda批准的药品中存在的成分)和化妆品成分数据库(欧盟委员会化妆品物质信息来源)的化合物的刺激/腐蚀电位。在这项研究中建立和验证的所有模型都可以在STopTox网站上公开获得(https://stoptox.mml.unc.edu/)。这些模型可以作为识别潜在的眼睛刺激性/腐蚀性化合物的可靠工具。
{"title":"Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals","authors":"Arthur C. Silva ,&nbsp;Joyce V.V.B. Borba ,&nbsp;Vinicius M. Alves ,&nbsp;Steven U.S. Hall ,&nbsp;Nicholas Furnham ,&nbsp;Nicole Kleinstreuer ,&nbsp;Eugene Muratov ,&nbsp;Alexander Tropsha ,&nbsp;Carolina Horta Andrade","doi":"10.1016/j.ailsci.2021.100028","DOIUrl":"10.1016/j.ailsci.2021.100028","url":null,"abstract":"<div><p>Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (<span>https://stoptox.mml.unc.edu/</span><svg><path></path></svg>). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100028"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9355119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40588277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Current status of active learning for drug discovery 药物发现中主动学习的现状
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100023
Jie Yu , Xutong Li , Mingyue Zheng

Active learning has been widely used in drug discovery and design in recent years. In this viewpoint, we will briefly summarize applications of AL for drug discovery and propose two potential limitations of research in this field.

近年来,主动学习在药物发现和设计中得到了广泛的应用。从这个角度来看,我们将简要总结人工智能在药物发现中的应用,并提出该领域研究的两个潜在局限性。
{"title":"Current status of active learning for drug discovery","authors":"Jie Yu ,&nbsp;Xutong Li ,&nbsp;Mingyue Zheng","doi":"10.1016/j.ailsci.2021.100023","DOIUrl":"10.1016/j.ailsci.2021.100023","url":null,"abstract":"<div><p>Active learning has been widely used in drug discovery and design in recent years. In this viewpoint, we will briefly summarize applications of AL for drug discovery and propose two potential limitations of research in this field.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100023"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000234/pdfft?md5=4b66ffe5aa91d2b4ff6b1d0f8fc4a84c&pid=1-s2.0-S2667318521000234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46279614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Machine learning in agriculture domain: A state-of-art survey 农业领域的机器学习:现状调查
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100010
Vishal Meshram , Kailas Patil , Vidula Meshram , Dinesh Hanchate , S.D. Ramkteke

Food is considered as a basic need of human being which can be satisfied through farming. Agriculture not only fulfills humans’ basic needs, but also considered as source of employment worldwide. Agriculture is considered as a backbone of economy and source of employment in the developing countries like India. Agriculture contributes 15.4% in the GDP of India. Agriculture activities are broadly categorized into three major areas: pre-harvesting, harvesting and post harvesting. Advancement in area of machine learning has helped improving gains in agriculture. Machine learning is the current technology which is benefiting farmers to minimize the losses in the farming by providing rich recommendations and insights about the crops. This paper presents an extensive survey of latest machine learning application in agriculture to alleviate the problems in the three areas of pre-harvesting, harvesting and post-harvesting. Application of machine learning in agriculture allows more efficient and precise farming with less human manpower with high quality production.

食物被认为是人类的基本需求,可以通过农业来满足。农业不仅满足了人类的基本需求,而且在世界范围内被认为是就业的来源。农业被认为是印度等发展中国家的经济支柱和就业来源。农业占印度GDP的15.4%。农业活动大致分为三个主要领域:收获前、收获和收获后。机器学习领域的进步有助于提高农业的收益。机器学习是当前的技术,通过提供丰富的建议和对作物的见解,使农民受益,从而最大限度地减少农业损失。本文对机器学习在农业中的最新应用进行了广泛的综述,以缓解收获前、收获和收获后三个方面的问题。机器学习在农业中的应用,可以用更少的人力和高质量的产品实现更高效、更精确的农业生产。
{"title":"Machine learning in agriculture domain: A state-of-art survey","authors":"Vishal Meshram ,&nbsp;Kailas Patil ,&nbsp;Vidula Meshram ,&nbsp;Dinesh Hanchate ,&nbsp;S.D. Ramkteke","doi":"10.1016/j.ailsci.2021.100010","DOIUrl":"10.1016/j.ailsci.2021.100010","url":null,"abstract":"<div><p>Food is considered as a basic need of human being which can be satisfied through farming. Agriculture not only fulfills humans’ basic needs, but also considered as source of employment worldwide. Agriculture is considered as a backbone of economy and source of employment in the developing countries like India. Agriculture contributes 15.4% in the GDP of India. Agriculture activities are broadly categorized into three major areas: pre-harvesting, harvesting and post harvesting. Advancement in area of machine learning has helped improving gains in agriculture. Machine learning is the current technology which is benefiting farmers to minimize the losses in the farming by providing rich recommendations and insights about the crops. This paper presents an extensive survey of latest machine learning application in agriculture to alleviate the problems in the three areas of pre-harvesting, harvesting and post-harvesting. Application of machine learning in agriculture allows more efficient and precise farming with less human manpower with high quality production.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100010"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000106/pdfft?md5=d2887b03e3cdff4a52c5bc0462338732&pid=1-s2.0-S2667318521000106-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46325215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
BeeToxAI: An artificial intelligence-based web app to assess acute toxicity of chemicals to honey bees BeeToxAI:一款基于人工智能的网络应用程序,用于评估化学品对蜜蜂的急性毒性
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100013
José T. Moreira-Filho , Rodolpho C. Braga , Jade Milhomem Lemos , Vinicius M. Alves , Joyce V.V.B. Borba , Wesley S. Costa , Nicole Kleinstreuer , Eugene N. Muratov , Carolina Horta Andrade , Bruno J. Neves

Chemically induced toxicity is the leading cause of recent extinction of honey bees. In this regard, we developed an innovative artificial intelligence-based web app (BeeToxAI) for assessing the acute toxicity of chemicals to Apis mellifera. Initially, we developed and externally validated QSAR models for classification (external set accuracy ∼91%) through the combination of Random Forest and molecular fingerprints to predict the potential for chemicals to cause acute contact toxicity and acute oral toxicity to honey bees. Then, we developed and externally validated regression QSAR models (R2 = 0.75) using Feedforward Neural Networks (FNNs). Afterward, the best models were implemented in the publicly available BeeToxAI web app (http://beetoxai.labmol.com.br/). The outputs of BeeToxAI are: toxicity predictions with estimated confidence, applicability domain estimation, and color-coded maps of relative structure fragment contributions to toxicity. As an additional assessment of BeeToxAI performance, we collected an external set of pesticides with known bee toxicity that were not included in our modeling dataset. BeeToxAI classification models were able to predict four out of five pesticides correctly. The acute contact toxicity model correctly predicted all of the eight pesticides. Here we demonstrate that BeeToxAI can be used as a rapid new approach methodology for predicting acute toxicity of chemicals in honey bees.

化学诱导的毒性是最近蜜蜂灭绝的主要原因。在这方面,我们开发了一个创新的基于人工智能的web应用程序(BeeToxAI),用于评估化学品对蜜蜂的急性毒性。最初,我们通过随机森林和分子指纹相结合,开发并外部验证了用于分类的QSAR模型(外部集精度约91%),以预测化学物质对蜜蜂造成急性接触毒性和急性口服毒性的可能性。然后,我们利用前馈神经网络(fnn)建立并外部验证了回归QSAR模型(R2 = 0.75)。之后,最好的模型在公开可用的BeeToxAI web应用程序(http://beetoxai.labmol.com.br/)中实现。BeeToxAI的输出是:估计置信度的毒性预测,适用性域估计,以及相对结构片段对毒性贡献的彩色编码图。作为对BeeToxAI性能的额外评估,我们收集了一组已知具有蜜蜂毒性的外部杀虫剂,这些杀虫剂未包括在我们的建模数据集中。BeeToxAI分类模型能够正确预测五种农药中的四种。急性接触毒性模型正确预测了所有8种农药。在这里,我们证明BeeToxAI可以作为一种快速的新方法来预测化学物质对蜜蜂的急性毒性。
{"title":"BeeToxAI: An artificial intelligence-based web app to assess acute toxicity of chemicals to honey bees","authors":"José T. Moreira-Filho ,&nbsp;Rodolpho C. Braga ,&nbsp;Jade Milhomem Lemos ,&nbsp;Vinicius M. Alves ,&nbsp;Joyce V.V.B. Borba ,&nbsp;Wesley S. Costa ,&nbsp;Nicole Kleinstreuer ,&nbsp;Eugene N. Muratov ,&nbsp;Carolina Horta Andrade ,&nbsp;Bruno J. Neves","doi":"10.1016/j.ailsci.2021.100013","DOIUrl":"10.1016/j.ailsci.2021.100013","url":null,"abstract":"<div><p>Chemically induced toxicity is the leading cause of recent extinction of honey bees. In this regard, we developed an innovative artificial intelligence-based web app (BeeToxAI) for assessing the acute toxicity of chemicals to <em>Apis mellifera</em>. Initially, we developed and externally validated QSAR models for classification (external set accuracy ∼91%) through the combination of Random Forest and molecular fingerprints to predict the potential for chemicals to cause acute contact toxicity and acute oral toxicity to honey bees. Then, we developed and externally validated regression QSAR models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> = 0.75) using Feedforward Neural Networks (FNNs). Afterward, the best models were implemented in the publicly available BeeToxAI web app (<span>http://beetoxai.labmol.com.br/</span><svg><path></path></svg><u>)</u>. The outputs of BeeToxAI are: toxicity predictions with estimated confidence, applicability domain estimation, and color-coded maps of relative structure fragment contributions to toxicity. As an additional assessment of BeeToxAI performance, we collected an external set of pesticides with known bee toxicity that were not included in our modeling dataset. BeeToxAI classification models were able to predict four out of five pesticides correctly. The acute contact toxicity model correctly predicted all of the eight pesticides. Here we demonstrate that BeeToxAI can be used as a rapid new approach methodology for predicting acute toxicity of chemicals in honey bees.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100013"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000131/pdfft?md5=f4b6e96a7da27f813679c0aab8f1014d&pid=1-s2.0-S2667318521000131-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48100929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Quantifying sources of uncertainty in drug discovery predictions with probabilistic models 用概率模型量化药物发现预测中的不确定性来源
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100004
Stanley E. Lazic , Dominic P. Williams

Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (https://github.com/stanlazic/ML_uncertainty_quantification).

当做出昂贵的投资决策和患者安全至关重要时,了解预测中的不确定性至关重要,但药物发现中的机器学习(ML)模型通常只提供一个最佳估计,而忽略了所有不确定性来源。因此,这些模型的预测可能过于自信,这可能使患者面临风险,并在注定失败的化合物进一步开发时浪费资源。概率预测模型(PPMs)可以包含所有不确定性的来源,并且它们返回表示预测中的不确定性的预测值的分布。我们描述了PPMs中的七个不确定性来源:数据、分布函数、均值函数、方差函数、链接函数、参数和超参数。我们以毒性预测为例,但同样的原理适用于所有的预测模型。还描述了忽略不确定性的后果以及PPMs如何解释不确定性。我们的目标是使广泛的非数学观众可以进行讨论。为数学读者提供了公式,使思想具体化(但可以跳过而不会失去理解),计算研究人员可以使用代码(https://github.com/stanlazic/ML_uncertainty_quantification)。
{"title":"Quantifying sources of uncertainty in drug discovery predictions with probabilistic models","authors":"Stanley E. Lazic ,&nbsp;Dominic P. Williams","doi":"10.1016/j.ailsci.2021.100004","DOIUrl":"10.1016/j.ailsci.2021.100004","url":null,"abstract":"<div><p>Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (<span>https://github.com/stanlazic/ML_uncertainty_quantification</span><svg><path></path></svg>).</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100004"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90695567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
AutoGGN: A gene graph network AutoML tool for multi-omics research AutoGGN:一个用于多组学研究的基因图网络AutoML工具
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100019
Lei Zhang , Wen Shen , Ping Li , Chi Xu , Denghui Liu , Wenjun He , Zhimeng Xu , Deyong Wang , Chenyi Zhang , Hualiang Jiang , Mingyue Zheng , Nan Qiao

Omics data can be used to identify biological characteristics from genetic to phenotypic levels during the life span of a living being, while molecular interaction networks have a fundamental impact on life activities. Integrating omics data and molecular interaction networks will help researchers delve into comprehensive information hidden in the data. Here, we propose a new multimodal method — AutoGGN — to integrate multi-omics data with molecular interaction networks based on graph convolutional neural networks (GCNs). We evaluated AutoGGN using three classification tasks: single-cell embryonic developmental stage classification, pan-cancer type classification, and breast cancer subtyping. On all three tasks, AutoGGN showed better performance than other methods. This means AutoGGN has the potential to extract insights more effectively by means of integrating molecular interaction networks with multi-omics data. Additionally, in order to provide a better understanding of how our model makes predictions, we utilized the SHAP module and identified the key genes contributing to the classification, providing insight for the design of downstream biological experiments.

组学数据可用于识别生物生命周期中从遗传到表型水平的生物学特征,而分子相互作用网络对生命活动具有根本影响。将组学数据与分子相互作用网络相结合,将有助于研究人员深入挖掘隐藏在数据中的综合信息。在此,我们提出了一种新的基于图卷积神经网络(GCNs)的多模态方法AutoGGN,将多组学数据与分子相互作用网络相结合。我们通过三个分类任务来评估AutoGGN:单细胞胚胎发育阶段分类、泛癌类型分类和乳腺癌亚型。在这三个任务上,AutoGGN都比其他方法表现得更好。这意味着AutoGGN有潜力通过整合分子相互作用网络和多组学数据来更有效地提取见解。此外,为了更好地理解我们的模型是如何进行预测的,我们利用了SHAP模块并确定了有助于分类的关键基因,为下游生物实验的设计提供了见解。
{"title":"AutoGGN: A gene graph network AutoML tool for multi-omics research","authors":"Lei Zhang ,&nbsp;Wen Shen ,&nbsp;Ping Li ,&nbsp;Chi Xu ,&nbsp;Denghui Liu ,&nbsp;Wenjun He ,&nbsp;Zhimeng Xu ,&nbsp;Deyong Wang ,&nbsp;Chenyi Zhang ,&nbsp;Hualiang Jiang ,&nbsp;Mingyue Zheng ,&nbsp;Nan Qiao","doi":"10.1016/j.ailsci.2021.100019","DOIUrl":"https://doi.org/10.1016/j.ailsci.2021.100019","url":null,"abstract":"<div><p>Omics data can be used to identify biological characteristics from genetic to phenotypic levels during the life span of a living being, while molecular interaction networks have a fundamental impact on life activities. Integrating omics data and molecular interaction networks will help researchers delve into comprehensive information hidden in the data. Here, we propose a new multimodal method — AutoGGN — to integrate multi-omics data with molecular interaction networks based on graph convolutional neural networks (GCNs). We evaluated AutoGGN using three classification tasks: single-cell embryonic developmental stage classification, pan-cancer type classification, and breast cancer subtyping. On all three tasks, AutoGGN showed better performance than other methods. This means AutoGGN has the potential to extract insights more effectively by means of integrating molecular interaction networks with multi-omics data. Additionally, in order to provide a better understanding of how our model makes predictions, we utilized the SHAP module and identified the key genes contributing to the classification, providing insight for the design of downstream biological experiments.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100019"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000192/pdfft?md5=91b39ee64c55f03bb6fc4708ba1153ea&pid=1-s2.0-S2667318521000192-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136694940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast-bonito: A faster deep learning based basecaller for nanopore sequencing Fast-bonito:一个更快的基于深度学习的纳米孔测序碱基调用器
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100011
Zhimeng Xu , Yuting Mai , Denghui Liu , Wenjun He , Xinyuan Lin , Chi Xu , Lei Zhang , Xin Meng , Joseph Mafofo , Walid Abbas Zaher , Ashish Koshy , Yi Li , Nan Qiao

Nanopore sequencing from Oxford Nanopore Technologies (ONT) is a promising third-generation sequencing (TGS) technology that generates relatively longer sequencing reads compared to the next-generation sequencing (NGS) technology. A basecaller is a piece of software that translates the original electrical current signals into nucleotide sequences. The accuracy of the basecaller is crucially important to downstream analysis. Bonito is a deep learning-based basecaller recently developed by ONT. Its neural network architecture is composed of a single convolutional layer followed by three stacked bidirectional gated recurrent unit (GRU) layers. Although Bonito has achieved state-of-the-art base calling accuracy, its speed is too slow to be used in production. We therefore developed Fast-Bonito, by using the neural architecture search (NAS) technique to search for a brand-new neural network backbone, and trained it from scratch using several advanced deep learning model training techniques. The new Fast-Bonito model balanced performance in terms of speed and accuracy. Fast-Bonito was 153.8% faster than the original Bonito on NVIDIA V100 GPU. When running on HUAWEI Ascend 910 NPU, Fast-Bonito was 565% faster than the original Bonito. The accuracy of Fast-Bonito was also slightly higher than that of Bonito. We have made Fast-Bonito open source, hoping it will boost the adoption of TGS in both academia and industry.

来自牛津纳米孔技术公司(ONT)的纳米孔测序是一种很有前途的第三代测序(TGS)技术,与下一代测序(NGS)技术相比,它可以产生相对较长的测序读数。碱基调用器是一种将原始电流信号转换成核苷酸序列的软件。基调用器的准确性对下游分析至关重要。Bonito是ONT公司最近开发的基于深度学习的基础调用器。其神经网络结构由单个卷积层和三个堆叠的双向门控循环单元(GRU)层组成。虽然Bonito已经达到了最先进的基础呼叫精度,但它的速度太慢,无法用于生产。因此,我们开发了Fast-Bonito,通过使用神经架构搜索(NAS)技术搜索全新的神经网络骨干,并使用几种先进的深度学习模型训练技术从头开始训练它。新的Fast-Bonito模型在速度和准确性方面平衡了性能。Fast-Bonito在NVIDIA V100 GPU上比原来的Bonito快153.8%。在HUAWEI Ascend 910 NPU上运行时,Fast-Bonito的速度比原来的Bonito快565%。Fast-Bonito的准确率也略高于Bonito。我们已经将Fast-Bonito开源,希望它能促进TGS在学术界和工业界的采用。
{"title":"Fast-bonito: A faster deep learning based basecaller for nanopore sequencing","authors":"Zhimeng Xu ,&nbsp;Yuting Mai ,&nbsp;Denghui Liu ,&nbsp;Wenjun He ,&nbsp;Xinyuan Lin ,&nbsp;Chi Xu ,&nbsp;Lei Zhang ,&nbsp;Xin Meng ,&nbsp;Joseph Mafofo ,&nbsp;Walid Abbas Zaher ,&nbsp;Ashish Koshy ,&nbsp;Yi Li ,&nbsp;Nan Qiao","doi":"10.1016/j.ailsci.2021.100011","DOIUrl":"10.1016/j.ailsci.2021.100011","url":null,"abstract":"<div><p>Nanopore sequencing from Oxford Nanopore Technologies (ONT) is a promising third-generation sequencing (TGS) technology that generates relatively longer sequencing reads compared to the next-generation sequencing (NGS) technology. A basecaller is a piece of software that translates the original electrical current signals into nucleotide sequences. The accuracy of the basecaller is crucially important to downstream analysis. Bonito is a deep learning-based basecaller recently developed by ONT. Its neural network architecture is composed of a single convolutional layer followed by three stacked bidirectional gated recurrent unit (GRU) layers. Although Bonito has achieved state-of-the-art base calling accuracy, its speed is too slow to be used in production. We therefore developed Fast-Bonito, by using the neural architecture search (NAS) technique to search for a brand-new neural network backbone, and trained it from scratch using several advanced deep learning model training techniques. The new Fast-Bonito model balanced performance in terms of speed and accuracy. Fast-Bonito was 153.8% faster than the original Bonito on NVIDIA V100 GPU. When running on HUAWEI Ascend 910 NPU, Fast-Bonito was 565% faster than the original Bonito. The accuracy of Fast-Bonito was also slightly higher than that of Bonito. We have made Fast-Bonito open source, hoping it will boost the adoption of TGS in both academia and industry.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100011"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000118/pdfft?md5=fd79b6a6d202e645142894875f87c96d&pid=1-s2.0-S2667318521000118-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48882457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Second-generation artificial intelligence approaches for life science research 生命科学研究的第二代人工智能方法
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100026
Jürgen Bajorath
{"title":"Second-generation artificial intelligence approaches for life science research","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2021.100026","DOIUrl":"10.1016/j.ailsci.2021.100026","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100026"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266731852100026X/pdfft?md5=5aa4de63e2d6fd19645b6729085c8c1c&pid=1-s2.0-S266731852100026X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47299019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning functional group chemistry from molecular images leads to accurate prediction of activity cliffs 从分子图像中学习官能团化学可以准确预测活性悬崖
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100022
Javed Iqbal, Martin Vogt, Jürgen Bajorath

Advances in image analysis through deep learning have catalyzed the recent use of molecular images in chemoinformatics and drug design for predictive modeling of compound properties and other applications. For image analysis and representation learning from molecular graphs, convolutional neural networks (CNNs) represent a preferred computational architecture. In this work, we have investigated the questions whether functional groups (FGs) and their distinguishing chemical features can be learned from compound images using CNNs of different complexity and whether such knowledge might be transferable to other prediction tasks. We have shown that frequently occurring FGs were comprehensively learned, leading to highly accurate multi-label FG predictions. Furthermore, we have determined that the FG knowledge acquired by CNNs was sufficient for accurate prediction of compound activity cliffs (ACs) via transfer learning. Re-training of FG prediction models on AC data optimized convolutional layer weights and further improved prediction accuracy. Through feature weight analysis and visualization, a rationale was provided for the ability of CNNs to learn FG chemistry and transfer this knowledge for effective AC prediction.

通过深度学习在图像分析方面的进步促进了分子图像在化学信息学和药物设计中的应用,用于化合物性质的预测建模和其他应用。对于从分子图中进行图像分析和表示学习,卷积神经网络(cnn)代表了首选的计算架构。在这项工作中,我们研究了是否可以使用不同复杂性的cnn从复合图像中学习官能团(fg)及其不同的化学特征,以及这些知识是否可以转移到其他预测任务中。我们已经表明,频繁发生的FG被全面学习,导致高度准确的多标签FG预测。此外,我们已经确定cnn获得的FG知识足以通过迁移学习准确预测复合活性悬崖(ACs)。在AC数据上重新训练FG预测模型,优化卷积层权值,进一步提高预测精度。通过特征权值分析和可视化,为cnn学习FG化学并将这些知识用于有效的AC预测提供了理论基础。
{"title":"Learning functional group chemistry from molecular images leads to accurate prediction of activity cliffs","authors":"Javed Iqbal,&nbsp;Martin Vogt,&nbsp;Jürgen Bajorath","doi":"10.1016/j.ailsci.2021.100022","DOIUrl":"10.1016/j.ailsci.2021.100022","url":null,"abstract":"<div><p>Advances in image analysis through deep learning have catalyzed the recent use of molecular images in chemoinformatics and drug design for predictive modeling of compound properties and other applications. For image analysis and representation learning from molecular graphs, convolutional neural networks (CNNs) represent a preferred computational architecture. In this work, we have investigated the questions whether functional groups (FGs) and their distinguishing chemical features can be learned from compound images using CNNs of different complexity and whether such knowledge might be transferable to other prediction tasks. We have shown that frequently occurring FGs were comprehensively learned, leading to highly accurate multi-label FG predictions. Furthermore, we have determined that the FG knowledge acquired by CNNs was sufficient for accurate prediction of compound activity cliffs (ACs) via transfer learning. Re-training of FG prediction models on AC data optimized convolutional layer weights and further improved prediction accuracy. Through feature weight analysis and visualization, a rationale was provided for the ability of CNNs to learn FG chemistry and transfer this knowledge for effective AC prediction.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100022"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000222/pdfft?md5=cb926dd5579da39d2f820073674a8d1d&pid=1-s2.0-S2667318521000222-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47627435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
AutoGenome: An AutoML tool for genomic research AutoGenome:一个用于基因组研究的AutoML工具
Pub Date : 2021-12-01 DOI: 10.1016/j.ailsci.2021.100017
Denghui Liu , Chi Xu , Wenjun He , Zhimeng Xu , Wenqi Fu , Lei Zhang , Jie Yang , Zhihao Wang , Bing Liu , Guangdun Peng , Dali Han , Xiaolong Bai , Nan Qiao

Deep learning has achieved great successes in traditional fields like computer vision (CV), natural language processing (NLP), speech processing, and more. These advancements have greatly inspired researchers in genomics and made deep learning in genomics an exciting and popular topic. The convolutional neural network (CNN) and recurrent neural network (RNN) are frequently used to solve genomic sequencing and prediction problems, and multiple layer perception (MLP) and auto-encoders (AE) are frequently used for genomic profiling data like RNA expression data and gene mutation data. Here, we introduce a new neural network architecture-the residual fully-connected neural network (RFCN)-and describe its advantage in modeling genomic profiling data. We also incorporate AutoML algorithms and implement AutoGenome, an end-to-end, automated deep learning framework for genomic studies. By utilizing the proposed RFCN architecture, automatic hyper-parameter search, and neural architecture search algorithms, AutoGenome can automatically train high-performance deep learning models for various kinds of genomic profiling data. To help researchers better understand the trained models, AutoGenome can assess the importance of different features and export the most critical features for supervised learning tasks and the representative latent vectors for unsupervised learning tasks. We expect AutoGenome will become a popular tool in genomic studies.

深度学习在计算机视觉(CV)、自然语言处理(NLP)、语音处理等传统领域取得了巨大成功。这些进步极大地激励了基因组学的研究人员,并使基因组学的深度学习成为一个令人兴奋和流行的话题。卷积神经网络(CNN)和递归神经网络(RNN)常用于解决基因组测序和预测问题,多层感知(MLP)和自编码器(AE)常用于RNA表达数据和基因突变数据等基因组分析数据。在这里,我们介绍了一种新的神经网络架构-残差全连接神经网络(RFCN),并描述了它在建模基因组图谱数据方面的优势。我们还整合了AutoML算法并实现了AutoGenome,这是一个用于基因组研究的端到端自动化深度学习框架。利用提出的RFCN架构、自动超参数搜索和神经架构搜索算法,AutoGenome可以自动训练高性能的深度学习模型,用于各种基因组分析数据。为了帮助研究人员更好地理解训练模型,AutoGenome可以评估不同特征的重要性,并为监督学习任务导出最关键的特征,为无监督学习任务导出具有代表性的潜在向量。我们期待AutoGenome成为基因组研究中的一个流行工具。
{"title":"AutoGenome: An AutoML tool for genomic research","authors":"Denghui Liu ,&nbsp;Chi Xu ,&nbsp;Wenjun He ,&nbsp;Zhimeng Xu ,&nbsp;Wenqi Fu ,&nbsp;Lei Zhang ,&nbsp;Jie Yang ,&nbsp;Zhihao Wang ,&nbsp;Bing Liu ,&nbsp;Guangdun Peng ,&nbsp;Dali Han ,&nbsp;Xiaolong Bai ,&nbsp;Nan Qiao","doi":"10.1016/j.ailsci.2021.100017","DOIUrl":"https://doi.org/10.1016/j.ailsci.2021.100017","url":null,"abstract":"<div><p>Deep learning has achieved great successes in traditional fields like computer vision (CV), natural language processing (NLP), speech processing, and more. These advancements have greatly inspired researchers in genomics and made deep learning in genomics an exciting and popular topic. The convolutional neural network (CNN) and recurrent neural network (RNN) are frequently used to solve genomic sequencing and prediction problems, and multiple layer perception (MLP) and auto-encoders (AE) are frequently used for genomic profiling data like RNA expression data and gene mutation data. Here, we introduce a new neural network architecture-the residual fully-connected neural network (RFCN)-and describe its advantage in modeling genomic profiling data. We also incorporate AutoML algorithms and implement AutoGenome, an end-to-end, automated deep learning framework for genomic studies. By utilizing the proposed RFCN architecture, automatic hyper-parameter search, and neural architecture search algorithms, AutoGenome can automatically train high-performance deep learning models for various kinds of genomic profiling data. To help researchers better understand the trained models, AutoGenome can assess the importance of different features and export the most critical features for supervised learning tasks and the representative latent vectors for unsupervised learning tasks. We expect AutoGenome will become a popular tool in genomic studies.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"1 ","pages":"Article 100017"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000179/pdfft?md5=cf6b23a0b87a53ab56b10caf93790902&pid=1-s2.0-S2667318521000179-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136694939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1