首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Elucidating dynamic cell lineages and gene networks in time-course single cell differentiation 阐明单细胞分化过程中的动态细胞系和基因网络
Pub Date : 2023-03-25 DOI: 10.1016/j.ailsci.2023.100068
Mengrui Zhang , Yongkai Chen , Dingyi Yu , Wenxuan Zhong , Jingyi Zhang , Ping Ma

Single cell RNA sequencing (scRNA-seq) technologies provide researchers with an unprecedented opportunity to exploit cell heterogeneity. For example, the sequenced cells belong to various cell lineages, which may have different cell fates in stem and progenitor cells. Those cells may differentiate into various mature cell types in a cell differentiation process. To trace the behavior of cell differentiation, researchers reconstruct cell lineages and predict cell fates by ordering cells chronologically into a trajectory with a pseudo-time. However, in scRNA-seq experiments, there are no cell-to-cell correspondences along with the time to reconstruct the cell lineages, which creates a significant challenge for cell lineage tracing and cell fate prediction. Therefore, methods that can accurately reconstruct the dynamic cell lineages and predict cell fates are highly desirable.

In this article, we develop an innovative machine-learning framework called Cell Smoothing Transformation (CellST) to elucidate the dynamic cell fate paths and construct gene networks in cell differentiation processes. Unlike the existing methods that construct one single bulk cell trajectory, CellST builds cell trajectories and tracks behaviors for each individual cell. Additionally, CellST can predict cell fates even for less frequent cell types. Based on the individual cell fate trajectories, CellST can further construct dynamic gene networks to model gene-gene relationships along the cell differentiation process and discover critical genes that potentially regulate cells into various mature cell types.

单细胞RNA测序(scRNA-seq)技术为研究人员利用细胞异质性提供了前所未有的机会。例如,测序的细胞属于不同的细胞谱系,在干细胞和祖细胞中可能具有不同的细胞命运。这些细胞可以在细胞分化过程中分化为各种成熟细胞类型。为了追踪细胞分化的行为,研究人员重建细胞谱系,并通过将细胞按时间顺序排列成具有伪时间的轨迹来预测细胞命运。然而,在scRNA-seq实验中,随着重建细胞谱系的时间,没有细胞与细胞的对应关系,这给细胞谱系追踪和细胞命运预测带来了重大挑战。因此,能够准确重建动态细胞谱系并预测细胞命运的方法是非常理想的。在这篇文章中,我们开发了一个名为细胞平滑转化(CellST)的创新机器学习框架,以阐明细胞分化过程中的动态细胞命运路径并构建基因网络。与构建单个大块细胞轨迹的现有方法不同,CellST构建细胞轨迹并跟踪每个单个细胞的行为。此外,CellST甚至可以预测频率较低的细胞类型的细胞命运。基于单个细胞的命运轨迹,CellST可以进一步构建动态基因网络,以模拟细胞分化过程中的基因-基因关系,并发现可能将细胞调节为各种成熟细胞类型的关键基因。
{"title":"Elucidating dynamic cell lineages and gene networks in time-course single cell differentiation","authors":"Mengrui Zhang ,&nbsp;Yongkai Chen ,&nbsp;Dingyi Yu ,&nbsp;Wenxuan Zhong ,&nbsp;Jingyi Zhang ,&nbsp;Ping Ma","doi":"10.1016/j.ailsci.2023.100068","DOIUrl":"10.1016/j.ailsci.2023.100068","url":null,"abstract":"<div><p>Single cell RNA sequencing (scRNA-seq) technologies provide researchers with an unprecedented opportunity to exploit cell heterogeneity. For example, the sequenced cells belong to various cell lineages, which may have different cell fates in stem and progenitor cells. Those cells may differentiate into various mature cell types in a cell differentiation process. To trace the behavior of cell differentiation, researchers reconstruct cell lineages and predict cell fates by ordering cells chronologically into a trajectory with a pseudo-time. However, in scRNA-seq experiments, there are no cell-to-cell correspondences along with the time to reconstruct the cell lineages, which creates a significant challenge for cell lineage tracing and cell fate prediction. Therefore, methods that can accurately reconstruct the dynamic cell lineages and predict cell fates are highly desirable.</p><p>In this article, we develop an innovative machine-learning framework called Cell Smoothing Transformation (CellST) to elucidate the dynamic cell fate paths and construct gene networks in cell differentiation processes. Unlike the existing methods that construct one single bulk cell trajectory, CellST builds cell trajectories and tracks behaviors for each individual cell. Additionally, CellST can predict cell fates even for less frequent cell types. Based on the individual cell fate trajectories, CellST can further construct dynamic gene networks to model gene-gene relationships along the cell differentiation process and discover critical genes that potentially regulate cells into various mature cell types.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100068"},"PeriodicalIF":0.0,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9800573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data science and data analytics in life science research 生命科学研究中的数据科学和数据分析
Pub Date : 2023-02-27 DOI: 10.1016/j.ailsci.2023.100067
Jürgen Bajorath
{"title":"Data science and data analytics in life science research","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2023.100067","DOIUrl":"10.1016/j.ailsci.2023.100067","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100067"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43783253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Natural products subsets: Generation and characterization 天然产物子集:生成和表征
Pub Date : 2023-02-26 DOI: 10.1016/j.ailsci.2023.100066
Ana L. Chávez-Hernández, José L. Medina-Franco

Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.

天然产物具有独特的化学结构,如大量的sp3碳原子、手性中心(这两个特征都与结构复杂性有关)、大型化学支架和功能基团的多样性,因此对药物发现应用具有吸引力。此外,天然产品被用于从头设计,并激发了使用生成模型的伪天然产品的发展。公共数据库,如开放天然产物集和通用天然产物数据库(UNPD)是生成模型和其他应用中使用的结构的丰富来源。在这项工作中,我们报告了使用MaxMin算法从UNPD中选择和表征最多样化的天然产物化合物。由14,994、7,497和4,998个化合物生成的子集可在https://github.com/DIFACQUIM/Natural-products-subsets-generation上公开获得。我们预计,这些子集将在研究小组建立基于自然产物的生成模型时特别有用,特别是那些无法获得大量超级计算机资源的研究小组。
{"title":"Natural products subsets: Generation and characterization","authors":"Ana L. Chávez-Hernández,&nbsp;José L. Medina-Franco","doi":"10.1016/j.ailsci.2023.100066","DOIUrl":"10.1016/j.ailsci.2023.100066","url":null,"abstract":"<div><p>Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp<sup>3</sup> carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in <em>de novo</em> design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at <span>https://github.com/DIFACQUIM/Natural-products-subsets-generation</span><svg><path></path></svg>. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100066"},"PeriodicalIF":0.0,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43292936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An improved 3D quantitative structure-activity relationships (QSAR) of molecules with CNN-based partial least squares model 基于CNN的偏最小二乘模型改进分子三维定量构效关系
Pub Date : 2023-02-24 DOI: 10.1016/j.ailsci.2023.100065
Xuxiang Huo , Jun Xu , Mingyuan Xu , Hongming Chen

Ligand-based virtual screening plays an important role for cases in which protein structures are not available. Among ligand-based methods, accurate and fast prediction of protein-ligand binding affinity is crucial for reducing computational cost and exploring the chemical search space efficiently. Here we proposed a CNN-based method, termed as L3D-PLS for building the quantitative structure-activity relationships without target structures. In L3D-PLS, a CNN module was designed for extracting the key interaction features from the grids around aligned ligands, and a partial least square (PLS) model fits the binding affinity with the extracted features of the pre-trained CNN module. In 30 publicly available pre-aligned molecular datasets, L3D-PLS outperformed the traditional CoMFA method. This results highlight that L3D-PLS can be useful for lead optimization based on small datasets which is often true in drug discovery compaign.

基于配体的虚拟筛选在蛋白质结构不可用的情况下起着重要作用。在基于配体的方法中,准确、快速地预测蛋白质与配体的结合亲和力对于降低计算成本和有效地探索化学搜索空间至关重要。在这里,我们提出了一种基于cnn的方法,称为L3D-PLS,用于在没有目标结构的情况下建立定量的构效关系。在L3D-PLS中,设计了一个CNN模块,用于从对齐配体周围的网格中提取关键的相互作用特征,并使用偏最小二乘(PLS)模型将其与预训练CNN模块提取的特征进行拟合。在30个公开的预对齐分子数据集中,L3D-PLS优于传统的CoMFA方法。这一结果突出表明,L3D-PLS可以用于基于小数据集的先导物优化,这在药物发现过程中通常是正确的。
{"title":"An improved 3D quantitative structure-activity relationships (QSAR) of molecules with CNN-based partial least squares model","authors":"Xuxiang Huo ,&nbsp;Jun Xu ,&nbsp;Mingyuan Xu ,&nbsp;Hongming Chen","doi":"10.1016/j.ailsci.2023.100065","DOIUrl":"10.1016/j.ailsci.2023.100065","url":null,"abstract":"<div><p>Ligand-based virtual screening plays an important role for cases in which protein structures are not available. Among ligand-based methods, accurate and fast prediction of protein-ligand binding affinity is crucial for reducing computational cost and exploring the chemical search space efficiently. Here we proposed a CNN-based method, termed as L3D-PLS for building the quantitative structure-activity relationships without target structures. In L3D-PLS, a CNN module was designed for extracting the key interaction features from the grids around aligned ligands, and a partial least square (PLS) model fits the binding affinity with the extracted features of the pre-trained CNN module. In 30 publicly available pre-aligned molecular datasets, L3D-PLS outperformed the traditional CoMFA method. This results highlight that L3D-PLS can be useful for lead optimization based on small datasets which is often true in drug discovery compaign.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100065"},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46036629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combining molecular and cell painting image data for mechanism of action prediction 结合分子和细胞绘画图像数据进行作用机理预测
Pub Date : 2023-02-17 DOI: 10.1016/j.ailsci.2023.100060
Guangyan Tian , Philip J Harrison , Akshai P Sreenivasan , Jordi Carreras-Puigvert , Ola Spjuth

The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction.

化合物的作用机制(MoA)描述了其产生药理作用的生物相互作用。多种数据源可用于预测MoA,包括化合物结构信息和各种测定,例如基于细胞形态、转录组学和代谢组学的测定。在本研究中,我们探讨了将Morgan指纹形式的结构信息和五通道细胞绘画图像数据形式的形态信息相结合的好处和潜在的相加/协同效应。对于一组10个代表性很好的MoA类,我们比较了分别在两个数据集上训练的深度学习模型与同时在这两个数据集中训练的模型的性能。在一个保留的测试集上,当仅在结构数据上训练时,我们获得了0.58的宏观平均F1分数,当仅对图像数据进行训练时,获得了0.81的宏观平均分数,当同时对两者进行训练时获得了0.92的宏观平均分。因此,表明了明显的相加/协同效应,并强调了整合多个数据源进行MoA预测的好处。
{"title":"Combining molecular and cell painting image data for mechanism of action prediction","authors":"Guangyan Tian ,&nbsp;Philip J Harrison ,&nbsp;Akshai P Sreenivasan ,&nbsp;Jordi Carreras-Puigvert ,&nbsp;Ola Spjuth","doi":"10.1016/j.ailsci.2023.100060","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100060","url":null,"abstract":"<div><p>The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100060"},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI4DR: Development and implementation of an annotation system for high-throughput dose-response experiments AI4DR:高通量剂量反应实验注释系统的开发与实现
Pub Date : 2023-02-06 DOI: 10.1016/j.ailsci.2023.100063
Marc Bianciotto , Lionel Colliandre , Kun Mi , Isabelle Schreiber , Cécile Delorme , Stéphanie Vougier , Hervé Minoux

One of the common strategies to identify novel chemical matter in drug discovery consists in performing a High Throughput Screening (HTS). However, the large amount of data generated at the dose-response (DR) step of an HTS campaign requires a careful analysis to detect artifacts and correct erroneous datapoints before validating the experiments. This step which requires to review each DR experiment can be time consuming and prone to human errors or inconsistencies. AI4DR is a system that has been developed for the classification of DR curves based on a Convolutional Neural Network (CNN) acting on normalized images of the DR curves. AI4DR allows the annotation in minutes of thousands of curves among 14 categories to help the High Throughput Screening biologists in their analyses. Several categories are associated with active and inactive compounds, other categories correspond to features of interest such as the presence of noise, a weaker effect at high doses, or a suspiciously weak or strong slope at the inflexion point of the DR curves of actives. The classifier has been trained on an algorithmically generated dataset curated and refined by experts, tested using real screening campaigns and improved using thousands of annotations by experts. The solution is deployed using a MLFlow model server interfaced with the Genedata Screener data analysis software used by the end users. AI4DR improves the consistency, the robustness, and the speed of HTS data analysis as well as reducing the human effort to identify faster new medicines for patients.

在药物发现中识别新化学物质的常见策略之一是进行高通量筛选(HTS)。然而,在HTS活动的剂量-反应(DR)步骤中产生的大量数据需要在验证实验之前进行仔细分析,以检测伪影并纠正错误的数据点。这一步骤需要审查每个DR实验,可能非常耗时,而且容易出现人为错误或不一致。AI4DR是一种基于卷积神经网络(CNN)的DR曲线分类系统,它作用于DR曲线的归一化图像。AI4DR允许在几分钟内注释14个类别中的数千条曲线,以帮助高通量筛选生物学家进行分析。一些类别与活性和非活性化合物有关,其他类别对应于感兴趣的特征,例如噪声的存在,高剂量时较弱的效应,或活性物质DR曲线拐点处可疑的弱或强斜率。分类器已经在由专家策划和改进的算法生成的数据集上进行了训练,使用真实的筛选活动进行了测试,并使用专家的数千个注释进行了改进。该解决方案使用MLFlow模型服务器与最终用户使用的Genedata Screener数据分析软件进行部署。AI4DR提高了HTS数据分析的一致性、稳健性和速度,并减少了为更快地为患者识别新药而付出的人力。
{"title":"AI4DR: Development and implementation of an annotation system for high-throughput dose-response experiments","authors":"Marc Bianciotto ,&nbsp;Lionel Colliandre ,&nbsp;Kun Mi ,&nbsp;Isabelle Schreiber ,&nbsp;Cécile Delorme ,&nbsp;Stéphanie Vougier ,&nbsp;Hervé Minoux","doi":"10.1016/j.ailsci.2023.100063","DOIUrl":"10.1016/j.ailsci.2023.100063","url":null,"abstract":"<div><p>One of the common strategies to identify novel chemical matter in drug discovery consists in performing a High Throughput Screening (HTS). However, the large amount of data generated at the dose-response (DR) step of an HTS campaign requires a careful analysis to detect artifacts and correct erroneous datapoints before validating the experiments. This step which requires to review each DR experiment can be time consuming and prone to human errors or inconsistencies. AI4DR is a system that has been developed for the classification of DR curves based on a Convolutional Neural Network (CNN) acting on normalized images of the DR curves. AI4DR allows the annotation in minutes of thousands of curves among 14 categories to help the High Throughput Screening biologists in their analyses. Several categories are associated with active and inactive compounds, other categories correspond to features of interest such as the presence of noise, a weaker effect at high doses, or a suspiciously weak or strong slope at the inflexion point of the DR curves of actives. The classifier has been trained on an algorithmically generated dataset curated and refined by experts, tested using real screening campaigns and improved using thousands of annotations by experts. The solution is deployed using a MLFlow model server interfaced with the Genedata Screener data analysis software used by the end users. AI4DR improves the consistency, the robustness, and the speed of HTS data analysis as well as reducing the human effort to identify faster new medicines for patients.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100063"},"PeriodicalIF":0.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45852417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring chemical space — Generative models and their evaluation 探索化学空间——生成模型及其评价
Pub Date : 2023-02-04 DOI: 10.1016/j.ailsci.2023.100064
Martin Vogt

Recent advances in the field of artificial intelligence, specifically regarding deep learning methods, have invigorated research into novel ways for the exploration of chemical space. Compared to more traditional methods that rely on chemical fragments and combinatorial recombination deep generative models generate molecules in a non-transparent way that defies easy rationalization. However, this opaque nature also promises to explore uncharted chemical space in novel ways that do not rely on structural similarity directly. These aspects and the complexity of training such models makes model assessment regarding novelty, uniqueness, and distribution of generated molecules a central aspect. This perspective gives an overview of current methodologies for chemical space exploration with an emphasis on deep neural network approaches. Key aspects of generative models include choice of molecular representation, the targeted chemical space, and the methodology for assessing and validating chemical space coverage.

人工智能领域的最新进展,特别是关于深度学习方法的进展,激发了对探索化学空间的新方法的研究。与依赖化学碎片和组合重组的更传统的方法相比,深度生成模型以一种不透明的方式生成分子,无法轻易合理化。然而,这种不透明的性质也有望以新颖的方式探索未知的化学空间,而不直接依赖于结构相似性。这些方面和训练这些模型的复杂性使得关于新颖性、唯一性和生成分子分布的模型评估成为一个中心方面。这一观点概述了当前化学空间探索的方法,重点是深度神经网络方法。生成模型的关键方面包括分子表示的选择,目标化学空间,以及评估和验证化学空间覆盖的方法。
{"title":"Exploring chemical space — Generative models and their evaluation","authors":"Martin Vogt","doi":"10.1016/j.ailsci.2023.100064","DOIUrl":"10.1016/j.ailsci.2023.100064","url":null,"abstract":"<div><p>Recent advances in the field of artificial intelligence, specifically regarding deep learning methods, have invigorated research into novel ways for the exploration of chemical space. Compared to more traditional methods that rely on chemical fragments and combinatorial recombination deep generative models generate molecules in a non-transparent way that defies easy rationalization. However, this opaque nature also promises to explore uncharted chemical space in novel ways that do not rely on structural similarity directly. These aspects and the complexity of training such models makes model assessment regarding novelty, uniqueness, and distribution of generated molecules a central aspect. This perspective gives an overview of current methodologies for chemical space exploration with an emphasis on deep neural network approaches. Key aspects of generative models include choice of molecular representation, the targeted chemical space, and the methodology for assessing and validating chemical space coverage.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100064"},"PeriodicalIF":0.0,"publicationDate":"2023-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48370934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Machine learning for longitudinal mortality risk prediction in patients with malignant neoplasm in São Paulo, Brazil 机器学习用于巴西圣保罗恶性肿瘤患者纵向死亡率风险预测
Pub Date : 2023-02-03 DOI: 10.1016/j.ailsci.2023.100061
GFS Silva , LS Duarte , MM Shirassu , SV Peres , MA de Moraes , A Chiavegatto Filho

Artificial intelligence is becoming an important diagnostic and prognostic tool in recent years, as machine learning algorithms have been shown to improve clinical decision-making. These algorithms will have some of their most important applications in developing regions with restricted data collection, but their performance under this condition is still widely unknown. We analyzed longitudinal data from São Paulo, Brazil, to develop machine learning algorithms to predict the risk of death in patients with cancer. We tested different algorithms using nine separate model structures. Considering the area under the ROC curve (AUC-ROC), we obtained values of 0.946 for the general model, 0.945 for the model with the five main cancers, 0.899 for bronchial and lung cancer, 0.947 for breast cancer, 0.866 for stomach cancer, 0.872 for colon cancer, 0.923 for rectum cancer, 0.955 for prostate cancer, and 0.917 for uterine cervix cancer. Our results indicate the potential of building models for predicting mortality risk in cancer patients in developing regions using only routinely-collected data.

近年来,随着机器学习算法被证明可以改善临床决策,人工智能正在成为一种重要的诊断和预后工具。这些算法将在数据收集受限的发展中地区有一些最重要的应用,但它们在这种情况下的性能仍然广泛未知。我们分析了来自巴西圣保罗的纵向数据,以开发机器学习算法来预测癌症患者的死亡风险。我们使用九种不同的模型结构测试了不同的算法。考虑到ROC曲线下面积(AUC-ROC),一般模型为0.946,五种主要癌症模型为0.945,支气管和肺癌为0.899,乳腺癌为0.947,胃癌为0.866,结肠癌为0.872,直肠癌为0.923,前列腺癌为0.955,宫颈癌为0.917。我们的研究结果表明,仅使用常规收集的数据就可以建立预测发展中地区癌症患者死亡风险的模型。
{"title":"Machine learning for longitudinal mortality risk prediction in patients with malignant neoplasm in São Paulo, Brazil","authors":"GFS Silva ,&nbsp;LS Duarte ,&nbsp;MM Shirassu ,&nbsp;SV Peres ,&nbsp;MA de Moraes ,&nbsp;A Chiavegatto Filho","doi":"10.1016/j.ailsci.2023.100061","DOIUrl":"10.1016/j.ailsci.2023.100061","url":null,"abstract":"<div><p>Artificial intelligence is becoming an important diagnostic and prognostic tool in recent years, as machine learning algorithms have been shown to improve clinical decision-making. These algorithms will have some of their most important applications in developing regions with restricted data collection, but their performance under this condition is still widely unknown. We analyzed longitudinal data from São Paulo, Brazil, to develop machine learning algorithms to predict the risk of death in patients with cancer. We tested different algorithms using nine separate model structures. Considering the area under the ROC curve (AUC-ROC), we obtained values of 0.946 for the general model, 0.945 for the model with the five main cancers, 0.899 for bronchial and lung cancer, 0.947 for breast cancer, 0.866 for stomach cancer, 0.872 for colon cancer, 0.923 for rectum cancer, 0.955 for prostate cancer, and 0.917 for uterine cervix cancer. Our results indicate the potential of building models for predicting mortality risk in cancer patients in developing regions using only routinely-collected data.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100061"},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49483214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep graph learning in molecular docking: Advances and opportunities 分子对接中的深度图学习:进展与机遇
Pub Date : 2023-02-03 DOI: 10.1016/j.ailsci.2023.100062
Norberto Sánchez-Cruz

One of the main computational tools for structure-based drug discovery is molecular docking. Due to the natural representation of molecules as graphs (a set of nodes/atoms connected through edges/bonds), Deep Graph Learning has been successfully applied for multiple tasks on this area. This work presents an overview of Deep Graph Learning methods developed within this research field, as well as opportunities for future development.

分子对接是基于结构的药物发现的主要计算工具之一。由于分子作为图的自然表示(通过边/键连接的一组节点/原子),深度图学习已经成功地应用于该领域的多个任务。这项工作概述了在该研究领域中开发的深度图学习方法,以及未来发展的机会。
{"title":"Deep graph learning in molecular docking: Advances and opportunities","authors":"Norberto Sánchez-Cruz","doi":"10.1016/j.ailsci.2023.100062","DOIUrl":"10.1016/j.ailsci.2023.100062","url":null,"abstract":"<div><p>One of the main computational tools for structure-based drug discovery is molecular docking. Due to the natural representation of molecules as graphs (a set of nodes/atoms connected through edges/bonds), Deep Graph Learning has been successfully applied for multiple tasks on this area. This work presents an overview of Deep Graph Learning methods developed within this research field, as well as opportunities for future development.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100062"},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48198299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using ontologies for life science text-based resource organization 本体论在生命科学中的应用基于文本的资源组织
Pub Date : 2023-01-27 DOI: 10.1016/j.ailsci.2023.100059
Giulia Panzarella , Pierangelo Veltri , Stefano Alcaro

Ontologies are used to support access to a multitude of databases that cover domains relevant information. Heterogeneity and different semantics can be accessed by using structured texts and descriptions in a hierarchical concept definition. We are interested in Life Sciences (LS) related ontologies including components taken from molecular biology, bioinformatics, physics, chemistry, medicine and other related areas. An Ontology comprises: (i) term connections, (ii) the identification of core concepts, (iii) data management, (iv) knowledge classification and integration to collect key information. An ontology may be very useful in navigating through LS terms. This paper explores some available biomedical ontologies and frameworks. It describes the most common ontology development environments (ODE): Protégé, Topbraid Composer, Ontostudio, Fluent Editor, VocBench, Swoop and Obo-edit, to create ontologies from textual scientific resources for LS plans. It also compares ontology methodologies in terms of Usability, Scalability, Stability, Integration, Documentation and Originality.

本体用于支持对涵盖领域相关信息的大量数据库的访问。通过在分层概念定义中使用结构化文本和描述,可以访问异构性和不同的语义。我们对生命科学(LS)相关的本体感兴趣,包括来自分子生物学,生物信息学,物理学,化学,医学和其他相关领域的组件。本体包括:(i)术语连接,(ii)核心概念的识别,(iii)数据管理,(iv)知识分类和集成以收集关键信息。本体在导航LS术语时可能非常有用。本文探讨了一些现有的生物医学本体和框架。它描述了最常见的本体开发环境(ODE): prot、Topbraid Composer、Ontostudio、Fluent Editor、VocBench、Swoop和Obo-edit,用于从文本科学资源中为LS计划创建本体。本文还从可用性、可扩展性、稳定性、集成、文档化和原创性等方面对本体方法进行了比较。
{"title":"Using ontologies for life science text-based resource organization","authors":"Giulia Panzarella ,&nbsp;Pierangelo Veltri ,&nbsp;Stefano Alcaro","doi":"10.1016/j.ailsci.2023.100059","DOIUrl":"10.1016/j.ailsci.2023.100059","url":null,"abstract":"<div><p>Ontologies are used to support access to a multitude of databases that cover domains relevant information. Heterogeneity and different semantics can be accessed by using structured texts and descriptions in a hierarchical concept definition. We are interested in Life Sciences (LS) related ontologies including components taken from molecular biology, bioinformatics, physics, chemistry, medicine and other related areas. An Ontology comprises: (i) term connections, (ii) the identification of core concepts, (iii) data management, (iv) knowledge classification and integration to collect key information. An ontology may be very useful in navigating through LS terms. This paper explores some available biomedical ontologies and frameworks. It describes the most common ontology development environments (ODE): Protégé, Topbraid Composer, Ontostudio, Fluent Editor, VocBench, Swoop and Obo-edit, to create ontologies from textual scientific resources for LS plans. It also compares ontology methodologies in terms of Usability, Scalability, Stability, Integration, Documentation and Originality.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100059"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42165958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1