首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Bayesian optimization for ternary complex prediction (BOTCP) 基于贝叶斯优化的三元复变预测
Pub Date : 2023-04-19 DOI: 10.1016/j.ailsci.2023.100072
Arjun Rao , Tin M. Tunjic , Michael Brunsteiner , Michael Müller, Hosein Fooladi, Chiara Gasbarri, Noah Weber

Proximity-inducing compounds (PICs) are an emergent drug technology through which a protein of interest (POI), often a drug target, is brought into the vicinity of a second protein which modifies the POI’s function, abundance or localisation, giving rise to a therapeutic effect. One of the best-known examples for such compounds are heterobifunctional molecules known as proteolysis targeting chimeras (PROTACs). PROTACs reduce the abundance of the target protein by establishing proximity to an E3 ligase which labels the protein for degradation via the ubiquitin-proteasomal pathway. Design of PROTACs in silico requires the computational prediction of the ternary complex consisting of POI, PROTAC molecule, and the E3 ligase.

We present a novel machine learning-based method for predicting PROTAC-mediated ternary complex structures using Bayesian optimization. We show how a fitness score combining an estimation of protein-protein interactions with PROTAC conformation energy calculations enables the sample-efficient exploration of candidate structures. Furthermore, our method presents two novel scores for filtering and reranking which take PROTAC stability (Autodock-Vina based PROTAC stability score) and protein interaction restraints (the TCP-AIR score) into account. We evaluate our method using DockQ scores on a number of available ternary complex structures (including previously unevaluated cases) and demonstrate that even with a clustering that requires members to have a high similarity, i.e., with smaller clusters, we can assign high ranks to those clusters that contain poses close to the experimentally determined native structure of the ternary complexes. We also demonstrate the resultant improved yield of near-native poses3 in these clusters.

邻近诱导化合物(PIC)是一种新兴的药物技术,通过该技术,将感兴趣的蛋白质(POI)(通常是药物靶点)引入第二种蛋白质附近,从而改变POI的功能、丰度或定位,从而产生治疗效果。这类化合物最著名的例子之一是被称为蛋白水解靶向嵌合体(PROTACs)的异双功能分子。PROTAC通过建立与E3连接酶的接近度来降低靶蛋白的丰度,该连接酶通过泛素-蛋白酶体途径标记蛋白进行降解。在计算机中设计PROTAC需要对由POI、PROTAC分子和E3连接酶组成的三元复合物进行计算预测。我们提出了一种新的基于机器学习的方法,用于使用贝叶斯优化预测PROTAC介导的三元复杂结构。我们展示了将蛋白质-蛋白质相互作用的估计与PROTAC构象能量计算相结合的适应度得分如何能够有效地探索候选结构。此外,我们的方法提出了两种新的过滤和重新排序分数,其中考虑了PROTAC稳定性(基于Autodock-Vina的PROTAC稳定分数)和蛋白质相互作用限制(TCP-AIR分数)。我们使用DockQ评分对许多可用的三元复杂结构(包括以前未评估的情况)评估了我们的方法,并证明即使使用需要成员具有高度相似性的聚类,即使用较小的聚类,我们可以为那些包含接近实验确定的三元配合物的天然结构的位姿的团簇分配高阶。我们还证明了在这些簇中近本机偏序3的改进产量。
{"title":"Bayesian optimization for ternary complex prediction (BOTCP)","authors":"Arjun Rao ,&nbsp;Tin M. Tunjic ,&nbsp;Michael Brunsteiner ,&nbsp;Michael Müller,&nbsp;Hosein Fooladi,&nbsp;Chiara Gasbarri,&nbsp;Noah Weber","doi":"10.1016/j.ailsci.2023.100072","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100072","url":null,"abstract":"<div><p>Proximity-inducing compounds (PICs) are an emergent drug technology through which a protein of interest (POI), often a drug target, is brought into the vicinity of a second protein which modifies the POI’s function, abundance or localisation, giving rise to a therapeutic effect. One of the best-known examples for such compounds are heterobifunctional molecules known as proteolysis targeting chimeras (PROTACs). PROTACs reduce the abundance of the target protein by establishing proximity to an E3 ligase which labels the protein for degradation via the ubiquitin-proteasomal pathway. Design of PROTACs in silico requires the computational prediction of the ternary complex consisting of POI, PROTAC molecule, and the E3 ligase.</p><p>We present a novel machine learning-based method for predicting PROTAC-mediated ternary complex structures using Bayesian optimization. We show how a fitness score combining an estimation of protein-protein interactions with PROTAC conformation energy calculations enables the sample-efficient exploration of candidate structures. Furthermore, our method presents two novel scores for filtering and reranking which take PROTAC stability (Autodock-Vina based PROTAC stability score) and protein interaction restraints (the TCP-AIR score) into account. We evaluate our method using DockQ scores on a number of available ternary complex structures (including previously unevaluated cases) and demonstrate that even with a clustering that requires members to have a high similarity, i.e., with smaller clusters, we can assign high ranks to those clusters that contain poses close to the experimentally determined native structure of the ternary complexes. We also demonstrate the resultant improved yield of near-native poses<span><sup>3</sup></span> in these clusters.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49775003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing microplate layouts using artificial intelligence 利用人工智能设计微孔板布局
Pub Date : 2023-04-14 DOI: 10.1016/j.ailsci.2023.100073
María Andreína Francisco Rodríguez, Jordi Carreras Puigvert, Ola Spjuth

Microplates are indispensable in large-scale biomedical experiments but the physical location of samples and controls on the microplate can significantly affect the resulting data and quality metric values. We introduce a new method based on constraint programming for designing microplate layouts that reduces unwanted bias and limits the impact of batch effects after error correction and normalisation. We demonstrate that our method applied to dose-response experiments leads to more accurate regression curves and lower errors when estimating IC50/EC50, and for drug screening leads to increased precision, when compared to random layouts. It also reduces the risk of inflated scores from common microplate quality assessment metrics such as Z factor and SSMD. We make our method available via a suite of tools (PLAID) including a reference constraint model, a web application, and Python notebooks to evaluate and compare designs when planning microplate experiments.

微孔板在大规模生物医学实验中是必不可少的,但样品和对照物在微孔板上的物理位置会显著影响所得数据和质量度量值。我们介绍了一种基于约束编程的微板布局设计新方法,该方法减少了不必要的偏差,并限制了纠错和归一化后批次效应的影响。我们证明,与随机布局相比,我们的方法应用于剂量反应实验,在估计IC50/EC50时会产生更准确的回归曲线和更低的误差,而药物筛选则会提高精度。它还降低了常见微板质量评估指标(如Z’因子和SSMD)分数膨胀的风险。我们通过一套工具(PLAID)提供了我们的方法,包括参考约束模型、网络应用程序和Python笔记本,以在规划微板实验时评估和比较设计。
{"title":"Designing microplate layouts using artificial intelligence","authors":"María Andreína Francisco Rodríguez,&nbsp;Jordi Carreras Puigvert,&nbsp;Ola Spjuth","doi":"10.1016/j.ailsci.2023.100073","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100073","url":null,"abstract":"<div><p>Microplates are indispensable in large-scale biomedical experiments but the physical location of samples and controls on the microplate can significantly affect the resulting data and quality metric values. We introduce a new method based on constraint programming for designing microplate layouts that reduces unwanted bias and limits the impact of batch effects after error correction and normalisation. We demonstrate that our method applied to dose-response experiments leads to more accurate regression curves and lower errors when estimating <span><math><msub><mtext>IC</mtext><mn>50</mn></msub></math></span>/<span><math><msub><mtext>EC</mtext><mn>50</mn></msub></math></span>, and for drug screening leads to increased precision, when compared to random layouts. It also reduces the risk of inflated scores from common microplate quality assessment metrics such as <span><math><msup><mi>Z</mi><mo>′</mo></msup></math></span> factor and SSMD. We make our method available via a suite of tools (PLAID) including a reference constraint model, a web application, and Python notebooks to evaluate and compare designs when planning microplate experiments.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep metric learning for the classification of MALDI-TOF spectral signatures from multiple species of neotropical disease vectors 多种新热带病媒MALDI-TOF谱特征分类的深度度量学习
Pub Date : 2023-04-06 DOI: 10.1016/j.ailsci.2023.100071
Fernando Merchan , Kenji Contreras , Rolando A. Gittens , Jose R. Loaiza , Javier E. Sanchez-Galan

Deep Learning techniques have significant advantages for mass spectral classification, such as parallelized signal correction and feature extraction. Deep Metric Learning models combine Metric Learning to determine the degree of similarity or difference between a set of mass spectra with the generalization power of Deep Learning to improve feature extraction even further. The two most popular of these models combine multiple neural networks with identical architectures and are commonly called Siamese (SNN) and Triplet Neural Networks (TNN). Herein, using both SNNs and TNNs, we intended to taxonomically categorize two sets of previously-validated mass spectra that corresponded to 30 species of Neotropical arthropods in the Culicidae and Ixodidae families, some of which are disease vectors. The effectiveness of SNNs and TNNs to correctly classify 826 spectra from 12 mosquito species and 310 spectra from 18 species of hard ticks was highly effective, with both algorithms performing with minimal average loss during cross-validation. SNNs produced accuracy rates for ticks and mosquitoes of 91.22% and 94.46%, respectively, while accuracy rates of 93% and 99% were obtained with TNNs. Our results indicate that Deep Metric Learning is a practical machine learning tool for quickly and precisely classifying MALDI-TOF-generated mass spectra of Neotropical and public-health-relevant arthropod species.

深度学习技术在质谱分类中具有显著的优势,如并行信号校正和特征提取。深度度量学习模型将度量学习与深度学习的泛化能力相结合,以确定一组质谱之间的相似或差异程度,从而进一步改进特征提取。其中最流行的两种模型将具有相同架构的多个神经网络组合在一起,通常称为Siamese (SNN)和Triplet neural networks (TNN)。本文利用snn和tnn对库蚊科和伊蚊科30种新热带节肢动物的两组经验证的质谱进行了分类,其中一些是病媒动物。snn和tnn对12种蚊子的826种光谱和18种硬蜱的310种光谱的正确分类效果非常好,交叉验证时两种算法的平均损失都很小。snn对蜱和蚊的准确率分别为91.22%和94.46%,tnn对蜱和蚊的准确率分别为93%和99%。我们的结果表明,深度度量学习是一种实用的机器学习工具,可以快速准确地对maldi - tof生成的新热带和公共卫生相关节肢动物物种的质谱进行分类。
{"title":"Deep metric learning for the classification of MALDI-TOF spectral signatures from multiple species of neotropical disease vectors","authors":"Fernando Merchan ,&nbsp;Kenji Contreras ,&nbsp;Rolando A. Gittens ,&nbsp;Jose R. Loaiza ,&nbsp;Javier E. Sanchez-Galan","doi":"10.1016/j.ailsci.2023.100071","DOIUrl":"10.1016/j.ailsci.2023.100071","url":null,"abstract":"<div><p>Deep Learning techniques have significant advantages for mass spectral classification, such as parallelized signal correction and feature extraction. Deep Metric Learning models combine Metric Learning to determine the degree of similarity or difference between a set of mass spectra with the generalization power of Deep Learning to improve feature extraction even further. The two most popular of these models combine multiple neural networks with identical architectures and are commonly called Siamese (SNN) and Triplet Neural Networks (TNN). Herein, using both SNNs and TNNs, we intended to taxonomically categorize two sets of previously-validated mass spectra that corresponded to 30 species of Neotropical arthropods in the Culicidae and Ixodidae families, some of which are disease vectors. The effectiveness of SNNs and TNNs to correctly classify 826 spectra from 12 mosquito species and 310 spectra from 18 species of hard ticks was highly effective, with both algorithms performing with minimal average loss during cross-validation. SNNs produced accuracy rates for ticks and mosquitoes of 91.22% and 94.46%, respectively, while accuracy rates of 93% and 99% were obtained with TNNs. Our results indicate that Deep Metric Learning is a practical machine learning tool for quickly and precisely classifying MALDI-TOF-generated mass spectra of Neotropical and public-health-relevant arthropod species.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41748999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Conformal efficiency as a metric for comparative model assessment befitting federated learning 适形效率作为适合联邦学习的比较模型评估的度量
Pub Date : 2023-04-01 DOI: 10.1016/j.ailsci.2023.100070
Wouter Heyndrickx , Adam Arany , Jaak Simm , Anastasia Pentina , Noé Sturm , Lina Humbeck , Lewis Mervin , Adam Zalewski , Martijn Oldenhof , Peter Schmidtke , Lukas Friedrich , Regis Loeb , Arina Afanasyeva , Ansgar Schuffenhauer , Yves Moreau , Hugo Ceulemans

In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models.

在药物研发环境中,制药公司拥有大量但保密的数据集。MELLODDY项目开发了一种保护隐私的联邦机器学习解决方案,并以前所未有的规模进行了部署。每个合作伙伴都为自己的私人分析建立了模型,这些模型受益于共享的表示。已建立的预测性能指标(如AUC ROC或AUC PR)仅限于未见标记的化学空间,无法衡量未标记的化学空间中的性能增益。联邦学习间接地扩展了标记空间,但是在保护隐私的上下文中,合作伙伴不能使用这个标签扩展进行性能评估。即使在没有已知标签的情况下,也可以计算出估计预测不确定性的度量。实际上,化学空间覆盖着超过不确定性阈值的预测,反映了模型的适用范围。在建立了与已建立的性能指标的联系之后,我们提出了共形预测框架的效率(“共形效率”)作为适用领域大小的代理。适用性领域的文档化扩展将符合联邦学习的实际好处。在中期评估中,MELLODDY合作伙伴报告联合的适形效率中位数比单一合作伙伴模型提高了5.5%(最高可达9.7%)。根据分布条件,效率的提高可以直接解释为保形预测(即低不确定性预测)的预期增加。总之,我们提出了第一个迹象,表明来自十个制药合作伙伴的大规模药物发现数据集的隐私保护联合机器学习确实扩展了属性预测模型的适用范围。
{"title":"Conformal efficiency as a metric for comparative model assessment befitting federated learning","authors":"Wouter Heyndrickx ,&nbsp;Adam Arany ,&nbsp;Jaak Simm ,&nbsp;Anastasia Pentina ,&nbsp;Noé Sturm ,&nbsp;Lina Humbeck ,&nbsp;Lewis Mervin ,&nbsp;Adam Zalewski ,&nbsp;Martijn Oldenhof ,&nbsp;Peter Schmidtke ,&nbsp;Lukas Friedrich ,&nbsp;Regis Loeb ,&nbsp;Arina Afanasyeva ,&nbsp;Ansgar Schuffenhauer ,&nbsp;Yves Moreau ,&nbsp;Hugo Ceulemans","doi":"10.1016/j.ailsci.2023.100070","DOIUrl":"10.1016/j.ailsci.2023.100070","url":null,"abstract":"<div><p>In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42954871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pharmaceutical patent landscaping: A novel approach to understand patents from the drug discovery perspective 药物专利景观:一种从药物发现角度理解专利的新方法
Pub Date : 2023-03-31 DOI: 10.1016/j.ailsci.2023.100069
Yojana Gadiya , Philip Gribbon , Martin Hofmann-Apitius , Andrea Zaliani

Patents play a crucial role in the drug discovery process by providing legal protection for discoveries and incentivising investments in research and development. By identifying patterns within patent data resources, researchers can gain insight into the market trends and priorities of the pharmaceutical and biotechnology industries, as well as provide additional perspectives on more fundamental aspects such as the emergence of potential new drug targets. In this paper, we used the patent enrichment tool, PEMT, to extract, integrate, and analyse patent literature for rare diseases (RD) and Alzheimer's disease (AD). This is followed by a systematic review of the underlying patent landscape to decipher trends and applications in patents for these diseases. To do so, we discuss prominent organisations involved in drug discovery research in AD and RD. This allows us to gain an understanding of the importance of AD and RD from specific organisational (pharmaceutical or university) perspectives. Next, we analyse the historical focus of patents in relation to individual therapeutic targets and correlate them with market scenarios allowing the identification of prominent targets for a disease. Lastly, we identified drug repurposing activities within the two diseases with the help of patents. This resulted in identifying existing repurposed drugs and novel potential therapeutic approaches applicable to the indication areas. The study demonstrates the expanded applicability of patent documents from legal to drug discovery, design, and research, thus, providing a valuable resource for future drug discovery efforts. Moreover, this study is an attempt towards understanding the importance of data underlying patent documents and raising the need for preparing the data for machine learning-based applications.

专利通过为发现提供法律保护和激励研发投资,在药物发现过程中发挥着至关重要的作用。通过识别专利数据资源中的模式,研究人员可以深入了解制药和生物技术行业的市场趋势和优先事项,并对潜在新药靶点的出现等更基本的方面提供更多的视角。在本文中,我们使用专利富集工具PEMT来提取、整合和分析罕见病(RD)和阿尔茨海默病(AD)的专利文献。接下来是对潜在专利前景的系统审查,以解读这些疾病专利的趋势和应用。为此,我们讨论了参与AD和RD药物发现研究的知名组织。这使我们能够从特定的组织(制药或大学)角度了解AD和RD的重要性。接下来,我们分析了专利与个体治疗靶点相关的历史焦点,并将其与市场情景相关联,从而确定疾病的突出靶点。最后,我们在专利的帮助下确定了这两种疾病中的药物再利用活动。这导致确定了适用于适应症领域的现有再利用药物和新的潜在治疗方法。该研究表明,专利文件的适用性从法律扩展到药物发现、设计和研究,从而为未来的药物发现工作提供了宝贵的资源。此外,这项研究试图理解专利文件中数据的重要性,并提出为基于机器学习的应用准备数据的必要性。
{"title":"Pharmaceutical patent landscaping: A novel approach to understand patents from the drug discovery perspective","authors":"Yojana Gadiya ,&nbsp;Philip Gribbon ,&nbsp;Martin Hofmann-Apitius ,&nbsp;Andrea Zaliani","doi":"10.1016/j.ailsci.2023.100069","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100069","url":null,"abstract":"<div><p>Patents play a crucial role in the drug discovery process by providing legal protection for discoveries and incentivising investments in research and development. By identifying patterns within patent data resources, researchers can gain insight into the market trends and priorities of the pharmaceutical and biotechnology industries, as well as provide additional perspectives on more fundamental aspects such as the emergence of potential new drug targets. In this paper, we used the patent enrichment tool, PEMT, to extract, integrate, and analyse patent literature for rare diseases (RD) and Alzheimer's disease (AD). This is followed by a systematic review of the underlying patent landscape to decipher trends and applications in patents for these diseases. To do so, we discuss prominent organisations involved in drug discovery research in AD and RD. This allows us to gain an understanding of the importance of AD and RD from specific organisational (pharmaceutical or university) perspectives. Next, we analyse the historical focus of patents in relation to individual therapeutic targets and correlate them with market scenarios allowing the identification of prominent targets for a disease. Lastly, we identified drug repurposing activities within the two diseases with the help of patents. This resulted in identifying existing repurposed drugs and novel potential therapeutic approaches applicable to the indication areas. The study demonstrates the expanded applicability of patent documents from legal to drug discovery, design, and research, thus, providing a valuable resource for future drug discovery efforts. Moreover, this study is an attempt towards understanding the importance of data underlying patent documents and raising the need for preparing the data for machine learning-based applications.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elucidating dynamic cell lineages and gene networks in time-course single cell differentiation 阐明单细胞分化过程中的动态细胞系和基因网络
Pub Date : 2023-03-25 DOI: 10.1016/j.ailsci.2023.100068
Mengrui Zhang , Yongkai Chen , Dingyi Yu , Wenxuan Zhong , Jingyi Zhang , Ping Ma

Single cell RNA sequencing (scRNA-seq) technologies provide researchers with an unprecedented opportunity to exploit cell heterogeneity. For example, the sequenced cells belong to various cell lineages, which may have different cell fates in stem and progenitor cells. Those cells may differentiate into various mature cell types in a cell differentiation process. To trace the behavior of cell differentiation, researchers reconstruct cell lineages and predict cell fates by ordering cells chronologically into a trajectory with a pseudo-time. However, in scRNA-seq experiments, there are no cell-to-cell correspondences along with the time to reconstruct the cell lineages, which creates a significant challenge for cell lineage tracing and cell fate prediction. Therefore, methods that can accurately reconstruct the dynamic cell lineages and predict cell fates are highly desirable.

In this article, we develop an innovative machine-learning framework called Cell Smoothing Transformation (CellST) to elucidate the dynamic cell fate paths and construct gene networks in cell differentiation processes. Unlike the existing methods that construct one single bulk cell trajectory, CellST builds cell trajectories and tracks behaviors for each individual cell. Additionally, CellST can predict cell fates even for less frequent cell types. Based on the individual cell fate trajectories, CellST can further construct dynamic gene networks to model gene-gene relationships along the cell differentiation process and discover critical genes that potentially regulate cells into various mature cell types.

单细胞RNA测序(scRNA-seq)技术为研究人员利用细胞异质性提供了前所未有的机会。例如,测序的细胞属于不同的细胞谱系,在干细胞和祖细胞中可能具有不同的细胞命运。这些细胞可以在细胞分化过程中分化为各种成熟细胞类型。为了追踪细胞分化的行为,研究人员重建细胞谱系,并通过将细胞按时间顺序排列成具有伪时间的轨迹来预测细胞命运。然而,在scRNA-seq实验中,随着重建细胞谱系的时间,没有细胞与细胞的对应关系,这给细胞谱系追踪和细胞命运预测带来了重大挑战。因此,能够准确重建动态细胞谱系并预测细胞命运的方法是非常理想的。在这篇文章中,我们开发了一个名为细胞平滑转化(CellST)的创新机器学习框架,以阐明细胞分化过程中的动态细胞命运路径并构建基因网络。与构建单个大块细胞轨迹的现有方法不同,CellST构建细胞轨迹并跟踪每个单个细胞的行为。此外,CellST甚至可以预测频率较低的细胞类型的细胞命运。基于单个细胞的命运轨迹,CellST可以进一步构建动态基因网络,以模拟细胞分化过程中的基因-基因关系,并发现可能将细胞调节为各种成熟细胞类型的关键基因。
{"title":"Elucidating dynamic cell lineages and gene networks in time-course single cell differentiation","authors":"Mengrui Zhang ,&nbsp;Yongkai Chen ,&nbsp;Dingyi Yu ,&nbsp;Wenxuan Zhong ,&nbsp;Jingyi Zhang ,&nbsp;Ping Ma","doi":"10.1016/j.ailsci.2023.100068","DOIUrl":"10.1016/j.ailsci.2023.100068","url":null,"abstract":"<div><p>Single cell RNA sequencing (scRNA-seq) technologies provide researchers with an unprecedented opportunity to exploit cell heterogeneity. For example, the sequenced cells belong to various cell lineages, which may have different cell fates in stem and progenitor cells. Those cells may differentiate into various mature cell types in a cell differentiation process. To trace the behavior of cell differentiation, researchers reconstruct cell lineages and predict cell fates by ordering cells chronologically into a trajectory with a pseudo-time. However, in scRNA-seq experiments, there are no cell-to-cell correspondences along with the time to reconstruct the cell lineages, which creates a significant challenge for cell lineage tracing and cell fate prediction. Therefore, methods that can accurately reconstruct the dynamic cell lineages and predict cell fates are highly desirable.</p><p>In this article, we develop an innovative machine-learning framework called Cell Smoothing Transformation (CellST) to elucidate the dynamic cell fate paths and construct gene networks in cell differentiation processes. Unlike the existing methods that construct one single bulk cell trajectory, CellST builds cell trajectories and tracks behaviors for each individual cell. Additionally, CellST can predict cell fates even for less frequent cell types. Based on the individual cell fate trajectories, CellST can further construct dynamic gene networks to model gene-gene relationships along the cell differentiation process and discover critical genes that potentially regulate cells into various mature cell types.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9800573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data science and data analytics in life science research 生命科学研究中的数据科学和数据分析
Pub Date : 2023-02-27 DOI: 10.1016/j.ailsci.2023.100067
Jürgen Bajorath
{"title":"Data science and data analytics in life science research","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2023.100067","DOIUrl":"10.1016/j.ailsci.2023.100067","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43783253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Natural products subsets: Generation and characterization 天然产物子集:生成和表征
Pub Date : 2023-02-26 DOI: 10.1016/j.ailsci.2023.100066
Ana L. Chávez-Hernández, José L. Medina-Franco

Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp3 carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in de novo design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at https://github.com/DIFACQUIM/Natural-products-subsets-generation. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.

天然产物具有独特的化学结构,如大量的sp3碳原子、手性中心(这两个特征都与结构复杂性有关)、大型化学支架和功能基团的多样性,因此对药物发现应用具有吸引力。此外,天然产品被用于从头设计,并激发了使用生成模型的伪天然产品的发展。公共数据库,如开放天然产物集和通用天然产物数据库(UNPD)是生成模型和其他应用中使用的结构的丰富来源。在这项工作中,我们报告了使用MaxMin算法从UNPD中选择和表征最多样化的天然产物化合物。由14,994、7,497和4,998个化合物生成的子集可在https://github.com/DIFACQUIM/Natural-products-subsets-generation上公开获得。我们预计,这些子集将在研究小组建立基于自然产物的生成模型时特别有用,特别是那些无法获得大量超级计算机资源的研究小组。
{"title":"Natural products subsets: Generation and characterization","authors":"Ana L. Chávez-Hernández,&nbsp;José L. Medina-Franco","doi":"10.1016/j.ailsci.2023.100066","DOIUrl":"10.1016/j.ailsci.2023.100066","url":null,"abstract":"<div><p>Natural products are attractive for drug discovery applications because of their distinctive chemical structures, such as an overall large fraction of sp<sup>3</sup> carbon atoms, chiral centers (both features associated with structural complexity), large chemical scaffolds, and diversity of functional groups. Furthermore, natural products are used in <em>de novo</em> design and have inspired the development of pseudo-natural products using generative models. Public databases such as the Collection of Open NatUral ProdUcTs and the Universal Natural Product database (UNPD) are rich sources of structures to be used in generative models and other applications. In this work, we report the selection and characterization of the most diverse compounds of natural products from the UNPD using the MaxMin algorithm. The subsets generated with 14,994, 7,497, and 4,998 compounds are publicly available at <span>https://github.com/DIFACQUIM/Natural-products-subsets-generation</span><svg><path></path></svg>. We anticipate that the subsets will be particularly useful in building generative models based on natural products by research groups, particularly those with limited access to extensive supercomputer resources.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43292936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An improved 3D quantitative structure-activity relationships (QSAR) of molecules with CNN-based partial least squares model 基于CNN的偏最小二乘模型改进分子三维定量构效关系
Pub Date : 2023-02-24 DOI: 10.1016/j.ailsci.2023.100065
Xuxiang Huo , Jun Xu , Mingyuan Xu , Hongming Chen

Ligand-based virtual screening plays an important role for cases in which protein structures are not available. Among ligand-based methods, accurate and fast prediction of protein-ligand binding affinity is crucial for reducing computational cost and exploring the chemical search space efficiently. Here we proposed a CNN-based method, termed as L3D-PLS for building the quantitative structure-activity relationships without target structures. In L3D-PLS, a CNN module was designed for extracting the key interaction features from the grids around aligned ligands, and a partial least square (PLS) model fits the binding affinity with the extracted features of the pre-trained CNN module. In 30 publicly available pre-aligned molecular datasets, L3D-PLS outperformed the traditional CoMFA method. This results highlight that L3D-PLS can be useful for lead optimization based on small datasets which is often true in drug discovery compaign.

基于配体的虚拟筛选在蛋白质结构不可用的情况下起着重要作用。在基于配体的方法中,准确、快速地预测蛋白质与配体的结合亲和力对于降低计算成本和有效地探索化学搜索空间至关重要。在这里,我们提出了一种基于cnn的方法,称为L3D-PLS,用于在没有目标结构的情况下建立定量的构效关系。在L3D-PLS中,设计了一个CNN模块,用于从对齐配体周围的网格中提取关键的相互作用特征,并使用偏最小二乘(PLS)模型将其与预训练CNN模块提取的特征进行拟合。在30个公开的预对齐分子数据集中,L3D-PLS优于传统的CoMFA方法。这一结果突出表明,L3D-PLS可以用于基于小数据集的先导物优化,这在药物发现过程中通常是正确的。
{"title":"An improved 3D quantitative structure-activity relationships (QSAR) of molecules with CNN-based partial least squares model","authors":"Xuxiang Huo ,&nbsp;Jun Xu ,&nbsp;Mingyuan Xu ,&nbsp;Hongming Chen","doi":"10.1016/j.ailsci.2023.100065","DOIUrl":"10.1016/j.ailsci.2023.100065","url":null,"abstract":"<div><p>Ligand-based virtual screening plays an important role for cases in which protein structures are not available. Among ligand-based methods, accurate and fast prediction of protein-ligand binding affinity is crucial for reducing computational cost and exploring the chemical search space efficiently. Here we proposed a CNN-based method, termed as L3D-PLS for building the quantitative structure-activity relationships without target structures. In L3D-PLS, a CNN module was designed for extracting the key interaction features from the grids around aligned ligands, and a partial least square (PLS) model fits the binding affinity with the extracted features of the pre-trained CNN module. In 30 publicly available pre-aligned molecular datasets, L3D-PLS outperformed the traditional CoMFA method. This results highlight that L3D-PLS can be useful for lead optimization based on small datasets which is often true in drug discovery compaign.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46036629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combining molecular and cell painting image data for mechanism of action prediction 结合分子和细胞绘画图像数据进行作用机理预测
Pub Date : 2023-02-17 DOI: 10.1016/j.ailsci.2023.100060
Guangyan Tian , Philip J Harrison , Akshai P Sreenivasan , Jordi Carreras-Puigvert , Ola Spjuth

The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction.

化合物的作用机制(MoA)描述了其产生药理作用的生物相互作用。多种数据源可用于预测MoA,包括化合物结构信息和各种测定,例如基于细胞形态、转录组学和代谢组学的测定。在本研究中,我们探讨了将Morgan指纹形式的结构信息和五通道细胞绘画图像数据形式的形态信息相结合的好处和潜在的相加/协同效应。对于一组10个代表性很好的MoA类,我们比较了分别在两个数据集上训练的深度学习模型与同时在这两个数据集中训练的模型的性能。在一个保留的测试集上,当仅在结构数据上训练时,我们获得了0.58的宏观平均F1分数,当仅对图像数据进行训练时,获得了0.81的宏观平均分数,当同时对两者进行训练时获得了0.92的宏观平均分。因此,表明了明显的相加/协同效应,并强调了整合多个数据源进行MoA预测的好处。
{"title":"Combining molecular and cell painting image data for mechanism of action prediction","authors":"Guangyan Tian ,&nbsp;Philip J Harrison ,&nbsp;Akshai P Sreenivasan ,&nbsp;Jordi Carreras-Puigvert ,&nbsp;Ola Spjuth","doi":"10.1016/j.ailsci.2023.100060","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100060","url":null,"abstract":"<div><p>The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1