首页 > 最新文献

arXiv - QuanBio - Quantitative Methods最新文献

英文 中文
Efficient Approximate Methods for Design of Experiments for Copolymer Engineering 共聚物工程实验设计的高效近似方法
Pub Date : 2024-08-04 DOI: arxiv-2408.02166
Swagatam Mukhopadhyay
We develop a set of algorithms to solve a broad class of Design of Experiment(DoE) problems efficiently. Specifically, we consider problems in which onemust choose a subset of polymers to test in experiments such that the learningof the polymeric design rules is optimal. This subset must be selected from alarger set of polymers permissible under arbitrary experimental designconstraints. We demonstrate the performance of our algorithms by solvingseveral pragmatic nucleic acid therapeutics engineering scenarios, wherelimitations in synthesis of chemically diverse nucleic acids or feasibility ofmeasurements in experimental setups appear as constraints. Our approach focuseson identifying optimal experimental designs from a given set of experiments,which is in contrast to traditional, generative DoE methods like BIBD. Finally,we discuss how these algorithms are broadly applicable to well-establishedoptimal DoE criteria like D-optimality.
我们开发了一套算法,可以高效地解决一大类实验设计(DoE)问题。具体来说,我们考虑的问题是,我们必须选择一个聚合物子集进行实验测试,从而使聚合物设计规则的学习达到最优。这个子集必须从任意实验设计约束条件下允许使用的更大聚合物集合中选出。我们通过求解各种实用的核酸治疗工程方案来证明我们算法的性能,在这些方案中,化学多样性核酸合成的限制或实验装置测量的可行性都是制约因素。我们的方法侧重于从一组给定的实验中确定最佳实验设计,这与传统的生成式 DoE 方法(如 BIBD)截然不同。最后,我们讨论了这些算法如何广泛适用于成熟的最优 DoE 标准(如 D-最优性)。
{"title":"Efficient Approximate Methods for Design of Experiments for Copolymer Engineering","authors":"Swagatam Mukhopadhyay","doi":"arxiv-2408.02166","DOIUrl":"https://doi.org/arxiv-2408.02166","url":null,"abstract":"We develop a set of algorithms to solve a broad class of Design of Experiment\u0000(DoE) problems efficiently. Specifically, we consider problems in which one\u0000must choose a subset of polymers to test in experiments such that the learning\u0000of the polymeric design rules is optimal. This subset must be selected from a\u0000larger set of polymers permissible under arbitrary experimental design\u0000constraints. We demonstrate the performance of our algorithms by solving\u0000several pragmatic nucleic acid therapeutics engineering scenarios, where\u0000limitations in synthesis of chemically diverse nucleic acids or feasibility of\u0000measurements in experimental setups appear as constraints. Our approach focuses\u0000on identifying optimal experimental designs from a given set of experiments,\u0000which is in contrast to traditional, generative DoE methods like BIBD. Finally,\u0000we discuss how these algorithms are broadly applicable to well-established\u0000optimal DoE criteria like D-optimality.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance MALADE:利用检索增强生成技术协调 LLM 驱动的药物警戒代理
Pub Date : 2024-08-03 DOI: arxiv-2408.01869
Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page
In the era of Large Language Models (LLMs), given their remarkable textunderstanding and generation abilities, there is an unprecedented opportunityto develop new, LLM-based methods for trustworthy medical knowledge synthesis,extraction and summarization. This paper focuses on the problem ofPharmacovigilance (PhV), where the significance and challenges lie inidentifying Adverse Drug Events (ADEs) from diverse text sources, such asmedical literature, clinical notes, and drug labels. Unfortunately, this taskis hindered by factors including variations in the terminologies of drugs andoutcomes, and ADE descriptions often being buried in large amounts of narrativetext. We present MALADE, the first effective collaborative multi-agent systempowered by LLM with Retrieval Augmented Generation for ADE extraction from druglabel data. This technique involves augmenting a query to an LLM with relevantinformation extracted from text resources, and instructing the LLM to compose aresponse consistent with the augmented data. MALADE is a general LLM-agnosticarchitecture, and its unique capabilities are: (1) leveraging a variety ofexternal sources, such as medical literature, drug labels, and FDA tools (e.g.,OpenFDA drug information API), (2) extracting drug-outcome association in astructured format along with the strength of the association, and (3) providingexplanations for established associations. Instantiated with GPT-4 Turbo orGPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an AreaUnder ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Ourimplementation leverages the Langroid multi-agent LLM framework and can befound at https://github.com/jihyechoi77/malade.
在大语言模型(LLM)时代,由于其卓越的文本理解和生成能力,为开发基于 LLM 的新方法来合成、提取和总结值得信赖的医学知识提供了前所未有的机遇。本文的重点是药物警戒(PhV)问题,其意义和挑战在于从不同的文本来源(如医学文献、临床笔记和药物标签)中识别药物不良事件(ADEs)。遗憾的是,这项任务受到各种因素的阻碍,包括药物和结果术语的差异,以及 ADE 描述经常被埋没在大量的叙述性文本中。我们介绍了 MALADE,这是第一个由 LLM 支持的有效协作多代理系统,它采用了检索增强生成技术,用于从药物标签数据中提取 ADE。这项技术包括用从文本资源中提取的相关信息来增强 LLM 的查询,并指示 LLM 根据增强的数据做出响应。MALADE 是一种与 LLM 无关的通用架构,其独特的功能包括(1) 利用各种外部资源,如医学文献、药物标签和 FDA 工具(如 OpenFDA 药物信息 API);(2) 以结构化格式提取药物-结果关联以及关联强度;(3) 为已建立的关联提供解释。MALADE 使用 GPT-4 Turbo 或 GPT-4o 和 FDA 药物标签数据进行实例化,与 ADEs 的 OMOP 地面实况表相比,MALADE 的 ROC 曲线下面积达到 0.90,证明了其有效性。我们的实现利用了 Langroid 多代理 LLM 框架,可在 https://github.com/jihyechoi77/malade 上找到。
{"title":"MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance","authors":"Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page","doi":"arxiv-2408.01869","DOIUrl":"https://doi.org/arxiv-2408.01869","url":null,"abstract":"In the era of Large Language Models (LLMs), given their remarkable text\u0000understanding and generation abilities, there is an unprecedented opportunity\u0000to develop new, LLM-based methods for trustworthy medical knowledge synthesis,\u0000extraction and summarization. This paper focuses on the problem of\u0000Pharmacovigilance (PhV), where the significance and challenges lie in\u0000identifying Adverse Drug Events (ADEs) from diverse text sources, such as\u0000medical literature, clinical notes, and drug labels. Unfortunately, this task\u0000is hindered by factors including variations in the terminologies of drugs and\u0000outcomes, and ADE descriptions often being buried in large amounts of narrative\u0000text. We present MALADE, the first effective collaborative multi-agent system\u0000powered by LLM with Retrieval Augmented Generation for ADE extraction from drug\u0000label data. This technique involves augmenting a query to an LLM with relevant\u0000information extracted from text resources, and instructing the LLM to compose a\u0000response consistent with the augmented data. MALADE is a general LLM-agnostic\u0000architecture, and its unique capabilities are: (1) leveraging a variety of\u0000external sources, such as medical literature, drug labels, and FDA tools (e.g.,\u0000OpenFDA drug information API), (2) extracting drug-outcome association in a\u0000structured format along with the strength of the association, and (3) providing\u0000explanations for established associations. Instantiated with GPT-4 Turbo or\u0000GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area\u0000Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our\u0000implementation leverages the Langroid multi-agent LLM framework and can be\u0000found at https://github.com/jihyechoi77/malade.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging GraphAge:释放图神经网络的力量,解码表观遗传衰老
Pub Date : 2024-08-02 DOI: arxiv-2408.00984
Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman
DNA methylation is a crucial epigenetic marker used in various clocks topredict epigenetic age. However, many existing clocks fail to account forcrucial information about CpG sites and their interrelationships, such asco-methylation patterns. We present a novel approach to represent methylationdata as a graph, using methylation values and relevant information about CpGsites as nodes, and relationships like co-methylation, same gene, and samechromosome as edges. We then use a Graph Neural Network (GNN) to predict age.Thus our model, GraphAge, leverages both structural and positional informationfor prediction as well as better interpretation. Although we had to train in aconstrained compute setting, GraphAge still showed competitive performance witha Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277,slightly outperforming the current state of the art. Perhaps more importantly,we utilized GNN explainer for interpretation purposes and were able to unearthinteresting insights (e.g., key CpG sites, pathways, and their relationshipsthrough Methylation Regulated Networks in the context of aging), which were notpossible to 'decode' without leveraging the unique capability of GraphAge to'encode' various structural relationships. GraphAge has the potential toconsume and utilize all relevant information (if available) about an individualthat relates to the complex process of aging. So, in that sense, it is one ofits kind and can be seen as the first benchmark for a multimodal model that canincorporate all this information in order to close the gap in our understandingof the true nature of aging.
DNA 甲基化是一种重要的表观遗传标记,被各种时钟用来预测表观遗传年龄。然而,现有的许多时钟都没有考虑到 CpG 位点及其相互关系的重要信息,如共同甲基化模式。我们提出了一种将甲基化数据表示为图的新方法,将甲基化值和 CpG 位点的相关信息作为节点,将共甲基化、同一基因和同一染色体等关系作为边。因此,我们的模型 GraphAge 既能利用结构信息和位置信息进行预测,又能更好地进行解释。虽然我们必须在有限制的计算环境中进行训练,但 GraphAge 仍然表现出了极具竞争力的性能,其平均绝对误差(MAE)为 3.207,平均平方误差(MSE)为 25.277,略高于目前的技术水平。也许更重要的是,我们利用 GNN 解释器进行了解释,并挖掘出了有趣的见解(例如,关键 CpG 位点、通路及其在衰老背景下通过甲基化调控网络的关系),如果不利用 GraphAge 的独特能力来 "编码 "各种结构关系,就无法 "解码 "这些见解。GraphAge 有可能收集和利用与复杂的衰老过程有关的所有相关信息(如果有的话)。因此,从这个意义上说,它是独一无二的,可以被视为多模态模型的第一个基准,该模型可以整合所有这些信息,以缩小我们对衰老真实本质的理解差距。
{"title":"GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging","authors":"Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman","doi":"arxiv-2408.00984","DOIUrl":"https://doi.org/arxiv-2408.00984","url":null,"abstract":"DNA methylation is a crucial epigenetic marker used in various clocks to\u0000predict epigenetic age. However, many existing clocks fail to account for\u0000crucial information about CpG sites and their interrelationships, such as\u0000co-methylation patterns. We present a novel approach to represent methylation\u0000data as a graph, using methylation values and relevant information about CpG\u0000sites as nodes, and relationships like co-methylation, same gene, and same\u0000chromosome as edges. We then use a Graph Neural Network (GNN) to predict age.\u0000Thus our model, GraphAge, leverages both structural and positional information\u0000for prediction as well as better interpretation. Although we had to train in a\u0000constrained compute setting, GraphAge still showed competitive performance with\u0000a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277,\u0000slightly outperforming the current state of the art. Perhaps more importantly,\u0000we utilized GNN explainer for interpretation purposes and were able to unearth\u0000interesting insights (e.g., key CpG sites, pathways, and their relationships\u0000through Methylation Regulated Networks in the context of aging), which were not\u0000possible to 'decode' without leveraging the unique capability of GraphAge to\u0000'encode' various structural relationships. GraphAge has the potential to\u0000consume and utilize all relevant information (if available) about an individual\u0000that relates to the complex process of aging. So, in that sense, it is one of\u0000its kind and can be seen as the first benchmark for a multimodal model that can\u0000incorporate all this information in order to close the gap in our understanding\u0000of the true nature of aging.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora UMMAN:基于肠道菌群的疾病预测无监督多图合并对抗网络
Pub Date : 2024-07-31 DOI: arxiv-2407.21714
Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu
The abundance of intestinal flora is closely related to human diseases, butdiseases are not caused by a single gut microbe. Instead, they result from thecomplex interplay of numerous microbial entities. This intricate and implicitconnection among gut microbes poses a significant challenge for diseaseprediction using abundance information from OTU data. Recently, several methodshave shown potential in predicting corresponding diseases. However, thesemethods fail to learn the inner association among gut microbes from differenthosts, leading to unsatisfactory performance. In this paper, we present a novelarchitecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMANcan obtain the embeddings of nodes in the Multi-Graph in an unsupervisedscenario, so that it helps learn the multiplex association. Our method is thefirst to combine Graph Neural Network with the task of intestinal flora diseaseprediction. We employ complex relation-types to construct the Original-Graphand disrupt the relationships among nodes to generate correspondingShuffled-Graph. We introduce the Node Feature Global Integration (NFGI) moduleto represent the global features of the graph. Furthermore, we design a jointloss comprising adversarial loss and hybrid attention loss to ensure that thereal graph embedding aligns closely with the Original-Graph and diverges fromthe Shuffled-Graph. Comprehensive experiments on five classical OTU gutmicrobiome datasets demonstrate the effectiveness and stability of our method.(We will release our code soon.)
肠道菌群的丰富程度与人类疾病密切相关,但疾病并非由单一的肠道微生物引起。相反,它们是由众多微生物实体的复杂相互作用造成的。肠道微生物之间这种错综复杂的隐性联系给利用 OTU 数据中的丰度信息进行疾病预测带来了巨大挑战。最近,有几种方法显示出预测相应疾病的潜力。然而,这些方法无法学习来自不同宿主的肠道微生物之间的内在联系,导致效果不尽如人意。本文提出了一种新型架构--无监督多图合并对抗网络(UMMAN)。UMMAN 可以在无监督的情况下获得多图中节点的嵌入,从而帮助学习多图关联。我们的方法首次将图神经网络与肠道菌群疾病预测任务相结合。我们采用复杂的关系类型来构建原始图,并破坏节点之间的关系来生成相应的修剪图。我们引入节点特征全局集成(NFGI)模块来表示图的全局特征。此外,我们还设计了一种由对抗损失和混合注意力损失组成的联合损失,以确保最终的图嵌入与原始图紧密一致,而与洗牌图相去甚远。五个经典 OTU 肠道微生物组数据集的综合实验证明了我们方法的有效性和稳定性。
{"title":"UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora","authors":"Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu","doi":"arxiv-2407.21714","DOIUrl":"https://doi.org/arxiv-2407.21714","url":null,"abstract":"The abundance of intestinal flora is closely related to human diseases, but\u0000diseases are not caused by a single gut microbe. Instead, they result from the\u0000complex interplay of numerous microbial entities. This intricate and implicit\u0000connection among gut microbes poses a significant challenge for disease\u0000prediction using abundance information from OTU data. Recently, several methods\u0000have shown potential in predicting corresponding diseases. However, these\u0000methods fail to learn the inner association among gut microbes from different\u0000hosts, leading to unsatisfactory performance. In this paper, we present a novel\u0000architecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMAN\u0000can obtain the embeddings of nodes in the Multi-Graph in an unsupervised\u0000scenario, so that it helps learn the multiplex association. Our method is the\u0000first to combine Graph Neural Network with the task of intestinal flora disease\u0000prediction. We employ complex relation-types to construct the Original-Graph\u0000and disrupt the relationships among nodes to generate corresponding\u0000Shuffled-Graph. We introduce the Node Feature Global Integration (NFGI) module\u0000to represent the global features of the graph. Furthermore, we design a joint\u0000loss comprising adversarial loss and hybrid attention loss to ensure that the\u0000real graph embedding aligns closely with the Original-Graph and diverges from\u0000the Shuffled-Graph. Comprehensive experiments on five classical OTU gut\u0000microbiome datasets demonstrate the effectiveness and stability of our method.\u0000(We will release our code soon.)","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cooperative SIR dynamics as a model for spontaneous blood clot initiation 作为自发血凝块形成模型的合作 SIR 动力学
Pub Date : 2024-07-31 DOI: arxiv-2408.00039
Philip Greulich
Blood clotting is an important physiological process to suppress bleedingupon injury, but when it occurs inadvertently, it can cause thrombosis, whichcan lead to life threatening conditions. Hence, understanding the microscopicmechanistic factors for inadvertent, spontaneous blood clotting, in absence ofa vessel breach, can help in predicting and adverting such conditions. Here, wepresent a minimal model -- reminiscent of the SIR model -- for the initiatingstage of spontaneous blood clotting, the collective activation of bloodplatelets. This model predicts that in the presence of very small initialactivation signals, macroscopic activation of the platelet population requiresa sufficient degree of heterogeneity of platelet sensitivity. To propagate theactivation signal and achieve collective activation of the bulk plateletpopulation, it requires the presence of, possibly only few, hyper-sensitiveplatelets, but also a sufficient proportion of platelets with intermediate, yethigher-than-average sensitivity. A comparison with experimental resultsdemonstrates a qualitative agreement for high platelet signalling activity.
凝血是抑制受伤后出血的重要生理过程,但如果不慎凝血,则可能导致血栓形成,从而危及生命。因此,在没有血管破裂的情况下,了解不经意间自发凝血的微观机制因素有助于预测和预防此类情况的发生。在此,我们提出了自发性血液凝结的初始阶段--血小板的集体激活--的最小模型(类似于 SIR 模型)。该模型预测,在初始激活信号很小的情况下,血小板群的宏观激活需要血小板敏感性有足够的异质性。要传播活化信号并实现大量血小板群的集体活化,可能需要有少数超敏感血小板,但也需要有足够比例的具有中等或高于平均敏感度的血小板。与实验结果的比较表明,两者在高血小板信号活性的定性上是一致的。
{"title":"Cooperative SIR dynamics as a model for spontaneous blood clot initiation","authors":"Philip Greulich","doi":"arxiv-2408.00039","DOIUrl":"https://doi.org/arxiv-2408.00039","url":null,"abstract":"Blood clotting is an important physiological process to suppress bleeding\u0000upon injury, but when it occurs inadvertently, it can cause thrombosis, which\u0000can lead to life threatening conditions. Hence, understanding the microscopic\u0000mechanistic factors for inadvertent, spontaneous blood clotting, in absence of\u0000a vessel breach, can help in predicting and adverting such conditions. Here, we\u0000present a minimal model -- reminiscent of the SIR model -- for the initiating\u0000stage of spontaneous blood clotting, the collective activation of blood\u0000platelets. This model predicts that in the presence of very small initial\u0000activation signals, macroscopic activation of the platelet population requires\u0000a sufficient degree of heterogeneity of platelet sensitivity. To propagate the\u0000activation signal and achieve collective activation of the bulk platelet\u0000population, it requires the presence of, possibly only few, hyper-sensitive\u0000platelets, but also a sufficient proportion of platelets with intermediate, yet\u0000higher-than-average sensitivity. A comparison with experimental results\u0000demonstrates a qualitative agreement for high platelet signalling activity.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141887005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distribution Learning for Molecular Regression 分子回归的分布学习
Pub Date : 2024-07-30 DOI: arxiv-2407.20475
Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das
Using "soft" targets to improve model performance has been shown to beeffective in classification settings, but the usage of soft targets forregression is a much less studied topic in machine learning. The existingliterature on the usage of soft targets for regression fails to properly assessthe method's limitations, and empirical evaluation is quite limited. In thiswork, we assess the strengths and drawbacks of existing methods when applied tomolecular property regression tasks. Our assessment outlines key biases presentin existing methods and proposes methods to address them, evaluated throughcareful ablation studies. We leverage these insights to propose DistributionalMixture of Experts (DMoE): A model-independent, and data-independent method forregression which trains a model to predict probability distributions of itstargets. Our proposed loss function combines the cross entropy betweenpredicted and target distributions and the L1 distance between their expectedvalues to produce a loss function that is robust to the outlined biases. Weevaluate the performance of DMoE on different molecular property predictiondatasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbonemodel architectures -- SchNet, GemNet, and Graphormer. Our results demonstratethat the proposed method is a promising alternative to classical regression formolecular property prediction tasks, showing improvements over baselines on alldatasets and architectures.
在分类设置中使用 "软 "目标来提高模型性能已被证明是有效的,但在机器学习中使用软目标进行回归却是一个研究较少的课题。关于使用软目标进行回归的现有文献未能正确评估该方法的局限性,经验评估也相当有限。在这项工作中,我们评估了现有方法在应用于分子性质回归任务时的优缺点。我们的评估概述了现有方法中存在的主要偏差,并提出了解决这些偏差的方法,通过仔细的消融研究对这些偏差进行了评估。我们利用这些见解提出了分布式专家混合物(DMoE):这是一种独立于模型、独立于数据的回归方法,可训练模型来预测目标的概率分布。我们提出的损失函数结合了预测分布和目标分布之间的交叉熵以及它们的期望值之间的 L1 距离,从而产生了一个对概述偏差具有鲁棒性的损失函数。我们评估了 DMoE 在不同分子性质预测数据集(Open Catalyst (OC20)、MD17 和 QM9)上的性能,以及在不同骨架模型架构(SchNet、GemNet 和 Graphormer)上的性能。我们的研究结果表明,所提出的方法是经典回归法分子性质预测任务的一个很有前途的替代方法,在所有数据集和架构上都比基线方法有所改进。
{"title":"Distribution Learning for Molecular Regression","authors":"Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das","doi":"arxiv-2407.20475","DOIUrl":"https://doi.org/arxiv-2407.20475","url":null,"abstract":"Using \"soft\" targets to improve model performance has been shown to be\u0000effective in classification settings, but the usage of soft targets for\u0000regression is a much less studied topic in machine learning. The existing\u0000literature on the usage of soft targets for regression fails to properly assess\u0000the method's limitations, and empirical evaluation is quite limited. In this\u0000work, we assess the strengths and drawbacks of existing methods when applied to\u0000molecular property regression tasks. Our assessment outlines key biases present\u0000in existing methods and proposes methods to address them, evaluated through\u0000careful ablation studies. We leverage these insights to propose Distributional\u0000Mixture of Experts (DMoE): A model-independent, and data-independent method for\u0000regression which trains a model to predict probability distributions of its\u0000targets. Our proposed loss function combines the cross entropy between\u0000predicted and target distributions and the L1 distance between their expected\u0000values to produce a loss function that is robust to the outlined biases. We\u0000evaluate the performance of DMoE on different molecular property prediction\u0000datasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbone\u0000model architectures -- SchNet, GemNet, and Graphormer. Our results demonstrate\u0000that the proposed method is a promising alternative to classical regression for\u0000molecular property prediction tasks, showing improvements over baselines on all\u0000datasets and architectures.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plant and insect proteins support optimal bone growth and development; Evidences from a pre-clinical model 植物和昆虫蛋白质支持骨骼的最佳生长和发育;临床前模型的证据
Pub Date : 2024-07-30 DOI: arxiv-2407.21087
Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan
By 2050, the global population will exceed 9 billion, demanding a 70%increase in food production. Animal proteins alone may not suffice andcontribute to global warming. Alternative proteins such as legumes, algae, andinsects are being explored, but their health impacts are largely unknown. Forthis, three-week-old rats were fed diets containing 20% protein from varioussources for six weeks. A casein-based control diet was compared to soy isolate,spirulina powder, chickpea isolate, chickpea flour, and fly larvae powder.Except for spirulina, alternative protein groups showed comparable growthpatterns to the casein group. Morphological and mechanical tests of femur bonesmatched growth patterns. Caecal 16S analysis highlighted the impact on gutmicrobiota diversity. Chickpea flour showed significantly lower$alpha$-diversity compared with casein and chickpea isolate groups whilechickpea flour, had the greatest distinction in $beta$-diversity. Alternativeprotein sources supported optimal growth, but quality and health implicationsrequire further exploration.
到 2050 年,全球人口将超过 90 亿,需要增加 70% 的粮食产量。仅靠动物蛋白可能无法满足需求,还会导致全球变暖。人们正在探索豆类、藻类和昆虫等替代蛋白质,但它们对健康的影响在很大程度上还不得而知。为此,研究人员连续六周给三周大的老鼠喂食含有 20% 不同来源蛋白质的食物。除螺旋藻外,其他蛋白质组的生长模式与酪蛋白组相当。股骨的形态学和力学测试与生长模式相匹配。粪便 16S 分析强调了对肠道微生物群多样性的影响。与酪蛋白组和鹰嘴豆分离组相比,鹰嘴豆粉的α-多样性明显较低,而鹰嘴豆粉的β-多样性差异最大。替代蛋白质来源支持最佳生长,但对质量和健康的影响需要进一步探讨。
{"title":"Plant and insect proteins support optimal bone growth and development; Evidences from a pre-clinical model","authors":"Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan","doi":"arxiv-2407.21087","DOIUrl":"https://doi.org/arxiv-2407.21087","url":null,"abstract":"By 2050, the global population will exceed 9 billion, demanding a 70%\u0000increase in food production. Animal proteins alone may not suffice and\u0000contribute to global warming. Alternative proteins such as legumes, algae, and\u0000insects are being explored, but their health impacts are largely unknown. For\u0000this, three-week-old rats were fed diets containing 20% protein from various\u0000sources for six weeks. A casein-based control diet was compared to soy isolate,\u0000spirulina powder, chickpea isolate, chickpea flour, and fly larvae powder.\u0000Except for spirulina, alternative protein groups showed comparable growth\u0000patterns to the casein group. Morphological and mechanical tests of femur bones\u0000matched growth patterns. Caecal 16S analysis highlighted the impact on gut\u0000microbiota diversity. Chickpea flour showed significantly lower\u0000$alpha$-diversity compared with casein and chickpea isolate groups while\u0000chickpea flour, had the greatest distinction in $beta$-diversity. Alternative\u0000protein sources supported optimal growth, but quality and health implications\u0000require further exploration.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patterns in soil organic carbon dynamics: integrating microbial activity, chemotaxis and data-driven approaches 土壤有机碳动态模式:综合微生物活动、趋化性和数据驱动方法
Pub Date : 2024-07-30 DOI: arxiv-2407.20625
Angela Monti, Fasma Diele, Deborah Lacitignola, Carmela Marangi
Models of soil organic carbon (SOC) frequently overlook the effects ofspatial dimensions and microbiological activities. In this paper, we focus ontwo reaction-diffusion chemotaxis models for SOC dynamics, both supportingchemotaxis-driven instability and exhibiting a variety of spatial patterns asstripes, spots and hexagons when the microbial chemotactic sensitivity is abovea critical threshold. We use symplectic techniques to numerically approximatechemotaxis-driven spatial patterns and explore the effectiveness of thepiecewice dynamic mode decomposition (pDMD) to reconstruct them. Our findingsshow that pDMD is effective at precisely recreating chemotaxis-driven spatialpatterns, therefore broadening the range of application of the method toclasses of solutions different than Turing patterns. By validating its efficacyacross a wider range of models, this research lays the groundwork for applyingpDMD to experimental spatiotemporal data, advancing predictions crucial forsoil microbial ecology and agricultural sustainability.
土壤有机碳(SOC)模型经常忽略空间维度和微生物活动的影响。在本文中,我们重点研究了两种用于 SOC 动力学的反应扩散趋化模型,这两种模型都支持趋化驱动的不稳定性,并在微生物趋化灵敏度高于临界阈值时表现出多种空间模式,如条状、点状和六边形。我们利用交折射技术对趋化驱动的空间模式进行了数值近似,并探索了片断动态模式分解(pDMD)重建空间模式的有效性。我们的研究结果表明,pDMD 能够有效地精确再现趋化驱动的空间模式,从而拓宽了该方法的应用范围,使其适用于图灵模式以外的各类解决方案。通过在更广泛的模型中验证其有效性,这项研究为将 pDMD 应用于实验性时空数据奠定了基础,从而推进了对土壤微生物生态学和农业可持续发展至关重要的预测。
{"title":"Patterns in soil organic carbon dynamics: integrating microbial activity, chemotaxis and data-driven approaches","authors":"Angela Monti, Fasma Diele, Deborah Lacitignola, Carmela Marangi","doi":"arxiv-2407.20625","DOIUrl":"https://doi.org/arxiv-2407.20625","url":null,"abstract":"Models of soil organic carbon (SOC) frequently overlook the effects of\u0000spatial dimensions and microbiological activities. In this paper, we focus on\u0000two reaction-diffusion chemotaxis models for SOC dynamics, both supporting\u0000chemotaxis-driven instability and exhibiting a variety of spatial patterns as\u0000stripes, spots and hexagons when the microbial chemotactic sensitivity is above\u0000a critical threshold. We use symplectic techniques to numerically approximate\u0000chemotaxis-driven spatial patterns and explore the effectiveness of the\u0000piecewice dynamic mode decomposition (pDMD) to reconstruct them. Our findings\u0000show that pDMD is effective at precisely recreating chemotaxis-driven spatial\u0000patterns, therefore broadening the range of application of the method to\u0000classes of solutions different than Turing patterns. By validating its efficacy\u0000across a wider range of models, this research lays the groundwork for applying\u0000pDMD to experimental spatiotemporal data, advancing predictions crucial for\u0000soil microbial ecology and agricultural sustainability.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Residual based Method for Molecular Property Prediction 基于图形残差的分子特性预测方法
Pub Date : 2024-07-27 DOI: arxiv-2408.03342
Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar
Property prediction of materials has recently been of high interest in therecent years in the field of material science. Various Physics-based andMachine Learning models have already been developed, that can give goodresults. However, they are not accurate enough and are inadequate for criticalapplications. The traditional machine learning models try to predict propertiesbased on the features extracted from the molecules, which are not easilyavailable most of the time. In this paper, a recently developed novel DeepLearning method, the Graph Neural Network (GNN), has been applied, allowing usto predict properties directly only the Graph-based structures of themolecules. SMILES (Simplified Molecular Input Line Entry System) representationof the molecules has been used in the present study as input data format, whichhas been further converted into a graph database, which constitutes thetraining data. This article highlights the detailed description of the novelGRU-based methodology to map the inputs that have been used. Emphasis onhighlighting both the regressive property as well as the classification-basedproperty of the GNN backbone. A detailed description of the VariationalAutoencoder (VAE) and the end-to-end learning method has been given tohighlight the multi-class multi-label property prediction of the backbone. Theresults have been compared with standard benchmark datasets as well as somenewly developed datasets. All performance metrics which have been used havebeen clearly defined as well as their reason for choice. Keywords: GNN, VAE,SMILES, multi-label multi-class classification, GRU
近年来,材料性能预测在材料科学领域受到高度关注。各种基于物理学和机器学习的模型已经被开发出来,并能给出很好的结果。然而,这些模型不够精确,不足以满足关键应用的需要。传统的机器学习模型试图根据从分子中提取的特征来预测特性,而这些特征在大多数情况下并不容易获得。本文应用了最近开发的一种新型深度学习方法--图神经网络(GNN),它允许我们仅根据分子的图式结构直接预测性质。本研究将分子的 SMILES(简化分子输入行输入系统)表示法用作输入数据格式,并将其进一步转换成图数据库,构成训练数据。本文重点详细描述了绘制所使用的输入数据的基于GRU 的新方法。重点强调了 GNN 主干网的回归特性和基于分类的特性。详细介绍了变异自动编码器(VAE)和端到端学习方法,以突出骨干网的多类别多标签属性预测。结果已与标准基准数据集和一些新开发的数据集进行了比较。使用的所有性能指标都有明确定义及其选择理由。关键词GNN、VAE、SMILES、多标签多类分类、GRU
{"title":"Graph Residual based Method for Molecular Property Prediction","authors":"Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar","doi":"arxiv-2408.03342","DOIUrl":"https://doi.org/arxiv-2408.03342","url":null,"abstract":"Property prediction of materials has recently been of high interest in the\u0000recent years in the field of material science. Various Physics-based and\u0000Machine Learning models have already been developed, that can give good\u0000results. However, they are not accurate enough and are inadequate for critical\u0000applications. The traditional machine learning models try to predict properties\u0000based on the features extracted from the molecules, which are not easily\u0000available most of the time. In this paper, a recently developed novel Deep\u0000Learning method, the Graph Neural Network (GNN), has been applied, allowing us\u0000to predict properties directly only the Graph-based structures of the\u0000molecules. SMILES (Simplified Molecular Input Line Entry System) representation\u0000of the molecules has been used in the present study as input data format, which\u0000has been further converted into a graph database, which constitutes the\u0000training data. This article highlights the detailed description of the novel\u0000GRU-based methodology to map the inputs that have been used. Emphasis on\u0000highlighting both the regressive property as well as the classification-based\u0000property of the GNN backbone. A detailed description of the Variational\u0000Autoencoder (VAE) and the end-to-end learning method has been given to\u0000highlight the multi-class multi-label property prediction of the backbone. The\u0000results have been compared with standard benchmark datasets as well as some\u0000newly developed datasets. All performance metrics which have been used have\u0000been clearly defined as well as their reason for choice. Keywords: GNN, VAE,\u0000SMILES, multi-label multi-class classification, GRU","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting T-Cell Receptor Specificity 预测 T 细胞受体的特异性
Pub Date : 2024-07-27 DOI: arxiv-2407.19349
Tengyao Tu, Wei Zeng, Kun Zhao, Zhenyu Zhang
Researching the specificity of TCR contributes to the development ofimmunotherapy and provides new opportunities and strategies for personalizedcancer immunotherapy. Therefore, we established a TCR generative specificitydetection framework consisting of an antigen selector and a TCR classifierbased on the Random Forest algorithm, aiming to efficiently screen out TCRs andtarget antigens and achieve TCR specificity prediction. Furthermore, we usedthe k-fold validation method to compare the performance of our model withordinary deep learning methods. The result proves that adding a classifier tothe model based on the random forest algorithm is very effective, and our modelgenerally outperforms ordinary deep learning methods. Moreover, we put forwardfeasible optimization suggestions for the shortcomings and challenges of ourmodel found during model implementation.
研究TCR的特异性有助于免疫疗法的发展,并为个性化癌症免疫疗法提供新的机遇和策略。因此,我们建立了一个由抗原选择器和基于随机森林算法的TCR分类器组成的TCR生成特异性检测框架,旨在有效筛选出TCR和目标抗原,实现TCR特异性预测。此外,我们还使用 k-fold 验证法比较了我们的模型与普通深度学习方法的性能。结果证明,在基于随机森林算法的模型中加入分类器是非常有效的,我们的模型总体上优于普通的深度学习方法。此外,我们还针对模型实现过程中发现的不足和面临的挑战提出了可行的优化建议。
{"title":"Predicting T-Cell Receptor Specificity","authors":"Tengyao Tu, Wei Zeng, Kun Zhao, Zhenyu Zhang","doi":"arxiv-2407.19349","DOIUrl":"https://doi.org/arxiv-2407.19349","url":null,"abstract":"Researching the specificity of TCR contributes to the development of\u0000immunotherapy and provides new opportunities and strategies for personalized\u0000cancer immunotherapy. Therefore, we established a TCR generative specificity\u0000detection framework consisting of an antigen selector and a TCR classifier\u0000based on the Random Forest algorithm, aiming to efficiently screen out TCRs and\u0000target antigens and achieve TCR specificity prediction. Furthermore, we used\u0000the k-fold validation method to compare the performance of our model with\u0000ordinary deep learning methods. The result proves that adding a classifier to\u0000the model based on the random forest algorithm is very effective, and our model\u0000generally outperforms ordinary deep learning methods. Moreover, we put forward\u0000feasible optimization suggestions for the shortcomings and challenges of our\u0000model found during model implementation.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - QuanBio - Quantitative Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1