We develop a set of algorithms to solve a broad class of Design of Experiment (DoE) problems efficiently. Specifically, we consider problems in which one must choose a subset of polymers to test in experiments such that the learning of the polymeric design rules is optimal. This subset must be selected from a larger set of polymers permissible under arbitrary experimental design constraints. We demonstrate the performance of our algorithms by solving several pragmatic nucleic acid therapeutics engineering scenarios, where limitations in synthesis of chemically diverse nucleic acids or feasibility of measurements in experimental setups appear as constraints. Our approach focuses on identifying optimal experimental designs from a given set of experiments, which is in contrast to traditional, generative DoE methods like BIBD. Finally, we discuss how these algorithms are broadly applicable to well-established optimal DoE criteria like D-optimality.
我们开发了一套算法,可以高效地解决一大类实验设计(DoE)问题。具体来说,我们考虑的问题是,我们必须选择一个聚合物子集进行实验测试,从而使聚合物设计规则的学习达到最优。这个子集必须从任意实验设计约束条件下允许使用的更大聚合物集合中选出。我们通过求解各种实用的核酸治疗工程方案来证明我们算法的性能,在这些方案中,化学多样性核酸合成的限制或实验装置测量的可行性都是制约因素。我们的方法侧重于从一组给定的实验中确定最佳实验设计,这与传统的生成式 DoE 方法(如 BIBD)截然不同。最后,我们讨论了这些算法如何广泛适用于成熟的最优 DoE 标准(如 D-最优性)。
{"title":"Efficient Approximate Methods for Design of Experiments for Copolymer Engineering","authors":"Swagatam Mukhopadhyay","doi":"arxiv-2408.02166","DOIUrl":"https://doi.org/arxiv-2408.02166","url":null,"abstract":"We develop a set of algorithms to solve a broad class of Design of Experiment\u0000(DoE) problems efficiently. Specifically, we consider problems in which one\u0000must choose a subset of polymers to test in experiments such that the learning\u0000of the polymeric design rules is optimal. This subset must be selected from a\u0000larger set of polymers permissible under arbitrary experimental design\u0000constraints. We demonstrate the performance of our algorithms by solving\u0000several pragmatic nucleic acid therapeutics engineering scenarios, where\u0000limitations in synthesis of chemically diverse nucleic acids or feasibility of\u0000measurements in experimental setups appear as constraints. Our approach focuses\u0000on identifying optimal experimental designs from a given set of experiments,\u0000which is in contrast to traditional, generative DoE methods like BIBD. Finally,\u0000we discuss how these algorithms are broadly applicable to well-established\u0000optimal DoE criteria like D-optimality.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page
In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (ADEs) from diverse text sources, such as medical literature, clinical notes, and drug labels. Unfortunately, this task is hindered by factors including variations in the terminologies of drugs and outcomes, and ADE descriptions often being buried in large amounts of narrative text. We present MALADE, the first effective collaborative multi-agent system powered by LLM with Retrieval Augmented Generation for ADE extraction from drug label data. This technique involves augmenting a query to an LLM with relevant information extracted from text resources, and instructing the LLM to compose a response consistent with the augmented data. MALADE is a general LLM-agnostic architecture, and its unique capabilities are: (1) leveraging a variety of external sources, such as medical literature, drug labels, and FDA tools (e.g., OpenFDA drug information API), (2) extracting drug-outcome association in a structured format along with the strength of the association, and (3) providing explanations for established associations. Instantiated with GPT-4 Turbo or GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our implementation leverages the Langroid multi-agent LLM framework and can be found at https://github.com/jihyechoi77/malade.
{"title":"MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance","authors":"Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page","doi":"arxiv-2408.01869","DOIUrl":"https://doi.org/arxiv-2408.01869","url":null,"abstract":"In the era of Large Language Models (LLMs), given their remarkable text\u0000understanding and generation abilities, there is an unprecedented opportunity\u0000to develop new, LLM-based methods for trustworthy medical knowledge synthesis,\u0000extraction and summarization. This paper focuses on the problem of\u0000Pharmacovigilance (PhV), where the significance and challenges lie in\u0000identifying Adverse Drug Events (ADEs) from diverse text sources, such as\u0000medical literature, clinical notes, and drug labels. Unfortunately, this task\u0000is hindered by factors including variations in the terminologies of drugs and\u0000outcomes, and ADE descriptions often being buried in large amounts of narrative\u0000text. We present MALADE, the first effective collaborative multi-agent system\u0000powered by LLM with Retrieval Augmented Generation for ADE extraction from drug\u0000label data. This technique involves augmenting a query to an LLM with relevant\u0000information extracted from text resources, and instructing the LLM to compose a\u0000response consistent with the augmented data. MALADE is a general LLM-agnostic\u0000architecture, and its unique capabilities are: (1) leveraging a variety of\u0000external sources, such as medical literature, drug labels, and FDA tools (e.g.,\u0000OpenFDA drug information API), (2) extracting drug-outcome association in a\u0000structured format along with the strength of the association, and (3) providing\u0000explanations for established associations. Instantiated with GPT-4 Turbo or\u0000GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area\u0000Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our\u0000implementation leverages the Langroid multi-agent LLM framework and can be\u0000found at https://github.com/jihyechoi77/malade.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman
DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and relationships like co-methylation, same gene, and same chromosome as edges. We then use a Graph Neural Network (GNN) to predict age. Thus our model, GraphAge, leverages both structural and positional information for prediction as well as better interpretation. Although we had to train in a constrained compute setting, GraphAge still showed competitive performance with a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277, slightly outperforming the current state of the art. Perhaps more importantly, we utilized GNN explainer for interpretation purposes and were able to unearth interesting insights (e.g., key CpG sites, pathways, and their relationships through Methylation Regulated Networks in the context of aging), which were not possible to 'decode' without leveraging the unique capability of GraphAge to 'encode' various structural relationships. GraphAge has the potential to consume and utilize all relevant information (if available) about an individual that relates to the complex process of aging. So, in that sense, it is one of its kind and can be seen as the first benchmark for a multimodal model that can incorporate all this information in order to close the gap in our understanding of the true nature of aging.
{"title":"GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging","authors":"Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman","doi":"arxiv-2408.00984","DOIUrl":"https://doi.org/arxiv-2408.00984","url":null,"abstract":"DNA methylation is a crucial epigenetic marker used in various clocks to\u0000predict epigenetic age. However, many existing clocks fail to account for\u0000crucial information about CpG sites and their interrelationships, such as\u0000co-methylation patterns. We present a novel approach to represent methylation\u0000data as a graph, using methylation values and relevant information about CpG\u0000sites as nodes, and relationships like co-methylation, same gene, and same\u0000chromosome as edges. We then use a Graph Neural Network (GNN) to predict age.\u0000Thus our model, GraphAge, leverages both structural and positional information\u0000for prediction as well as better interpretation. Although we had to train in a\u0000constrained compute setting, GraphAge still showed competitive performance with\u0000a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277,\u0000slightly outperforming the current state of the art. Perhaps more importantly,\u0000we utilized GNN explainer for interpretation purposes and were able to unearth\u0000interesting insights (e.g., key CpG sites, pathways, and their relationships\u0000through Methylation Regulated Networks in the context of aging), which were not\u0000possible to 'decode' without leveraging the unique capability of GraphAge to\u0000'encode' various structural relationships. GraphAge has the potential to\u0000consume and utilize all relevant information (if available) about an individual\u0000that relates to the complex process of aging. So, in that sense, it is one of\u0000its kind and can be seen as the first benchmark for a multimodal model that can\u0000incorporate all this information in order to close the gap in our understanding\u0000of the true nature of aging.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The abundance of intestinal flora is closely related to human diseases, but diseases are not caused by a single gut microbe. Instead, they result from the complex interplay of numerous microbial entities. This intricate and implicit connection among gut microbes poses a significant challenge for disease prediction using abundance information from OTU data. Recently, several methods have shown potential in predicting corresponding diseases. However, these methods fail to learn the inner association among gut microbes from different hosts, leading to unsatisfactory performance. In this paper, we present a novel architecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMAN can obtain the embeddings of nodes in the Multi-Graph in an unsupervised scenario, so that it helps learn the multiplex association. Our method is the first to combine Graph Neural Network with the task of intestinal flora disease prediction. We employ complex relation-types to construct the Original-Graph and disrupt the relationships among nodes to generate corresponding Shuffled-Graph. We introduce the Node Feature Global Integration (NFGI) module to represent the global features of the graph. Furthermore, we design a joint loss comprising adversarial loss and hybrid attention loss to ensure that the real graph embedding aligns closely with the Original-Graph and diverges from the Shuffled-Graph. Comprehensive experiments on five classical OTU gut microbiome datasets demonstrate the effectiveness and stability of our method. (We will release our code soon.)
肠道菌群的丰富程度与人类疾病密切相关,但疾病并非由单一的肠道微生物引起。相反,它们是由众多微生物实体的复杂相互作用造成的。肠道微生物之间这种错综复杂的隐性联系给利用 OTU 数据中的丰度信息进行疾病预测带来了巨大挑战。最近,有几种方法显示出预测相应疾病的潜力。然而,这些方法无法学习来自不同宿主的肠道微生物之间的内在联系,导致效果不尽如人意。本文提出了一种新型架构--无监督多图合并对抗网络(UMMAN)。UMMAN 可以在无监督的情况下获得多图中节点的嵌入,从而帮助学习多图关联。我们的方法首次将图神经网络与肠道菌群疾病预测任务相结合。我们采用复杂的关系类型来构建原始图,并破坏节点之间的关系来生成相应的修剪图。我们引入节点特征全局集成(NFGI)模块来表示图的全局特征。此外,我们还设计了一种由对抗损失和混合注意力损失组成的联合损失,以确保最终的图嵌入与原始图紧密一致,而与洗牌图相去甚远。五个经典 OTU 肠道微生物组数据集的综合实验证明了我们方法的有效性和稳定性。
{"title":"UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora","authors":"Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu","doi":"arxiv-2407.21714","DOIUrl":"https://doi.org/arxiv-2407.21714","url":null,"abstract":"The abundance of intestinal flora is closely related to human diseases, but\u0000diseases are not caused by a single gut microbe. Instead, they result from the\u0000complex interplay of numerous microbial entities. This intricate and implicit\u0000connection among gut microbes poses a significant challenge for disease\u0000prediction using abundance information from OTU data. Recently, several methods\u0000have shown potential in predicting corresponding diseases. However, these\u0000methods fail to learn the inner association among gut microbes from different\u0000hosts, leading to unsatisfactory performance. In this paper, we present a novel\u0000architecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMAN\u0000can obtain the embeddings of nodes in the Multi-Graph in an unsupervised\u0000scenario, so that it helps learn the multiplex association. Our method is the\u0000first to combine Graph Neural Network with the task of intestinal flora disease\u0000prediction. We employ complex relation-types to construct the Original-Graph\u0000and disrupt the relationships among nodes to generate corresponding\u0000Shuffled-Graph. We introduce the Node Feature Global Integration (NFGI) module\u0000to represent the global features of the graph. Furthermore, we design a joint\u0000loss comprising adversarial loss and hybrid attention loss to ensure that the\u0000real graph embedding aligns closely with the Original-Graph and diverges from\u0000the Shuffled-Graph. Comprehensive experiments on five classical OTU gut\u0000microbiome datasets demonstrate the effectiveness and stability of our method.\u0000(We will release our code soon.)","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blood clotting is an important physiological process to suppress bleeding upon injury, but when it occurs inadvertently, it can cause thrombosis, which can lead to life threatening conditions. Hence, understanding the microscopic mechanistic factors for inadvertent, spontaneous blood clotting, in absence of a vessel breach, can help in predicting and adverting such conditions. Here, we present a minimal model -- reminiscent of the SIR model -- for the initiating stage of spontaneous blood clotting, the collective activation of blood platelets. This model predicts that in the presence of very small initial activation signals, macroscopic activation of the platelet population requires a sufficient degree of heterogeneity of platelet sensitivity. To propagate the activation signal and achieve collective activation of the bulk platelet population, it requires the presence of, possibly only few, hyper-sensitive platelets, but also a sufficient proportion of platelets with intermediate, yet higher-than-average sensitivity. A comparison with experimental results demonstrates a qualitative agreement for high platelet signalling activity.
凝血是抑制受伤后出血的重要生理过程,但如果不慎凝血,则可能导致血栓形成,从而危及生命。因此,在没有血管破裂的情况下,了解不经意间自发凝血的微观机制因素有助于预测和预防此类情况的发生。在此,我们提出了自发性血液凝结的初始阶段--血小板的集体激活--的最小模型(类似于 SIR 模型)。该模型预测,在初始激活信号很小的情况下,血小板群的宏观激活需要血小板敏感性有足够的异质性。要传播活化信号并实现大量血小板群的集体活化,可能需要有少数超敏感血小板,但也需要有足够比例的具有中等或高于平均敏感度的血小板。与实验结果的比较表明,两者在高血小板信号活性的定性上是一致的。
{"title":"Cooperative SIR dynamics as a model for spontaneous blood clot initiation","authors":"Philip Greulich","doi":"arxiv-2408.00039","DOIUrl":"https://doi.org/arxiv-2408.00039","url":null,"abstract":"Blood clotting is an important physiological process to suppress bleeding\u0000upon injury, but when it occurs inadvertently, it can cause thrombosis, which\u0000can lead to life threatening conditions. Hence, understanding the microscopic\u0000mechanistic factors for inadvertent, spontaneous blood clotting, in absence of\u0000a vessel breach, can help in predicting and adverting such conditions. Here, we\u0000present a minimal model -- reminiscent of the SIR model -- for the initiating\u0000stage of spontaneous blood clotting, the collective activation of blood\u0000platelets. This model predicts that in the presence of very small initial\u0000activation signals, macroscopic activation of the platelet population requires\u0000a sufficient degree of heterogeneity of platelet sensitivity. To propagate the\u0000activation signal and achieve collective activation of the bulk platelet\u0000population, it requires the presence of, possibly only few, hyper-sensitive\u0000platelets, but also a sufficient proportion of platelets with intermediate, yet\u0000higher-than-average sensitivity. A comparison with experimental results\u0000demonstrates a qualitative agreement for high platelet signalling activity.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141887005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das
Using "soft" targets to improve model performance has been shown to be effective in classification settings, but the usage of soft targets for regression is a much less studied topic in machine learning. The existing literature on the usage of soft targets for regression fails to properly assess the method's limitations, and empirical evaluation is quite limited. In this work, we assess the strengths and drawbacks of existing methods when applied to molecular property regression tasks. Our assessment outlines key biases present in existing methods and proposes methods to address them, evaluated through careful ablation studies. We leverage these insights to propose Distributional Mixture of Experts (DMoE): A model-independent, and data-independent method for regression which trains a model to predict probability distributions of its targets. Our proposed loss function combines the cross entropy between predicted and target distributions and the L1 distance between their expected values to produce a loss function that is robust to the outlined biases. We evaluate the performance of DMoE on different molecular property prediction datasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbone model architectures -- SchNet, GemNet, and Graphormer. Our results demonstrate that the proposed method is a promising alternative to classical regression for molecular property prediction tasks, showing improvements over baselines on all datasets and architectures.
{"title":"Distribution Learning for Molecular Regression","authors":"Nima Shoghi, Pooya Shoghi, Anuroop Sriram, Abhishek Das","doi":"arxiv-2407.20475","DOIUrl":"https://doi.org/arxiv-2407.20475","url":null,"abstract":"Using \"soft\" targets to improve model performance has been shown to be\u0000effective in classification settings, but the usage of soft targets for\u0000regression is a much less studied topic in machine learning. The existing\u0000literature on the usage of soft targets for regression fails to properly assess\u0000the method's limitations, and empirical evaluation is quite limited. In this\u0000work, we assess the strengths and drawbacks of existing methods when applied to\u0000molecular property regression tasks. Our assessment outlines key biases present\u0000in existing methods and proposes methods to address them, evaluated through\u0000careful ablation studies. We leverage these insights to propose Distributional\u0000Mixture of Experts (DMoE): A model-independent, and data-independent method for\u0000regression which trains a model to predict probability distributions of its\u0000targets. Our proposed loss function combines the cross entropy between\u0000predicted and target distributions and the L1 distance between their expected\u0000values to produce a loss function that is robust to the outlined biases. We\u0000evaluate the performance of DMoE on different molecular property prediction\u0000datasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbone\u0000model architectures -- SchNet, GemNet, and Graphormer. Our results demonstrate\u0000that the proposed method is a promising alternative to classical regression for\u0000molecular property prediction tasks, showing improvements over baselines on all\u0000datasets and architectures.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan
By 2050, the global population will exceed 9 billion, demanding a 70% increase in food production. Animal proteins alone may not suffice and contribute to global warming. Alternative proteins such as legumes, algae, and insects are being explored, but their health impacts are largely unknown. For this, three-week-old rats were fed diets containing 20% protein from various sources for six weeks. A casein-based control diet was compared to soy isolate, spirulina powder, chickpea isolate, chickpea flour, and fly larvae powder. Except for spirulina, alternative protein groups showed comparable growth patterns to the casein group. Morphological and mechanical tests of femur bones matched growth patterns. Caecal 16S analysis highlighted the impact on gut microbiota diversity. Chickpea flour showed significantly lower $alpha$-diversity compared with casein and chickpea isolate groups while chickpea flour, had the greatest distinction in $beta$-diversity. Alternative protein sources supported optimal growth, but quality and health implications require further exploration.
{"title":"Plant and insect proteins support optimal bone growth and development; Evidences from a pre-clinical model","authors":"Gal Becker, Jerome Nicolas Janssen, Rotem Kalev-Altman, Dana Meilich, Astar Shitrit, Svetlana Penn, Ram Reifen, Efrat Monsonego Ornan","doi":"arxiv-2407.21087","DOIUrl":"https://doi.org/arxiv-2407.21087","url":null,"abstract":"By 2050, the global population will exceed 9 billion, demanding a 70%\u0000increase in food production. Animal proteins alone may not suffice and\u0000contribute to global warming. Alternative proteins such as legumes, algae, and\u0000insects are being explored, but their health impacts are largely unknown. For\u0000this, three-week-old rats were fed diets containing 20% protein from various\u0000sources for six weeks. A casein-based control diet was compared to soy isolate,\u0000spirulina powder, chickpea isolate, chickpea flour, and fly larvae powder.\u0000Except for spirulina, alternative protein groups showed comparable growth\u0000patterns to the casein group. Morphological and mechanical tests of femur bones\u0000matched growth patterns. Caecal 16S analysis highlighted the impact on gut\u0000microbiota diversity. Chickpea flour showed significantly lower\u0000$alpha$-diversity compared with casein and chickpea isolate groups while\u0000chickpea flour, had the greatest distinction in $beta$-diversity. Alternative\u0000protein sources supported optimal growth, but quality and health implications\u0000require further exploration.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Models of soil organic carbon (SOC) frequently overlook the effects of spatial dimensions and microbiological activities. In this paper, we focus on two reaction-diffusion chemotaxis models for SOC dynamics, both supporting chemotaxis-driven instability and exhibiting a variety of spatial patterns as stripes, spots and hexagons when the microbial chemotactic sensitivity is above a critical threshold. We use symplectic techniques to numerically approximate chemotaxis-driven spatial patterns and explore the effectiveness of the piecewice dynamic mode decomposition (pDMD) to reconstruct them. Our findings show that pDMD is effective at precisely recreating chemotaxis-driven spatial patterns, therefore broadening the range of application of the method to classes of solutions different than Turing patterns. By validating its efficacy across a wider range of models, this research lays the groundwork for applying pDMD to experimental spatiotemporal data, advancing predictions crucial for soil microbial ecology and agricultural sustainability.
{"title":"Patterns in soil organic carbon dynamics: integrating microbial activity, chemotaxis and data-driven approaches","authors":"Angela Monti, Fasma Diele, Deborah Lacitignola, Carmela Marangi","doi":"arxiv-2407.20625","DOIUrl":"https://doi.org/arxiv-2407.20625","url":null,"abstract":"Models of soil organic carbon (SOC) frequently overlook the effects of\u0000spatial dimensions and microbiological activities. In this paper, we focus on\u0000two reaction-diffusion chemotaxis models for SOC dynamics, both supporting\u0000chemotaxis-driven instability and exhibiting a variety of spatial patterns as\u0000stripes, spots and hexagons when the microbial chemotactic sensitivity is above\u0000a critical threshold. We use symplectic techniques to numerically approximate\u0000chemotaxis-driven spatial patterns and explore the effectiveness of the\u0000piecewice dynamic mode decomposition (pDMD) to reconstruct them. Our findings\u0000show that pDMD is effective at precisely recreating chemotaxis-driven spatial\u0000patterns, therefore broadening the range of application of the method to\u0000classes of solutions different than Turing patterns. By validating its efficacy\u0000across a wider range of models, this research lays the groundwork for applying\u0000pDMD to experimental spatiotemporal data, advancing predictions crucial for\u0000soil microbial ecology and agricultural sustainability.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar
Property prediction of materials has recently been of high interest in the recent years in the field of material science. Various Physics-based and Machine Learning models have already been developed, that can give good results. However, they are not accurate enough and are inadequate for critical applications. The traditional machine learning models try to predict properties based on the features extracted from the molecules, which are not easily available most of the time. In this paper, a recently developed novel Deep Learning method, the Graph Neural Network (GNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, which constitutes the training data. This article highlights the detailed description of the novel GRU-based methodology to map the inputs that have been used. Emphasis on highlighting both the regressive property as well as the classification-based property of the GNN backbone. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method has been given to highlight the multi-class multi-label property prediction of the backbone. The results have been compared with standard benchmark datasets as well as some newly developed datasets. All performance metrics which have been used have been clearly defined as well as their reason for choice. Keywords: GNN, VAE, SMILES, multi-label multi-class classification, GRU
{"title":"Graph Residual based Method for Molecular Property Prediction","authors":"Kanad Sen, Saksham Gupta, Abhishek Raj, Alankar Alankar","doi":"arxiv-2408.03342","DOIUrl":"https://doi.org/arxiv-2408.03342","url":null,"abstract":"Property prediction of materials has recently been of high interest in the\u0000recent years in the field of material science. Various Physics-based and\u0000Machine Learning models have already been developed, that can give good\u0000results. However, they are not accurate enough and are inadequate for critical\u0000applications. The traditional machine learning models try to predict properties\u0000based on the features extracted from the molecules, which are not easily\u0000available most of the time. In this paper, a recently developed novel Deep\u0000Learning method, the Graph Neural Network (GNN), has been applied, allowing us\u0000to predict properties directly only the Graph-based structures of the\u0000molecules. SMILES (Simplified Molecular Input Line Entry System) representation\u0000of the molecules has been used in the present study as input data format, which\u0000has been further converted into a graph database, which constitutes the\u0000training data. This article highlights the detailed description of the novel\u0000GRU-based methodology to map the inputs that have been used. Emphasis on\u0000highlighting both the regressive property as well as the classification-based\u0000property of the GNN backbone. A detailed description of the Variational\u0000Autoencoder (VAE) and the end-to-end learning method has been given to\u0000highlight the multi-class multi-label property prediction of the backbone. The\u0000results have been compared with standard benchmark datasets as well as some\u0000newly developed datasets. All performance metrics which have been used have\u0000been clearly defined as well as their reason for choice. Keywords: GNN, VAE,\u0000SMILES, multi-label multi-class classification, GRU","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Researching the specificity of TCR contributes to the development of immunotherapy and provides new opportunities and strategies for personalized cancer immunotherapy. Therefore, we established a TCR generative specificity detection framework consisting of an antigen selector and a TCR classifier based on the Random Forest algorithm, aiming to efficiently screen out TCRs and target antigens and achieve TCR specificity prediction. Furthermore, we used the k-fold validation method to compare the performance of our model with ordinary deep learning methods. The result proves that adding a classifier to the model based on the random forest algorithm is very effective, and our model generally outperforms ordinary deep learning methods. Moreover, we put forward feasible optimization suggestions for the shortcomings and challenges of our model found during model implementation.
{"title":"Predicting T-Cell Receptor Specificity","authors":"Tengyao Tu, Wei Zeng, Kun Zhao, Zhenyu Zhang","doi":"arxiv-2407.19349","DOIUrl":"https://doi.org/arxiv-2407.19349","url":null,"abstract":"Researching the specificity of TCR contributes to the development of\u0000immunotherapy and provides new opportunities and strategies for personalized\u0000cancer immunotherapy. Therefore, we established a TCR generative specificity\u0000detection framework consisting of an antigen selector and a TCR classifier\u0000based on the Random Forest algorithm, aiming to efficiently screen out TCRs and\u0000target antigens and achieve TCR specificity prediction. Furthermore, we used\u0000the k-fold validation method to compare the performance of our model with\u0000ordinary deep learning methods. The result proves that adding a classifier to\u0000the model based on the random forest algorithm is very effective, and our model\u0000generally outperforms ordinary deep learning methods. Moreover, we put forward\u0000feasible optimization suggestions for the shortcomings and challenges of our\u0000model found during model implementation.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}