Carbon dioxide is a chemically active molecule that plays a vital role in Earth's ecosphere. CO$_2$ affects the acidity of seawater and has multiple negative effects on marine organisms. It is also a fundamental component of the photosynthesis and respiration reactions. There is evidence that higher CO$_2$ concentration can make the photosynthetic reaction faster in some plants, but also negatively impact the respiration reaction in aerobic lifeforms. The effects of this chemical and biochemical perturbation on the biosphere and on human health may be more important than generally highlighted in the discussion on CO$_2$, usually focused on thermal effects only. These considerations stress the importance of rapidly reducing CO$_2$ emissions and, whenever possible, remove the excess from the atmosphere. They also show that geoengineering technologies based on Solar Radiation Management (SRM) alone cannot be sufficient to contrast the negative effects of CO$_2$ anthropogenic emissions.
二氧化碳是一种化学性质活跃的分子,在地球生态圈中发挥着至关重要的作用。二氧化碳会影响海水的酸度,并对海洋生物产生多重负面影响。它也是光合作用和呼吸反应的基本成分。有证据表明,二氧化碳浓度越高,某些植物的光合反应速度越快,但同时也会对需氧生物的呼吸反应产生负面影响。这种化学和生化扰动对生物圈和人类健康的影响,可能比通常只关注热效应的有关 CO$_2$ 的讨论所强调的更为重要。这些考虑突出了迅速减少 CO2 排放并尽可能从大气中清除多余 CO2 的重要性。它们还表明,仅靠基于太阳辐射管理(SRM)的地球工程技术不足以抵消二氧化碳人为排放的负面影响。
{"title":"Carbon Dioxide as a Pollutant. The Risks of Rising Atmospheric CO$_2$ Levels on Human Health and on the Stability of the Biosphere","authors":"Ugo Bardi","doi":"arxiv-2408.08344","DOIUrl":"https://doi.org/arxiv-2408.08344","url":null,"abstract":"Carbon dioxide is a chemically active molecule that plays a vital role in\u0000Earth's ecosphere. CO$_2$ affects the acidity of seawater and has multiple\u0000negative effects on marine organisms. It is also a fundamental component of the\u0000photosynthesis and respiration reactions. There is evidence that higher CO$_2$\u0000concentration can make the photosynthetic reaction faster in some plants, but\u0000also negatively impact the respiration reaction in aerobic lifeforms. The\u0000effects of this chemical and biochemical perturbation on the biosphere and on\u0000human health may be more important than generally highlighted in the discussion\u0000on CO$_2$, usually focused on thermal effects only. These considerations stress\u0000the importance of rapidly reducing CO$_2$ emissions and, whenever possible,\u0000remove the excess from the atmosphere. They also show that geoengineering\u0000technologies based on Solar Radiation Management (SRM) alone cannot be\u0000sufficient to contrast the negative effects of CO$_2$ anthropogenic emissions.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omar Abdelghani AttafiDepartment of Biomedical Sciences University of Padova Italy, Damiano ClementelDepartment of Biomedical Sciences University of Padova Italy, Konstantinos KyritsisInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki Greece, Emidio CapriottiDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Gavin FarrellELIXIR Hub Hinxton Cambridge UK, Styliani-Christina FragkouliInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki GreeceDepartment of Biology National and Kapodistrian University of Athens Athens Greece, Leyla Jael CastroZB Med Information Centre for Life Sciences Cologne Germany, András HatosDepartment of Oncology Geneva University Hospitals Geneva SwitzerlandDepartment of Computational Biology University of Lausanne Lausanne SwitzerlandSwiss Institute of Bioinformatics Lausanne SwitzerlandSwiss Cancer Center Léman Lausanne Switzerland, Tom LenaertsInteruniversity Institute of Bioinformatics in Brussels Université Libre de Bruxelles Vrije Universiteit Brussel Brussels BelgiumMachine Learning Group Université Libre de Bruxelles Street BelgiumArtificial Intelligence Laboratory Vrije Universiteit Brussels Brussels Belgium, Stanislav MazurenkoLoschmidt Laboratories Department of Experimental Biology and RECETOX Faculty of ScienceMasaryk University Brno Czech Republic International Clinical Research Centre St Anne's Hospital Brno Czech Republic, Soroush MozaffariDepartment of Biomedical Sciences University of Padova Italy, Franco PradelliDepartment of Biomedical Sciences University of Padova Italy, Patrick RuchHES-SO - HEG Geneva Geneva SwitzerlandSIB Swiss Institute of Bioinformatics Geneva Switzerland, Castrense SavojardoDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Paola TurinaDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Federico ZambelliDept of Biosciences University of Milan ItalyInstitute of Biomembranes Bioenergetics and Molecular Biotechnologies Bari Italy, Damiano PiovesanDepartment of Biomedical Sciences University of Padova Italy, Alexander Miguel MonzonDepartment of Information Engineering University of Padova Italy, Fotis PsomopoulosInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki Greece, Silvio C. E. TosattoDepartment of Biomedical Sciences University of Padova ItalyInstitute of Biomembranes Bioenergetics and Molecular Biotechnologies National Research Council Bari Italy
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The DOME recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME Registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, promoting transparency and reproducibility of ML in the life sciences.
有监督的机器学习(ML)被广泛应用于生物学领域,值得更严格的审查。DOME 建议旨在通过建立数据处理和加工、优化、评估和模型可解释性等关键方面的标准,加强 ML 研究的验证和可重复性。这些建议通过提供一系列结构化问题,有助于确保关键细节的透明报告。在此,我们介绍 DOME 注册中心(URL:registry.dome-ml.org),这是一个允许科学家管理和访问已发表的 ML 研究的 DOME 相关综合信息的数据库。该注册中心使用 ORCID、APICURON 和数据管理向导等外部资源来简化注释过程并确保文档的全面性。未来的计划包括通过社区化继续发展该注册机构,改进 DOME 分数定义,鼓励出版商采用 DOME 标准,提高生命科学领域 ML 的透明度和可重复性。
{"title":"DOME Registry: Implementing community-wide recommendations for reporting supervised machine learning in biology","authors":"Omar Abdelghani AttafiDepartment of Biomedical Sciences University of Padova Italy, Damiano ClementelDepartment of Biomedical Sciences University of Padova Italy, Konstantinos KyritsisInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki Greece, Emidio CapriottiDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Gavin FarrellELIXIR Hub Hinxton Cambridge UK, Styliani-Christina FragkouliInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki GreeceDepartment of Biology National and Kapodistrian University of Athens Athens Greece, Leyla Jael CastroZB Med Information Centre for Life Sciences Cologne Germany, András HatosDepartment of Oncology Geneva University Hospitals Geneva SwitzerlandDepartment of Computational Biology University of Lausanne Lausanne SwitzerlandSwiss Institute of Bioinformatics Lausanne SwitzerlandSwiss Cancer Center Léman Lausanne Switzerland, Tom LenaertsInteruniversity Institute of Bioinformatics in Brussels Université Libre de Bruxelles Vrije Universiteit Brussel Brussels BelgiumMachine Learning Group Université Libre de Bruxelles Street BelgiumArtificial Intelligence Laboratory Vrije Universiteit Brussels Brussels Belgium, Stanislav MazurenkoLoschmidt Laboratories Department of Experimental Biology and RECETOX Faculty of ScienceMasaryk University Brno Czech Republic International Clinical Research Centre St Anne's Hospital Brno Czech Republic, Soroush MozaffariDepartment of Biomedical Sciences University of Padova Italy, Franco PradelliDepartment of Biomedical Sciences University of Padova Italy, Patrick RuchHES-SO - HEG Geneva Geneva SwitzerlandSIB Swiss Institute of Bioinformatics Geneva Switzerland, Castrense SavojardoDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Paola TurinaDepartment of Pharmacy and Biotechnology University of Bologna Bologna Italy, Federico ZambelliDept of Biosciences University of Milan ItalyInstitute of Biomembranes Bioenergetics and Molecular Biotechnologies Bari Italy, Damiano PiovesanDepartment of Biomedical Sciences University of Padova Italy, Alexander Miguel MonzonDepartment of Information Engineering University of Padova Italy, Fotis PsomopoulosInstitute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki Greece, Silvio C. E. TosattoDepartment of Biomedical Sciences University of Padova ItalyInstitute of Biomembranes Bioenergetics and Molecular Biotechnologies National Research Council Bari Italy","doi":"arxiv-2408.07721","DOIUrl":"https://doi.org/arxiv-2408.07721","url":null,"abstract":"Supervised machine learning (ML) is used extensively in biology and deserves\u0000closer scrutiny. The DOME recommendations aim to enhance the validation and\u0000reproducibility of ML research by establishing standards for key aspects such\u0000as data handling and processing, optimization, evaluation, and model\u0000interpretability. The recommendations help to ensure that key details are\u0000reported transparently by providing a structured set of questions. Here, we\u0000introduce the DOME Registry (URL: registry.dome-ml.org), a database that allows\u0000scientists to manage and access comprehensive DOME-related information on\u0000published ML studies. The registry uses external resources like ORCID, APICURON\u0000and the Data Stewardship Wizard to streamline the annotation process and ensure\u0000comprehensive documentation. By assigning unique identifiers and DOME scores to\u0000publications, the registry fosters a standardized evaluation of ML methods.\u0000Future plans include continuing to grow the registry through community\u0000curation, improving the DOME score definition and encouraging publishers to\u0000adopt DOME standards, promoting transparency and reproducibility of ML in the\u0000life sciences.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this commentary I review the claim by Luebbert and Pachter (arXiv:2405.12998v1) that the reported R-Squared value in Srinivasan et al. (Science, 287(5454):851-853, 2000), describing the relationship between distance to a food source and mean waggle duration of honeybee dances, was too high to be consistent with the reported means and standard deviations in the latter study. There is one serious limitation of the simulations conducted by Luebbert and Pachter, and two flaws that compromise their findings. The reported R-squared value of Srinivasan. et al. is within the expected range, as far as that can be determined given the limitations of the available data.
在这篇评论中,我回顾了 Luebbert 和 Pachter(arXiv:2405.12998v1)的说法,即 Srinivasan 等人(《科学》,287(5454):851-853, 2000)报告的 R 平方值(描述了距离食物源的距离与蜜蜂舞蹈的平均摇摆持续时间之间的关系)过高,与后来研究中报告的平均值和标准偏差不一致。Luebbert 和 Pachter 所做的模拟有一个严重的局限性,还有两个缺陷影响了他们的研究结果。斯里尼瓦桑等人报告的 R 平方值在预期范围之内,这是在现有数据的限制下所能确定的。
{"title":"Miscalibration of simulations: A comment on Luebbert and Pachter: 'Miscalibration of the honeybee odometer' arXiv:2405.12998v1","authors":"Geoffrey Willam Stuart","doi":"arxiv-2408.07713","DOIUrl":"https://doi.org/arxiv-2408.07713","url":null,"abstract":"In this commentary I review the claim by Luebbert and Pachter\u0000(arXiv:2405.12998v1) that the reported R-Squared value in Srinivasan et al.\u0000(Science, 287(5454):851-853, 2000), describing the relationship between\u0000distance to a food source and mean waggle duration of honeybee dances, was too\u0000high to be consistent with the reported means and standard deviations in the\u0000latter study. There is one serious limitation of the simulations conducted by\u0000Luebbert and Pachter, and two flaws that compromise their findings. The\u0000reported R-squared value of Srinivasan. et al. is within the expected range, as\u0000far as that can be determined given the limitations of the available data.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"393 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victoria Hedley, Rebecca Leary, Anando Sen, Anna Irvin, Emma Heslop, Volker Straub
Over the past 50 years, the advancements in medical and health research have radically changed the epidemiology of health conditions in neonates, children, and adolescents; and clinical research has on the whole, moved forward. However, large sections of the pediatric community remain vulnerable and underserved, by clinical research. One reason for this is the fact that most pediatric diseases are also rare diseases (i.e., they fit the EU definition of a rare condition, by affecting no more than 5 in 10,000 individuals), and indeed the majority of conditions under this umbrella heading are in fact much rarer, affecting fewer than 1 in 100,000. Rare pediatric diseases incur particular challenges, both in terms of actually conducting clinical trials but also planning trials (and indeed, stimulating the preclinical research and knowledge generation necessary to embark on clinical trials in the first place). The pediatric regulation and orphan regulation (covering rare diseases) were introduced to address the complexities in research and development of medicines specifically for children and for people living with a rare disease, respectively. The regulations have been reasonably effective, particularly in areas where adult and pediatric diseases overlap, driving the development of more pediatric medicines; however, challenges still remain, often exacerbated by the rarity of the diseases. These include issues around trial planning, the need for more innovative methodologies in smaller populations, significant delays in trial start up and recruitment, recruitment issues (due to small populations and the nature of the conditions), lack of endpoints, and scarce data. This chapter will discuss some of the major challenges in delivering trials in pediatric rare diseases while also assessing current and future solutions to address these.
{"title":"Performing clinical drug trials in children with a rare disease","authors":"Victoria Hedley, Rebecca Leary, Anando Sen, Anna Irvin, Emma Heslop, Volker Straub","doi":"arxiv-2408.07142","DOIUrl":"https://doi.org/arxiv-2408.07142","url":null,"abstract":"Over the past 50 years, the advancements in medical and health research have\u0000radically changed the epidemiology of health conditions in neonates, children,\u0000and adolescents; and clinical research has on the whole, moved forward.\u0000However, large sections of the pediatric community remain vulnerable and\u0000underserved, by clinical research. One reason for this is the fact that most\u0000pediatric diseases are also rare diseases (i.e., they fit the EU definition of\u0000a rare condition, by affecting no more than 5 in 10,000 individuals), and\u0000indeed the majority of conditions under this umbrella heading are in fact much\u0000rarer, affecting fewer than 1 in 100,000. Rare pediatric diseases incur\u0000particular challenges, both in terms of actually conducting clinical trials but\u0000also planning trials (and indeed, stimulating the preclinical research and\u0000knowledge generation necessary to embark on clinical trials in the first\u0000place). The pediatric regulation and orphan regulation (covering rare diseases)\u0000were introduced to address the complexities in research and development of\u0000medicines specifically for children and for people living with a rare disease,\u0000respectively. The regulations have been reasonably effective, particularly in\u0000areas where adult and pediatric diseases overlap, driving the development of\u0000more pediatric medicines; however, challenges still remain, often exacerbated\u0000by the rarity of the diseases. These include issues around trial planning, the\u0000need for more innovative methodologies in smaller populations, significant\u0000delays in trial start up and recruitment, recruitment issues (due to small\u0000populations and the nature of the conditions), lack of endpoints, and scarce\u0000data. This chapter will discuss some of the major challenges in delivering\u0000trials in pediatric rare diseases while also assessing current and future\u0000solutions to address these.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"70-72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Dold, Joana Pereira, Bastian Sajonz, Volker A. Coenen, Marcus L. F. Janssen, Michael Tangermann
This work introduces Dareplane, a modular and broad technology agnostic open source software platform for brain-computer interface research with an application focus on adaptive deep brain stimulation (aDBS). While the search for suitable biomarkers to inform aDBS has provided rich results over the last two decades, development of control strategies is not progressing at the same pace. One difficulty for investigating control approaches resides with the complex setups required for aDBS experiments. The Dareplane platform supports aDBS setups, and more generally brain computer interfaces, by providing a modular, technology-agnostic, and easy-to-implement software platform to make experimental setups more resilient and replicable. The key features of the platform are presented and the composition of modules into a full experimental setup is discussed in the context of a Python-based orchestration module. The performance of a typical experimental setup on Dareplane for aDBS is evaluated in three benchtop experiments, covering (a) an easy-to-replicate setup using an Arduino microcontroller, (b) a setup with hardware of an implantable pulse generator, and (c) a setup using an established and CE certified external neurostimulator. Benchmark results are presented for individual processing steps and full closed-loop processing. The results show that the microcontroller setup in (a) provides timing comparable to the realistic setups in (b) and (c). The Dareplane platform was successfully used in a total of 19 open-loop DBS sessions with externalized DBS and electrocorticography (ECoG) leads. In addition, the full technical feasibility of the platform in the aDBS context is demonstrated in a first closed-loop session with externalized leads on a patient with Parkinson's disease receiving DBS treatment.
{"title":"A modular open-source software platform for BCI research with application in closed-loop deep brain stimulation","authors":"Matthias Dold, Joana Pereira, Bastian Sajonz, Volker A. Coenen, Marcus L. F. Janssen, Michael Tangermann","doi":"arxiv-2408.01242","DOIUrl":"https://doi.org/arxiv-2408.01242","url":null,"abstract":"This work introduces Dareplane, a modular and broad technology agnostic open\u0000source software platform for brain-computer interface research with an\u0000application focus on adaptive deep brain stimulation (aDBS). While the search\u0000for suitable biomarkers to inform aDBS has provided rich results over the last\u0000two decades, development of control strategies is not progressing at the same\u0000pace. One difficulty for investigating control approaches resides with the\u0000complex setups required for aDBS experiments. The Dareplane platform supports\u0000aDBS setups, and more generally brain computer interfaces, by providing a\u0000modular, technology-agnostic, and easy-to-implement software platform to make\u0000experimental setups more resilient and replicable. The key features of the\u0000platform are presented and the composition of modules into a full experimental\u0000setup is discussed in the context of a Python-based orchestration module. The\u0000performance of a typical experimental setup on Dareplane for aDBS is evaluated\u0000in three benchtop experiments, covering (a) an easy-to-replicate setup using an\u0000Arduino microcontroller, (b) a setup with hardware of an implantable pulse\u0000generator, and (c) a setup using an established and CE certified external\u0000neurostimulator. Benchmark results are presented for individual processing\u0000steps and full closed-loop processing. The results show that the\u0000microcontroller setup in (a) provides timing comparable to the realistic setups\u0000in (b) and (c). The Dareplane platform was successfully used in a total of 19\u0000open-loop DBS sessions with externalized DBS and electrocorticography (ECoG)\u0000leads. In addition, the full technical feasibility of the platform in the aDBS\u0000context is demonstrated in a first closed-loop session with externalized leads\u0000on a patient with Parkinson's disease receiving DBS treatment.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Composition is a powerful principle for systems biology, focused on the interfaces, interconnections, and orchestration of distributed processes. Whereas most systems biology models focus on the structure or dynamics of specific subsystems in controlled conditions, compositional systems biology aims to connect such models into integrative multiscale simulations. This emphasizes the space between models--a compositional perspective asks what variables should be exposed through a submodel's interface? How do coupled models connect and translate across scales? How can we connect domain-specific models across biological and physical research areas to drive the synthesis of new knowledge? What is required of software that integrates diverse datasets and submodels into unified multiscale simulations? How can the resulting integrative models be accessed, flexibly recombined into new forms, and iteratively refined by a community of researchers? This essay offers a high-level overview of the key components for compositional systems biology, including: 1) a conceptual framework and corresponding graphical framework to represent interfaces, composition patterns, and orchestration patterns; 2) standardized composition schemas that offer consistent formats for composable data types and models, fostering robust infrastructure for a registry of simulation modules that can be flexibly assembled; 3) a foundational set of biological templates--schemas for cellular and molecular interfaces, which can be filled with detailed submodels and datasets, and are designed to integrate knowledge that sheds light on the molecular emergence of cells; and 4) scientific collaboration facilitated by user-friendly interfaces for connecting researchers with datasets and models, and which allows a community of researchers to effectively build integrative multiscale models of cellular systems.
{"title":"Prelude to a Compositional Systems Biology","authors":"Eran Agmon","doi":"arxiv-2408.00942","DOIUrl":"https://doi.org/arxiv-2408.00942","url":null,"abstract":"Composition is a powerful principle for systems biology, focused on the\u0000interfaces, interconnections, and orchestration of distributed processes.\u0000Whereas most systems biology models focus on the structure or dynamics of\u0000specific subsystems in controlled conditions, compositional systems biology\u0000aims to connect such models into integrative multiscale simulations. This\u0000emphasizes the space between models--a compositional perspective asks what\u0000variables should be exposed through a submodel's interface? How do coupled\u0000models connect and translate across scales? How can we connect domain-specific\u0000models across biological and physical research areas to drive the synthesis of\u0000new knowledge? What is required of software that integrates diverse datasets\u0000and submodels into unified multiscale simulations? How can the resulting\u0000integrative models be accessed, flexibly recombined into new forms, and\u0000iteratively refined by a community of researchers? This essay offers a\u0000high-level overview of the key components for compositional systems biology,\u0000including: 1) a conceptual framework and corresponding graphical framework to\u0000represent interfaces, composition patterns, and orchestration patterns; 2)\u0000standardized composition schemas that offer consistent formats for composable\u0000data types and models, fostering robust infrastructure for a registry of\u0000simulation modules that can be flexibly assembled; 3) a foundational set of\u0000biological templates--schemas for cellular and molecular interfaces, which can\u0000be filled with detailed submodels and datasets, and are designed to integrate\u0000knowledge that sheds light on the molecular emergence of cells; and 4)\u0000scientific collaboration facilitated by user-friendly interfaces for connecting\u0000researchers with datasets and models, and which allows a community of\u0000researchers to effectively build integrative multiscale models of cellular\u0000systems.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"369 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims to utilize Transformer models and large language models (such as GPT and Claude) to predict the brightness of Aequorea victoria green fluorescent protein (avGFP) and design mutants with higher brightness. Considering the time and cost associated with traditional experimental screening methods, this study employs machine learning techniques to enhance research efficiency. We first read and preprocess a proprietary dataset containing approximately 140,000 protein sequences, including about 30,000 avGFP sequences. Subsequently, we constructed and trained a Transformer-based prediction model to screen and design new avGFP mutants that are expected to exhibit higher brightness. Our methodology consists of two primary stages: first, the construction of a scoring model using BERT, and second, the screening and generation of mutants using mutation site statistics and large language models. Through the analysis of predictive results, we designed and screened 10 new high-brightness avGFP sequences. This study not only demonstrates the potential of deep learning in protein design but also provides new perspectives and methodologies for future research by integrating prior knowledge from large language models.
{"title":"BERT and LLMs-Based avGFP Brightness Prediction and Mutation Design","authors":"X. Guo, W. Che","doi":"arxiv-2407.20534","DOIUrl":"https://doi.org/arxiv-2407.20534","url":null,"abstract":"This study aims to utilize Transformer models and large language models (such\u0000as GPT and Claude) to predict the brightness of Aequorea victoria green\u0000fluorescent protein (avGFP) and design mutants with higher brightness.\u0000Considering the time and cost associated with traditional experimental\u0000screening methods, this study employs machine learning techniques to enhance\u0000research efficiency. We first read and preprocess a proprietary dataset\u0000containing approximately 140,000 protein sequences, including about 30,000\u0000avGFP sequences. Subsequently, we constructed and trained a Transformer-based\u0000prediction model to screen and design new avGFP mutants that are expected to\u0000exhibit higher brightness. Our methodology consists of two primary stages: first, the construction of a\u0000scoring model using BERT, and second, the screening and generation of mutants\u0000using mutation site statistics and large language models. Through the analysis\u0000of predictive results, we designed and screened 10 new high-brightness avGFP\u0000sequences. This study not only demonstrates the potential of deep learning in\u0000protein design but also provides new perspectives and methodologies for future\u0000research by integrating prior knowledge from large language models.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Belinda Neo, Noel Nannup, Dale Tilbrook, Eleanor Dunlop, John Jacky, Carol Michie, Cindy Prior, Brad Farrant, Carrington C. J. Shepherd, Lucinda J. Black
Background: Low vitamin D intake and high prevalence of vitamin D deficiency (serum 25-hydroxyvitamin D concentration < 50 nmol/L) among Aboriginal and Torres Strait Islander peoples highlight a need for public health strategies to improve vitamin D status. As few foods contain naturally occurring vitamin D, fortification strategies may be needed to improve vitamin D intake and status among Aboriginal and Torres Strait Islander peoples. Objective: We aimed to model vitamin D food fortification scenarios among Aboriginal and Torres Strait Islander peoples. Methods: We used nationally representative food consumption data (n=4,109) and vitamin D food composition data to model four food fortification scenarios. The modelling for Scenario 1 included foods and maximum vitamin D concentrations permitted for fortification in Australia: i) dairy products and alternatives, ii) butter/margarine/oil spreads, iii) formulated beverages, and iv) selected ready-to-eat breakfast cereals. The modelling for Scenarios 2a-c included some vitamin D concentrations higher than permitted in Australia; Scenario 2c included bread, which is not permitted for vitamin D fortification in Australia. Scenario 2a: i) dairy products and alternatives, ii) butter/margarine/oil spreads, iii) formulated beverages. Scenario 2b: as per Scenario 2a plus selected ready-to-eat breakfast cereals. Scenario 2c: as per Scenario 2b plus bread. Results: Vitamin D fortification of a range of staple foods could potentially increase vitamin D intake among Aboriginal and Torres Strait Islander peoples by ~ 3-6 {mu}g/day. Scenario 2c showed the highest potential median vitamin D intake increase to ~ 8 {mu}g/day. Across all modelled scenarios, none of the participants had vitamin D intake above the Australian upper level of intake of 80 {mu}g/day.
背景:原住民和托雷斯海峡岛民维生素 D 摄入量低,维生素 D 缺乏症(血清 25- 羟维生素 D 浓度低于 50 nmol/L)发病率高,这凸显了改善维生素 D 状态的公共卫生策略的必要性。由于很少有食物含有天然维生素 D,因此可能需要采取强化策略来改善土著居民和托雷斯海峡岛民的维生素 D 摄入量和状况。目的:我们旨在模拟原住民和托雷斯海峡岛民的维生素 D 食物强化方案。方法:我们使用了具有全国代表性的食品消费数据:我们使用具有全国代表性的食物消费数据(n=4109)和维生素 D 食物成分数据,模拟了四种食物强化方案。方案 1 的建模包括澳大利亚允许强化的食品和维生素 D 的最高浓度:i) 乳制品及其替代品;ii) 黄油/人造黄油/油涂抹酱;iii) 配方饮料;iv) 部分即食谷物早餐。方案 2a-c 的模拟包括一些维生素 D 浓度高于澳大利亚允许水平的食品;方案 2c 包括面包,澳大利亚不允许在面包中添加维生素 D。方案 2a:i) 乳制品和替代品,ii) 黄油/人造黄油/涂油,iii) 配方饮料。方案 2b:与方案 2a 相同,加上选定的即食谷物早餐。结果:一系列主食的维生素D强化可能会使土著居民和托雷斯海峡岛民的维生素D摄入量增加约3-6 {mu}克/天。方案2c显示维生素D摄入量的潜在中位数增幅最大,达到~ 8{mu}克/天。在所有模拟情景中,没有一个参与者的维生素D摄入量超过澳大利亚80{mu}克/天的摄入上限。
{"title":"Modelling vitamin D food fortification among Aboriginal and Torres Strait Islander peoples in Australia","authors":"Belinda Neo, Noel Nannup, Dale Tilbrook, Eleanor Dunlop, John Jacky, Carol Michie, Cindy Prior, Brad Farrant, Carrington C. J. Shepherd, Lucinda J. Black","doi":"arxiv-2407.20116","DOIUrl":"https://doi.org/arxiv-2407.20116","url":null,"abstract":"Background: Low vitamin D intake and high prevalence of vitamin D deficiency\u0000(serum 25-hydroxyvitamin D concentration < 50 nmol/L) among Aboriginal and\u0000Torres Strait Islander peoples highlight a need for public health strategies to\u0000improve vitamin D status. As few foods contain naturally occurring vitamin D,\u0000fortification strategies may be needed to improve vitamin D intake and status\u0000among Aboriginal and Torres Strait Islander peoples. Objective: We aimed to\u0000model vitamin D food fortification scenarios among Aboriginal and Torres Strait\u0000Islander peoples. Methods: We used nationally representative food consumption\u0000data (n=4,109) and vitamin D food composition data to model four food\u0000fortification scenarios. The modelling for Scenario 1 included foods and\u0000maximum vitamin D concentrations permitted for fortification in Australia: i)\u0000dairy products and alternatives, ii) butter/margarine/oil spreads, iii)\u0000formulated beverages, and iv) selected ready-to-eat breakfast cereals. The\u0000modelling for Scenarios 2a-c included some vitamin D concentrations higher than\u0000permitted in Australia; Scenario 2c included bread, which is not permitted for\u0000vitamin D fortification in Australia. Scenario 2a: i) dairy products and\u0000alternatives, ii) butter/margarine/oil spreads, iii) formulated beverages.\u0000Scenario 2b: as per Scenario 2a plus selected ready-to-eat breakfast cereals.\u0000Scenario 2c: as per Scenario 2b plus bread. Results: Vitamin D fortification of\u0000a range of staple foods could potentially increase vitamin D intake among\u0000Aboriginal and Torres Strait Islander peoples by ~ 3-6 {mu}g/day. Scenario 2c\u0000showed the highest potential median vitamin D intake increase to ~ 8\u0000{mu}g/day. Across all modelled scenarios, none of the participants had vitamin\u0000D intake above the Australian upper level of intake of 80 {mu}g/day.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Palatability of food is driven by multiple factors like taste, smell, texture, freshness, etc. and can be very variable across species. There are classic examples of local adaptations leading to speciation, driven by food availability. Urbanization across the world is causing rapid decline of biodiversity, while also driving local adaptations in some species. Free-ranging dogs are an interesting example of adaptation to a human-dominated environment across varied habitats. They have co-existed with humans for centuries and are a perfect model system for studying local adaptations. We attempted to understand a specific aspect of their scavenging behaviour in India: citrus aversion. Pet dogs are known to avoid citrus fruits and food contaminated by them. In India, lemons are used widely in the cuisine, and discarded in the garbage. Hence, free-ranging dogs, that typically are scavengers of human leftovers, are likely to encounter lemons and lemon-contaminated food on a regular basis. We carried out a population level experiment to test response of free-ranging dogs to chicken contaminated with various parts of lemon. The dogs avoided chicken contaminated with lemon juice the most. Further, when provided with chicken dipped in three different concentrations of lemon juice, the lowest concentration was most preferred. A survey confirmed that the local people use lemon in their diet extensively and also discard these with the leftovers. People avoided giving citrus contaminated food to their pets but did not follow the same caution for free-ranging dogs. This study revealed that free-ranging dogs in West Bengal, India, are well adapted to scavenging among citrus-contaminated garbage and have their own strategies to avoid the contamination as far as possible, while maximizing their preferred food intake.
{"title":"When Life Gives You Lemons, Squeeze Your Way Through: Understanding Citrus Avoidance Behaviour by Free-Ranging Dogs in India","authors":"Tuhin Subhra Pal, Srijaya Nandi, Rohan Sarkar, Anindita Bhadra","doi":"arxiv-2407.17601","DOIUrl":"https://doi.org/arxiv-2407.17601","url":null,"abstract":"Palatability of food is driven by multiple factors like taste, smell,\u0000texture, freshness, etc. and can be very variable across species. There are\u0000classic examples of local adaptations leading to speciation, driven by food\u0000availability. Urbanization across the world is causing rapid decline of\u0000biodiversity, while also driving local adaptations in some species.\u0000Free-ranging dogs are an interesting example of adaptation to a human-dominated\u0000environment across varied habitats. They have co-existed with humans for\u0000centuries and are a perfect model system for studying local adaptations. We\u0000attempted to understand a specific aspect of their scavenging behaviour in\u0000India: citrus aversion. Pet dogs are known to avoid citrus fruits and food\u0000contaminated by them. In India, lemons are used widely in the cuisine, and\u0000discarded in the garbage. Hence, free-ranging dogs, that typically are\u0000scavengers of human leftovers, are likely to encounter lemons and\u0000lemon-contaminated food on a regular basis. We carried out a population level\u0000experiment to test response of free-ranging dogs to chicken contaminated with\u0000various parts of lemon. The dogs avoided chicken contaminated with lemon juice\u0000the most. Further, when provided with chicken dipped in three different\u0000concentrations of lemon juice, the lowest concentration was most preferred. A\u0000survey confirmed that the local people use lemon in their diet extensively and\u0000also discard these with the leftovers. People avoided giving citrus\u0000contaminated food to their pets but did not follow the same caution for\u0000free-ranging dogs. This study revealed that free-ranging dogs in West Bengal,\u0000India, are well adapted to scavenging among citrus-contaminated garbage and\u0000have their own strategies to avoid the contamination as far as possible, while\u0000maximizing their preferred food intake.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How does the genome encode the form of the organism? What is the nature of this genomic code? Common metaphors, such as a blueprint or program, fail to capture the complex, indirect, and evolutionarily dynamic relationship between the genome and organismal form, or the constructive, interactive processes that produce it. Such metaphors are also not readily formalised, either to treat empirical data or to simulate genomic encoding of form in silico. Here, we propose a new analogy, inspired by recent work in machine learning and neuroscience: that the genome encodes a generative model of the organism. In this scheme, by analogy with variational autoencoders, the genome does not encode either organismal form or developmental processes directly, but comprises a compressed space of latent variables. These latent variables are the DNA sequences that specify the biochemical properties of encoded proteins and the relative affinities between trans-acting regulatory factors and their target sequence elements. Collectively, these comprise a connectionist network, with weights that get encoded by the learning algorithm of evolution and decoded through the processes of development. The latent variables collectively shape an energy landscape that constrains the self-organising processes of development so as to reliably produce a new individual of a certain type, providing a direct analogy to Waddingtons famous epigenetic landscape. The generative model analogy accounts for the complex, distributed genetic architecture of most traits and the emergent robustness and evolvability of developmental processes. It also provides a new way to explain the independent selectability of specific traits, drawing on the idea of multiplexed disentangled representations observed in artificial and neural systems and lends itself to formalisation.
基因组如何编码生物体的形态?基因组代码的本质是什么?常见的隐喻,如蓝图或程序,无法概括基因组与生物体形态之间复杂、间接和动态进化的关系,也无法概括产生这种关系的建设性互动过程。无论是处理经验数据,还是模拟基因组对形态的编码,这些隐喻都不容易形式化。在此,我们受机器学习和神经科学领域最新研究的启发,提出了一个新的类比:基因组编码生物体的生成模型。在这个方案中,通过与变异自动编码器类比,基因组并不直接编码生物体的形态或发育过程,而是包含一个压缩的潜变量空间。这些潜变量是指定编码蛋白质生化特性的 DNA 序列,以及反式调节因子与其目标序列元素之间的相对亲和力。这些变量共同构成了一个联结网络,其权重由进化学习算法编码,并通过发育过程解码。这些潜在变量共同形成了一个能量景观,它制约着发育的自组织过程,从而可靠地产生出某种类型的新个体,这与韦丁顿著名的表观遗传景观形成了直接的类比。该生成模型类比解释了大多数性状复杂、分布式的遗传结构,以及发育过程中出现的稳健性和可演化性。它还提供了一种新的方法来解释特定性状的独立可选择性,借鉴了在人工和神经系统中观察到的多路复用分散表征的思想,并适合形式化。
{"title":"The Genomic Code: The genome instantiates a generative model of the organism","authors":"Kevin J. Mitchell, Nick Cheney","doi":"arxiv-2407.15908","DOIUrl":"https://doi.org/arxiv-2407.15908","url":null,"abstract":"How does the genome encode the form of the organism? What is the nature of\u0000this genomic code? Common metaphors, such as a blueprint or program, fail to\u0000capture the complex, indirect, and evolutionarily dynamic relationship between\u0000the genome and organismal form, or the constructive, interactive processes that\u0000produce it. Such metaphors are also not readily formalised, either to treat\u0000empirical data or to simulate genomic encoding of form in silico. Here, we\u0000propose a new analogy, inspired by recent work in machine learning and\u0000neuroscience: that the genome encodes a generative model of the organism. In\u0000this scheme, by analogy with variational autoencoders, the genome does not\u0000encode either organismal form or developmental processes directly, but\u0000comprises a compressed space of latent variables. These latent variables are\u0000the DNA sequences that specify the biochemical properties of encoded proteins\u0000and the relative affinities between trans-acting regulatory factors and their\u0000target sequence elements. Collectively, these comprise a connectionist network,\u0000with weights that get encoded by the learning algorithm of evolution and\u0000decoded through the processes of development. The latent variables collectively\u0000shape an energy landscape that constrains the self-organising processes of\u0000development so as to reliably produce a new individual of a certain type,\u0000providing a direct analogy to Waddingtons famous epigenetic landscape. The\u0000generative model analogy accounts for the complex, distributed genetic\u0000architecture of most traits and the emergent robustness and evolvability of\u0000developmental processes. It also provides a new way to explain the independent\u0000selectability of specific traits, drawing on the idea of multiplexed\u0000disentangled representations observed in artificial and neural systems and\u0000lends itself to formalisation.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}