Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101311
Miranda Mundt, William E Hart, Emma S Johnson, Bethany Nicholson, John D Siirola
Pyomo is an open-source optimization modeling software that has undergone significant evolution since its inception in 2008. Pyomo has evolved to enhance flexibility, solver integration, and community engagement. Modern collaborative tools for open-source software have facilitated the development of new Pyomo functionality and improved our development process through automated testing and performance-tracking pipelines. However, Pyomo faces challenges typical of research software, including resource limitations and knowledge retention. The Pyomo team's commitment to better development practices and community engagement reflects a proactive approach to these issues. We describe Pyomo's development journey, highlighting both successes and failures, in the hopes that other open-source research software packages may benefit from our experiences.
{"title":"Pyomo: Accidentally outrunning the bear.","authors":"Miranda Mundt, William E Hart, Emma S Johnson, Bethany Nicholson, John D Siirola","doi":"10.1016/j.patter.2025.101311","DOIUrl":"10.1016/j.patter.2025.101311","url":null,"abstract":"<p><p>Pyomo is an open-source optimization modeling software that has undergone significant evolution since its inception in 2008. Pyomo has evolved to enhance flexibility, solver integration, and community engagement. Modern collaborative tools for open-source software have facilitated the development of new Pyomo functionality and improved our development process through automated testing and performance-tracking pipelines. However, Pyomo faces challenges typical of research software, including resource limitations and knowledge retention. The Pyomo team's commitment to better development practices and community engagement reflects a proactive approach to these issues. We describe Pyomo's development journey, highlighting both successes and failures, in the hopes that other open-source research software packages may benefit from our experiences.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101311"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101315
Mayo Faulkner, Miles Wells
Open-source software is the lifeblood of many modern research projects, allowing researchers to push boundaries, build collaborations, and work transparently. The International Brain Laboratory (IBL), a group of more than twenty labs working together to understand the neuroscience of decision-making, uses open-source software and other open science practices extensively to advance its research. Here, we interview two of the IBL's research software engineers to learn more about their career paths and how they view open-source development.
{"title":"A conversation with research software engineers at the International Brain Laboratory.","authors":"Mayo Faulkner, Miles Wells","doi":"10.1016/j.patter.2025.101315","DOIUrl":"https://doi.org/10.1016/j.patter.2025.101315","url":null,"abstract":"<p><p>Open-source software is the lifeblood of many modern research projects, allowing researchers to push boundaries, build collaborations, and work transparently. The International Brain Laboratory (IBL), a group of more than twenty labs working together to understand the neuroscience of decision-making, uses open-source software and other open science practices extensively to advance its research. Here, we interview two of the IBL's research software engineers to learn more about their career paths and how they view open-source development.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101315"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101314
Jose Guadalupe Hernandez, Anil Kumar Saini, Attri Ghosh, Jason H Moore
The tree-based pipeline optimization tool (TPOT) is one of the earliest automated machine learning (ML) frameworks developed for optimizing ML pipelines, with an emphasis on addressing the complexities of biomedical research. TPOT uses genetic programming to explore a diverse space of pipeline structures and hyperparameter configurations in search of optimal pipelines. Here, we provide a comparative overview of the conceptual similarities and implementation differences between the previous and latest versions of TPOT, focusing on two key aspects: (1) the representation of ML pipelines and (2) the underlying algorithm driving pipeline optimization. We also highlight TPOT's application across various medical and healthcare domains, including disease diagnosis, adverse outcome forecasting, and genetic analysis. Additionally, we propose future directions for enhancing TPOT by integrating contemporary ML techniques and recent advancements in evolutionary computation.
{"title":"The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning.","authors":"Jose Guadalupe Hernandez, Anil Kumar Saini, Attri Ghosh, Jason H Moore","doi":"10.1016/j.patter.2025.101314","DOIUrl":"10.1016/j.patter.2025.101314","url":null,"abstract":"<p><p>The tree-based pipeline optimization tool (TPOT) is one of the earliest automated machine learning (ML) frameworks developed for optimizing ML pipelines, with an emphasis on addressing the complexities of biomedical research. TPOT uses genetic programming to explore a diverse space of pipeline structures and hyperparameter configurations in search of optimal pipelines. Here, we provide a comparative overview of the conceptual similarities and implementation differences between the previous and latest versions of TPOT, focusing on two key aspects: (1) the representation of ML pipelines and (2) the underlying algorithm driving pipeline optimization. We also highlight TPOT's application across various medical and healthcare domains, including disease diagnosis, adverse outcome forecasting, and genetic analysis. Additionally, we propose future directions for enhancing TPOT by integrating contemporary ML techniques and recent advancements in evolutionary computation.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101314"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101324
Andrew L Hufton
{"title":"Open-source software for data science.","authors":"Andrew L Hufton","doi":"10.1016/j.patter.2025.101324","DOIUrl":"https://doi.org/10.1016/j.patter.2025.101324","url":null,"abstract":"","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101324"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101322
Neil P Chue Hong, Selina Aragon, Simon Hettrick, Caroline Jay
The use of software is near-ubiquitous in research, yet it is still underrecognized despite changes in policy and practice. Notwithstanding many successful initiatives to improve the culture around research software, the authors argue that it is essential that the development of research software anticipates changes in the research landscape and continues to support the many different people who use it.
{"title":"The future of research software is the future of research.","authors":"Neil P Chue Hong, Selina Aragon, Simon Hettrick, Caroline Jay","doi":"10.1016/j.patter.2025.101322","DOIUrl":"10.1016/j.patter.2025.101322","url":null,"abstract":"<p><p>The use of software is near-ubiquitous in research, yet it is still underrecognized despite changes in policy and practice. Notwithstanding many successful initiatives to improve the culture around research software, the authors argue that it is essential that the development of research software anticipates changes in the research landscape and continues to support the many different people who use it.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101322"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.
{"title":"Open-source models for development of data and metadata standards.","authors":"Ariel Rokem, Vani Mandava, Nicoleta Cristea, Anshul Tambay, Kristofer Bouchard, Carolina Berys-Gonzalez, Andy Connolly","doi":"10.1016/j.patter.2025.101316","DOIUrl":"10.1016/j.patter.2025.101316","url":null,"abstract":"<p><p>Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101316"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101319
Vincent J Carey
This opinion piece discusses the Bioconductor project for open-source bioinformatics and the engineering concepts underlying its effectiveness to date. Since the inception of Bioconductor in 2002 with 15 software packages devoted to analysis of DNA microarrays, it has grown into an ecosystem of ∼3,000 packages contributed by more than 1,000 developers. Aspects of the history and commitments are reviewed here to contribute to thinking about the design and orchestration of future open-source software projects.
{"title":"Bioconductor: Planning a third decade of comprehensive support for genomic data science.","authors":"Vincent J Carey","doi":"10.1016/j.patter.2025.101319","DOIUrl":"10.1016/j.patter.2025.101319","url":null,"abstract":"<p><p>This opinion piece discusses the Bioconductor project for open-source bioinformatics and the engineering concepts underlying its effectiveness to date. Since the inception of Bioconductor in 2002 with 15 software packages devoted to analysis of DNA microarrays, it has grown into an ecosystem of ∼3,000 packages contributed by more than 1,000 developers. Aspects of the history and commitments are reviewed here to contribute to thinking about the design and orchestration of future open-source software projects.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101319"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03eCollection Date: 2025-07-11DOI: 10.1016/j.patter.2025.101318
Jonathan de Bruin, Peter Lombaers, Casper Kaandorp, Jelle Teijema, Timo van der Kuil, Berke Yazan, Angie Dong, Rens van de Schoot
ASReview LAB v.2 introduces an advancement in AI-assisted systematic reviewing by enabling collaborative screening with multiple experts ("a crowd of oracles") using a shared AI model. The platform supports multiple AI agents within the same project, allowing users to switch between fast general-purpose models and domain-specific, semantic, or multilingual transformer models. Leveraging the SYNERGY benchmark dataset, performance has improved significantly, showing a 24.1% reduction in loss compared to version 1 through model improvements and hyperparameter tuning. ASReview LAB v.2 follows user-centric design principles and offers reproducible, transparent workflows. It logs key configuration and annotation data while balancing full model traceability with efficient storage. Future developments include automated model switching based on performance metrics, noise-robust learning, and ensemble-based decision-making.
{"title":"ASReview LAB v.2: Open-source text screening with multiple agents and a crowd of experts.","authors":"Jonathan de Bruin, Peter Lombaers, Casper Kaandorp, Jelle Teijema, Timo van der Kuil, Berke Yazan, Angie Dong, Rens van de Schoot","doi":"10.1016/j.patter.2025.101318","DOIUrl":"10.1016/j.patter.2025.101318","url":null,"abstract":"<p><p>ASReview LAB v.2 introduces an advancement in AI-assisted systematic reviewing by enabling collaborative screening with multiple experts (\"a crowd of oracles\") using a shared AI model. The platform supports multiple AI agents within the same project, allowing users to switch between fast general-purpose models and domain-specific, semantic, or multilingual transformer models. Leveraging the SYNERGY benchmark dataset, performance has improved significantly, showing a 24.1% reduction in loss compared to version 1 through model improvements and hyperparameter tuning. ASReview LAB v.2 follows user-centric design principles and offers reproducible, transparent workflows. It logs key configuration and annotation data while balancing full model traceability with efficient storage. Future developments include automated model switching based on performance metrics, noise-robust learning, and ensemble-based decision-making.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101318"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03eCollection Date: 2025-07-11DOI: 10.1016/j.patter.2025.101317
Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, Lennart Purucker, Sahithya Ravi, Jan N van Rijn, Prabhant Singh, Joaquin Vanschoren, Jos van der Velde, Marcel Wever
OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform's impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML's capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.
{"title":"OpenML: Insights from 10 years and more than a thousand papers.","authors":"Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, Lennart Purucker, Sahithya Ravi, Jan N van Rijn, Prabhant Singh, Joaquin Vanschoren, Jos van der Velde, Marcel Wever","doi":"10.1016/j.patter.2025.101317","DOIUrl":"10.1016/j.patter.2025.101317","url":null,"abstract":"<p><p>OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform's impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML's capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101317"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03eCollection Date: 2025-09-12DOI: 10.1016/j.patter.2025.101295
Mirko Rossini, Felix M Weidner, Joachim Ankerhold, Hans A Kestler
Describing gene interactions in cells is challenging due to their complexity and the limited microscopic detail available. Boolean networks offer a powerful, coarse-grained approach to modeling these dynamics using binary agents and their interactions. In this context, attractors-stable states of the system-are associated with biological phenotypes, making their identification biologically important. However, traditional computing struggles with the exponential growth of the state space in such models. Here, we present a novel quantum search algorithm for identifying attractors in synchronous Boolean networks, specifically designed for use on quantum computers. The algorithm iteratively suppresses known attractor basins, increasing the probability of detecting new ones. Unlike classical methods, it guarantees the discovery of a new attractor in each run. Early tests demonstrate strong resilience to noise on current NISQ (noisy intermediate-scale quantum) devices, marking a promising advance toward practical quantum-enhanced biological modeling.
{"title":"A novel quantum algorithm for efficient attractor search in gene regulatory networks.","authors":"Mirko Rossini, Felix M Weidner, Joachim Ankerhold, Hans A Kestler","doi":"10.1016/j.patter.2025.101295","DOIUrl":"10.1016/j.patter.2025.101295","url":null,"abstract":"<p><p>Describing gene interactions in cells is challenging due to their complexity and the limited microscopic detail available. Boolean networks offer a powerful, coarse-grained approach to modeling these dynamics using binary agents and their interactions. In this context, attractors-stable states of the system-are associated with biological phenotypes, making their identification biologically important. However, traditional computing struggles with the exponential growth of the state space in such models. Here, we present a novel quantum search algorithm for identifying attractors in synchronous Boolean networks, specifically designed for use on quantum computers. The algorithm iteratively suppresses known attractor basins, increasing the probability of detecting new ones. Unlike classical methods, it guarantees the discovery of a new attractor in each run. Early tests demonstrate strong resilience to noise on current NISQ (noisy intermediate-scale quantum) devices, marking a promising advance toward practical quantum-enhanced biological modeling.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 9","pages":"101295"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145213985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}