Pub Date : 2025-07-21eCollection Date: 2025-11-14DOI: 10.1016/j.patter.2025.101313
Juli Bakagianni, Kanella Pouli, Maria Gavriilidou, John Pavlopoulos
Comprehensive monolingual natural language processing (NLP) surveys are essential for assessing language-specific challenges, resource availability, and research gaps. However, existing surveys often lack standardized methodologies, leading to selection bias and fragmented coverage of NLP tasks and resources. This study introduces a generalizable framework for systematic monolingual NLP surveys. Our approach integrates a structured search protocol to minimize bias, an NLP task taxonomy for classification, and language resource taxonomies to identify potential benchmarks and highlight opportunities for improving resource availability. We apply this framework to Greek NLP (2012-2023), providing an in-depth analysis of its current state, task-specific progress, and resource gaps. The survey results are publicly available and are regularly updated to provide an evergreen resource. This systematic survey of Greek NLP serves as a case study, demonstrating the effectiveness of our framework and its potential for broader application to other not-so-well-resourced languages as regards NLP.
{"title":"A systematic survey of natural language processing for the Greek language.","authors":"Juli Bakagianni, Kanella Pouli, Maria Gavriilidou, John Pavlopoulos","doi":"10.1016/j.patter.2025.101313","DOIUrl":"10.1016/j.patter.2025.101313","url":null,"abstract":"<p><p>Comprehensive monolingual natural language processing (NLP) surveys are essential for assessing language-specific challenges, resource availability, and research gaps. However, existing surveys often lack standardized methodologies, leading to selection bias and fragmented coverage of NLP tasks and resources. This study introduces a generalizable framework for systematic monolingual NLP surveys. Our approach integrates a structured search protocol to minimize bias, an NLP task taxonomy for classification, and language resource taxonomies to identify potential benchmarks and highlight opportunities for improving resource availability. We apply this framework to Greek NLP (2012-2023), providing an in-depth analysis of its current state, task-specific progress, and resource gaps. The survey results are publicly available and are regularly updated to provide an evergreen resource. This systematic survey of Greek NLP serves as a case study, demonstrating the effectiveness of our framework and its potential for broader application to other not-so-well-resourced languages as regards NLP.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 11","pages":"101313"},"PeriodicalIF":7.4,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101311
Miranda Mundt, William E Hart, Emma S Johnson, Bethany Nicholson, John D Siirola
Pyomo is an open-source optimization modeling software that has undergone significant evolution since its inception in 2008. Pyomo has evolved to enhance flexibility, solver integration, and community engagement. Modern collaborative tools for open-source software have facilitated the development of new Pyomo functionality and improved our development process through automated testing and performance-tracking pipelines. However, Pyomo faces challenges typical of research software, including resource limitations and knowledge retention. The Pyomo team's commitment to better development practices and community engagement reflects a proactive approach to these issues. We describe Pyomo's development journey, highlighting both successes and failures, in the hopes that other open-source research software packages may benefit from our experiences.
{"title":"Pyomo: Accidentally outrunning the bear.","authors":"Miranda Mundt, William E Hart, Emma S Johnson, Bethany Nicholson, John D Siirola","doi":"10.1016/j.patter.2025.101311","DOIUrl":"10.1016/j.patter.2025.101311","url":null,"abstract":"<p><p>Pyomo is an open-source optimization modeling software that has undergone significant evolution since its inception in 2008. Pyomo has evolved to enhance flexibility, solver integration, and community engagement. Modern collaborative tools for open-source software have facilitated the development of new Pyomo functionality and improved our development process through automated testing and performance-tracking pipelines. However, Pyomo faces challenges typical of research software, including resource limitations and knowledge retention. The Pyomo team's commitment to better development practices and community engagement reflects a proactive approach to these issues. We describe Pyomo's development journey, highlighting both successes and failures, in the hopes that other open-source research software packages may benefit from our experiences.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101311"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101315
Mayo Faulkner, Miles Wells
Open-source software is the lifeblood of many modern research projects, allowing researchers to push boundaries, build collaborations, and work transparently. The International Brain Laboratory (IBL), a group of more than twenty labs working together to understand the neuroscience of decision-making, uses open-source software and other open science practices extensively to advance its research. Here, we interview two of the IBL's research software engineers to learn more about their career paths and how they view open-source development.
{"title":"A conversation with research software engineers at the International Brain Laboratory.","authors":"Mayo Faulkner, Miles Wells","doi":"10.1016/j.patter.2025.101315","DOIUrl":"https://doi.org/10.1016/j.patter.2025.101315","url":null,"abstract":"<p><p>Open-source software is the lifeblood of many modern research projects, allowing researchers to push boundaries, build collaborations, and work transparently. The International Brain Laboratory (IBL), a group of more than twenty labs working together to understand the neuroscience of decision-making, uses open-source software and other open science practices extensively to advance its research. Here, we interview two of the IBL's research software engineers to learn more about their career paths and how they view open-source development.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101315"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101314
Jose Guadalupe Hernandez, Anil Kumar Saini, Attri Ghosh, Jason H Moore
The tree-based pipeline optimization tool (TPOT) is one of the earliest automated machine learning (ML) frameworks developed for optimizing ML pipelines, with an emphasis on addressing the complexities of biomedical research. TPOT uses genetic programming to explore a diverse space of pipeline structures and hyperparameter configurations in search of optimal pipelines. Here, we provide a comparative overview of the conceptual similarities and implementation differences between the previous and latest versions of TPOT, focusing on two key aspects: (1) the representation of ML pipelines and (2) the underlying algorithm driving pipeline optimization. We also highlight TPOT's application across various medical and healthcare domains, including disease diagnosis, adverse outcome forecasting, and genetic analysis. Additionally, we propose future directions for enhancing TPOT by integrating contemporary ML techniques and recent advancements in evolutionary computation.
{"title":"The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning.","authors":"Jose Guadalupe Hernandez, Anil Kumar Saini, Attri Ghosh, Jason H Moore","doi":"10.1016/j.patter.2025.101314","DOIUrl":"10.1016/j.patter.2025.101314","url":null,"abstract":"<p><p>The tree-based pipeline optimization tool (TPOT) is one of the earliest automated machine learning (ML) frameworks developed for optimizing ML pipelines, with an emphasis on addressing the complexities of biomedical research. TPOT uses genetic programming to explore a diverse space of pipeline structures and hyperparameter configurations in search of optimal pipelines. Here, we provide a comparative overview of the conceptual similarities and implementation differences between the previous and latest versions of TPOT, focusing on two key aspects: (1) the representation of ML pipelines and (2) the underlying algorithm driving pipeline optimization. We also highlight TPOT's application across various medical and healthcare domains, including disease diagnosis, adverse outcome forecasting, and genetic analysis. Additionally, we propose future directions for enhancing TPOT by integrating contemporary ML techniques and recent advancements in evolutionary computation.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101314"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101324
Andrew L Hufton
{"title":"Open-source software for data science.","authors":"Andrew L Hufton","doi":"10.1016/j.patter.2025.101324","DOIUrl":"https://doi.org/10.1016/j.patter.2025.101324","url":null,"abstract":"","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101324"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101322
Neil P Chue Hong, Selina Aragon, Simon Hettrick, Caroline Jay
The use of software is near-ubiquitous in research, yet it is still underrecognized despite changes in policy and practice. Notwithstanding many successful initiatives to improve the culture around research software, the authors argue that it is essential that the development of research software anticipates changes in the research landscape and continues to support the many different people who use it.
{"title":"The future of research software is the future of research.","authors":"Neil P Chue Hong, Selina Aragon, Simon Hettrick, Caroline Jay","doi":"10.1016/j.patter.2025.101322","DOIUrl":"10.1016/j.patter.2025.101322","url":null,"abstract":"<p><p>The use of software is near-ubiquitous in research, yet it is still underrecognized despite changes in policy and practice. Notwithstanding many successful initiatives to improve the culture around research software, the authors argue that it is essential that the development of research software anticipates changes in the research landscape and continues to support the many different people who use it.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101322"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.
{"title":"Open-source models for development of data and metadata standards.","authors":"Ariel Rokem, Vani Mandava, Nicoleta Cristea, Anshul Tambay, Kristofer Bouchard, Carolina Berys-Gonzalez, Andy Connolly","doi":"10.1016/j.patter.2025.101316","DOIUrl":"10.1016/j.patter.2025.101316","url":null,"abstract":"<p><p>Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101316"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.patter.2025.101319
Vincent J Carey
This opinion piece discusses the Bioconductor project for open-source bioinformatics and the engineering concepts underlying its effectiveness to date. Since the inception of Bioconductor in 2002 with 15 software packages devoted to analysis of DNA microarrays, it has grown into an ecosystem of ∼3,000 packages contributed by more than 1,000 developers. Aspects of the history and commitments are reviewed here to contribute to thinking about the design and orchestration of future open-source software projects.
{"title":"Bioconductor: Planning a third decade of comprehensive support for genomic data science.","authors":"Vincent J Carey","doi":"10.1016/j.patter.2025.101319","DOIUrl":"10.1016/j.patter.2025.101319","url":null,"abstract":"<p><p>This opinion piece discusses the Bioconductor project for open-source bioinformatics and the engineering concepts underlying its effectiveness to date. Since the inception of Bioconductor in 2002 with 15 software packages devoted to analysis of DNA microarrays, it has grown into an ecosystem of ∼3,000 packages contributed by more than 1,000 developers. Aspects of the history and commitments are reviewed here to contribute to thinking about the design and orchestration of future open-source software projects.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101319"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03eCollection Date: 2025-07-11DOI: 10.1016/j.patter.2025.101318
Jonathan de Bruin, Peter Lombaers, Casper Kaandorp, Jelle Teijema, Timo van der Kuil, Berke Yazan, Angie Dong, Rens van de Schoot
ASReview LAB v.2 introduces an advancement in AI-assisted systematic reviewing by enabling collaborative screening with multiple experts ("a crowd of oracles") using a shared AI model. The platform supports multiple AI agents within the same project, allowing users to switch between fast general-purpose models and domain-specific, semantic, or multilingual transformer models. Leveraging the SYNERGY benchmark dataset, performance has improved significantly, showing a 24.1% reduction in loss compared to version 1 through model improvements and hyperparameter tuning. ASReview LAB v.2 follows user-centric design principles and offers reproducible, transparent workflows. It logs key configuration and annotation data while balancing full model traceability with efficient storage. Future developments include automated model switching based on performance metrics, noise-robust learning, and ensemble-based decision-making.
{"title":"ASReview LAB v.2: Open-source text screening with multiple agents and a crowd of experts.","authors":"Jonathan de Bruin, Peter Lombaers, Casper Kaandorp, Jelle Teijema, Timo van der Kuil, Berke Yazan, Angie Dong, Rens van de Schoot","doi":"10.1016/j.patter.2025.101318","DOIUrl":"10.1016/j.patter.2025.101318","url":null,"abstract":"<p><p>ASReview LAB v.2 introduces an advancement in AI-assisted systematic reviewing by enabling collaborative screening with multiple experts (\"a crowd of oracles\") using a shared AI model. The platform supports multiple AI agents within the same project, allowing users to switch between fast general-purpose models and domain-specific, semantic, or multilingual transformer models. Leveraging the SYNERGY benchmark dataset, performance has improved significantly, showing a 24.1% reduction in loss compared to version 1 through model improvements and hyperparameter tuning. ASReview LAB v.2 follows user-centric design principles and offers reproducible, transparent workflows. It logs key configuration and annotation data while balancing full model traceability with efficient storage. Future developments include automated model switching based on performance metrics, noise-robust learning, and ensemble-based decision-making.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101318"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03eCollection Date: 2025-07-11DOI: 10.1016/j.patter.2025.101317
Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, Lennart Purucker, Sahithya Ravi, Jan N van Rijn, Prabhant Singh, Joaquin Vanschoren, Jos van der Velde, Marcel Wever
OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform's impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML's capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.
{"title":"OpenML: Insights from 10 years and more than a thousand papers.","authors":"Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, Lennart Purucker, Sahithya Ravi, Jan N van Rijn, Prabhant Singh, Joaquin Vanschoren, Jos van der Velde, Marcel Wever","doi":"10.1016/j.patter.2025.101317","DOIUrl":"10.1016/j.patter.2025.101317","url":null,"abstract":"<p><p>OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform's impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML's capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 7","pages":"101317"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145030901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}