The R package DoE.MIParray uses mixed integer optimization for creating well-balanced arrays for experimental designs. Its use requires availability of at least one of the commercial optimizers Gurobi or Mosek. Investing some effort into the creation of a suitable array is justified, because experimental runs are often very expensive, so that their information content should be maximized. DoE.MIParray is particularly useful for creating relatively small mixed level designs. Balance is optimized by applying the quality criterion “generalized minimum aberration” (GMA), which aims at minimizing confounding of low order effects in factorial models, without assuming a specific model. For relevant cases, DoE.MIParray exploits a lower bound on its objective function, which allows to drastically reduce the computational burden of mixed integer optimization.
{"title":"DoE.MIParray: An R Package for Algorithmic Creation of Orthogonal Arrays","authors":"U. Grömping","doi":"10.5334/jors.286","DOIUrl":"https://doi.org/10.5334/jors.286","url":null,"abstract":"The R package DoE.MIParray uses mixed integer optimization for creating well-balanced arrays for experimental designs. Its use requires availability of at least one of the commercial optimizers Gurobi or Mosek. Investing some effort into the creation of a suitable array is justified, because experimental runs are often very expensive, so that their information content should be maximized. DoE.MIParray is particularly useful for creating relatively small mixed level designs. Balance is optimized by applying the quality criterion “generalized minimum aberration” (GMA), which aims at minimizing confounding of low order effects in factorial models, without assuming a specific model. For relevant cases, DoE.MIParray exploits a lower bound on its objective function, which allows to drastically reduce the computational burden of mixed integer optimization.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43445401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the most widely used approaches to explore and understand non-random structure in data in a largely assumption-free manner is clustering. In this paper, we detail two original Shiny apps written in R, openly developed at Github, and archived at Zenodo, for exploring and comparing major unsupervised algorithms for clustering applications: k-means and Gaussian mixture models via Expectation-Maximization. The first app leverages simulated data and the second uses Fisher’s Iris data set to visually and numerically compare the clustering algorithms using data familiar to many applied researchers. In addition to being valuable tools for comparing these clustering techniques, the open source architecture of our Shiny apps allows for wide engagement and extension by the broader open science community, such as including different data sets and algorithms.
{"title":"Exploring and Comparing Unsupervised Clustering Algorithms","authors":"M. Lavielle, Philip D. Waggoner","doi":"10.5334/jors.269","DOIUrl":"https://doi.org/10.5334/jors.269","url":null,"abstract":"One of the most widely used approaches to explore and understand non-random structure in data in a largely assumption-free manner is clustering. In this paper, we detail two original Shiny apps written in R, openly developed at Github, and archived at Zenodo, for exploring and comparing major unsupervised algorithms for clustering applications: k-means and Gaussian mixture models via Expectation-Maximization. The first app leverages simulated data and the second uses Fisher’s Iris data set to visually and numerically compare the clustering algorithms using data familiar to many applied researchers. In addition to being valuable tools for comparing these clustering techniques, the open source architecture of our Shiny apps allows for wide engagement and extension by the broader open science community, such as including different data sets and algorithms.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44365301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
pyfMRIqc is a tool for checking the quality of raw functional magnetic resonance imaging (fMRI) data. pyfMRIqc produces a range of output files which can be used to identify fMRI data quality issues such as artefacts, motion, signal loss etc. This tool creates a number of 3D and 4D NIFTI files that can be used for in depth quality assurance. Additionally, 2D images are created for each NIFTI file for a quick overview. These images and other information (e.g. about signal-to-noise ratio, scan parameters, etc.) are combined in a user-friendly HTML output file. pyfMRIqc is written entirely in Python and is available under a GNU GPL3 license on GitHub (https://drmichaellindner.github.io/pyfMRIqc/). pyfMRIqc can be used from the command line and therefore can be included as part of a processing pipeline or used to quality-check a series of datasets using batch scripting. The quality assurance of a single dataset can also be performed via dialog boxes.
{"title":"pyfMRIqc: A Software Package for Raw fMRI Data Quality Assurance","authors":"B. Williams, Michael Q. Lindner","doi":"10.5334/jors.280","DOIUrl":"https://doi.org/10.5334/jors.280","url":null,"abstract":"pyfMRIqc is a tool for checking the quality of raw functional magnetic resonance imaging (fMRI) data. pyfMRIqc produces a range of output files which can be used to identify fMRI data quality issues such as artefacts, motion, signal loss etc. This tool creates a number of 3D and 4D NIFTI files that can be used for in depth quality assurance. Additionally, 2D images are created for each NIFTI file for a quick overview. These images and other information (e.g. about signal-to-noise ratio, scan parameters, etc.) are combined in a user-friendly HTML output file. pyfMRIqc is written entirely in Python and is available under a GNU GPL3 license on GitHub (https://drmichaellindner.github.io/pyfMRIqc/). pyfMRIqc can be used from the command line and therefore can be included as part of a processing pipeline or used to quality-check a series of datasets using batch scripting. The quality assurance of a single dataset can also be performed via dialog boxes.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47948560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Jackson, S. Collis, T. Lang, C. Potvin, T. Munson
This software assimilates data from an arbitrary number of weather radars together with other spatial wind fields (eg numerical weather forecasting model data) in order to retrieve high resolution three dimensional wind fields. PyDDA uses NumPy and SciPy’s optimization techniques combined with the Python Atmospheric Radiation Measurement (ARM) Radar Toolkit (Py-ART) in order to create wind fields using the 3D variational technique (3DVAR). PyDDA is hosted and distributed on GitHub at https://github.com/ openradar/PyDDA. PyDDA has the potential to be used by the atmospheric science community to develop high resolution wind retrievals from radar networks. These retrievals can be used for the evaluation of numerical weather forecasting models and plume modelling. This paper shows how wind fields from 2 NEXt generation RADar (NEXRAD) WSR-88D radars and the High Resolution Rapid Refresh can be assimilated together using PyDDA to create a high resolution wind field inside Hurricane Florence.
{"title":"PyDDA: A Pythonic Direct Data Assimilation Framework for Wind Retrievals","authors":"R. Jackson, S. Collis, T. Lang, C. Potvin, T. Munson","doi":"10.5334/jors.264","DOIUrl":"https://doi.org/10.5334/jors.264","url":null,"abstract":"This software assimilates data from an arbitrary number of weather radars together with other spatial wind fields (eg numerical weather forecasting model data) in order to retrieve high resolution three dimensional wind fields. PyDDA uses NumPy and SciPy’s optimization techniques combined with the Python Atmospheric Radiation Measurement (ARM) Radar Toolkit (Py-ART) in order to create wind fields using the 3D variational technique (3DVAR). PyDDA is hosted and distributed on GitHub at https://github.com/ openradar/PyDDA. PyDDA has the potential to be used by the atmospheric science community to develop high resolution wind retrievals from radar networks. These retrievals can be used for the evaluation of numerical weather forecasting models and plume modelling. This paper shows how wind fields from 2 NEXt generation RADar (NEXRAD) WSR-88D radars and the High Resolution Rapid Refresh can be assimilated together using PyDDA to create a high resolution wind field inside Hurricane Florence.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46891617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
OpenStreetMap is the largest freely accessible geographic database of the world. The necessary processing steps to extract information from this database, namely reading, converting and filtering, can be very consuming in terms of computational time and disk space. esy-osmfilter is a Python library designed to read and filter OpenStreetMap data under optimization of disc space and computational time. It uses parallelized prefiltering for the OSM pbf-files data in order to quickly reduce the original data size. It can store the prefiltered data to the hard drive. In the main filtering process, these prefiltered data can be reused repeatedly to identify different items with the help of more specialized main filters. At the end, the output can be exported to the GeoJSON format.
{"title":"esy-osmfilter – A Python Library to Efficiently Extract OpenStreetMap Data","authors":"A. Pluta, Ontje Lünsdorf","doi":"10.5334/jors.317","DOIUrl":"https://doi.org/10.5334/jors.317","url":null,"abstract":"OpenStreetMap is the largest freely accessible geographic database of the world. The necessary processing steps to extract information from this database, namely reading, converting and filtering, can be very consuming in terms of computational time and disk space. esy-osmfilter is a Python library designed to read and filter OpenStreetMap data under optimization of disc space and computational time. It uses parallelized prefiltering for the OSM pbf-files data in order to quickly reduce the original data size. It can store the prefiltered data to the hard drive. In the main filtering process, these prefiltered data can be reused repeatedly to identify different items with the help of more specialized main filters. At the end, the output can be exported to the GeoJSON format.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46664764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific software development processes are understood to be distinct from commercial software development practices due to uncertain and evolving states of scientific knowledge. Sustaining these software products is a recognized challenge, but under-examined is the usability and usefulness of such tools to their scientific end users. User research is a well-established set of techniques (e.g., interviews, mockups, usability tests) applied in commercial software projects to develop foundational, generative, and evaluative insights about products and the people who use them. Currently these approaches are not commonly applied and discussed in scientific software development work. The use of user research techniques in scientific environments can be challenging due to the nascent, fluid problem spaces of scientific work, varying scope of projects and their user communities, and funding/economic constraints on projects. In this paper, we reflect on our experiences undertaking a multi-method user research process in the Deduce project. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists make decisions around data change. There is a lack of common terminology since the concept of systematically measuring and managing data change is under explored in scientific environments. To bridge this gap we conducted user research that focuses on user practices, needs, and motivations to help us design and develop metrics and tools for data change. This paper contributes reflections and the lessons we have learned from our experiences. We offer key takeaways for scientific software project teams to effectively and flexibly incorporate similar processes into their projects.
{"title":"Experiences with a Flexible User Research Process to Build Data Change Tools","authors":"Drew Paine, D. Ghoshal, L. Ramakrishnan","doi":"10.5334/jors.284","DOIUrl":"https://doi.org/10.5334/jors.284","url":null,"abstract":"Scientific software development processes are understood to be distinct from commercial software development practices due to uncertain and evolving states of scientific knowledge. Sustaining these software products is a recognized challenge, but under-examined is the usability and usefulness of such tools to their scientific end users. User research is a well-established set of techniques (e.g., interviews, mockups, usability tests) applied in commercial software projects to develop foundational, generative, and evaluative insights about products and the people who use them. Currently these approaches are not commonly applied and discussed in scientific software development work. The use of user research techniques in scientific environments can be challenging due to the nascent, fluid problem spaces of scientific work, varying scope of projects and their user communities, and funding/economic constraints on projects. In this paper, we reflect on our experiences undertaking a multi-method user research process in the Deduce project. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists make decisions around data change. There is a lack of common terminology since the concept of systematically measuring and managing data change is under explored in scientific environments. To bridge this gap we conducted user research that focuses on user practices, needs, and motivations to help us design and develop metrics and tools for data change. This paper contributes reflections and the lessons we have learned from our experiences. We offer key takeaways for scientific software project teams to effectively and flexibly incorporate similar processes into their projects.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43079040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CURSAT ver. 2.1 is an open-source code in QB64 basic, compilable into an executable file, that produces n pseudoreplicates of an empirical data set. Both resamplings with and without replacement are allowed by the software. The number (n) of pseudoreplicates is set by the user. Pseudoreplicates can be exported in a file that can be opened by a spreadsheet. Thus, pseudoreplicates are permanently stored and available for the calculation of statistics of interest and associated variance. The software also uses the n pseudoreplicate data to reconstruct n accumulation matrices, appended in an output file. Accumulation has applicability in cases in which repeated sample-based data must be evaluated for exhaustiveness. Many situations involve repeated sampling from the same set of observations. For example, if data consist of species occurrence, the software can be used by a wide spectrum of specialists such as ecologists, zoologists, botanists, biogeographers, conservationists for biodiversity estimation. The software allows performing accumulation irrespectively whether the input data set contains abundance (quantitative) or incidence (binary) data. Accumulation matrices can be imported in statistical packages to estimate distributions of successive pooling of samples and depict accumulation and rarefaction curves with associated variance. CURSAT ver. 2.1 is released in two editions. Edition #1 is recommended for analysis, whereas Edition #2 generates a log file in which the flow of internal steps of resampling and accumulation routines is reported. Edition #2 is primarily designed for educational purposes and quality check. Funding statement: The software was developed with no specific funds.
{"title":"CURSAT ver. 2.1: A Simple, Resampling-Based, Program to Generate Pseudoreplicates of Data and Calculate Rarefaction Curves","authors":"G. Gentile","doi":"10.5334/jors.260","DOIUrl":"https://doi.org/10.5334/jors.260","url":null,"abstract":"CURSAT ver. 2.1 is an open-source code in QB64 basic, compilable into an executable file, that produces n pseudoreplicates of an empirical data set. Both resamplings with and without replacement are allowed by the software. The number (n) of pseudoreplicates is set by the user. Pseudoreplicates can be exported in a file that can be opened by a spreadsheet. Thus, pseudoreplicates are permanently stored and available for the calculation of statistics of interest and associated variance. The software also uses the n pseudoreplicate data to reconstruct n accumulation matrices, appended in an output file. Accumulation has applicability in cases in which repeated sample-based data must be evaluated for exhaustiveness. Many situations involve repeated sampling from the same set of observations. For example, if data consist of species occurrence, the software can be used by a wide spectrum of specialists such as ecologists, zoologists, botanists, biogeographers, conservationists for biodiversity estimation. The software allows performing accumulation irrespectively whether the input data set contains abundance (quantitative) or incidence (binary) data. Accumulation matrices can be imported in statistical packages to estimate distributions of successive pooling of samples and depict accumulation and rarefaction curves with associated variance. CURSAT ver. 2.1 is released in two editions. Edition #1 is recommended for analysis, whereas Edition #2 generates a log file in which the flow of internal steps of resampling and accumulation routines is reported. Edition #2 is primarily designed for educational purposes and quality check. Funding statement: The software was developed with no specific funds.","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49385037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comfort Simulator: A Software Tool to Model Thermoregulation and Perception of Comfort","authors":"J. Hussan, P. Hunter","doi":"10.5334/jors.288","DOIUrl":"https://doi.org/10.5334/jors.288","url":null,"abstract":"","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48012091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Janus: A Python Package for Agent-Based Modeling of Land Use and Land Cover Change","authors":"K. Kaiser, A. Flores, C. Vernon","doi":"10.5334/jors.306","DOIUrl":"https://doi.org/10.5334/jors.306","url":null,"abstract":"","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47029367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"bayest: An R Package for Effect-Size Targeted Bayesian Two-Sample t-Tests","authors":"Riko Kelter","doi":"10.5334/jors.290","DOIUrl":"https://doi.org/10.5334/jors.290","url":null,"abstract":"","PeriodicalId":37323,"journal":{"name":"Journal of Open Research Software","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47629527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}