首页 > 最新文献

Journal of open source software最新文献

英文 中文
PDF Entity Annotation Tool (PEAT). PDF实体注释工具(泥炭)。
Pub Date : 2025-04-08 DOI: 10.21105/joss.05336
Christopher G Stahl, Kristan J Markey, Brian C Jewell, Dahnish Shams, Michele M Taylor, A Amina Wilkins, Sean Watford, Andy Shapiro, Michelle Angrish

While different text mining approaches - including the use of Artificial Intelligence (AI) and other machine based methods - continue to expand at a rapid pace, the tools used by researchers to create the labeled datasets required for training, modeling, and evaluation remain rudimentary. Labeled datasets contain the target attributes the machine is going to learn; for example, training an algorithm to delineate between images of a car or truck would generally require a set of images with a quantitative description of the underlying features of each vehicle type. Development of labeled textual data that can be used to build natural language machine learning models for scientific literature is not currently integrated into existing manual workflows used by domain experts. Published literature is rich with important information, such as different types of embedded text, plots, and tables that can all be used as inputs to train ML/natural language processing (NLP) models, when extracted and prepared in machine readable formats. Currently, both normalized data extraction of use to domain experts and extraction to support development of ML/NLP models are labor intensive and cumbersome manual processes. Automatic extraction of data and information from formats such as PDFs that are optimized for layout and human readability, not machine readability. The PDF (Portable Document Format) Entity Annotation Tool (PEAT) was developed with the goal of allowing users to annotate publications within their current print format, while also allowing those annotations to be captured in a machine-readable format. One of the main issues with traditional annotation tools is that they require transforming the PDF into plain text to facilitate the annotation process. While doing so lessens the technical challenges of annotating data, the user loses all structure and provenance that was inherent in the underlying PDF. Also, textual data extraction from PDFs can be an error prone process. Challenges include identifying sequential blocks of text and a multitude of document formats (multiple columns, font encodings, etc.). As a result of these challenges, using existing tools for development of NLP/ML models directly from PDFs is difficult because the generated outputs are not interoperable. We created a system that allows annotations to be completed on the original PDF document structure, with no plain text extraction. The result is an application that allows for easier and more accurate annotations. In addition, by including a feature that grants the user the ability to easily create a schema, we have developed a system that can be used to annotate text for different domain-centric schemas of relevance to subject matter experts. Different knowledge domains require distinct schemas and annotation tags to support machine learning.

虽然不同的文本挖掘方法——包括使用人工智能(AI)和其他基于机器的方法——继续以快速的速度扩展,但研究人员用于创建训练、建模和评估所需的标记数据集的工具仍然是初级的。标记的数据集包含机器将要学习的目标属性;例如,训练一种算法来描述汽车或卡车的图像,通常需要一组图像,其中包含每种车辆类型的潜在特征的定量描述。可用于为科学文献构建自然语言机器学习模型的标记文本数据的开发目前尚未集成到领域专家使用的现有手动工作流程中。已发表的文献中含有丰富的重要信息,例如不同类型的嵌入式文本、图表和表格,当以机器可读的格式提取和准备时,这些信息都可以用作训练ML/自然语言处理(NLP)模型的输入。目前,用于领域专家的规范化数据提取和用于支持ML/NLP模型开发的提取都是劳动密集型和繁琐的手工过程。从pdf等格式中自动提取数据和信息,这些格式针对布局和人类可读性而不是机器可读性进行了优化。PDF(可移植文档格式)实体注释工具(PEAT)的开发目标是允许用户在其当前的打印格式中注释出版物,同时还允许以机器可读的格式捕获这些注释。传统注释工具的一个主要问题是,它们需要将PDF转换为纯文本以促进注释过程。虽然这样做减少了注释数据的技术挑战,但用户失去了底层PDF中固有的所有结构和来源。此外,从pdf中提取文本数据可能是一个容易出错的过程。挑战包括识别连续的文本块和多种文档格式(多列、字体编码等)。由于这些挑战,使用现有工具直接从pdf开发NLP/ML模型是困难的,因为生成的输出不能互操作。我们创建了一个系统,允许在原始PDF文档结构上完成注释,而不需要提取纯文本。结果是一个允许更容易和更准确的注释的应用程序。此外,通过包含允许用户轻松创建模式的特性,我们开发了一个系统,该系统可用于为与主题专家相关的不同以领域为中心的模式注释文本。不同的知识领域需要不同的模式和注释标签来支持机器学习。
{"title":"PDF Entity Annotation Tool (PEAT).","authors":"Christopher G Stahl, Kristan J Markey, Brian C Jewell, Dahnish Shams, Michele M Taylor, A Amina Wilkins, Sean Watford, Andy Shapiro, Michelle Angrish","doi":"10.21105/joss.05336","DOIUrl":"10.21105/joss.05336","url":null,"abstract":"<p><p>While different text mining approaches - including the use of Artificial Intelligence (AI) and other machine based methods - continue to expand at a rapid pace, the tools used by researchers to create the labeled datasets required for training, modeling, and evaluation remain rudimentary. Labeled datasets contain the target attributes the machine is going to learn; for example, training an algorithm to delineate between images of a car or truck would generally require a set of images with a quantitative description of the underlying features of each vehicle type. Development of labeled textual data that can be used to build natural language machine learning models for scientific literature is not currently integrated into existing manual workflows used by domain experts. Published literature is rich with important information, such as different types of embedded text, plots, and tables that can all be used as inputs to train ML/natural language processing (NLP) models, when extracted and prepared in machine readable formats. Currently, both normalized data extraction of use to domain experts and extraction to support development of ML/NLP models are labor intensive and cumbersome manual processes. Automatic extraction of data and information from formats such as PDFs that are optimized for layout and human readability, not machine readability. The PDF (Portable Document Format) Entity Annotation Tool (PEAT) was developed with the goal of allowing users to annotate publications within their current print format, while also allowing those annotations to be captured in a machine-readable format. One of the main issues with traditional annotation tools is that they require transforming the PDF into plain text to facilitate the annotation process. While doing so lessens the technical challenges of annotating data, the user loses all structure and provenance that was inherent in the underlying PDF. Also, textual data extraction from PDFs can be an error prone process. Challenges include identifying sequential blocks of text and a multitude of document formats (multiple columns, font encodings, etc.). As a result of these challenges, using existing tools for development of NLP/ML models directly from PDFs is difficult because the generated outputs are not interoperable. We created a system that allows annotations to be completed on the original PDF document structure, with no plain text extraction. The result is an application that allows for easier and more accurate annotations. In addition, by including a feature that grants the user the ability to easily create a schema, we have developed a system that can be used to annotate text for different domain-centric schemas of relevance to subject matter experts. Different knowledge domains require distinct schemas and annotation tags to support machine learning.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 108","pages":"5336"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightLogR: Reproducible analysis of personal light exposure data. LightLogR:个人光照数据的可重复分析。
Pub Date : 2025-03-13 DOI: 10.21105/joss.07601
Johannes Zauner, Steffen Hartmeyer, Manuel Spitschan

Light plays an important role in human health and well-being, which necessitates the study of the effects of personal light exposure in real-world settings, measured by means of wearable devices. A growing number of studies incorporate these kinds of data to assess associations between light and health outcomes. Yet with few or missing standards, guidelines, and frameworks, it is challenging setting up measurements, analysing the data, and comparing outcomes between studies. Overall, time series data from wearable light loggers are significantly more complex compared to controlled stimuli used in laboratory studies. In this paper, we introduce LightLogR, a novel resource to facilitate these research efforts. The package for R statistical software is open-source and permissively MIT-licenced. As part of a developing software ecosystem, LightLogR is built with common challenges of current and future datasets in mind. The package standardises many tasks for importing and processing personal light exposure data. It allows for quick as well as detailed insights into the datasets through summary and visualisation tools. Furthermore, LightLogR incorporates major metrics commonly used in the field (61 metrics across 17 metric families), all while embracing an inherently hierarchical, participant-based data structure.

光对人类的健康和福祉起着重要作用,因此有必要通过可穿戴设备测量,研究现实世界中个人光照射的影响。越来越多的研究纳入了这类数据,以评估光与健康结果之间的关联。然而,由于标准、指南和框架很少或缺失,因此设置测量、分析数据以及比较不同研究的结果都具有挑战性。总体而言,与实验室研究中使用的受控刺激相比,来自可穿戴光记录仪的时间序列数据要复杂得多。在本文中,我们介绍了 LightLogR,这是一种新型资源,可为这些研究工作提供便利。R 统计软件包是开源的,并获得了麻省理工学院的许可。作为发展中的软件生态系统的一部分,LightLogR 在构建时考虑到了当前和未来数据集所面临的共同挑战。该软件包将许多导入和处理个人光照数据的任务标准化。它可以通过汇总和可视化工具快速、详细地了解数据集。此外,LightLogR 纳入了该领域常用的主要指标(17 个指标族中的 61 个指标),同时采用了固有的分层、基于参与者的数据结构。
{"title":"LightLogR: Reproducible analysis of personal light exposure data.","authors":"Johannes Zauner, Steffen Hartmeyer, Manuel Spitschan","doi":"10.21105/joss.07601","DOIUrl":"10.21105/joss.07601","url":null,"abstract":"<p><p>Light plays an important role in human health and well-being, which necessitates the study of the effects of personal light exposure in real-world settings, measured by means of wearable devices. A growing number of studies incorporate these kinds of data to assess associations between light and health outcomes. Yet with few or missing standards, guidelines, and frameworks, it is challenging setting up measurements, analysing the data, and comparing outcomes between studies. Overall, time series data from wearable light loggers are significantly more complex compared to controlled stimuli used in laboratory studies. In this paper, we introduce LightLogR, a novel resource to facilitate these research efforts. The package for R statistical software is open-source and permissively MIT-licenced. As part of a developing software ecosystem, LightLogR is built with common challenges of current and future datasets in mind. The package standardises many tasks for importing and processing personal light exposure data. It allows for quick as well as detailed insights into the datasets through summary and visualisation tools. Furthermore, LightLogR incorporates major metrics commonly used in the field (61 metrics across 17 metric families), all while embracing an inherently hierarchical, participant-based data structure.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 107","pages":"7601"},"PeriodicalIF":0.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BART-Survival: A Bayesian machine learning approach to survival analyses in Python. BART-Survival:在Python中使用贝叶斯机器学习方法进行生存分析。
Pub Date : 2025-01-28 DOI: 10.21105/joss.07213
Jacob Tiegs, Julia Raykin, Ilia Rochlin

BART-Survival is a Python package that allows time-to-event (survival) analyses in discrete-time using the non-parametric machine learning algorithm, Bayesian Additive Regression Trees (BART). BART-Survival combines the performance of the BART algorithm with the complementary data and model formatting required to complete the survival analyses. The library contains a convenient application programming interface (API) that allows a simple approach when using the library for survival analyses, while maintaining capabilities for added complexity when desired. The package is intended for analysts exploring use of flexible non-parametric alternatives to traditional (semi-)parametric survival analyses.

BART- survival是一个Python包,它允许使用非参数机器学习算法贝叶斯加性回归树(BART)在离散时间内进行时间到事件(生存)分析。BART- survival将BART算法的性能与完成生存分析所需的补充数据和模型格式相结合。该库包含一个方便的应用程序编程接口(API),它允许在使用库进行生存分析时使用简单的方法,同时在需要时保留增加复杂性的功能。该软件包旨在为分析师探索使用灵活的非参数替代传统(半)参数生存分析。
{"title":"BART-Survival: A Bayesian machine learning approach to survival analyses in Python.","authors":"Jacob Tiegs, Julia Raykin, Ilia Rochlin","doi":"10.21105/joss.07213","DOIUrl":"10.21105/joss.07213","url":null,"abstract":"<p><p>BART-Survival is a Python package that allows time-to-event (survival) analyses in discrete-time using the non-parametric machine learning algorithm, Bayesian Additive Regression Trees (BART). BART-Survival combines the performance of the BART algorithm with the complementary data and model formatting required to complete the survival analyses. The library contains a convenient application programming interface (API) that allows a simple approach when using the library for survival analyses, while maintaining capabilities for added complexity when desired. The package is intended for analysts exploring use of flexible non-parametric alternatives to traditional (semi-)parametric survival analyses.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 105","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11848778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143495117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
bayes_traj: A Python package for Bayesian trajectory analysis. bayes_traj:一个用于贝叶斯轨迹分析的Python包。
Pub Date : 2025-01-01 Epub Date: 2025-04-15 DOI: 10.21105/joss.07323
James C Ross, Tingting Zhao
{"title":"bayes_traj: A Python package for Bayesian trajectory analysis.","authors":"James C Ross, Tingting Zhao","doi":"10.21105/joss.07323","DOIUrl":"10.21105/joss.07323","url":null,"abstract":"","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 108","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12857910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146109406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMPPY3D: Image Processing in Python for 3D Image Stacks. IMPPY3D:用于3D图像堆栈的Python图像处理。
Pub Date : 2025-01-01 DOI: 10.21105/joss.07405
Newell H Moser, Alexander K Landauer, Orion L Kafka

Image Processing in Python for 3D image stacks, or IMPPY3D, is a free and open-source software (FOSS) repository that simplifies post-processing and 3D shape characterization for grayscale image stacks, otherwise known as volumetric images, 3D images, or voxel models. While IMPPY3D, pronounced impee-three-dee, was originally created for post-processing image stacks generated from X-ray computed tomography (XCT) measurements, it can be applied generally in post-processing 2D and 3D images. IMPPY3D includes tools for segmenting volumetric images and characterizing the 3D shape of features or regions of interest. These functionalities have proven useful in 3D shape analysis of powder particles, porous polymers, concrete aggregates, internal pores/defects, and more (see the Research Applications section). IMPPY3D consists of a combination of original Python scripts, Cython extensions, and convenience wrappers for popular third-party libraries like SciKit-Image (Walt et al., 2014), OpenCV (Bradski, 2000), and PyVista (Sullivan & Kaszynski, 2019). Highlighted capabilities of IMPPY3D include: varying image processing parameters interactively, applying numerous 2D/3D image filters (e.g., blurring/sharpening, denoising, erosion/dilation), segmenting and labeling continuous 3D objects, precisely rotating and re-slicing an image stack in 3D, generating rotated bounding boxes fitted to voxelized features, converting image stacks into 3D voxel models, exporting 3D models as Visualization Toolkik (VTK) files for ParaView (Ayachit, 2015), and converting voxel models into smooth mesh-based models. Additional information and example scripts can be found in the included ReadMe files within the IMPPY3D GitHub repository (Moser, Landauer, et al., 2024). As a visualized example, Figure 1 demonstrates the high-level steps to characterize powder particles using IMPPY3D. This workflow is also similar to how pores can be visualized and characterized in metal-based additive manufacturing. Additional research applications for IMPPY3D are discussed in a later section.

用于3D图像堆栈的Python图像处理(IMPPY3D)是一个免费的开源软件(FOSS)存储库,它简化了灰度图像堆栈的后处理和3D形状表征,也称为体积图像,3D图像或体素模型。虽然IMPPY3D最初是为x射线计算机断层扫描(XCT)测量产生的图像堆栈的后处理而创建的,但它可以广泛应用于后处理2D和3D图像。IMPPY3D包括分割体积图像和表征感兴趣的特征或区域的3D形状的工具。这些功能已被证明可用于粉末颗粒、多孔聚合物、混凝土骨料、内部孔隙/缺陷等的3D形状分析(参见研究应用部分)。IMPPY3D由原始Python脚本、Cython扩展和方便包装器组成,适用于流行的第三方库,如SciKit-Image (Walt et al., 2014)、OpenCV (Bradski, 2000)和PyVista (Sullivan & Kaszynski, 2019)。IMPPY3D的突出功能包括:交互式地改变图像处理参数,应用大量2D/3D图像过滤器(例如,模糊/锐化,去噪,侵蚀/膨胀),分割和标记连续的3D对象,在3D中精确旋转和重新切片图像堆栈,生成适合体素化特征的旋转边界框,将图像堆栈转换为3D体素模型,将3D模型导出为visualtoolkik (VTK)文件用于ParaView (Ayachit, 2015)。将体素模型转换为平滑的基于网格的模型。其他信息和示例脚本可以在IMPPY3D GitHub存储库中包含的ReadMe文件中找到(Moser, Landauer, et al., 2024)。作为一个可视化示例,图1演示了使用IMPPY3D表征粉末颗粒的高级步骤。该工作流程也类似于如何在金属基增材制造中可视化和表征孔隙。IMPPY3D的其他研究应用将在后面的章节中讨论。
{"title":"IMPPY3D: Image Processing in Python for 3D Image Stacks.","authors":"Newell H Moser, Alexander K Landauer, Orion L Kafka","doi":"10.21105/joss.07405","DOIUrl":"10.21105/joss.07405","url":null,"abstract":"<p><p>Image Processing in Python for 3D image stacks, or IMPPY3D, is a free and open-source software (FOSS) repository that simplifies post-processing and 3D shape characterization for grayscale image stacks, otherwise known as volumetric images, 3D images, or voxel models. While IMPPY3D, pronounced impee-three-dee, was originally created for post-processing image stacks generated from X-ray computed tomography (XCT) measurements, it can be applied generally in post-processing 2D and 3D images. IMPPY3D includes tools for segmenting volumetric images and characterizing the 3D shape of features or regions of interest. These functionalities have proven useful in 3D shape analysis of powder particles, porous polymers, concrete aggregates, internal pores/defects, and more (see the Research Applications section). IMPPY3D consists of a combination of original Python scripts, Cython extensions, and convenience wrappers for popular third-party libraries like SciKit-Image (Walt et al., 2014), OpenCV (Bradski, 2000), and PyVista (Sullivan & Kaszynski, 2019). Highlighted capabilities of IMPPY3D include: varying image processing parameters interactively, applying numerous 2D/3D image filters (e.g., blurring/sharpening, denoising, erosion/dilation), segmenting and labeling continuous 3D objects, precisely rotating and re-slicing an image stack in 3D, generating rotated bounding boxes fitted to voxelized features, converting image stacks into 3D voxel models, exporting 3D models as Visualization Toolkik (VTK) files for ParaView (Ayachit, 2015), and converting voxel models into smooth mesh-based models. Additional information and example scripts can be found in the included ReadMe files within the IMPPY3D GitHub repository (Moser, Landauer, et al., 2024). As a visualized example, Figure 1 demonstrates the high-level steps to characterize powder particles using IMPPY3D. This workflow is also similar to how pores can be visualized and characterized in metal-based additive manufacturing. Additional research applications for IMPPY3D are discussed in a later section.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 108","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11984349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MATLAB-based Instrument Control (MIC) package for fluorescence imaging. 一个基于matlab的仪器控制(MIC)包荧光成像。
Pub Date : 2025-01-01 Epub Date: 2025-01-28 DOI: 10.21105/joss.07275
Sajjad A Khan, Sandeep Pallikkuth, David J Schodt, Marjolein B M Meddens, Hanieh Mazloom-Farsibaf, Michael J Wester, Sheng Liu, Ellyse Taylor, Mohamadreza Fazel, Farzin Farzam, Keith A Lidke

MATLAB Instrument Control (MIC) is a software package designed to facilitate data collection for custom-built microscopes. Utilizing object-oriented programming, MIC provides a class for each low-level instrument. These classes inherit from a common MIC abstract class, ensuring a uniform interface across different instruments. Key components such as lasers, stages, power meter and cameras are grouped under abstract subclasses, which standardize interfaces and simplify the development of control classes for new instruments. Both simple and complex systems can be built from these lower level tools. Since the interoperation is developed by the end user, the modes or sequence of operations can be flexibly designed with interactive or automated data collection and integrated analysis. MATLAB provides the ability to create GUIs and therefore MIC allows for both rapid prototyping and for building custom, high-level user interfaces that can be used for production instruments.

MATLAB仪器控制(MIC)是一个软件包,旨在方便定制显微镜的数据收集。MIC利用面向对象编程,为每个低级仪器提供一个类。这些类继承自一个通用的MIC抽象类,确保了跨不同乐器的统一接口。将激光器、舞台、功率计和摄像机等关键部件分组到抽象子类中,使接口标准化,简化了新仪器控制类的开发。简单和复杂的系统都可以用这些低级工具构建。由于互操作是由最终用户开发的,因此可以灵活地设计操作模式或顺序,进行交互式或自动化的数据收集和集成分析。MATLAB提供了创建gui的能力,因此MIC允许快速原型和构建可用于生产仪器的自定义高级用户界面。
{"title":"A MATLAB-based Instrument Control (MIC) package for fluorescence imaging.","authors":"Sajjad A Khan, Sandeep Pallikkuth, David J Schodt, Marjolein B M Meddens, Hanieh Mazloom-Farsibaf, Michael J Wester, Sheng Liu, Ellyse Taylor, Mohamadreza Fazel, Farzin Farzam, Keith A Lidke","doi":"10.21105/joss.07275","DOIUrl":"10.21105/joss.07275","url":null,"abstract":"<p><p>MATLAB Instrument Control (MIC) is a software package designed to facilitate data collection for custom-built microscopes. Utilizing object-oriented programming, MIC provides a class for each low-level instrument. These classes inherit from a common MIC abstract class, ensuring a uniform interface across different instruments. Key components such as lasers, stages, power meter and cameras are grouped under abstract subclasses, which standardize interfaces and simplify the development of control classes for new instruments. Both simple and complex systems can be built from these lower level tools. Since the interoperation is developed by the end user, the modes or sequence of operations can be flexibly designed with interactive or automated data collection and integrated analysis. MATLAB provides the ability to create GUIs and therefore MIC allows for both rapid prototyping and for building custom, high-level user interfaces that can be used for production instruments.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 105","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12176407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
fastfrechet: An R package for fast implementation of Fréchet regression with distributional responses. fastfrechet:一个R包,用于快速实现具有分布响应的fr<s:1>回归。
Pub Date : 2025-01-01 Epub Date: 2025-05-01 DOI: 10.21105/joss.07925
Alexander Coulter, Rebecca Lee, Irina Gaynanova

Distribution-as-response regression problems are gaining wider attention, especially within biomedical settings where observation-rich patient specific data sets are available, such as feature densities in CT scans (Petersen et al., 2021), actigraphy (Ghosal et al., 2023), and continuous glucose monitoring (Coulter et al., 2024; Matabuena et al., 2021). To accommodate the complex structure of such problems, Petersen & Müller (2019) proposed a regression framework called Fréchet regression which allows non-Euclidean responses, including distributional responses. This regression framework was further extended for variable selection by Tucker et al. (2023), and Coulter et al. (2024) developed a fast variable selection algorithm for the specific setting of univariate distributional responses equipped with the 2-Wasserstein metric (2-Wasserstein space). We present fastfrechet, an R package providing fast implementation of these Fréchet regression and variable selection methods in 2-Wasserstein space, with resampling tools for automatic variable selection. fastfrechet makes distribution-based Fréchet regression with resampling-supplemented variable selection readily available and highly scalable to large data sets, such as the UK Biobank (Doherty et al., 2017).

分布即响应回归问题正得到更广泛的关注,特别是在生物医学环境中,观察丰富的患者特定数据集可用,例如CT扫描中的特征密度(Petersen等人,2021),活动描图(Ghosal等人,2023)和连续血糖监测(Coulter等人,2024;Matabuena等人,2021)。为了适应这类问题的复杂结构,Petersen & m ller(2019)提出了一种称为fr回归的回归框架,该框架允许非欧几里得响应,包括分布响应。Tucker等人(2023)将该回归框架进一步扩展为变量选择,Coulter等人(2024)开发了一种快速变量选择算法,用于配备2-Wasserstein度量(2-Wasserstein空间)的单变量分布响应的特定设置。我们提出了fastfrechet,这是一个R包,提供了在2-Wasserstein空间中快速实现这些frefrechet回归和变量选择方法,并提供了用于自动变量选择的重采样工具。fastfrechet使基于分布的fracimchet回归与重新采样补充的变量选择很容易获得,并且高度可扩展到大型数据集,如UK Biobank (Doherty等人,2017)。
{"title":"fastfrechet: An R package for fast implementation of Fréchet regression with distributional responses.","authors":"Alexander Coulter, Rebecca Lee, Irina Gaynanova","doi":"10.21105/joss.07925","DOIUrl":"10.21105/joss.07925","url":null,"abstract":"<p><p>Distribution-as-response regression problems are gaining wider attention, especially within biomedical settings where observation-rich patient specific data sets are available, such as feature densities in CT scans (Petersen et al., 2021), actigraphy (Ghosal et al., 2023), and continuous glucose monitoring (Coulter et al., 2024; Matabuena et al., 2021). To accommodate the complex structure of such problems, Petersen & Müller (2019) proposed a regression framework called <i>Fréchet regression</i> which allows non-Euclidean responses, including distributional responses. This regression framework was further extended for variable selection by Tucker et al. (2023), and Coulter et al. (2024) developed a fast variable selection algorithm for the specific setting of univariate distributional responses equipped with the 2-Wasserstein metric (<i>2-Wasserstein space</i>). We present fastfrechet, an R package providing fast implementation of these Fréchet regression and variable selection methods in 2-Wasserstein space, with resampling tools for automatic variable selection. fastfrechet makes distribution-based Fréchet regression with resampling-supplemented variable selection readily available and highly scalable to large data sets, such as the UK Biobank (Doherty et al., 2017).</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 109","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12536474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145350830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCSimMod: An R Package for Working with Ordinary Differential Equation Models Encoded in the MCSim Model Specification Language. 一个用MCSim模型规范语言编码的常微分方程模型的R包。
Pub Date : 2025-01-01 Epub Date: 2025-08-15 DOI: 10.21105/joss.08492
Dustin F Kapraun, Todd J Zurlinden, Ryan D Friese, Andrew J Shapiro
{"title":"MCSimMod: An R Package for Working with Ordinary Differential Equation Models Encoded in the MCSim Model Specification Language.","authors":"Dustin F Kapraun, Todd J Zurlinden, Ryan D Friese, Andrew J Shapiro","doi":"10.21105/joss.08492","DOIUrl":"10.21105/joss.08492","url":null,"abstract":"","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"10 112","pages":"8492"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
harmonize-wq: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats. harmonize-wq:将水质门户网站的数据标准化、清理和整理成更易于分析的格式。
Pub Date : 2024-10-22 DOI: 10.21105/joss.07305
Justin Bousquin, Cristina A Mullin
{"title":"harmonize-wq: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats.","authors":"Justin Bousquin, Cristina A Mullin","doi":"10.21105/joss.07305","DOIUrl":"10.21105/joss.07305","url":null,"abstract":"","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"9 102","pages":"7305"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
primerForge: a Python program for identifying primer pairs capable of distinguishing groups of genomes from each other. primerForge:一个Python程序,用于识别能够区分基因组组的引物对。
Pub Date : 2024-09-16 DOI: 10.21105/joss.06850
Joseph S Wirth, Lee S Katz, Grant M Williams, Jessica C Chen

In both molecular epidemiology and microbial ecology, it is useful to be able to categorize specific strains of microorganisms in either an ingroup or an outgroup in a given population, e.g. to distinguish a pathogenic strain of interest from its non-virulent relatives. An "ingroup" refers to a group of microbes that are the primary focus of study or interest. Conversely, an "outgroup" consists of microbes that are closely-related to, but have evolved separately from, the ingroup. While whole genome sequencing and downstream phylogenetic analyses can be employed to do this, these techniques are often slow and can be resource intensive. Additionally, the laboratory would have to sequence the whole genome to use these tools to determine whether or not a new sample is part of the ingroup or outgroup. Alternatively, polymerase chain reaction (PCR) can be used to amplify regions of genetic material that are specific to the strain(s) of interest. PCR is faster, less expensive, and more accessible than whole genome sequencing, so having a PCR-based approach can accelerate the detection of specific strain(s) of microbes and facilitate diagnoses and/or population studies.

在分子流行病学和微生物生态学中,能够对特定种群内群或外群中的特定微生物菌株进行分类是有用的,例如,将感兴趣的致病菌株与其非毒性亲缘菌株区分开来。“内群体”指的是一组微生物,它们是研究或兴趣的主要焦点。相反,“外群”由与内群密切相关但又独立进化的微生物组成。虽然可以采用全基因组测序和下游系统发育分析来实现这一目标,但这些技术通常速度很慢,而且可能需要大量资源。此外,实验室必须对整个基因组进行测序,才能使用这些工具来确定一个新样本是属于内群体还是外群体。另外,聚合酶链反应(PCR)可用于扩增特定于感兴趣的菌株的遗传物质区域。PCR比全基因组测序更快、更便宜、更容易获得,因此基于PCR的方法可以加速对特定微生物菌株的检测,并促进诊断和/或种群研究。
{"title":"primerForge: a Python program for identifying primer pairs capable of distinguishing groups of genomes from each other.","authors":"Joseph S Wirth, Lee S Katz, Grant M Williams, Jessica C Chen","doi":"10.21105/joss.06850","DOIUrl":"10.21105/joss.06850","url":null,"abstract":"<p><p>In both molecular epidemiology and microbial ecology, it is useful to be able to categorize specific strains of microorganisms in either an ingroup or an outgroup in a given population, e.g. to distinguish a pathogenic strain of interest from its non-virulent relatives. An \"ingroup\" refers to a group of microbes that are the primary focus of study or interest. Conversely, an \"outgroup\" consists of microbes that are closely-related to, but have evolved separately from, the ingroup. While whole genome sequencing and downstream phylogenetic analyses can be employed to do this, these techniques are often slow and can be resource intensive. Additionally, the laboratory would have to sequence the whole genome to use these tools to determine whether or not a new sample is part of the ingroup or outgroup. Alternatively, polymerase chain reaction (PCR) can be used to amplify regions of genetic material that are specific to the strain(s) of interest. PCR is faster, less expensive, and more accessible than whole genome sequencing, so having a PCR-based approach can accelerate the detection of specific strain(s) of microbes and facilitate diagnoses and/or population studies.</p>","PeriodicalId":94101,"journal":{"name":"Journal of open source software","volume":"9 101","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11611387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of open source software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1