Data in Brief最新文献_第3页

Glial cell-specific proteomic data from the substantia nigra of a rat 6-OHDA and fluorocitrate model of astrocyte death and microglial activation 大鼠6-OHDA黑质胶质细胞特异性蛋白质组学数据和星形胶质细胞死亡和小胶质细胞活化的氟柠檬酸模型

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-15 DOI: 10.1016/j.dib.2026.112473

Justyna Kadłuczka , Tatsiana Chubukova , Przemysław Mielczarek , Agata Maziak , Adam Roman , Emilija Napieralska , Katarzyna Z. Kuter

The dataset shows proteomic results (timsTOF Pro 2 (Bruker)) obtained using an originally developed method of adult rat brain isolation of astrocytes or microglia from the same sample. Mechano-enzymatic dissociation and FACS sorting retrieved pure, separate cellular fractions from the substantia nigra. Results come from an animal model of early Parkinson’s disease of selective nigrostriatal dopaminergic system neuron degeneration by 6-OHDA, combined with 7-day-long astrocyte dysfunction and death induced by fluorocitrate. Astrocyte and neuron death both induce microglial activation, but to varying degrees and through different mechanisms. Previous studies did not allow for assigning changes in common mechanisms (such as, for example, energy metabolism) to a specific cell type in tissue, while in vitro studies lack functional dimension. This research enables the identification of clear information on mechanisms within each cell type, originating from a multidimensional environment, while maintaining the functional and tissue-specific context. Comparison of astrocyte death-induced vs neuron death-induced microglia activation processes can be analysed using this dataset. Raw data are available via ProteomeXchange with identifiers PXD066353 and PXD067265.

该数据集显示了蛋白质组学结果（timsTOF Pro 2 (Bruker)），使用最初开发的方法从相同样品中分离成年大鼠脑星形胶质细胞或小胶质细胞获得。机械酶解和FACS分选从黑质中提取了纯的、分离的细胞组分。结果来自于6-羟多巴胺诱导的选择性黑质纹状体多巴胺能系统神经元变性，并伴7天星形胶质细胞功能障碍和氟柠檬酸致死亡的早期帕金森病动物模型。星形胶质细胞和神经元死亡都能诱导小胶质细胞活化，但程度和机制不同。以前的研究不允许将共同机制的变化（例如，能量代谢）分配给组织中的特定细胞类型，而体外研究缺乏功能维度。这项研究能够在维持功能和组织特异性背景的同时，从多维环境中确定每种细胞类型机制的清晰信息。星形胶质细胞死亡诱导与神经元死亡诱导的小胶质细胞激活过程的比较可以使用该数据集进行分析。原始数据可通过ProteomeXchange与标识符PXD066353和PXD067265。

{"title":"Glial cell-specific proteomic data from the substantia nigra of a rat 6-OHDA and fluorocitrate model of astrocyte death and microglial activation","authors":"Justyna Kadłuczka , Tatsiana Chubukova , Przemysław Mielczarek , Agata Maziak , Adam Roman , Emilija Napieralska , Katarzyna Z. Kuter","doi":"10.1016/j.dib.2026.112473","DOIUrl":"10.1016/j.dib.2026.112473","url":null,"abstract":"<div><div>The dataset shows proteomic results (timsTOF Pro 2 (Bruker)) obtained using an originally developed method of adult rat brain isolation of astrocytes or microglia from the same sample. Mechano-enzymatic dissociation and FACS sorting retrieved pure, separate cellular fractions from the substantia nigra. Results come from an animal model of early Parkinson’s disease of selective nigrostriatal dopaminergic system neuron degeneration by 6-OHDA, combined with 7-day-long astrocyte dysfunction and death induced by fluorocitrate. Astrocyte and neuron death both induce microglial activation, but to varying degrees and through different mechanisms. Previous studies did not allow for assigning changes in common mechanisms (such as, for example, energy metabolism) to a specific cell type in tissue, while in vitro studies lack functional dimension. This research enables the identification of clear information on mechanisms within each cell type, originating from a multidimensional environment, while maintaining the functional and tissue-specific context. Comparison of astrocyte death-induced vs neuron death-induced microglia activation processes can be analysed using this dataset. Raw data are available via ProteomeXchange with identifiers PXD066353 and PXD067265.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112473"},"PeriodicalIF":1.4,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-angle reflectance dataset of wheat and peach trees with unmanned aerial vehicle imagery 基于无人机影像的小麦和桃树多角度反射率数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-15 DOI: 10.1016/j.dib.2026.112480

Yuhan Guo , Xihan Mu , Xiang Lyu , Dasheng Fan , Chengzhuo Lei , Ruiqiang Wu , Donghui Xie , Guangjian Yan

Medium- to high-resolution satellite data, such as Sentinel-2 and Landsat-8, have significantly enhanced the accuracy of vegetation monitoring. However, canopy reflectance and vegetation indices are affected by the bidirectional reflectance distribution function (BRDF) effect, introducing uncertainties in vegetation phenology monitoring and parameter retrieval. Herein, a multi-angle reflectance dataset was developed using unmanned aerial vehicles (UAVs) to investigate the angular effect over the crop canopy across various growth stages. Multi-angle measurements were acquired over two 10 m × 10 m plots of wheat and peach trees in Yukou Town, Pinggu District, Beijing, China, during their respective key growth stages (29 Feb to 21 May in 2024 for wheat; 11 Apr to 12 Jun in 2024 for peach trees). Wheat measurements were obtained using a DJI Matrice 350 RTK equipped with a Cubert ULTRIS X20 Plus hyperspectral imager, while peach tree data were captured with an Agrowing 61MP Sextuple multispectral imager. UAV flights used a spherical-helical trajectory to maximize solar-view geometry coverage, yielding over 1,800 angular observations under clear-sky conditions. All UAV imagery was post-processed and stored in text and Excel formats. The dataset provides high-quality, multi-angle spectral measurements for two representative crops during multiple growth stages, enabling in-depth investigations into BRDF characteristics, angular sensitivity of spectral metrics, and temporal spectral dynamics. Furthermore, the data support the validation of physical models for vegetation remote sensing, vegetation parameter inversion, and the training of machine-learning models for more remote sensing applications.

Sentinel-2和Landsat-8等中分辨率卫星数据显著提高了植被监测的精度。然而，冠层反射率和植被指数受双向反射率分布函数（BRDF）效应的影响，给植被物候监测和参数反演带来了不确定性。在此基础上，利用无人机建立了多角度反射率数据集，研究作物不同生长阶段的角度效应。在北京市平谷区玉口镇的两个10 m × 10 m的小麦和桃树地块上，在各自的关键生长阶段（2024年2月29日至5月21日，2024年4月11日至6月12日）进行了多角度测量。小麦测量数据使用配备Cubert ULTRIS X20 Plus高光谱成像仪的大ji matrix 350 RTK获得，桃树数据使用aggrowing 6100万像素六元多光谱成像仪捕获。无人机飞行使用球-螺旋轨迹来最大化太阳视野的几何覆盖，在晴朗的天空条件下产生超过1800个角度的观测。所有无人机图像都经过后处理并以文本和Excel格式存储。该数据集提供了两种代表性作物在多个生长阶段的高质量、多角度光谱测量数据，可以深入研究BRDF特征、光谱指标的角度灵敏度和时间光谱动态。此外，这些数据为植被遥感物理模型的验证、植被参数反演和机器学习模型的训练提供了支持，为更多的遥感应用提供了支持。

{"title":"A multi-angle reflectance dataset of wheat and peach trees with unmanned aerial vehicle imagery","authors":"Yuhan Guo , Xihan Mu , Xiang Lyu , Dasheng Fan , Chengzhuo Lei , Ruiqiang Wu , Donghui Xie , Guangjian Yan","doi":"10.1016/j.dib.2026.112480","DOIUrl":"10.1016/j.dib.2026.112480","url":null,"abstract":"<div><div>Medium- to high-resolution satellite data, such as Sentinel-2 and Landsat-8, have significantly enhanced the accuracy of vegetation monitoring. However, canopy reflectance and vegetation indices are affected by the bidirectional reflectance distribution function (BRDF) effect, introducing uncertainties in vegetation phenology monitoring and parameter retrieval. Herein, a multi-angle reflectance dataset was developed using unmanned aerial vehicles (UAVs) to investigate the angular effect over the crop canopy across various growth stages. Multi-angle measurements were acquired over two 10 m × 10 m plots of wheat and peach trees in Yukou Town, Pinggu District, Beijing, China, during their respective key growth stages (29 Feb to 21 May in 2024 for wheat; 11 Apr to 12 Jun in 2024 for peach trees). Wheat measurements were obtained using a DJI Matrice 350 RTK equipped with a Cubert ULTRIS X20 Plus hyperspectral imager, while peach tree data were captured with an Agrowing 61MP Sextuple multispectral imager. UAV flights used a spherical-helical trajectory to maximize solar-view geometry coverage, yielding over 1,800 angular observations under clear-sky conditions. All UAV imagery was post-processed and stored in text and Excel formats. The dataset provides high-quality, multi-angle spectral measurements for two representative crops during multiple growth stages, enabling in-depth investigations into BRDF characteristics, angular sensitivity of spectral metrics, and temporal spectral dynamics. Furthermore, the data support the validation of physical models for vegetation remote sensing, vegetation parameter inversion, and the training of machine-learning models for more remote sensing applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112480"},"PeriodicalIF":1.4,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-analytical dataset on Lekhaniya Mahakashaya: HRLC-MS/MS Orbitrap profiling, HPTLC fingerprinting with marker estimation, and FTIR spectroscopy Lekhaniya Mahakashaya的多分析数据集：HRLC-MS/MS Orbitrap分析，HPTLC指纹图谱与标记估计，FTIR光谱

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112464

Narayan Singh, Anjali Upadhyay, Debajyoti Chakraborty, Girimalla Patil, Pramod Yadav, Galib R, Pradeep Kumar Prajapati

This dataset provides a comprehensive, multidimensional phytochemical characterization of Lekhniya Mahakashaya (LMK), a classical Ayurvedic formulation used for the Treatment of obesity and metabolic disorders. Three complementary analytical platforms were employed: High-Resolution Liquid Chromatography-Mass Spectrometry/Mass Spectrometry (HRLC-MS/MS) Orbitrap, High-Performance Thin Layer Chromatography (HPTLC), and Fourier-Transform Infrared (FTIR) spectroscopy. For HRLC-MS/MS analysis, Hydroalcoholic extracts of LMK were prepared and analysed in both Positive and negative ionisation modes using an Orbitrap mass spectrometer. The dataset includes 2034 metabolomics-identified compounds: 1712 in positive ion mode and 322 in negative ion mode, with detailed retention times, molecular weights, and fragmentation patterns, suitable for compound annotation, metabolite networking, and cheminformatics-based correlation studies. HPTLC fingerprinting was performed using methanolic extracts (2–10 µL) on silica gel 60 F₂₅₄ plates, which yielded 7–8 reproducible peaks across the Rf range 0.12–0.89 under 254 nm, 366 nm, and 540 nm, confirming LMK’s polyherbal complexity. Marker-based quantification revealed that berberine (0.24 % w/w) and curcumin (0.31 % w/w) were performed using validated HPTLC protocols, and calibration curves are included for reproducibility. FTIR Spectroscopic data encompass 19 absorption peaks (3278–0468 cm⁻¹), representing hydroxyl, aliphatic, unsaturated, sulfur-, nitrogen-, and halogen-containing functional groups, which highlights LMK’s diverse phytochemical matrix. This dataset is structured for pharmacological exploration, quality control, and phytochemical standardisation of LMK and associated Ayurvedic formulations. This dataset is a reference resource. Additionally, the dataset can be used for molecular docking validation, network pharmacology mapping, metabolomics comparisons, and future drug discovery. To promote transparency, encourage computational or experimental reuse, and support integrative research on traditional medicine, all raw chromatograms, spectrum files, and processed data tables are made available in widely accessible formats.

该数据集提供了Lekhniya Mahakashaya （LMK）的全面、多维的植物化学特征，LMK是一种经典的阿育吠陀配方，用于治疗肥胖和代谢紊乱。采用三种互补分析平台：高分辨率液相色谱-质谱/质谱（HRLC-MS/MS）轨道阱、高效薄层色谱（HPTLC）和傅里叶变换红外（FTIR）光谱。为了进行HRLC-MS/MS分析，制备了LMK的水醇提取物，并使用Orbitrap质谱仪在正负电离模式下进行了分析。该数据集包括2034种代谢组学鉴定的化合物：1712种为正离子模式，322种为负离子模式，具有详细的保留时间、分子量和碎片模式，适用于化合物注释、代谢物网络和基于化学信息学的相关性研究。在硅胶60f₂₅₄板上使用甲醇提取物（2-10µL）进行HPTLC指纹图谱，在254 nm， 366 nm和540 nm下，在Rf范围0.12-0.89内产生了7-8个可重复的峰，证实了LMK的多草药复杂性。基于标记的定量显示，小檗碱（0.24% w/w）和姜黄素（0.31% w/w）采用验证的HPTLC方案进行，并包括校准曲线以确保重复性。FTIR光谱数据包含19个吸收峰（3278-0468 cm），代表羟基、脂肪族、不饱和、含硫、含氮和含卤素的官能团，这突出了LMK的植物化学基质的多样性。该数据集用于LMK和相关阿育吠陀配方的药理学探索、质量控制和植物化学标准化。此数据集是参考资源。此外，该数据集可用于分子对接验证、网络药理学定位、代谢组学比较和未来的药物发现。为提高透明度，鼓励计算或实验重复使用，并支持传统医学综合研究，所有原始色谱图、光谱文件和处理过的数据表均以可广泛获取的格式提供。

{"title":"Multi-analytical dataset on Lekhaniya Mahakashaya: HRLC-MS/MS Orbitrap profiling, HPTLC fingerprinting with marker estimation, and FTIR spectroscopy","authors":"Narayan Singh, Anjali Upadhyay, Debajyoti Chakraborty, Girimalla Patil, Pramod Yadav, Galib R, Pradeep Kumar Prajapati","doi":"10.1016/j.dib.2026.112464","DOIUrl":"10.1016/j.dib.2026.112464","url":null,"abstract":"<div><div>This dataset provides a comprehensive, multidimensional phytochemical characterization of <em>Lekhniya Mahakashaya</em> (LMK), a classical Ayurvedic formulation used for the Treatment of obesity and metabolic disorders. Three complementary analytical platforms were employed: High-Resolution Liquid Chromatography-Mass Spectrometry/Mass Spectrometry (HRLC-MS/MS) Orbitrap, High-Performance Thin Layer Chromatography (HPTLC), and Fourier-Transform Infrared (FTIR) spectroscopy. For HRLC-MS/MS analysis, Hydroalcoholic extracts of LMK were prepared and analysed in both Positive and negative ionisation modes using an Orbitrap mass spectrometer. The dataset includes 2034 metabolomics-identified compounds: 1712 in positive ion mode and 322 in negative ion mode, with detailed retention times, molecular weights, and fragmentation patterns, suitable for compound annotation, metabolite networking, and cheminformatics-based correlation studies. HPTLC fingerprinting was performed using methanolic extracts (2–10 µL) on silica gel 60 F₂₅₄ plates, which yielded 7–8 reproducible peaks across the Rf range 0.12–0.89 under 254 nm, 366 nm, and 540 nm, confirming LMK’s polyherbal complexity. Marker-based quantification revealed that berberine (0.24 % w/w) and curcumin (0.31 % w/w) were performed using validated HPTLC protocols, and calibration curves are included for reproducibility. FTIR Spectroscopic data encompass 19 absorption peaks (3278–0468 cm⁻¹), representing hydroxyl, aliphatic, unsaturated, sulfur-, nitrogen-, and halogen-containing functional groups, which highlights LMK’s diverse phytochemical matrix. This dataset is structured for pharmacological exploration, quality control, and phytochemical standardisation of LMK and associated Ayurvedic formulations. This dataset is a reference resource. Additionally, the dataset can be used for molecular docking validation, network pharmacology mapping, metabolomics comparisons, and future drug discovery. To promote transparency, encourage computational or experimental reuse, and support integrative research on traditional medicine, all raw chromatograms, spectrum files, and processed data tables are made available in widely accessible formats.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112464"},"PeriodicalIF":1.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dataset on the performance of a photovoltaic solar water pump in coffee plantations using response surface methodology (RSM) 基于响应面法（RSM）的咖啡种植园光伏太阳能水泵性能数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112467

Nopparat Suriyachai, Torpong Kreetachat, Saksit Imman

This dataset presents experimental data on the performance of a photovoltaic (PV) solar-powered water pumping system installed in a coffee plantation in Chiang Mai province, Thailand. The system performance was evaluated through controlled experiments using response surface methodology (RSM). Three independent variables were systematically varied: solar irradiance (300–900 W/m²), panel inclination (15–35°), and panel surface temperature (30–60°C). A total of 15 experimental runs were conducted, and the pumping efficiency (%) was recorded under each condition. Statistical analyses, including analysis of variance (ANOVA) and regression modeling, were applied to evaluate the effects of the individual variables and their interactions on system performance. The dataset includes raw and processed measurements, regression coefficients, and response surface parameters, enabling replication and further analysis. Perturbation plots, 3D surface plots, and contour plots provide detailed visualizations of the relationships between environmental factors and system efficiency. The optimal operating conditions were identified at a solar irradiance of 600 W/m², a panel inclination of 25°, and a panel surface temperature of 45°C, corresponding to a predicted maximum efficiency of 76.3–77.0%.

This dataset can be reused for designing optimized solar water pumping systems, validating predictive models, and comparing system performance under different environmental conditions or geographic locations. It also serves as a reference for researchers in renewable energy system optimization and agricultural water management. The data provide high-resolution, experimentally validated information on the combined effects of solar irradiance, panel inclination, and panel surface temperature on PV water pumping efficiency. Unlike previous studies, it includes detailed quantitative analysis specific to coffee-growing regions in Northern Thailand, along with regression models and visualizations that can guide both experimental replication and predictive modeling under similar climatic and agricultural conditions

本数据集展示了安装在泰国清迈省一个咖啡种植园的光伏（PV）太阳能抽水系统性能的实验数据。采用响应面法（RSM）通过对照实验对系统性能进行了评价。系统地改变了三个独立变量：太阳辐照度（300-900 W/m²），面板倾角（15-35°）和面板表面温度（30-60°C）。共进行了15次试验，记录了各工况下的抽气效率（%）。统计分析，包括方差分析（ANOVA）和回归模型，用于评估单个变量及其相互作用对系统性能的影响。该数据集包括原始和处理的测量值、回归系数和响应面参数，可进行复制和进一步分析。摄动图、三维表面图和等高线图提供了环境因素与系统效率之间关系的详细可视化。在太阳辐照度为600 W/m²，面板倾角为25°，面板表面温度为45°C的条件下，预测的最大效率为76.3-77.0%。该数据集可用于设计优化的太阳能水泵系统，验证预测模型，并比较不同环境条件或地理位置下的系统性能。为可再生能源系统优化和农业水资源管理研究提供参考。这些数据提供了太阳辐照度、面板倾角和面板表面温度对光伏水泵效率的综合影响的高分辨率、实验验证的信息。与以前的研究不同，它包括对泰国北部咖啡种植区的详细定量分析，以及回归模型和可视化，可以指导类似气候和农业条件下的实验复制和预测建模

{"title":"Dataset on the performance of a photovoltaic solar water pump in coffee plantations using response surface methodology (RSM)","authors":"Nopparat Suriyachai, Torpong Kreetachat, Saksit Imman","doi":"10.1016/j.dib.2026.112467","DOIUrl":"10.1016/j.dib.2026.112467","url":null,"abstract":"<div><div>This dataset presents experimental data on the performance of a photovoltaic (PV) solar-powered water pumping system installed in a coffee plantation in Chiang Mai province, Thailand. The system performance was evaluated through controlled experiments using response surface methodology (RSM). Three independent variables were systematically varied: solar irradiance (300–900 W/m²), panel inclination (15–35°), and panel surface temperature (30–60°C). A total of 15 experimental runs were conducted, and the pumping efficiency (%) was recorded under each condition. Statistical analyses, including analysis of variance (ANOVA) and regression modeling, were applied to evaluate the effects of the individual variables and their interactions on system performance. The dataset includes raw and processed measurements, regression coefficients, and response surface parameters, enabling replication and further analysis. Perturbation plots, 3D surface plots, and contour plots provide detailed visualizations of the relationships between environmental factors and system efficiency. The optimal operating conditions were identified at a solar irradiance of 600 W/m², a panel inclination of 25°, and a panel surface temperature of 45°C, corresponding to a predicted maximum efficiency of 76.3–77.0%.</div><div>This dataset can be reused for designing optimized solar water pumping systems, validating predictive models, and comparing system performance under different environmental conditions or geographic locations. It also serves as a reference for researchers in renewable energy system optimization and agricultural water management. The data provide high-resolution, experimentally validated information on the combined effects of solar irradiance, panel inclination, and panel surface temperature on PV water pumping efficiency. Unlike previous studies, it includes detailed quantitative analysis specific to coffee-growing regions in Northern Thailand, along with regression models and visualizations that can guide both experimental replication and predictive modeling under similar climatic and agricultural conditions</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112467"},"PeriodicalIF":1.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Cadenza lyric intelligibility prediction (CLIP) dataset 华彩乐段歌词可理解性预测（CLIP）数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112466

Gerardo Roa-Dabike , Trevor J. Cox , Jon P. Barker , Bruno M. Fazenda , Simone Graetzer , Rebecca R. Vos , Michael A. Akeroyd , Jennifer Firth , William M. Whitmer , Scott Bannister , Alinka Greasley

This paper presents CLIP, a dataset of 11,072 popular western music signals sourced from independent artists, accompanied by ground truth lyrics, and lyric intelligibility scores from listening tests. The dataset is designed to facilitate music information retrieval (MIR) research using machine learning. It was created to allow the development of algorithms to predict lyric intelligibility for the Cadenza ICASSP 2026 Signal Processing Grand Challenge. Currently, it is the only publicly available large-scale dataset for such a task. The music was sourced from the Free Music Archive (FMA) dataset and is unlikely to be familiar to listeners. We excluded tracks whose license did not allow derivative works and those that did not have English singing. Ground truth transcriptions were generated by seven native English speakers, resulting in 3700 excerpts of 5 to 10 words each from 1452 different songs. A hearing loss simulation was also applied to the stereo audio. This resulted in 11,100 music signals with no, mild or moderate hearing loss. This was done so more diverse hearing is represented in the dataset. Human transcriptions were then collected via an online listening experiment. Participants self-reported as having normal-hearing and being native English speakers. They listened to each music signal twice before transcribing each line. Final intelligibility scores were the ratio of matching words between the listening test responses and the ground truth transcriptions. The final dataset consists of audio, ground truth lyrics, intelligibility scores and associated metadata.

本文介绍了CLIP，这是一个来自独立艺术家的11072个流行西方音乐信号的数据集，伴随着真实的歌词，以及来自听力测试的歌词可理解性分数。该数据集旨在促进使用机器学习的音乐信息检索（MIR）研究。它的创建是为了允许算法的发展，以预测华彩ICASSP 2026信号处理大挑战的歌词可理解性。目前，它是唯一可用于此类任务的公开大规模数据集。这些音乐来自免费音乐档案（FMA）数据集，听众可能不太熟悉。我们排除了那些许可证不允许衍生作品和那些没有英文演唱的歌曲。7名母语为英语的人生成了真实的转录，从1452首不同的歌曲中提取了3700段，每段5到10个单词。对立体声音频进行了听力损失模拟。这导致了11100个音乐信号没有、轻度或中度听力损失。这样做是为了在数据集中表示更多样化的听力。然后通过在线听力实验收集人类的转录。参与者自我报告听力正常，母语为英语。他们先听两遍音乐信号，然后再抄写每一行。最终的可理解性分数是听力测试回答和基本事实转录之间匹配单词的比率。最终的数据集包括音频、真实歌词、可理解性分数和相关的元数据。

{"title":"The Cadenza lyric intelligibility prediction (CLIP) dataset","authors":"Gerardo Roa-Dabike , Trevor J. Cox , Jon P. Barker , Bruno M. Fazenda , Simone Graetzer , Rebecca R. Vos , Michael A. Akeroyd , Jennifer Firth , William M. Whitmer , Scott Bannister , Alinka Greasley","doi":"10.1016/j.dib.2026.112466","DOIUrl":"10.1016/j.dib.2026.112466","url":null,"abstract":"<div><div>This paper presents CLIP, a dataset of 11,072 popular western music signals sourced from independent artists, accompanied by ground truth lyrics, and lyric intelligibility scores from listening tests. The dataset is designed to facilitate music information retrieval (MIR) research using machine learning. It was created to allow the development of algorithms to predict lyric intelligibility for the Cadenza ICASSP 2026 Signal Processing Grand Challenge. Currently, it is the only publicly available large-scale dataset for such a task. The music was sourced from the Free Music Archive (FMA) dataset and is unlikely to be familiar to listeners. We excluded tracks whose license did not allow derivative works and those that did not have English singing. Ground truth transcriptions were generated by seven native English speakers, resulting in 3700 excerpts of 5 to 10 words each from 1452 different songs. A hearing loss simulation was also applied to the stereo audio. This resulted in 11,100 music signals with no, mild or moderate hearing loss. This was done so more diverse hearing is represented in the dataset. Human transcriptions were then collected via an online listening experiment. Participants self-reported as having normal-hearing and being native English speakers. They listened to each music signal twice before transcribing each line. Final intelligibility scores were the ratio of matching words between the listening test responses and the ground truth transcriptions. The final dataset consists of audio, ground truth lyrics, intelligibility scores and associated metadata.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112466"},"PeriodicalIF":1.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-term management reporting credibility data from an accounting experiment 来自会计实验的长期管理报告可信度数据

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112470

Eric Gooden, Nicole Holden

The data reports the results of an experiment used to examine whether short-horizon investors differ from long-horizon investors in their sensitivity to truthful disclosure regarding negative earnings news. We used an experimental methodology to isolate the impact of Investment Horizon and Forthcomingness on investors' long-term management credibility assessments. Specifically, participants assumed the role of either a long-horizon current investor (already owned shares) or a short-horizon prospective investor (contemplating an investment decision) in a fictional company. Participants were then given identical information regarding the firm and asked to make an initial credibility assessment. Subsequently, participants either received forthcoming disclosure from company management regarding negative earnings news (Forthcomingness) or the participant did not receive disclosure. Short-horizon investors then made an investment decision regarding an investment position in the firm, and all participants received negative earnings news regarding the firm. Participants returned two-weeks later to make final credibility assessments of company management as part of the post-experimental questionnaire.

这些数据报告了一项实验的结果，该实验用于检验短线投资者与长线投资者在对负面收益新闻的真实披露的敏感性方面是否存在差异。我们使用了一种实验方法来分离投资地平线和即将到来对投资者长期管理可信度评估的影响。具体来说，参与者在一个虚构的公司中扮演一个长期的当前投资者（已经拥有股票）或一个短期的潜在投资者（正在考虑一个投资决策）的角色。然后，参与者被告知有关该公司的相同信息，并被要求做出初步的可信度评估。随后，参与者要么收到公司管理层关于负面收益消息的即将披露（即将披露），要么没有收到披露。然后，短线投资者对该公司的投资头寸做出投资决策，所有参与者都收到有关该公司的负面收益消息。作为实验后问卷的一部分，参与者在两周后返回对公司管理层进行最后的可信度评估。

{"title":"Long-term management reporting credibility data from an accounting experiment","authors":"Eric Gooden, Nicole Holden","doi":"10.1016/j.dib.2026.112470","DOIUrl":"10.1016/j.dib.2026.112470","url":null,"abstract":"<div><div>The data reports the results of an experiment used to examine whether short-horizon investors differ from long-horizon investors in their sensitivity to truthful disclosure regarding negative earnings news. We used an experimental methodology to isolate the impact of Investment Horizon and Forthcomingness on investors' long-term management credibility assessments. Specifically, participants assumed the role of either a long-horizon current investor (already owned shares) or a short-horizon prospective investor (contemplating an investment decision) in a fictional company. Participants were then given identical information regarding the firm and asked to make an initial credibility assessment. Subsequently, participants either received forthcoming disclosure from company management regarding negative earnings news (Forthcomingness) or the participant did not receive disclosure. Short-horizon investors then made an investment decision regarding an investment position in the firm, and all participants received negative earnings news regarding the firm. Participants returned two-weeks later to make final credibility assessments of company management as part of the post-experimental questionnaire.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112470"},"PeriodicalIF":1.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Survey data on digital competence assessment among pre-service teachers in Vietnam 越南职前教师数字能力评估调查数据

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112465

Ha-Nam Nguyen , Hoai-Nam Nguyen , Thi-Thu Ngo

In the context of digital transformation in education, digital competence is one of the significant essential requirements for future teachers. This study surveyed and analyzed the digital competence structure of 1439 pre-service teachers in different regions of Viet Nam. We utilized a self-assessment questionnaire based on the Digital Kids Asia-Pacific (DKAP) framework, with references to TPACK (Technological Pedagogical Content Knowledge), DigComp (Digital Competence Framework for Citizens), and DigCompEdu frameworks. The dataset provided detailed information on each participant’s self-evaluated digital proficiency in five categories, along with demographic variables such as gender and subject specialization. The core of this data file locates itself in such potential to inform teacher training programs and educational policy by offering evidence on prowess and weakness in future teachers’ digital competence.

在教育数字化转型的背景下，数字化能力是对未来教师的重要基本要求之一。本研究调查并分析了越南不同地区1439名职前教师的数字能力结构。我们使用了一份基于亚太数字儿童（DKAP）框架的自我评估问卷，并参考了TPACK（技术教学内容知识）、DigComp（公民数字能力框架）和DigCompEdu框架。该数据集提供了每个参与者在五个类别中自我评估的数字熟练程度的详细信息，以及性别和学科专业化等人口统计变量。该数据文件的核心在于，通过提供关于未来教师数字能力优劣的证据，为教师培训计划和教育政策提供信息。

引用次数: 0

A Sentinel-1 SAR imagery dataset for airstrips detection and segmentation in the Brazilian Amazon Rainforest 巴西亚马逊雨林机场跑道检测与分割的Sentinel-1 SAR图像数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-14 DOI: 10.1016/j.dib.2026.112472

Leandro da Silva Gomes , Gustavo Henrique de Queiroz Stabile , Tahisa Neitzel Kuck , Felipe Augusto Pereira de Figueiredo , Elcio Hideiti Shiguemori , Dimas Irion Alves

The Brazilian Amazon Rainforest holds a large ecological and economic importance and is considered one of the most biodiverse regions on the planet. The region faces numerous challenges from illegal human activities that threaten its sustainability and well-being, which are often supported by the construction of unauthorized airstrips. Additionally, due to its persistent cloud cover, which often hinders monitoring with optical satellites, Synthetic Aperture Radar (SAR) imagery provides a crucial alternative for the region surveillance. Thus, this dataset was developed to support the training and evaluation of machine learning techniques, including deep learning models for detecting and segmenting airstrips in the Brazilian Amazon Rainforest using SAR imagery. The dataset comprises images from the Sentinel-1 satellite, acquired primarily between 2021 and 2024, covering 1040 locations of known airstrips sourced from the MapBiomas project (published in 2023, based on 2021 reference data). For the change detection task, historical “before” images were selected from the period between 2014 and 2021 to capture the pre-construction state. The data is structured to support three distinct machine learning tasks: object detection (e.g., YOLOv8), semantic segmentation (e.g., U-Net), and change detection. For each task, specific images and annotations are provided. Additionally, geospatial files (Shapefile, GeoPackage) are included to facilitate the integration and visualization of the dataset in a GIS environment. The data is valuable for researchers in remote sensing, computer vision, environmental monitoring, security and defense, enabling the development of automated systems to monitor irregular activities in remote forest regions. The dataset is available at a Mendeley Data repository: https://data.mendeley.com/datasets/x7rn78ymtn/1

巴西亚马逊雨林拥有巨大的生态和经济重要性，被认为是地球上生物多样性最丰富的地区之一。该地区面临着许多来自非法人类活动的挑战，这些活动威胁到其可持续性和福祉，这些活动往往由未经授权的飞机跑道建设提供支持。此外，由于其持续的云层覆盖通常会阻碍光学卫星的监测，合成孔径雷达（SAR）图像为区域监测提供了一个重要的替代方案。因此，该数据集的开发是为了支持机器学习技术的培训和评估，包括使用SAR图像检测和分割巴西亚马逊雨林中的飞机跑道的深度学习模型。该数据集包括Sentinel-1卫星的图像，主要在2021年至2024年期间获取，涵盖了MapBiomas项目（2023年发布，基于2021年参考数据）中已知机场跑道的1040个位置。对于变化检测任务，选择2014年至2021年期间的历史“之前”图像来捕捉施工前的状态。数据的结构支持三个不同的机器学习任务：对象检测（例如，YOLOv8），语义分割（例如，U-Net）和变化检测。对于每个任务，都提供了特定的图像和注释。此外，还包括地理空间文件（Shapefile、geoppackage），以促进数据集在GIS环境中的集成和可视化。这些数据对遥感、计算机视觉、环境监测、安全和国防方面的研究人员很有价值，使开发自动化系统能够监测偏远森林地区的不规则活动。该数据集可在Mendeley数据存储库中获得：https://data.mendeley.com/datasets/x7rn78ymtn/1

{"title":"A Sentinel-1 SAR imagery dataset for airstrips detection and segmentation in the Brazilian Amazon Rainforest","authors":"Leandro da Silva Gomes , Gustavo Henrique de Queiroz Stabile , Tahisa Neitzel Kuck , Felipe Augusto Pereira de Figueiredo , Elcio Hideiti Shiguemori , Dimas Irion Alves","doi":"10.1016/j.dib.2026.112472","DOIUrl":"10.1016/j.dib.2026.112472","url":null,"abstract":"<div><div>The Brazilian Amazon Rainforest holds a large ecological and economic importance and is considered one of the most biodiverse regions on the planet. The region faces numerous challenges from illegal human activities that threaten its sustainability and well-being, which are often supported by the construction of unauthorized airstrips. Additionally, due to its persistent cloud cover, which often hinders monitoring with optical satellites, Synthetic Aperture Radar (SAR) imagery provides a crucial alternative for the region surveillance. Thus, this dataset was developed to support the training and evaluation of machine learning techniques, including deep learning models for detecting and segmenting airstrips in the Brazilian Amazon Rainforest using SAR imagery. The dataset comprises images from the Sentinel-1 satellite, acquired primarily between 2021 and 2024, covering 1040 locations of known airstrips sourced from the MapBiomas project (published in 2023, based on 2021 reference data). For the change detection task, historical “before” images were selected from the period between 2014 and 2021 to capture the pre-construction state. The data is structured to support three distinct machine learning tasks: object detection (e.g., YOLOv8), semantic segmentation (e.g., U-Net), and change detection. For each task, specific images and annotations are provided. Additionally, geospatial files (Shapefile, GeoPackage) are included to facilitate the integration and visualization of the dataset in a GIS environment. The data is valuable for researchers in remote sensing, computer vision, environmental monitoring, security and defense, enabling the development of automated systems to monitor irregular activities in remote forest regions. The dataset is available at a Mendeley Data repository: <span><span>https://data.mendeley.com/datasets/x7rn78ymtn/1</span><svg><path></path></svg></span></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112472"},"PeriodicalIF":1.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualizing archaeobotanical data: A comprehensive photographic record of desiccated plant remains from an early modern context at Santi Quattro Coronati, Rome 可视化的考古植物学数据：在罗马的Santi Quattro Coronati的早期现代背景下，对干燥的植物遗骸进行了全面的摄影记录

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-13 DOI: 10.1016/j.dib.2026.112468

Claudia Moricca , Rachele Nicolini , Lucrezia Masci , Lia Barelli , Simona Morretta , Raffaele Pugliese , Laura Sadori

<div><div>The “Santi Quattro Coronati – archaeobotanical plates” dataset presents a comprehensive photographic collection of carpological remains recovered from a pit in the complex of Santi Quattro Coronati (Rome, Italy). The deposit, dated between the late 15th and the mid-16th century, yielded a diverse assemblage of desiccated plant remains. The dataset is novel in that it provides the complete photographic documentation of all identified taxa from a single Early Modern archaeological context, a chronological phase that remains underrepresented in Italian archaeobotanical research.</div><div>The photographic documentation focuses on a representative sample of each taxon identified in the archaeobotanical analysis, with particular attention to the best-preserved specimens. When multiple plant parts of the same taxon were present, all were included. The dataset also includes fragile and rarely illustrated plant parts, such as cereal rachis fragments, tunics and basal plates of onion and garlic, grapevine tendrils and legume seed coats. These are often excluded from reference atlases due to their low archaeological survivability and the consequent scarcity of well-preserved comparative specimens.</div><div>High-resolution images were acquired using a Leica MC205C stereomicroscope equipped with a Leica IC80HD camera and the Leica Application Suite v.4.5.0 software. Illumination was provided by the Leica LED5000 HDI™ dome system, ensuring constant, diffuse light conditions. A column of images was captured for each specimen and processed with Helicon Focus v.7.0.1 Pro through focus stacking to obtain a single fully focused image. Depending on specimen size and complexity, between 9 and 127 photographs were used per perspective. Larger samples, unsuitable for microscopic observation, were photographed using a Canon digital camera under controlled illumination. Post-processing was performed with GIMP, applying standard tools for background cleaning and masking. Each final plate includes a scale bar for size reference.</div><div>The dataset is organized alphabetically by plant family and taxon. For each taxon, one or more plates are provided, displaying specimens from one to three perspectives to represent their 3D morphology. Nomenclature follows the taxonomy used in the original publication of the assemblage and has been updated according to the most recent checklist of the Italian vascular flora. A metadata .xls file is provided to facilitate consultation, reuse, comparison and integration with other archaeobotanical datasets.</div><div>This dataset offers a well-documented comparative visual reference for species/genus identification and for assessing the preservation state and morphological integrity of desiccated archaeobotanical remains. Offering detailed photographic records of New World plant taxa previously identified in this context, the study enhances accessibility and understanding of these materials through visual reference. Despite bein

“Santi Quattro Coronati -考古植物板块”数据集展示了从意大利罗马的Santi Quattro Coronati复合体的一个坑中恢复的人类学遗骸的综合摄影集合。该矿床的历史可以追溯到15世纪晚期到16世纪中期，发现了各种各样的干枯植物遗骸。该数据集的新颖之处在于，它提供了来自单一的早期现代考古背景的所有已识别分类群的完整照片文档，这是意大利考古植物学研究中尚未充分代表的时间顺序阶段。摄影文献集中于考古植物学分析中确定的每个分类单元的代表性样本，特别关注保存最完好的标本。当同一分类单元的多个植物部分存在时，所有部分都被包括在内。该数据集还包括易碎且很少展示的植物部分，如谷物轴片、洋葱和大蒜的外衣和基板、葡萄藤卷须和豆类种皮。由于它们的考古存续能力较低，因而缺乏保存完好的比较标本，因此经常被排除在参考地图集之外。使用配备徕卡IC80HD相机的徕卡MC205C立体显微镜和Leica Application Suite v.4.5.0软件获取高分辨率图像。照明由徕卡LED5000 HDI™穹顶系统提供，确保恒定的漫射光条件。每个标本采集一列图像，用Helicon Focus v.7.0.1 Pro进行对焦叠加处理，得到一张完全聚焦的图像。根据标本的大小和复杂程度，每个视角使用9到127张照片。较大的样本，不适合显微镜观察，使用佳能数码相机在受控照明下拍摄。使用GIMP进行后处理，使用标准工具进行背景清理和遮盖。每个最终板包括一个比例尺的尺寸参考。数据集按植物科和分类单元的字母顺序组织。对于每个分类单元，提供一个或多个板，从一个到三个角度展示标本，以表示它们的三维形态。命名法遵循汇编原始出版物中使用的分类法，并根据意大利维管植物区系的最新清单进行了更新。提供了一个元数据。xls文件，以便与其他考古植物数据集进行查阅、重用、比较和集成。该数据集为物种/属鉴定和评估干燥考古植物遗骸的保存状态和形态完整性提供了一个有充分记录的比较视觉参考。该研究提供了在此背景下发现的新大陆植物分类群的详细照片记录，通过视觉参考提高了对这些材料的可及性和理解。尽管受到单一背景的限制，该数据集代表了考古植物学的最佳实践，鼓励其他研究人员分享他们所研究的人类学组合的完整照片文档，从而支持开放科学和逐步构建扩展的视觉参考集合。该数据集主要用于研究早期现代背景的考古植物学家和环境考古学家，但它也可以为研究其他年代和地点的干枯植物遗骸的研究人员提供服务。

{"title":"Visualizing archaeobotanical data: A comprehensive photographic record of desiccated plant remains from an early modern context at Santi Quattro Coronati, Rome","authors":"Claudia Moricca , Rachele Nicolini , Lucrezia Masci , Lia Barelli , Simona Morretta , Raffaele Pugliese , Laura Sadori","doi":"10.1016/j.dib.2026.112468","DOIUrl":"10.1016/j.dib.2026.112468","url":null,"abstract":"<div><div>The “Santi Quattro Coronati – archaeobotanical plates” dataset presents a comprehensive photographic collection of carpological remains recovered from a pit in the complex of Santi Quattro Coronati (Rome, Italy). The deposit, dated between the late 15th and the mid-16th century, yielded a diverse assemblage of desiccated plant remains. The dataset is novel in that it provides the complete photographic documentation of all identified taxa from a single Early Modern archaeological context, a chronological phase that remains underrepresented in Italian archaeobotanical research.</div><div>The photographic documentation focuses on a representative sample of each taxon identified in the archaeobotanical analysis, with particular attention to the best-preserved specimens. When multiple plant parts of the same taxon were present, all were included. The dataset also includes fragile and rarely illustrated plant parts, such as cereal rachis fragments, tunics and basal plates of onion and garlic, grapevine tendrils and legume seed coats. These are often excluded from reference atlases due to their low archaeological survivability and the consequent scarcity of well-preserved comparative specimens.</div><div>High-resolution images were acquired using a Leica MC205C stereomicroscope equipped with a Leica IC80HD camera and the Leica Application Suite v.4.5.0 software. Illumination was provided by the Leica LED5000 HDI™ dome system, ensuring constant, diffuse light conditions. A column of images was captured for each specimen and processed with Helicon Focus v.7.0.1 Pro through focus stacking to obtain a single fully focused image. Depending on specimen size and complexity, between 9 and 127 photographs were used per perspective. Larger samples, unsuitable for microscopic observation, were photographed using a Canon digital camera under controlled illumination. Post-processing was performed with GIMP, applying standard tools for background cleaning and masking. Each final plate includes a scale bar for size reference.</div><div>The dataset is organized alphabetically by plant family and taxon. For each taxon, one or more plates are provided, displaying specimens from one to three perspectives to represent their 3D morphology. Nomenclature follows the taxonomy used in the original publication of the assemblage and has been updated according to the most recent checklist of the Italian vascular flora. A metadata .xls file is provided to facilitate consultation, reuse, comparison and integration with other archaeobotanical datasets.</div><div>This dataset offers a well-documented comparative visual reference for species/genus identification and for assessing the preservation state and morphological integrity of desiccated archaeobotanical remains. Offering detailed photographic records of New World plant taxa previously identified in this context, the study enhances accessibility and understanding of these materials through visual reference. Despite bein","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112468"},"PeriodicalIF":1.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A dataset of acoustics emissions recordings of woodboring insects in wood and cultural objects, context images and remarks 木材和文物中蛀木昆虫的声学发射记录，背景图像和注释的数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-13 DOI: 10.1016/j.dib.2026.112461

Tom Marti , Cécile Costa , Emmanuel de Salis , Laura Brambilla , Stefano Carrino

This dataset presents acoustic emission (AE) recordings collected from woodboring insect-infested and non-infested wood samples and cultural heritage objects. Data acquisition was conducted across four institutions: Haute École Arc (HE-Arc), Switzerland; Canadian Museum of History (CMH), Canada; National Gallery of Canada (NGC), Canada; and Musée National de l'Automobile (MNA), France; from April to July 2025.

The recordings were captured using Vallen VS900-M sensors with AEP5 preamplifiers set to 34dB gain and AMSY-6 4-channel chassis, employing continuous acoustic emission monitoring at 2 MHz sampling rate. Each experiment utilized three sensors positioned on test objects and one reference sensor facing up to record ambient noise conditions. The dataset comprises approximately 440.9 hours of recordings distributed across the four collection sites.

The dataset includes four main components: raw Vallen AE database files (.tradb format), processed statistical data exported as CSV files, contextual images documenting setups and sensor placements, and Python script for statistical data processing. Each experiment is documented with duration, material specifications, coupling methods (renaissance wax, cyclododecane, or mechanical fastening), environmental conditions, and infestation labels.

The dataset's structure enables multiple research applications. The time-series statistical features and binary classification labels (infested/non-infested) provide a foundation for supervised machine learning model development. The diverse experimental conditions across four geographic locations, varying coupling methods, and different ambient environments offer opportunities to evaluate model generalization and robustness. Reference sensor recordings captured simultaneously with each experiment allow for ambient noise characterization studies and development of noise filtering methodologies. The combination of raw acoustic data and contextual documentation makes this dataset suitable for comparative studies of different signal processing approaches and feature extraction techniques in acoustic emission analysis for heritage conservation applications.

该数据集展示了从木材钻孔昆虫感染和非昆虫感染的木材样本和文化遗产中收集的声发射（AE）记录。数据采集在四个机构进行：瑞士的Haute École Arc (HE-Arc)；加拿大历史博物馆，加拿大；加拿大国家美术馆（NGC），加拿大；和法国mus National de l'Automobile (MNA)；从2025年4月到7月。录音采用Vallen VS900-M传感器，AEP5前置放大器设置为34dB增益，AMSY-6 4通道机箱，采用连续声发射监测，采样率为2 MHz。每个实验使用放置在测试对象上的三个传感器和一个向上的参考传感器来记录环境噪声条件。该数据集包括分布在四个收集点的大约440.9小时的录音。数据集包括四个主要组成部分：原始valenae数据库文件(；tradb格式)，处理的统计数据导出为CSV文件，记录设置和传感器位置的上下文图像，以及用于统计数据处理的Python脚本。每个实验都记录了持续时间、材料规格、耦合方法（再生蜡、环十二烷或机械紧固）、环境条件和虫害标签。数据集的结构支持多种研究应用。时间序列统计特征和二元分类标签（出没/未出没）为监督式机器学习模型的开发提供了基础。四个地理位置的不同实验条件、不同的耦合方法和不同的环境为评估模型的泛化和鲁棒性提供了机会。与每个实验同时捕获的参考传感器记录允许环境噪声特性研究和噪声过滤方法的发展。原始声学数据和上下文文档的结合使该数据集适合于在遗产保护应用的声发射分析中对不同信号处理方法和特征提取技术进行比较研究。

{"title":"A dataset of acoustics emissions recordings of woodboring insects in wood and cultural objects, context images and remarks","authors":"Tom Marti , Cécile Costa , Emmanuel de Salis , Laura Brambilla , Stefano Carrino","doi":"10.1016/j.dib.2026.112461","DOIUrl":"10.1016/j.dib.2026.112461","url":null,"abstract":"<div><div>This dataset presents acoustic emission (AE) recordings collected from woodboring insect-infested and non-infested wood samples and cultural heritage objects. Data acquisition was conducted across four institutions: Haute École Arc (HE-Arc), Switzerland; Canadian Museum of History (CMH), Canada; National Gallery of Canada (NGC), Canada; and Musée National de l'Automobile (MNA), France; from April to July 2025.</div><div>The recordings were captured using Vallen VS900-M sensors with AEP5 preamplifiers set to 34dB gain and AMSY-6 4-channel chassis, employing continuous acoustic emission monitoring at 2 MHz sampling rate. Each experiment utilized three sensors positioned on test objects and one reference sensor facing up to record ambient noise conditions. The dataset comprises approximately 440.9 hours of recordings distributed across the four collection sites.</div><div>The dataset includes four main components: raw Vallen AE database files (.tradb format), processed statistical data exported as CSV files, contextual images documenting setups and sensor placements, and Python script for statistical data processing. Each experiment is documented with duration, material specifications, coupling methods (renaissance wax, cyclododecane, or mechanical fastening), environmental conditions, and infestation labels.</div><div>The dataset's structure enables multiple research applications. The time-series statistical features and binary classification labels (infested/non-infested) provide a foundation for supervised machine learning model development. The diverse experimental conditions across four geographic locations, varying coupling methods, and different ambient environments offer opportunities to evaluate model generalization and robustness. Reference sensor recordings captured simultaneously with each experiment allow for ambient noise characterization studies and development of noise filtering methodologies. The combination of raw acoustic data and contextual documentation makes this dataset suitable for comparative studies of different signal processing approaches and feature extraction techniques in acoustic emission analysis for heritage conservation applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112461"},"PeriodicalIF":1.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0