首页 > 最新文献

Data in Brief最新文献

英文 中文
UAV-LiDAR dataset of Pangandaran coastal tourism hotspots for tsunami and climate risk valuation and exposure mapping Pangandaran沿海旅游热点地区的无人机-激光雷达数据集,用于海啸和气候风险评估和暴露测绘。
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-24 DOI: 10.1016/j.dib.2026.112505
Mega Laksmini Syamsuddin , Umar Abdurrahman , Ajeng Riska Puspita , Sunarto , Qurnia Wulan Sari , Fadli Syamsudin , Indrawan Fadhil Pratyaksa , Iqbal Maulana Cipta , Ivonne Milichristi Radjawane , Hansan Park
This article presents a high-resolution UAV–LiDAR dataset acquired over the main coastal tourism hotspots of Pangandaran, West Java, Indonesia (WGS84 / UTM Zone 49S). The survey was conducted using a DJI Matrice 300 RTK equipped with a CHCNAV AA450 LiDAR system at altitudes of 77–83 m AGL, following grid-based flight lines with 80% forward and 70% side overlap. The final point cloud, delivered in LAS format, exhibits a mean density of approximately 865 pts/m², with dominant values of 600–800 pts/m² across roads, roofs, and open terrain, and localized peaks exceeding 3,000 pts/m² in areas of flight-line overlap. Ground control was established using three static base stations, with 14 calibration control points and 8 independent validation check points. Accuracy assessment yields RMSE values of 0.072 m (Easting), 0.062 m (Northing), and 0.138 m (Elevation), with corresponding mean biases of 0.017 m, 0.017 m, and 0.044 m, confirming centimeter-level positional precision suitable for detailed coastal mapping. The dataset includes DSM and DTM derivatives, block-based tiles, metadata, and processing reports, supporting its use in tsunami exposure assessment, climate-risk valuation, urban coastal planning, and remote-sensing education. As one of the first openly accessible UAV–LiDAR datasets for an Indonesian coastal tourism hotspot, it provides a reproducible, high-density 3D resource for research, hazard analysis, and sustainable coastal development.
本文介绍了在印度尼西亚西爪哇邦干达兰主要沿海旅游热点(WGS84 / UTM区49S)获取的高分辨率无人机-激光雷达数据集。该调查使用了一架配备CHCNAV AA450激光雷达系统的大疆矩阵300 RTK飞机,飞行高度为77-83米,飞行高度为80%向前重叠,70%侧面重叠。最终的点云以LAS格式交付,其平均密度约为865 pts/m²,在道路、屋顶和开阔地形上的主要值为600-800 pts/m²,在航线重叠区域的局部峰值超过3,000 pts/m²。地面控制采用3个静态基站,14个校准控制点和8个独立验证检查点。精度评估的RMSE值分别为0.072 m (east)、0.062 m (north)和0.138 m (Elevation),相应的平均偏差分别为0.017 m、0.017 m和0.044 m,确定了适合沿海详细制图的厘米级定位精度。该数据集包括DSM和DTM衍生产品、基于块的瓦片、元数据和处理报告,支持其在海啸暴露评估、气候风险评估、城市沿海规划和遥感教育中的应用。作为印尼沿海旅游热点地区首批可公开访问的无人机-激光雷达数据集之一,它为研究、危害分析和沿海可持续发展提供了可复制的高密度3D资源。
{"title":"UAV-LiDAR dataset of Pangandaran coastal tourism hotspots for tsunami and climate risk valuation and exposure mapping","authors":"Mega Laksmini Syamsuddin ,&nbsp;Umar Abdurrahman ,&nbsp;Ajeng Riska Puspita ,&nbsp;Sunarto ,&nbsp;Qurnia Wulan Sari ,&nbsp;Fadli Syamsudin ,&nbsp;Indrawan Fadhil Pratyaksa ,&nbsp;Iqbal Maulana Cipta ,&nbsp;Ivonne Milichristi Radjawane ,&nbsp;Hansan Park","doi":"10.1016/j.dib.2026.112505","DOIUrl":"10.1016/j.dib.2026.112505","url":null,"abstract":"<div><div>This article presents a high-resolution UAV–LiDAR dataset acquired over the main coastal tourism hotspots of Pangandaran, West Java, Indonesia (WGS84 / UTM Zone 49S). The survey was conducted using a DJI Matrice 300 RTK equipped with a CHCNAV AA450 LiDAR system at altitudes of 77–83 m AGL, following grid-based flight lines with 80% forward and 70% side overlap. The final point cloud, delivered in LAS format, exhibits a mean density of approximately 865 pts/m², with dominant values of 600–800 pts/m² across roads, roofs, and open terrain, and localized peaks exceeding 3,000 pts/m² in areas of flight-line overlap. Ground control was established using three static base stations, with 14 calibration control points and 8 independent validation check points. Accuracy assessment yields RMSE values of 0.072 m (Easting), 0.062 m (Northing), and 0.138 m (Elevation), with corresponding mean biases of 0.017 m, 0.017 m, and 0.044 m, confirming centimeter-level positional precision suitable for detailed coastal mapping. The dataset includes DSM and DTM derivatives, block-based tiles, metadata, and processing reports, supporting its use in tsunami exposure assessment, climate-risk valuation, urban coastal planning, and remote-sensing education. As one of the first openly accessible UAV–LiDAR datasets for an Indonesian coastal tourism hotspot, it provides a reproducible, high-density 3D resource for research, hazard analysis, and sustainable coastal development.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112505"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146178266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic Malaysian sign language dataset for sign language recognition and translation 一个动态马来西亚手语数据集,用于手语识别和翻译
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-29 DOI: 10.1016/j.dib.2026.112511
Yuan Ting Chong , Yessane Shrrie Nagendhra Rao , Rehman Ullah Khan , Chee Siong Teh , Mohamad Hardyman Barawi , Mohd Shahrizal Sunar , Joan Jo Jo Sim
Sign languages all around the world are unique and diverse. Each sign language shows the differences in cultural nuances of its origin locale giving it is distinctive nature. Thus, despite the positive outcomes of sign language recognition and translation research that has been widely conducted worldwide, there are still notable limitations to each system which are mainly caused by data limitations. The sign language recognition and translation research in Malaysia especially has been set back by the limited size and nature of datasets available that are concurrent with current technological developments. The current datasets available for Malaysian Sign Language (BIM – Bahasa Isyarat Malaysia) are small and limited to fingerspelling of alphanumeric characters and several dynamic words and short phrases. However, given the continuous nature of the sign language communication, these data are not enough to properly train machine learning models to recognize and translate continuous real-world signs. Therefore, in order to address this issue, we introduce a dynamic BIM dataset which comprises of video, gloss, and translation data consisting of alphanumeric characters, dynamic words and short phrases, and continuous sentences. The dataset is split into two versions. The first version, BIM-SSD-V1 dataset comprises of 4,858 parallel video (RGB frames), gloss, and translation data while the second version, BIM-SSD-V2 dataset comprises of 3,143 parallel video (RGB frames), keypoints and gloss data for recognition purposes, and 4,900 parallel gloss and translation data for translation purposes. The raw videos are also available in the dataset. The dataset was developed and compiled with the help of the Deaf and Hard-of-Hearing community. This process also included the development of a Sign Language Module (translations for the video and gloss data) to assist in the development of the dataset. The image and video data were collected using smartphones and the respective gloss annotations for the data were prepared with the help of a BIM expert. The data collection process was designed to reflect everyday communication scenarios by incorporating varied sentence constructions, repeated signing instances, and recordings under different backgrounds and contextual conditions to introduce data-level variability relevant to real-world use. The total number of participants involved in the data collection process was four. There are also four samples for every character, word, phrase or sentence in the Sign Language Module. The dataset can mainly be reused by researchers who would like to conduct sign language recognition and translation research using the Sign-to-Gloss-to-Text framework. However, the dataset is not limited to only one framework and can be used for other sign language recognition and translation research frameworks accordingly.
世界各地的手语都是独特而多样的。每一种手语都显示出其起源地区的文化细微差别,赋予其独特的性质。因此,尽管在世界范围内广泛开展的手语识别和翻译研究取得了积极成果,但每个系统仍然存在明显的局限性,主要是由于数据的限制。马来西亚的手语识别和翻译研究尤其受到现有数据集规模和性质的限制,而这些数据集又与当前的技术发展同步。目前可用于马来西亚手语(BIM - Bahasa Isyarat Malaysia)的数据集很小,并且仅限于字母数字字符的手指拼写和几个动态单词和短语。然而,鉴于手语交流的连续性,这些数据不足以正确训练机器学习模型来识别和翻译连续的现实世界符号。因此,为了解决这个问题,我们引入了一个动态BIM数据集,该数据集包括视频、注释和由字母数字字符、动态单词和短语以及连续句组成的翻译数据。数据集被分成两个版本。第一版BIM-SSD-V1数据集包括4858个平行视频(RGB帧)、光泽度和翻译数据,第二版BIM-SSD-V2数据集包括3143个平行视频(RGB帧)、关键点和光泽度数据(用于识别),以及4900个平行光泽度和翻译数据(用于翻译)。原始视频也可以在数据集中使用。该数据集是在聋人和听障人士社区的帮助下开发和编译的。该过程还包括开发手语模块(视频和注释数据的翻译),以协助数据集的开发。使用智能手机收集图像和视频数据,并在BIM专家的帮助下为数据准备了相应的注释。数据收集过程旨在通过结合不同的句子结构、重复的签名实例和不同背景和上下文条件下的记录来反映日常交流场景,以引入与现实世界使用相关的数据级可变性。参与数据收集过程的参与者总数为4人。手语模块中的每个字符、单词、短语或句子也有四个示例。该数据集主要供希望使用符号-光泽-文本框架进行手语识别和翻译研究的研究人员重用。然而,该数据集不仅限于一个框架,还可以用于其他手语识别和翻译研究框架。
{"title":"A dynamic Malaysian sign language dataset for sign language recognition and translation","authors":"Yuan Ting Chong ,&nbsp;Yessane Shrrie Nagendhra Rao ,&nbsp;Rehman Ullah Khan ,&nbsp;Chee Siong Teh ,&nbsp;Mohamad Hardyman Barawi ,&nbsp;Mohd Shahrizal Sunar ,&nbsp;Joan Jo Jo Sim","doi":"10.1016/j.dib.2026.112511","DOIUrl":"10.1016/j.dib.2026.112511","url":null,"abstract":"<div><div>Sign languages all around the world are unique and diverse. Each sign language shows the differences in cultural nuances of its origin locale giving it is distinctive nature. Thus, despite the positive outcomes of sign language recognition and translation research that has been widely conducted worldwide, there are still notable limitations to each system which are mainly caused by data limitations. The sign language recognition and translation research in Malaysia especially has been set back by the limited size and nature of datasets available that are concurrent with current technological developments. The current datasets available for Malaysian Sign Language (BIM – Bahasa Isyarat Malaysia) are small and limited to fingerspelling of alphanumeric characters and several dynamic words and short phrases. However, given the continuous nature of the sign language communication, these data are not enough to properly train machine learning models to recognize and translate continuous real-world signs. Therefore, in order to address this issue, we introduce a dynamic BIM dataset which comprises of video, gloss, and translation data consisting of alphanumeric characters, dynamic words and short phrases, and continuous sentences. The dataset is split into two versions. The first version, BIM-SSD-V1 dataset comprises of 4,858 parallel video (RGB frames), gloss, and translation data while the second version, BIM-SSD-V2 dataset comprises of 3,143 parallel video (RGB frames), keypoints and gloss data for recognition purposes, and 4,900 parallel gloss and translation data for translation purposes. The raw videos are also available in the dataset. The dataset was developed and compiled with the help of the Deaf and Hard-of-Hearing community. This process also included the development of a Sign Language Module (translations for the video and gloss data) to assist in the development of the dataset. The image and video data were collected using smartphones and the respective gloss annotations for the data were prepared with the help of a BIM expert. The data collection process was designed to reflect everyday communication scenarios by incorporating varied sentence constructions, repeated signing instances, and recordings under different backgrounds and contextual conditions to introduce data-level variability relevant to real-world use. The total number of participants involved in the data collection process was four. There are also four samples for every character, word, phrase or sentence in the Sign Language Module. The dataset can mainly be reused by researchers who would like to conduct sign language recognition and translation research using the Sign-to-Gloss-to-Text framework. However, the dataset is not limited to only one framework and can be used for other sign language recognition and translation research frameworks accordingly.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112511"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146184682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Legal case documents: A comprehensive dataset for Arabic natural language processing research and applications 法律案件文件:阿拉伯语自然语言处理研究和应用的综合数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-02 DOI: 10.1016/j.dib.2025.112429
Soha Zarbah, Arwa Wali, Dimah Alahmadi
The legal sector remains distinctive due to the complex language structure and specialized terminology of legal data. This complexity offers considerable contextual information, which demands natural language processing (NLP). The availability of high-quality and well-structured legal datasets is essential for advancing NLP research and applications within the legal field. However, a gap exists within the Arabic legal NLP owing to insufficient research and datasets. To address this gap, we aim to propose an Arabic legal case dataset containing cases, case summaries, relevant keywords, and case categories. The legal case data were obtained from the Board of Grievances website in Saudi Arabia and include 3170 cases distributed across 47 classes. The number of words in these cases varies significantly, ranging from about 100 to nearly 30,000 words per case. Moreover, the number of pages varies, ranging from one page to 80 pages per case. Therefore, this dataset supports various NLP applications, including text categorization, data extraction, sentiment analysis, and summarization, thereby improving task efficiency and decision accuracy in the legal profession.
由于复杂的语言结构和法律数据的专业术语,法律部门仍然与众不同。这种复杂性提供了大量的上下文信息,这需要自然语言处理(NLP)。高质量和结构良好的法律数据集的可用性对于推进法律领域的自然语言处理研究和应用至关重要。然而,由于研究和数据集不足,阿拉伯法律NLP内部存在差距。为了解决这一差距,我们的目标是提出一个阿拉伯语法律案例数据集,其中包含案例、案例摘要、相关关键词和案例类别。法律案件数据来自沙特阿拉伯申诉委员会网站,包括分布在47个阶层的3170个案件。这些情况下的单词数量变化很大,从100到近30,000个单词不等。此外,页数各不相同,从一页到80页不等。因此,该数据集支持各种NLP应用,包括文本分类、数据提取、情感分析和摘要,从而提高法律行业的任务效率和决策准确性。
{"title":"Legal case documents: A comprehensive dataset for Arabic natural language processing research and applications","authors":"Soha Zarbah,&nbsp;Arwa Wali,&nbsp;Dimah Alahmadi","doi":"10.1016/j.dib.2025.112429","DOIUrl":"10.1016/j.dib.2025.112429","url":null,"abstract":"<div><div>The legal sector remains distinctive due to the complex language structure and specialized terminology of legal data. This complexity offers considerable contextual information, which demands natural language processing (NLP). The availability of high-quality and well-structured legal datasets is essential for advancing NLP research and applications within the legal field. However, a gap exists within the Arabic legal NLP owing to insufficient research and datasets. To address this gap, we aim to propose an Arabic legal case dataset containing cases, case summaries, relevant keywords, and case categories. The legal case data were obtained from the Board of Grievances website in Saudi Arabia and include 3170 cases distributed across 47 classes. The number of words in these cases varies significantly, ranging from about 100 to nearly 30,000 words per case. Moreover, the number of pages varies, ranging from one page to 80 pages per case. Therefore, this dataset supports various NLP applications, including text categorization, data extraction, sentiment analysis, and summarization, thereby improving task efficiency and decision accuracy in the legal profession.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112429"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits 从马尾藻废料中分离的新型马里安Stutzerimonas marianensis LB-0542的基因组数据挖掘及其对塑料降解和植物生长的促进作用
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-08 DOI: 10.1016/j.dib.2026.112454
Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James
This study reports the whole-genome sequence data and functional annotations of a novel Stutzerimonas marianensis strain LB-0542 isolated from the decomposing pelagic Sargassum biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L50 of 2 and a N50 of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of Sargassum and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.
本研究报道了从搁浅在巴巴多斯长滩的浮游马尾藻分解生物量中分离到的一株新的Stutzerimonas marianensis菌株LB-0542的全基因组序列数据和功能注释。基因组DNA测序采用Illumina NextSeq2000平台。使用SPAdes genome Assembler(3.15.5版)进行基因组组装。组装的基因组大小为4520,813 bp,覆盖率为110X, GC含量为63.2%,L50为2,N50为1079,143 bp。基因组由12个contigs、0个CRISPR、3个rRNA、56个tRNA和4166个cds(编码序列)组成,编码率为89.4%。同源基因簇(COG)和子系统特征的基因组注释结果表明,代谢和氨基酸及其衍生物分别是最主要的类别。对碳水化合物活性酶(CAZymes)的基因组分析鉴定出230个基因编码4个功能类别的碳水化合物酶[糖苷水解酶(75个基因)、糖基转移酶(95个基因)、碳水化合物酯酶(9个基因)和碳水化合物结合模块(51个基因)]。塑料降解基因组的功能注释显示,存在34个基因,可以催化14种塑料的降解过程,聚乙二醇[PEG(29%)],聚乳酸[PLA(11%)],聚(3-羟基丁酸酯-co-3-羟基戊酸酯)[PHBV(9%)],聚羟基烷酸酯[PHA(9%)],聚乙烯[PE(6%)],聚己内酯[PCL(6%)],聚醚砜[PES(6%)],聚对苯二甲酸乙二醇酯[PET (6%)],聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBAT(3%)]、聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBSA(3%)]、聚3-羟戊酸酯[P3HV(3%)]、聚乙烯醇[PVA(3%)]和天然橡胶[NR(3%)]。对植物生长促进性状的基因组挖掘发现了3175个与定植植物系统(26%)、竞争排斥(21%)、胁迫控制(21%)、生物施肥(14%)、植物激素和植物信号产生(10%)、生物修复(7%)和植物免疫应答刺激(1%)相关的基因。这些基因组挖掘结果表明,新菌株LB-0542对马尾藻和含塑料废物的可持续生物催化处理具有生物技术和生态意义。基因组序列数据可在DDBJ/EMBL/GenBank中查询,登录号BAAIAE000000000。
{"title":"Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits","authors":"Bidyut R. Mohapatra,&nbsp;Linel S. Moralez,&nbsp;Kiya E. James","doi":"10.1016/j.dib.2026.112454","DOIUrl":"10.1016/j.dib.2026.112454","url":null,"abstract":"<div><div>This study reports the whole-genome sequence data and functional annotations of a novel <em>Stutzerimonas marianensis</em> strain LB-0542 isolated from the decomposing pelagic <em>Sargassum</em> biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L<sub>50</sub> of 2 and a N<sub>50</sub> of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of <em>Sargassum</em> and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112454"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BraTioUS: A multicenter dataset of baseline intraoperative brain tumor ultrasound images br:一个多中心的基线术中脑肿瘤超声图像数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-15 DOI: 10.1016/j.dib.2026.112478
Olga Esteban-Sinovas , Rosario Sarabia , Ignacio Arrese , Vikas Singh , Prakash Shett , Aliasgar Moiyadi , Ilyess Zemmoura , Massimiliano Del Bene , Arianna Barbotti , Francesco DiMeco , Timothy Richard West , Brian Vala Nahed , Giuseppe Roberto Giammalva , Santiago Cepeda
The BraTioUS (Brain Tumor Intraoperative Ultrasound) dataset [1] is a large-scale, multicenter, and publicly available collection of intraoperative ultrasound (ioUS) images acquired during glioma surgeries. Created through an international collaboration among six hospitals across five countries, BraTioUS comprises 1669 B-mode 2D ioUS images from 142 glioma patients collected between 2018 and 2023 using various ultrasound systems and acquisition protocols. It also includes masks supervised by experts of tumor segmentation from every ioUS image.
BraTioUS addresses several limitations found in existing public datasets, such as lack of diversity in acquisition hardware, imaging protocols, and glioma types. The primary objective of this dataset is to be publicly available and accessible for the training and validation of machine learning models aimed at improving the interpretation and use of ioUS. The dataset’s scale, quality, and heterogeneity make it a valuable resource for training and validating AI tools aimed at improving intraoperative decision-making and patient outcomes in glioma surgery.
br(脑肿瘤术中超声)数据集[1]是一个大规模的、多中心的、公开的胶质瘤手术中获得的术中超声(iu)图像集合。brious是由五个国家的六家医院通过国际合作创建的,包括2018年至2023年期间使用各种超声系统和采集方案收集的142名胶质瘤患者的1669张b模式2D白条图像。它还包括由专家监督的从每个ioUS图像中分割肿瘤的掩模。br解决了现有公共数据集中存在的几个限制,例如采集硬件、成像协议和胶质瘤类型缺乏多样性。该数据集的主要目标是公开可用,并可用于训练和验证旨在改进借据解释和使用的机器学习模型。该数据集的规模、质量和异质性使其成为训练和验证人工智能工具的宝贵资源,旨在改善胶质瘤手术中的术中决策和患者预后。
{"title":"BraTioUS: A multicenter dataset of baseline intraoperative brain tumor ultrasound images","authors":"Olga Esteban-Sinovas ,&nbsp;Rosario Sarabia ,&nbsp;Ignacio Arrese ,&nbsp;Vikas Singh ,&nbsp;Prakash Shett ,&nbsp;Aliasgar Moiyadi ,&nbsp;Ilyess Zemmoura ,&nbsp;Massimiliano Del Bene ,&nbsp;Arianna Barbotti ,&nbsp;Francesco DiMeco ,&nbsp;Timothy Richard West ,&nbsp;Brian Vala Nahed ,&nbsp;Giuseppe Roberto Giammalva ,&nbsp;Santiago Cepeda","doi":"10.1016/j.dib.2026.112478","DOIUrl":"10.1016/j.dib.2026.112478","url":null,"abstract":"<div><div>The BraTioUS (Brain Tumor Intraoperative Ultrasound) dataset [<span><span>1</span></span>] is a large-scale, multicenter, and publicly available collection of intraoperative ultrasound (ioUS) images acquired during glioma surgeries. Created through an international collaboration among six hospitals across five countries, BraTioUS comprises 1669 B-mode 2D ioUS images from 142 glioma patients collected between 2018 and 2023 using various ultrasound systems and acquisition protocols. It also includes masks supervised by experts of tumor segmentation from every ioUS image.</div><div>BraTioUS addresses several limitations found in existing public datasets, such as lack of diversity in acquisition hardware, imaging protocols, and glioma types. The primary objective of this dataset is to be publicly available and accessible for the training and validation of machine learning models aimed at improving the interpretation and use of ioUS. The dataset’s scale, quality, and heterogeneity make it a valuable resource for training and validating AI tools aimed at improving intraoperative decision-making and patient outcomes in glioma surgery.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112478"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-analytical dataset on Lekhaniya Mahakashaya: HRLC-MS/MS Orbitrap profiling, HPTLC fingerprinting with marker estimation, and FTIR spectroscopy Lekhaniya Mahakashaya的多分析数据集:HRLC-MS/MS Orbitrap分析,HPTLC指纹图谱与标记估计,FTIR光谱
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.dib.2026.112464
Narayan Singh, Anjali Upadhyay, Debajyoti Chakraborty, Girimalla Patil, Pramod Yadav, Galib R, Pradeep Kumar Prajapati
This dataset provides a comprehensive, multidimensional phytochemical characterization of Lekhniya Mahakashaya (LMK), a classical Ayurvedic formulation used for the Treatment of obesity and metabolic disorders. Three complementary analytical platforms were employed: High-Resolution Liquid Chromatography-Mass Spectrometry/Mass Spectrometry (HRLC-MS/MS) Orbitrap, High-Performance Thin Layer Chromatography (HPTLC), and Fourier-Transform Infrared (FTIR) spectroscopy. For HRLC-MS/MS analysis, Hydroalcoholic extracts of LMK were prepared and analysed in both Positive and negative ionisation modes using an Orbitrap mass spectrometer. The dataset includes 2034 metabolomics-identified compounds: 1712 in positive ion mode and 322 in negative ion mode, with detailed retention times, molecular weights, and fragmentation patterns, suitable for compound annotation, metabolite networking, and cheminformatics-based correlation studies. HPTLC fingerprinting was performed using methanolic extracts (2–10 µL) on silica gel 60 F₂₅₄ plates, which yielded 7–8 reproducible peaks across the Rf range 0.12–0.89 under 254 nm, 366 nm, and 540 nm, confirming LMK’s polyherbal complexity. Marker-based quantification revealed that berberine (0.24 % w/w) and curcumin (0.31 % w/w) were performed using validated HPTLC protocols, and calibration curves are included for reproducibility. FTIR Spectroscopic data encompass 19 absorption peaks (3278–0468 cm⁻¹), representing hydroxyl, aliphatic, unsaturated, sulfur-, nitrogen-, and halogen-containing functional groups, which highlights LMK’s diverse phytochemical matrix. This dataset is structured for pharmacological exploration, quality control, and phytochemical standardisation of LMK and associated Ayurvedic formulations. This dataset is a reference resource. Additionally, the dataset can be used for molecular docking validation, network pharmacology mapping, metabolomics comparisons, and future drug discovery. To promote transparency, encourage computational or experimental reuse, and support integrative research on traditional medicine, all raw chromatograms, spectrum files, and processed data tables are made available in widely accessible formats.
该数据集提供了Lekhniya Mahakashaya (LMK)的全面、多维的植物化学特征,LMK是一种经典的阿育吠陀配方,用于治疗肥胖和代谢紊乱。采用三种互补分析平台:高分辨率液相色谱-质谱/质谱(HRLC-MS/MS)轨道阱、高效薄层色谱(HPTLC)和傅里叶变换红外(FTIR)光谱。为了进行HRLC-MS/MS分析,制备了LMK的水醇提取物,并使用Orbitrap质谱仪在正负电离模式下进行了分析。该数据集包括2034种代谢组学鉴定的化合物:1712种为正离子模式,322种为负离子模式,具有详细的保留时间、分子量和碎片模式,适用于化合物注释、代谢物网络和基于化学信息学的相关性研究。在硅胶60f₂₅₄板上使用甲醇提取物(2-10µL)进行HPTLC指纹图谱,在254 nm, 366 nm和540 nm下,在Rf范围0.12-0.89内产生了7-8个可重复的峰,证实了LMK的多草药复杂性。基于标记的定量显示,小檗碱(0.24% w/w)和姜黄素(0.31% w/w)采用验证的HPTLC方案进行,并包括校准曲线以确保重复性。FTIR光谱数据包含19个吸收峰(3278-0468 cm),代表羟基、脂肪族、不饱和、含硫、含氮和含卤素的官能团,这突出了LMK的植物化学基质的多样性。该数据集用于LMK和相关阿育吠陀配方的药理学探索、质量控制和植物化学标准化。此数据集是参考资源。此外,该数据集可用于分子对接验证、网络药理学定位、代谢组学比较和未来的药物发现。为提高透明度,鼓励计算或实验重复使用,并支持传统医学综合研究,所有原始色谱图、光谱文件和处理过的数据表均以可广泛获取的格式提供。
{"title":"Multi-analytical dataset on Lekhaniya Mahakashaya: HRLC-MS/MS Orbitrap profiling, HPTLC fingerprinting with marker estimation, and FTIR spectroscopy","authors":"Narayan Singh,&nbsp;Anjali Upadhyay,&nbsp;Debajyoti Chakraborty,&nbsp;Girimalla Patil,&nbsp;Pramod Yadav,&nbsp;Galib R,&nbsp;Pradeep Kumar Prajapati","doi":"10.1016/j.dib.2026.112464","DOIUrl":"10.1016/j.dib.2026.112464","url":null,"abstract":"<div><div>This dataset provides a comprehensive, multidimensional phytochemical characterization of <em>Lekhniya Mahakashaya</em> (LMK), a classical Ayurvedic formulation used for the Treatment of obesity and metabolic disorders. Three complementary analytical platforms were employed: High-Resolution Liquid Chromatography-Mass Spectrometry/Mass Spectrometry (HRLC-MS/MS) Orbitrap, High-Performance Thin Layer Chromatography (HPTLC), and Fourier-Transform Infrared (FTIR) spectroscopy. For HRLC-MS/MS analysis, Hydroalcoholic extracts of LMK were prepared and analysed in both Positive and negative ionisation modes using an Orbitrap mass spectrometer. The dataset includes 2034 metabolomics-identified compounds: 1712 in positive ion mode and 322 in negative ion mode, with detailed retention times, molecular weights, and fragmentation patterns, suitable for compound annotation, metabolite networking, and cheminformatics-based correlation studies. HPTLC fingerprinting was performed using methanolic extracts (2–10 µL) on silica gel 60 F₂₅₄ plates, which yielded 7–8 reproducible peaks across the Rf range 0.12–0.89 under 254 nm, 366 nm, and 540 nm, confirming LMK’s polyherbal complexity. Marker-based quantification revealed that berberine (0.24 % w/w) and curcumin (0.31 % w/w) were performed using validated HPTLC protocols, and calibration curves are included for reproducibility. FTIR Spectroscopic data encompass 19 absorption peaks (3278–0468 cm⁻¹), representing hydroxyl, aliphatic, unsaturated, sulfur-, nitrogen-, and halogen-containing functional groups, which highlights LMK’s diverse phytochemical matrix. This dataset is structured for pharmacological exploration, quality control, and phytochemical standardisation of LMK and associated Ayurvedic formulations. This dataset is a reference resource. Additionally, the dataset can be used for molecular docking validation, network pharmacology mapping, metabolomics comparisons, and future drug discovery. To promote transparency, encourage computational or experimental reuse, and support integrative research on traditional medicine, all raw chromatograms, spectrum files, and processed data tables are made available in widely accessible formats.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112464"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VDzSL: A synthetic video dataset for Algerian sign language using 3D avatars VDzSL:一个使用3D化身的阿尔及利亚手语合成视频数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-02-07 DOI: 10.1016/j.dib.2026.112551
Younes Ouargani, Noussaima El Khattabi
Video datasets are crucial for advancing communication technologies for deaf and hard-of-hearing individuals. Despite that, extensive datasets are not available for the majority of sign languages due to the ample work required to capture, clean, organize, and publish them. This paper introduces the Video Dataset for Algerian Sign Language (VDzSL), the largest video dataset for Algerian Sign Language. To ensure demographic diversity, VDzSL utilizes four different avatars to animate the signs and records them from five distinct camera angles, employing polar coordinates to ensure consistency while capturing varying horizontal and vertical perspectives. With 415 signs, our dataset has a 99.5% coverage of the signs included in 3DZSignDB’s SiGML dataset, and 26.6% coverage of the official ALGSL dictionary provided by the Algerian Ministry of Solidarity. Our dataset contains 8300 video files totaling 3 h, 11 min, and 43 s of synthetic videos provided at a 498×498 pixel resolution and an average frame rate of 27 frames per second across the entire dataset. The dataset is primarily aimed at training, testing, and benchmarking machine learning models, facilitating transfer learning and comparative analyses, as well as developing learning tools and accessibility applications.
视频数据集对于推进聋人和听力障碍者的通信技术至关重要。尽管如此,由于需要大量的工作来捕获、清理、组织和发布大多数手语,因此无法获得广泛的数据集。本文介绍了目前最大的阿尔及利亚手语视频数据集——阿尔及利亚手语视频数据集(VDzSL)。为了确保人口的多样性,VDzSL利用四个不同的化身来动画标志,并从五个不同的相机角度记录它们,采用极坐标来确保一致性,同时捕捉不同的水平和垂直视角。有415个标志,我们的数据集对3DZSignDB的SiGML数据集中包含的标志有99.5%的覆盖率,对阿尔及利亚团结部提供的官方ALGSL词典有26.6%的覆盖率。我们的数据集包含8300个视频文件,总计3小时11分钟43秒的合成视频,整个数据集以498×498像素分辨率和27帧/秒的平均帧率提供。该数据集主要用于训练、测试和对机器学习模型进行基准测试,促进迁移学习和比较分析,以及开发学习工具和可访问性应用程序。
{"title":"VDzSL: A synthetic video dataset for Algerian sign language using 3D avatars","authors":"Younes Ouargani,&nbsp;Noussaima El Khattabi","doi":"10.1016/j.dib.2026.112551","DOIUrl":"10.1016/j.dib.2026.112551","url":null,"abstract":"<div><div>Video datasets are crucial for advancing communication technologies for deaf and hard-of-hearing individuals. Despite that, extensive datasets are not available for the majority of sign languages due to the ample work required to capture, clean, organize, and publish them. This paper introduces the Video Dataset for Algerian Sign Language (VDzSL), the largest video dataset for Algerian Sign Language. To ensure demographic diversity, VDzSL utilizes four different avatars to animate the signs and records them from five distinct camera angles, employing polar coordinates to ensure consistency while capturing varying horizontal and vertical perspectives. With 415 signs, our dataset has a 99.5% coverage of the signs included in 3DZSignDB’s SiGML dataset, and 26.6% coverage of the official ALGSL dictionary provided by the Algerian Ministry of Solidarity. Our dataset contains 8300 video files totaling 3 h, 11 min, and 43 s of synthetic videos provided at a 498×498 pixel resolution and an average frame rate of 27 frames per second across the entire dataset. The dataset is primarily aimed at training, testing, and benchmarking machine learning models, facilitating transfer learning and comparative analyses, as well as developing learning tools and accessibility applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112551"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of smart home devices sold on Spanish e-commerce platforms 西班牙电子商务平台上销售的智能家居设备数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-29 DOI: 10.1016/j.dib.2026.112516
Jhovany Quintana-Vera, Ana I. González-Tablas, Mohammed Rashed
The use of smart home devices is on the rise with estimations of the number of users reaching over 785 million from the current ∼ 361 million users; a 117% increase in just 4 years. Thus, it becomes essential to have an available dataset that provides details about the different aspects of the available devices in the market. In this paper, we introduce our dataset titled Spanish MArket Smart Home devices (SMASH) which we collected via structured data extraction from four major Spanish e-commerce platforms. Containing 5218 devices across 652 brands, the dataset provides an overview of smart home devices sold within Spain, the fourth largest economy in the European Union. The dataset is versatile as it includes details such as name, price, brand, model, rating, number of reviews, platform and category. The dataset can be used as primary source in research that involves consumer behaviour and microeconomics. Additionally, the details could be used for creating new datasets like privacy policies of brands and mobile applications (apps) used for the devices. The dataset is publicly accessible under license CC-BY-NC-4.0-ES. We note, however, that SMASH is limited to products sold within Spain and collected within a specific time window (start date: 2023–12; end date: 2024–08); users should consider the scope and temporal constraints when generalizing findings.
智能家居设备的使用正在增加,预计用户数量将从目前的3.61亿人增加到7.85亿人以上;仅仅4年就增长了117%。因此,必须有一个可用的数据集,提供有关市场上可用设备的不同方面的详细信息。在本文中,我们介绍了我们的数据集,标题为西班牙市场智能家居设备(SMASH),我们通过结构化数据提取从四个主要的西班牙电子商务平台收集。该数据集包含652个品牌的5218个设备,提供了在欧盟第四大经济体西班牙销售的智能家居设备的概述。该数据集是通用的,因为它包括诸如名称,价格,品牌,型号,评级,评论数量,平台和类别等详细信息。该数据集可以作为消费者行为和微观经济学研究的主要来源。此外,这些细节可以用来创建新的数据集,比如品牌和设备使用的移动应用程序的隐私政策。该数据集在CC-BY-NC-4.0-ES许可下可公开访问。然而,我们注意到,SMASH仅限于在西班牙境内销售的产品,并在特定的时间窗口内收集(开始日期:2023-12;结束日期:2024-08);在概括发现时,用户应该考虑范围和时间限制。
{"title":"A dataset of smart home devices sold on Spanish e-commerce platforms","authors":"Jhovany Quintana-Vera,&nbsp;Ana I. González-Tablas,&nbsp;Mohammed Rashed","doi":"10.1016/j.dib.2026.112516","DOIUrl":"10.1016/j.dib.2026.112516","url":null,"abstract":"<div><div>The use of smart home devices is on the rise with estimations of the number of users reaching over 785 million from the current ∼ 361 million users; a 117% increase in just 4 years. Thus, it becomes essential to have an available dataset that provides details about the different aspects of the available devices in the market. In this paper, we introduce our dataset titled <strong>Spanish MArket Smart Home devices (SMASH)</strong> which we collected via structured data extraction from four major Spanish e-commerce platforms. Containing 5218 devices across 652 brands, the dataset provides an overview of smart home devices sold within Spain, the fourth largest economy in the European Union. The dataset is versatile as it includes details such as name, price, brand, model, rating, number of reviews, platform and category. The dataset can be used as primary source in research that involves consumer behaviour and microeconomics. Additionally, the details could be used for creating new datasets like privacy policies of brands and mobile applications (apps) used for the devices. The dataset is publicly accessible under license CC-BY-NC-4.0-ES. We note, however, that SMASH is limited to products sold within Spain and collected within a specific time window (start date: 2023–12; end date: 2024–08); users should consider the scope and temporal constraints when generalizing findings.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112516"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of scattered images using noncoherent light under varying diffusion conditions and projected patterns 使用不同扩散条件和投影模式的非相干光散射图像数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-02-02 DOI: 10.1016/j.dib.2026.112541
Roger Chiu-Coutino , Miguel S. Soriano-Garcia , Carlos Israel Medel-Ruiz , S.M. Afanador-Delgado , Edgar Villafaña-Rauda , Roger Chiu
This data article presents an experimental dataset of scattered images, obtained using a low-cost, open-source, Raspberry Pi-based optical system. Each data sample includes two grayscale images of 256 × 256 resolution: the (i) scattered image, and (ii) original projected pattern as ground truth. The system projects diverse patterns using various optical diffusers with different scattering coefficients and physical thicknesses. The dataset includes geometric shapes, digits, and textures to increase variability and generalization. This variety allows the analysis of distinct scattering regimes and evaluation of image recovery models under varying optical complexities. The dataset supports deep learning research focused on inverse problems in optics. It is particularly useful for training and benchmarking image restoration models in scattering environments.
这篇数据文章介绍了一个散射图像的实验数据集,使用低成本、开源、基于树莓派的光学系统获得。每个数据样本包括两个256 × 256分辨率的灰度图像:(i)散射图像,(ii)原始投影模式作为地面真值。该系统使用不同散射系数和物理厚度的光漫射器投射出不同的图案。数据集包括几何形状、数字和纹理,以增加可变性和泛化。这种变化允许在不同的光学复杂性下分析不同的散射制度和评估图像恢复模型。该数据集支持光学逆问题的深度学习研究。它对于在散射环境中训练和测试图像恢复模型特别有用。
{"title":"Dataset of scattered images using noncoherent light under varying diffusion conditions and projected patterns","authors":"Roger Chiu-Coutino ,&nbsp;Miguel S. Soriano-Garcia ,&nbsp;Carlos Israel Medel-Ruiz ,&nbsp;S.M. Afanador-Delgado ,&nbsp;Edgar Villafaña-Rauda ,&nbsp;Roger Chiu","doi":"10.1016/j.dib.2026.112541","DOIUrl":"10.1016/j.dib.2026.112541","url":null,"abstract":"<div><div>This data article presents an experimental dataset of scattered images, obtained using a low-cost, open-source, Raspberry Pi-based optical system. Each data sample includes two grayscale images of 256 × 256 resolution: the (i) scattered image, and (ii) original projected pattern as ground truth. The system projects diverse patterns using various optical diffusers with different scattering coefficients and physical thicknesses. The dataset includes geometric shapes, digits, and textures to increase variability and generalization. This variety allows the analysis of distinct scattering regimes and evaluation of image recovery models under varying optical complexities. The dataset supports deep learning research focused on inverse problems in optics. It is particularly useful for training and benchmarking image restoration models in scattering environments.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112541"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset on the performance of a photovoltaic solar water pump in coffee plantations using response surface methodology (RSM) 基于响应面法(RSM)的咖啡种植园光伏太阳能水泵性能数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.dib.2026.112467
Nopparat Suriyachai, Torpong Kreetachat, Saksit Imman
This dataset presents experimental data on the performance of a photovoltaic (PV) solar-powered water pumping system installed in a coffee plantation in Chiang Mai province, Thailand. The system performance was evaluated through controlled experiments using response surface methodology (RSM). Three independent variables were systematically varied: solar irradiance (300–900 W/m²), panel inclination (15–35°), and panel surface temperature (30–60°C). A total of 15 experimental runs were conducted, and the pumping efficiency (%) was recorded under each condition. Statistical analyses, including analysis of variance (ANOVA) and regression modeling, were applied to evaluate the effects of the individual variables and their interactions on system performance. The dataset includes raw and processed measurements, regression coefficients, and response surface parameters, enabling replication and further analysis. Perturbation plots, 3D surface plots, and contour plots provide detailed visualizations of the relationships between environmental factors and system efficiency. The optimal operating conditions were identified at a solar irradiance of 600 W/m², a panel inclination of 25°, and a panel surface temperature of 45°C, corresponding to a predicted maximum efficiency of 76.3–77.0%.
This dataset can be reused for designing optimized solar water pumping systems, validating predictive models, and comparing system performance under different environmental conditions or geographic locations. It also serves as a reference for researchers in renewable energy system optimization and agricultural water management. The data provide high-resolution, experimentally validated information on the combined effects of solar irradiance, panel inclination, and panel surface temperature on PV water pumping efficiency. Unlike previous studies, it includes detailed quantitative analysis specific to coffee-growing regions in Northern Thailand, along with regression models and visualizations that can guide both experimental replication and predictive modeling under similar climatic and agricultural conditions
本数据集展示了安装在泰国清迈省一个咖啡种植园的光伏(PV)太阳能抽水系统性能的实验数据。采用响应面法(RSM)通过对照实验对系统性能进行了评价。系统地改变了三个独立变量:太阳辐照度(300-900 W/m²),面板倾角(15-35°)和面板表面温度(30-60°C)。共进行了15次试验,记录了各工况下的抽气效率(%)。统计分析,包括方差分析(ANOVA)和回归模型,用于评估单个变量及其相互作用对系统性能的影响。该数据集包括原始和处理的测量值、回归系数和响应面参数,可进行复制和进一步分析。摄动图、三维表面图和等高线图提供了环境因素与系统效率之间关系的详细可视化。在太阳辐照度为600 W/m²,面板倾角为25°,面板表面温度为45°C的条件下,预测的最大效率为76.3-77.0%。该数据集可用于设计优化的太阳能水泵系统,验证预测模型,并比较不同环境条件或地理位置下的系统性能。为可再生能源系统优化和农业水资源管理研究提供参考。这些数据提供了太阳辐照度、面板倾角和面板表面温度对光伏水泵效率的综合影响的高分辨率、实验验证的信息。与以前的研究不同,它包括对泰国北部咖啡种植区的详细定量分析,以及回归模型和可视化,可以指导类似气候和农业条件下的实验复制和预测建模
{"title":"Dataset on the performance of a photovoltaic solar water pump in coffee plantations using response surface methodology (RSM)","authors":"Nopparat Suriyachai,&nbsp;Torpong Kreetachat,&nbsp;Saksit Imman","doi":"10.1016/j.dib.2026.112467","DOIUrl":"10.1016/j.dib.2026.112467","url":null,"abstract":"<div><div>This dataset presents experimental data on the performance of a photovoltaic (PV) solar-powered water pumping system installed in a coffee plantation in Chiang Mai province, Thailand. The system performance was evaluated through controlled experiments using response surface methodology (RSM). Three independent variables were systematically varied: solar irradiance (300–900 W/m²), panel inclination (15–35°), and panel surface temperature (30–60°C). A total of 15 experimental runs were conducted, and the pumping efficiency (%) was recorded under each condition. Statistical analyses, including analysis of variance (ANOVA) and regression modeling, were applied to evaluate the effects of the individual variables and their interactions on system performance. The dataset includes raw and processed measurements, regression coefficients, and response surface parameters, enabling replication and further analysis. Perturbation plots, 3D surface plots, and contour plots provide detailed visualizations of the relationships between environmental factors and system efficiency. The optimal operating conditions were identified at a solar irradiance of 600 W/m², a panel inclination of 25°, and a panel surface temperature of 45°C, corresponding to a predicted maximum efficiency of 76.3–77.0%.</div><div>This dataset can be reused for designing optimized solar water pumping systems, validating predictive models, and comparing system performance under different environmental conditions or geographic locations. It also serves as a reference for researchers in renewable energy system optimization and agricultural water management. The data provide high-resolution, experimentally validated information on the combined effects of solar irradiance, panel inclination, and panel surface temperature on PV water pumping efficiency. Unlike previous studies, it includes detailed quantitative analysis specific to coffee-growing regions in Northern Thailand, along with regression models and visualizations that can guide both experimental replication and predictive modeling under similar climatic and agricultural conditions</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112467"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Clean-Soil Air Water Acta Oceanolog. Sin. Ann. Phys. Environmental Control in Biology ECOL RESTOR Ecol. Processes Geosci. Model Dev. Aquat. Geochem. 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ECOTOXICOLOGY Am. J. Phys. Anthropol. EXPERT REV ANTICANC Ann. Glaciol. 2013 Abstracts IEEE International Conference on Plasma Science (ICOPS) Classical Quantum Gravity 2012 38th IEEE Photovoltaic Specialists Conference J. Nanophotonics 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology IZV-PHYS SOLID EART+ Revista Chilena de Obstetricia y Ginecologia Commun. Phys. APL Photonics Environment and Natural Resources Journal Environ. Technol. Innovation ERN: Regulation (IO) (Topic) NUKLEONIKA Commun. Theor. Phys. Administration and Policy in Mental Health and Mental Health Services Research Acta Neurol. Scand. ACTA CARDIOL Ecol. Indic. BIOGEOSCIENCES Conserv. Biol. Int. J. Disaster Risk Reduct. Engineering Science and Technology, an International Journal «Проблемы прогнозирования» 2022 №1 2012 IEEE International Conference on Oxide Materials for Electronic Engineering (OMEE) Am. Mineral. ERN: Stock Market Risk (Topic) 测绘科学技术 Geol. J. Acta Geochimica ICARUS Contemporary Economics ACTA CHIR ORTHOP TR 2013 IEEE Conference on Computer Vision and Pattern Recognition Environmental Toxicology & Water Quality Communications Earth & Environment "Radiation and Risk" Bulletin of the National Radiation and Epidemiological Registry J. Adv. Model. Earth Syst. 2010 International Conference on Enabling Science and Nanotechnology (ESciNano) Espacio Tiempo y Forma. Serie VI, Geografía Ecol. Res. ECOSYSTEMS Chem. Ecol. ENVIRON HEALTH-GLOB Geochim. Cosmochim. Acta Environ. Eng. Sci. Energy Ecol Environ Environ. Chem. Brain Impairment Environ. Eng. Res. Carbon Balance Manage. 环境与发展 CRIT REV ENV SCI TEC Am. J. Sci. Int. J. Biometeorol. ENG SANIT AMBIENT ERN: Other Macroeconomics: Aggregative Models (Topic) Geobiology Appl. Clay Sci. Clean Technol. Environ. Policy Geochem. Trans. Environ. Eng. Manage. J. Contrib. Mineral. Petrol. Conserv. Genet. Resour. European Journal of Chemistry Org. Geochem. Ecol. Eng. Environ. Geochem. Health Environ. Res. Lett. Energy Environ. 2011 6th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT) Ecol. Monogr. Environ. Prot. Eng. J. Hydrol. AAPG Bull. Environ. Prog. Sustainable Energy Transactions of the Korean Hydrogen and New Energy Society COMP BIOCHEM PHYS C Environ. Educ. Res, ARCHAEOMETRY Clim. Change 电力系统及其自动化学报 Environ. Mol. Mutagen. GEOLOGY Asia-Pac. J. Atmos. Sci. Archaeol. Anthropol. Sci. Appl. Geochem. ENVIRONMENT
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1