首页 > 最新文献

Data in Brief最新文献

英文 中文
Introducing OpenTextile-NIR: Near-infrared hyperspectral imaging and photography dataset for optical identification of textiles 介绍opentexile - nir:用于纺织品光学识别的近红外高光谱成像和摄影数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-09 DOI: 10.1016/j.dib.2026.112559
Tuomas Sormunen , Ella Mahlamäki , Satu-Marja Mäkelä , Mikko Mäkelä
This dataset presents the first open-access collection of near-infrared hyperspectral imaging (NIR-HSI) data for the optical identification of textiles, with a focus on supporting research in sensor-based textile sorting and recycling. The dataset comprises hyperspectral images, RGB photographs, and detailed metadata, including fibre composition and colour, for 71 post-industrial textile samples, collected in Finland. Over 11 million spectra are included in the hyperspectral images, with more than 6 million annotated, providing a robust foundation for machine learning and data analysis. In addition, we provide a single representative NIR spectra and RGB value for each sample in order to accommodate classic spectroscopic analysis.
Used garments were sourced from a partner company specializing in end-of-life textile management, with ground truth information on fibre composition obtained from suppliers. Small pieces of each garment were measured using Specim SWIR 3 hyperspectral camera and photographed with high-resolution mobile phone camera (Samsung Galaxy A52). The dataset is organized into folders containing raw and processed data, including ENVI-format hyperspectral images, RGB images, as well as CSV files with mean spectra, mean RGB values, and sample metadata. An example Python script is provided to facilitate data access and processing.
Potential reuse scenarios include classification of textiles by material or colour, prediction of natural fibre content, image segmentation, algorithm development for spectral classification, and use as a reference spectral library. The dataset’s comprehensive structure and open availability address the limitations of previous research, which often relied on small or non-public datasets, and is intended to accelerate advances in optical identification technologies for textile recycling.
该数据集展示了第一个开放获取的近红外高光谱成像(NIR-HSI)数据集,用于纺织品的光学识别,重点是支持基于传感器的纺织品分类和回收研究。该数据集包括高光谱图像、RGB照片和详细的元数据,包括在芬兰收集的71个后工业纺织品样品的纤维成分和颜色。高光谱图像中包含超过1100万个光谱,其中超过600万个有注释,为机器学习和数据分析提供了坚实的基础。此外,我们为每个样品提供了一个具有代表性的近红外光谱和RGB值,以适应经典的光谱分析。二手服装是从一家专门从事报废纺织品管理的合作伙伴公司采购的,并从供应商那里获得了纤维成分的真实信息。使用specm SWIR 3高光谱相机测量每件衣服的小片,并使用高分辨率手机相机(三星Galaxy A52)拍摄。数据集被组织成包含原始和处理数据的文件夹,包括envi格式的高光谱图像、RGB图像以及具有平均光谱、平均RGB值和样本元数据的CSV文件。提供了一个示例Python脚本来促进数据访问和处理。潜在的再利用方案包括按材料或颜色对纺织品进行分类、预测天然纤维含量、图像分割、光谱分类算法开发以及用作参考光谱库。该数据集的全面结构和开放可用性解决了以往研究的局限性,这些研究通常依赖于小型或非公共数据集,旨在加速纺织品回收光学识别技术的进步。
{"title":"Introducing OpenTextile-NIR: Near-infrared hyperspectral imaging and photography dataset for optical identification of textiles","authors":"Tuomas Sormunen ,&nbsp;Ella Mahlamäki ,&nbsp;Satu-Marja Mäkelä ,&nbsp;Mikko Mäkelä","doi":"10.1016/j.dib.2026.112559","DOIUrl":"10.1016/j.dib.2026.112559","url":null,"abstract":"<div><div>This dataset presents the first open-access collection of near-infrared hyperspectral imaging (NIR-HSI) data for the optical identification of textiles, with a focus on supporting research in sensor-based textile sorting and recycling. The dataset comprises hyperspectral images, RGB photographs, and detailed metadata, including fibre composition and colour, for 71 post-industrial textile samples, collected in Finland. Over 11 million spectra are included in the hyperspectral images, with more than 6 million annotated, providing a robust foundation for machine learning and data analysis. In addition, we provide a single representative NIR spectra and RGB value for each sample in order to accommodate classic spectroscopic analysis.</div><div>Used garments were sourced from a partner company specializing in end-of-life textile management, with ground truth information on fibre composition obtained from suppliers. Small pieces of each garment were measured using Specim SWIR 3 hyperspectral camera and photographed with high-resolution mobile phone camera (Samsung Galaxy A52). The dataset is organized into folders containing raw and processed data, including ENVI-format hyperspectral images, RGB images, as well as CSV files with mean spectra, mean RGB values, and sample metadata. An example Python script is provided to facilitate data access and processing.</div><div>Potential reuse scenarios include classification of textiles by material or colour, prediction of natural fibre content, image segmentation, algorithm development for spectral classification, and use as a reference spectral library. The dataset’s comprehensive structure and open availability address the limitations of previous research, which often relied on small or non-public datasets, and is intended to accelerate advances in optical identification technologies for textile recycling.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112559"},"PeriodicalIF":1.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gut microbiomes of wild and domesticated mammals and birds in Slovenia, Europe: 16S rRNA sequencing data 欧洲斯洛文尼亚野生和家养哺乳动物和鸟类的肠道微生物组:16S rRNA测序数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-09 DOI: 10.1016/j.dib.2026.112564
Zlender Tanja , Rupnik Maja
From a One Health perspective, the gut microbiota of animals acts as a major driver of microbial exchange between animals and the environment. Animals continuously release gut microbes into their surroundings, shaping environmental and human microbial communities and potentially dispersing pathogens. Characterizing gut microbiota across diverse animal hosts is therefore critical for understanding the patterns of microbial spread through ecosystems and their impact on animal, human and environmental health.
Here, we introduce a large, taxonomically diverse dataset of fecal microbiomes from 715 individual animals representing over 50 mammalian and avian species. We collected samples from both wild and domestic animals with an emphasis on capturing microbial diversity across a wide range of taxa and ecological contexts. The samples were subjected to 16S rRNA gene sequencing, targeting the V3–V4 hypervariable region. Bioinformatic analysis was performed using Usearch to generate zero-radius operational taxonomic units (ZOTUs).
This dataset was generated primarily for the development of microbial source tracking (MST) assays used for identifying the sources of fecal pollution in contaminated water. However, it provides a valuable resource for broader microbiome research. It enables comparative studies across host species, trophic guilds, and environmental contexts such as domestication.
从“同一个健康”的角度来看,动物的肠道微生物群是动物与环境之间微生物交换的主要驱动力。动物不断地将肠道微生物释放到周围环境中,形成环境和人类微生物群落,并可能传播病原体。因此,表征不同动物宿主的肠道微生物群对于理解微生物在生态系统中的传播模式及其对动物、人类和环境健康的影响至关重要。在这里,我们介绍了一个大型的、分类多样化的粪便微生物组数据集,来自715只动物,代表了50多种哺乳动物和鸟类。我们收集了野生和家养动物的样本,重点是在广泛的分类群和生态环境中捕捉微生物多样性。对样品进行16S rRNA基因测序,靶向V3-V4高变区。利用ussearch进行生物信息学分析,生成零半径操作分类单位(ZOTUs)。该数据集主要用于开发微生物源追踪(MST)分析,用于识别受污染水中的粪便污染源。然而,它为更广泛的微生物组研究提供了宝贵的资源。它使跨宿主物种、营养行会和环境背景(如驯化)的比较研究成为可能。
{"title":"Gut microbiomes of wild and domesticated mammals and birds in Slovenia, Europe: 16S rRNA sequencing data","authors":"Zlender Tanja ,&nbsp;Rupnik Maja","doi":"10.1016/j.dib.2026.112564","DOIUrl":"10.1016/j.dib.2026.112564","url":null,"abstract":"<div><div>From a One Health perspective, the gut microbiota of animals acts as a major driver of microbial exchange between animals and the environment. Animals continuously release gut microbes into their surroundings, shaping environmental and human microbial communities and potentially dispersing pathogens. Characterizing gut microbiota across diverse animal hosts is therefore critical for understanding the patterns of microbial spread through ecosystems and their impact on animal, human and environmental health.</div><div>Here, we introduce a large, taxonomically diverse dataset of fecal microbiomes from 715 individual animals representing over 50 mammalian and avian species. We collected samples from both wild and domestic animals with an emphasis on capturing microbial diversity across a wide range of taxa and ecological contexts. The samples were subjected to 16S rRNA gene sequencing, targeting the V3–V4 hypervariable region. Bioinformatic analysis was performed using Usearch to generate zero-radius operational taxonomic units (ZOTUs).</div><div>This dataset was generated primarily for the development of microbial source tracking (MST) assays used for identifying the sources of fecal pollution in contaminated water. However, it provides a valuable resource for broader microbiome research. It enables comparative studies across host species, trophic guilds, and environmental contexts such as domestication.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112564"},"PeriodicalIF":1.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VDzSL: A synthetic video dataset for Algerian sign language using 3D avatars VDzSL:一个使用3D化身的阿尔及利亚手语合成视频数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-07 DOI: 10.1016/j.dib.2026.112551
Younes Ouargani, Noussaima El Khattabi
Video datasets are crucial for advancing communication technologies for deaf and hard-of-hearing individuals. Despite that, extensive datasets are not available for the majority of sign languages due to the ample work required to capture, clean, organize, and publish them. This paper introduces the Video Dataset for Algerian Sign Language (VDzSL), the largest video dataset for Algerian Sign Language. To ensure demographic diversity, VDzSL utilizes four different avatars to animate the signs and records them from five distinct camera angles, employing polar coordinates to ensure consistency while capturing varying horizontal and vertical perspectives. With 415 signs, our dataset has a 99.5% coverage of the signs included in 3DZSignDB’s SiGML dataset, and 26.6% coverage of the official ALGSL dictionary provided by the Algerian Ministry of Solidarity. Our dataset contains 8300 video files totaling 3 h, 11 min, and 43 s of synthetic videos provided at a 498×498 pixel resolution and an average frame rate of 27 frames per second across the entire dataset. The dataset is primarily aimed at training, testing, and benchmarking machine learning models, facilitating transfer learning and comparative analyses, as well as developing learning tools and accessibility applications.
视频数据集对于推进聋人和听力障碍者的通信技术至关重要。尽管如此,由于需要大量的工作来捕获、清理、组织和发布大多数手语,因此无法获得广泛的数据集。本文介绍了目前最大的阿尔及利亚手语视频数据集——阿尔及利亚手语视频数据集(VDzSL)。为了确保人口的多样性,VDzSL利用四个不同的化身来动画标志,并从五个不同的相机角度记录它们,采用极坐标来确保一致性,同时捕捉不同的水平和垂直视角。有415个标志,我们的数据集对3DZSignDB的SiGML数据集中包含的标志有99.5%的覆盖率,对阿尔及利亚团结部提供的官方ALGSL词典有26.6%的覆盖率。我们的数据集包含8300个视频文件,总计3小时11分钟43秒的合成视频,整个数据集以498×498像素分辨率和27帧/秒的平均帧率提供。该数据集主要用于训练、测试和对机器学习模型进行基准测试,促进迁移学习和比较分析,以及开发学习工具和可访问性应用程序。
{"title":"VDzSL: A synthetic video dataset for Algerian sign language using 3D avatars","authors":"Younes Ouargani,&nbsp;Noussaima El Khattabi","doi":"10.1016/j.dib.2026.112551","DOIUrl":"10.1016/j.dib.2026.112551","url":null,"abstract":"<div><div>Video datasets are crucial for advancing communication technologies for deaf and hard-of-hearing individuals. Despite that, extensive datasets are not available for the majority of sign languages due to the ample work required to capture, clean, organize, and publish them. This paper introduces the Video Dataset for Algerian Sign Language (VDzSL), the largest video dataset for Algerian Sign Language. To ensure demographic diversity, VDzSL utilizes four different avatars to animate the signs and records them from five distinct camera angles, employing polar coordinates to ensure consistency while capturing varying horizontal and vertical perspectives. With 415 signs, our dataset has a 99.5% coverage of the signs included in 3DZSignDB’s SiGML dataset, and 26.6% coverage of the official ALGSL dictionary provided by the Algerian Ministry of Solidarity. Our dataset contains 8300 video files totaling 3 h, 11 min, and 43 s of synthetic videos provided at a 498×498 pixel resolution and an average frame rate of 27 frames per second across the entire dataset. The dataset is primarily aimed at training, testing, and benchmarking machine learning models, facilitating transfer learning and comparative analyses, as well as developing learning tools and accessibility applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112551"},"PeriodicalIF":1.4,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experimental data collected during quasi-static cyclic loading of mass timber lateral force-resisting system tested in a three-story building structure 三层建筑结构大质量木材抗侧力体系准静态循环加载试验数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-07 DOI: 10.1016/j.dib.2026.112558
Nicholas T. Thielsen , Arijit Sinha , Andre R. Barbosa , Barbara G. Simpson , Daniel Cheney
The Emmerson Lab Launch Initiative (ELLI) was a multi-experiment testing program designed to evaluate the structural performance of mass timber lateral force-resisting systems (LFRS) under cyclic quasi-static loading protocols. The testing program was also designed to showcase, to their fullest potential, the testing capabilities of the A.A. “Red” Emmerson Advanced Wood Products Laboratory at Oregon State University in Corvallis, Oregon, USA (operational since 2020). This paper describes the static and visual data associated with one ELLI experiment comprised of two experiments. The paper also describes the value of the data and its collection process. In both experimental events, the test the gravity system of the test building sub-assemblage consisted of mass ply panel (MPP) diaphragms and out-of-plane walls, laminated veneer lumber (LVL) beams and columns, steel bolted gravity connections, and screwed connections. In the first experimental event, the test building structure did not include a dedicated LFRS separate from its gravity system. In the second experimental event, the test building structure included a self-centering rocking wall as the LFRS that consisted of veneer laminated timber (VLT) panels coupled by U-shaped flexural plate (UFP) energy dissipators with vertical post-tensioned (PT) rods and a horizontal tying system. The data associated with the two experimental events were used to evaluate performance-based seismic design objectives and validate the direct displacement-based method used in the lateral design of the structural system, which are described in Cyclic Testing of Three-Story Mass Timber Building Structure with Self-Centering Rocking Walls Coupled by UFP Dissipators. The dataset includes construction drawings and instrumentation plans of the test building structure as references for data collection and analysis. The dataset also includes displacement, strain, force, photograph, and time-lapse video data that collectively represent the structural behavior of the test building structure and its key components, and they are accessible through Mendeley Data with DOI: 10.17632/v6cs4t4zxc.1. This article complements the associated thesis and research publications by providing a peer-reviewed, standardized, and citable documentation of the experimental dataset, focused exclusively on data generation, structure, processing, limitations, and reuse, without duplicating analytical interpretation or conclusion.
埃默森实验室启动计划(ELLI)是一项多实验测试计划,旨在评估循环准静态加载协议下大量木材侧抗力系统(LFRS)的结构性能。该测试项目还旨在充分展示位于美国俄勒冈州科瓦利斯的俄勒冈州立大学A.A.“Red”埃默森高级木制品实验室的测试能力(自2020年开始运营)。本文描述了由两个实验组成的一个ELLI实验的静态和可视化数据。本文还描述了数据的价值及其收集过程。在这两个实验事件中,测试建筑子组件的重力系统包括质量层合板(MPP)隔板和面外墙,层压单板木材(LVL)梁和柱,钢螺栓重力连接和螺钉连接。在第一次实验中,测试建筑结构没有包括一个与重力系统分离的专用LFRS。在第二个实验事件中,测试建筑结构包括一个自定心摇摆墙作为LFRS,该结构由贴面层压木材(VLT)板与u形弯曲板(UFP)耗能器耦合组成,带有垂直后张(PT)杆和水平系扎系统。与这两个实验事件相关的数据被用来评估基于性能的抗震设计目标,并验证结构体系横向设计中使用的直接基于位移的方法,这些方法在《三层大质量木结构的循环测试》中进行了描述。数据集包括试验建筑结构的施工图和仪表平面图,作为数据收集和分析的参考。该数据集还包括位移、应变、力、照片和延时视频数据,这些数据共同代表了测试建筑结构及其关键部件的结构行为,并且可以通过Mendeley data访问,DOI: 10.17632/v6cs4t4zxc.1。本文通过提供同行评议的、标准化的、可引用的实验数据集文档,补充了相关的论文和研究出版物,专注于数据的生成、结构、处理、限制和重用,而不重复分析解释或结论。
{"title":"Experimental data collected during quasi-static cyclic loading of mass timber lateral force-resisting system tested in a three-story building structure","authors":"Nicholas T. Thielsen ,&nbsp;Arijit Sinha ,&nbsp;Andre R. Barbosa ,&nbsp;Barbara G. Simpson ,&nbsp;Daniel Cheney","doi":"10.1016/j.dib.2026.112558","DOIUrl":"10.1016/j.dib.2026.112558","url":null,"abstract":"<div><div>The Emmerson Lab Launch Initiative (ELLI) was a multi-experiment testing program designed to evaluate the structural performance of mass timber lateral force-resisting systems (LFRS) under cyclic quasi-static loading protocols. The testing program was also designed to showcase, to their fullest potential, the testing capabilities of the A.A. “Red” Emmerson Advanced Wood Products Laboratory at Oregon State University in Corvallis, Oregon, USA (operational since 2020). This paper describes the static and visual data associated with one ELLI experiment comprised of two experiments. The paper also describes the value of the data and its collection process. In both experimental events, the test the gravity system of the test building sub-assemblage consisted of mass ply panel (MPP) diaphragms and out-of-plane walls, laminated veneer lumber (LVL) beams and columns, steel bolted gravity connections, and screwed connections. In the first experimental event, the test building structure did not include a dedicated LFRS separate from its gravity system. In the second experimental event, the test building structure included a self-centering rocking wall as the LFRS that consisted of veneer laminated timber (VLT) panels coupled by U-shaped flexural plate (UFP) energy dissipators with vertical post-tensioned (PT) rods and a horizontal tying system. The data associated with the two experimental events were used to evaluate performance-based seismic design objectives and validate the direct displacement-based method used in the lateral design of the structural system, which are described in <em>Cyclic Testing of Three-Story Mass Timber Building Structure with Self-Centering Rocking Walls Coupled by UFP Dissipators</em>. The dataset includes construction drawings and instrumentation plans of the test building structure as references for data collection and analysis. The dataset also includes displacement, strain, force, photograph, and time-lapse video data that collectively represent the structural behavior of the test building structure and its key components, and they are accessible through Mendeley Data with DOI: 10.17632/v6cs4t4zxc.1. This article complements the associated thesis and research publications by providing a peer-reviewed, standardized, and citable documentation of the experimental dataset, focused exclusively on data generation, structure, processing, limitations, and reuse, without duplicating analytical interpretation or conclusion.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112558"},"PeriodicalIF":1.4,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The complete genome sequencing data of Priestia aryabhattai SPCL1 isolated from a heavy metal leachate-contaminated soil in Queretaro, México 从墨西哥克雷塔罗市重金属渗滤液污染土壤中分离的Priestia aryabhattai SPCL1全基因组测序数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-07 DOI: 10.1016/j.dib.2026.112561
Mario Eduardo Clemente Albores , María De Los Ángeles Hernández Tépach , Paola Itzel Herrera de la Torre , Mayra Paola Mena Navarro , María Carlota García Gutiérrez , Karla Isabel Lira De León , David Gustavo García Gutiérrez , Aldo Amaro Reyes , Miguel Angel Ramos López , José Alberto Rodríguez Morales , Erika Álvarez Hidalgo , Sergio de Jesús Romero Gómez , Juan Campos Guillén
We are providing the genome sequence of Priestia aryabhattai SPCL1, a bacterial strain isolated from a heavy metal leachate-contaminated soil in Querétaro, México. The Illumina NovaSeq platform was used to sequence the whole genome and the sequencing data obtained, including assembly and annotation, were analyzed on the BV-BRC platform. The genome, comprising 41 contigs and approximately 5.6 million base pairs with a GC content of 37.58 mol % and 6131 protein-coding sequences. In addition, 6 contigs of 146,177 bp (36.77 mol % G + C), 126,627 bp (33.27 mol % G + C), 16,881 bp (34.13 mol % G + C), 9835 bp (34.67 mol % G + C), 7402 bp (36.54 mol % G + C) and 4590 bp (35.38 mol % G + C) were assembled as plasmids. This analysis of genomic data represents a valuable resource for increasing knowledge of this bacterial specie and for possible applications in its biological functions. The genome data was deposited at National Center for Biotechnology Information (NCBI) under accession number Bioproject ID PRJNA1377581, Bio Sample ID SAMN53794006 and genome accession number ID JBSVDB000000000.
我们正在提供Priestia aryabhattai SPCL1的基因组序列,这是一种从墨西哥奎尔凯萨罗省重金属渗滤液污染的土壤中分离出来的细菌菌株。采用Illumina NovaSeq平台对全基因组进行测序,获得的测序数据在BV-BRC平台上进行分析,包括组装和注释。该基因组包含41个contigs,约560万个碱基对,GC含量为37.58 mol %,有6131个蛋白质编码序列。另外,共组装了146,177 bp (36.77 mol % G + C)、126,627 bp (33.27 mol % G + C)、16,881 bp (34.13 mol % G + C)、9835 bp (34.67 mol % G + C)、7402 bp (36.54 mol % G + C)和4590 bp (35.38 mol % G + C) 6个质粒。这种基因组数据的分析为增加对这种细菌物种的认识和在其生物学功能方面的可能应用提供了宝贵的资源。基因组数据保存在国家生物技术信息中心(NCBI),登录号为Bioproject ID PRJNA1377581, Bio Sample ID SAMN53794006,基因组登录号为JBSVDB000000000。
{"title":"The complete genome sequencing data of Priestia aryabhattai SPCL1 isolated from a heavy metal leachate-contaminated soil in Queretaro, México","authors":"Mario Eduardo Clemente Albores ,&nbsp;María De Los Ángeles Hernández Tépach ,&nbsp;Paola Itzel Herrera de la Torre ,&nbsp;Mayra Paola Mena Navarro ,&nbsp;María Carlota García Gutiérrez ,&nbsp;Karla Isabel Lira De León ,&nbsp;David Gustavo García Gutiérrez ,&nbsp;Aldo Amaro Reyes ,&nbsp;Miguel Angel Ramos López ,&nbsp;José Alberto Rodríguez Morales ,&nbsp;Erika Álvarez Hidalgo ,&nbsp;Sergio de Jesús Romero Gómez ,&nbsp;Juan Campos Guillén","doi":"10.1016/j.dib.2026.112561","DOIUrl":"10.1016/j.dib.2026.112561","url":null,"abstract":"<div><div>We are providing the genome sequence of <em>Priestia aryabhattai</em> SPCL1, a bacterial strain isolated from a heavy metal leachate-contaminated soil in Querétaro, México. The Illumina NovaSeq platform was used to sequence the whole genome and the sequencing data obtained, including assembly and annotation, were analyzed on the BV-BRC platform. The genome, comprising 41 contigs and approximately 5.6 million base pairs with a GC content of 37.58 mol % and 6131 protein-coding sequences. In addition, 6 contigs of 146,177 bp (36.77 mol % <em>G</em> + <em>C</em>), 126,627 bp (33.27 mol % <em>G</em> + <em>C</em>), 16,881 bp (34.13 mol % <em>G</em> + <em>C</em>), 9835 bp (34.67 mol % <em>G</em> + <em>C</em>), 7402 bp (36.54 mol % <em>G</em> + <em>C</em>) and 4590 bp (35.38 mol % <em>G</em> + <em>C</em>) were assembled as plasmids. This analysis of genomic data represents a valuable resource for increasing knowledge of this bacterial specie and for possible applications in its biological functions. The genome data was deposited at National Center for Biotechnology Information (NCBI) under accession number Bioproject ID PRJNA1377581, Bio Sample ID SAMN53794006 and genome accession number ID JBSVDB000000000.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112561"},"PeriodicalIF":1.4,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whole genome sequencing data analysis identified a cefotaxime-resistant Empedobacter brevis GBW-1 isolate from ground beef encoding a novel metallo-beta-lactamase variant, blaEBR-6 全基因组测序数据分析发现,从碎牛肉中分离出一株耐头孢噻肟短恩培多杆菌GBW-1,该菌株编码一种新型金属β -内酰胺酶变体blaEBR-6
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1016/j.dib.2026.112547
Daniel Jones , Praful Aggarwal , Jamison Trewyn , Poojhaa Shanmugam , Kyle Leistikow , Troy Skwor
While investigating foodstuffs for ESBL-producing Aeromonas species on ampicillin dextrin agar with vancomycin and cefotaxime, a multidrug-resistant Empedobacter brevis strain GBW-1 was identified from ground beef. Phylogenetic analysis supports the interconnectedness of environment, humans and food driving this species' evolutionary development. Antimicrobial susceptibility testing demonstrated resistance to gentamicin, carbapenems and third-generation cephalosporins. Data collection from whole genome sequencing of this strain detected a 3.74 Mb genome with 32.8% GC content containing 3780 coding genes. Among these genes, at least three known antimicrobial resistance (AMR) genes were identified from the dataset with qacG, vanT gene within the vanG cluster, and a novel variant of the metallo-β-lactamase blaEBR-6. This homologue, EBR-6, was compared against previously known EBR variants and was found to be closest to EBR-3 with an 84.98% amino acid identity match. Data collection from in silico molecular docking experiments predicted these mutations change the binding to meropenem. Furthermore, nearly 100 annotated regions associated with mobile genetic elements, including the presence of tra operons, were identified on the genome. Together, this dataset provides, genomic, phenotypic, and in silico data that may be reused to monitor the evolution of EBR from a One Health perspective.
在用万古霉素和头孢他肟对氨苄西林糊精琼脂对食品中产生esbl的气单胞菌进行调查时,从碎牛肉中鉴定出一株多重耐药的短恩培多杆菌菌株GBW-1。系统发育分析支持环境、人类和食物的相互联系,推动了这个物种的进化发展。抗菌药物敏感性试验显示对庆大霉素、碳青霉烯类和第三代头孢菌素耐药。全基因组测序结果显示,该菌株基因组全长3.74 Mb, GC含量32.8%,编码基因3780个。在这些基因中,至少有三个已知的抗菌素耐药性(AMR)基因从数据集中鉴定出,其中qacG基因,vanG簇中的vanT基因,以及金属β-内酰胺酶blaEBR-6的新变体。该同源物EBR-6与先前已知的EBR变体进行了比较,发现与EBR-3最接近,氨基酸同源性为84.98%。从硅分子对接实验中收集的数据预测,这些突变改变了与美罗培南的结合。此外,在基因组上鉴定了近100个与移动遗传元件相关的注释区域,包括反操纵子的存在。总的来说,该数据集提供了基因组、表型和计算机数据,可以重用这些数据,从One Health的角度监测EBR的演变。
{"title":"Whole genome sequencing data analysis identified a cefotaxime-resistant Empedobacter brevis GBW-1 isolate from ground beef encoding a novel metallo-beta-lactamase variant, blaEBR-6","authors":"Daniel Jones ,&nbsp;Praful Aggarwal ,&nbsp;Jamison Trewyn ,&nbsp;Poojhaa Shanmugam ,&nbsp;Kyle Leistikow ,&nbsp;Troy Skwor","doi":"10.1016/j.dib.2026.112547","DOIUrl":"10.1016/j.dib.2026.112547","url":null,"abstract":"<div><div>While investigating foodstuffs for ESBL-producing <em>Aeromonas</em> species on ampicillin dextrin agar with vancomycin and cefotaxime, a multidrug-resistant <em>Empedobacter brevis</em> strain GBW-1 was identified from ground beef. Phylogenetic analysis supports the interconnectedness of environment, humans and food driving this species' evolutionary development. Antimicrobial susceptibility testing demonstrated resistance to gentamicin, carbapenems and third-generation cephalosporins. Data collection from whole genome sequencing of this strain detected a 3.74 Mb genome with 32.8% GC content containing 3780 coding genes. Among these genes, at least three known antimicrobial resistance (AMR) genes were identified from the dataset with <em>qacG, vanT</em> gene within the <em>vanG</em> cluster, and a novel variant of the metallo-β-lactamase <em>bla</em><sub>EBR-6</sub>. This homologue, EBR-6, was compared against previously known EBR variants and was found to be closest to EBR-3 with an 84.98% amino acid identity match. Data collection from <em>in silico</em> molecular docking experiments predicted these mutations change the binding to meropenem. Furthermore, nearly 100 annotated regions associated with mobile genetic elements, including the presence of <em>tra</em> operons, were identified on the genome. Together, this dataset provides, genomic, phenotypic, and <em>in</em> silico data that may be reused to monitor the evolution of EBR from a One Health perspective.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112547"},"PeriodicalIF":1.4,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of sugarcane crop yield, production environment, meteorological records, and satellite images of commercial fields in the northeast of São Paulo State, Brazil 巴西<s:1>圣保罗州东北部商业农田的甘蔗作物产量、生产环境、气象记录和卫星图像数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1016/j.dib.2026.112549
Luiz Antonio Falaguasta Barbosa, Hernani Mazier Junior, Ivan Rizzo Guilherme, Daniel Carlos Guimarães Pedronette
Brazil is the world’s largest producer of sugarcane (Saccharum officinarum), accounting for approximately 40% of global production, with the state of São Paulo responsible for more than half of the national output due to its high level of mechanization. Despite its economic importance, publicly available datasets integrating information on sugarcane yield and production environment remain scarce.
This is the first freely available dataset comprising crop yield, meteorological, and production environment data with a large number of observations derived from multiple plots, harvest cycles, and time steps, and that identifies the exact locations of 12 commercial fields in the northeast of São Paulo State, Brazil. It is combined with images downloaded from the Sentinel-2 satellite, based on plot shapefiles, and with other meteorological data at the exact locations and during the same periods of sugarcane cultivation.
Crop yield and production environment data were shared by a sugar and alcohol plant operating in the region, collected at farms in the northeast of São Paulo State, Brazil, with measurements taken at the plot level across two plots per farm, across six farms. The data correspond to different numbers of harvests per plot. Between the plant and harvest dates, complementary data were generated by downloading Sentinel-2 RGB bands as single-band images and combining them into a single image. The exact process is applied using a meteorological dataset, selecting the closest meteorological station to obtain data for the same days between the plant and harvest dates.
Given the unavailability of integrated sugarcane datasets, this resource provides a valuable foundation for studies on crop yield prediction, analysis of production environments, and the development and evaluation of data-driven models in precision agriculture.
巴西是世界上最大的甘蔗(Saccharum officinarum)生产国,约占全球产量的40%,由于机械化水平高,圣保罗州的产量占全国产量的一半以上。尽管甘蔗具有重要的经济意义,但整合甘蔗产量和生产环境信息的公开数据集仍然很少。这是第一个免费提供的数据集,包括作物产量、气象和生产环境数据,以及从多个地块、收获周期和时间步长获得的大量观测数据,并确定了巴西圣保罗州东北部12个商业地块的确切位置。它结合了从哨兵2号卫星下载的基于地块形状文件的图像,以及在甘蔗种植的确切地点和同一时期的其他气象数据。在该地区运营的一家糖和酒精工厂共享了作物产量和生产环境数据,这些数据是从巴西圣保罗州东北部的农场收集的,并在六个农场的每个农场的两个地块上进行了地块水平的测量。这些数据对应于每块土地的不同收成数。在种植日期和收获日期之间,通过下载Sentinel-2 RGB波段作为单波段图像并将其合并为单个图像来生成补充数据。使用气象数据集应用精确的过程,选择最近的气象站来获取种植日期和收获日期之间同一天的数据。考虑到甘蔗综合数据集的缺乏,该资源为作物产量预测、生产环境分析以及数据驱动模型的开发和评估提供了有价值的基础。
{"title":"A dataset of sugarcane crop yield, production environment, meteorological records, and satellite images of commercial fields in the northeast of São Paulo State, Brazil","authors":"Luiz Antonio Falaguasta Barbosa,&nbsp;Hernani Mazier Junior,&nbsp;Ivan Rizzo Guilherme,&nbsp;Daniel Carlos Guimarães Pedronette","doi":"10.1016/j.dib.2026.112549","DOIUrl":"10.1016/j.dib.2026.112549","url":null,"abstract":"<div><div>Brazil is the world’s largest producer of sugarcane (<em>Saccharum officinarum</em>), accounting for approximately 40% of global production, with the state of São Paulo responsible for more than half of the national output due to its high level of mechanization. Despite its economic importance, publicly available datasets integrating information on sugarcane yield and production environment remain scarce.</div><div>This is the first freely available dataset comprising crop yield, meteorological, and production environment data with a large number of observations derived from multiple plots, harvest cycles, and time steps, and that identifies the exact locations of 12 commercial fields in the northeast of São Paulo State, Brazil. It is combined with images downloaded from the Sentinel-2 satellite, based on plot shapefiles, and with other meteorological data at the exact locations and during the same periods of sugarcane cultivation.</div><div>Crop yield and production environment data were shared by a sugar and alcohol plant operating in the region, collected at farms in the northeast of São Paulo State, Brazil, with measurements taken at the plot level across two plots per farm, across six farms. The data correspond to different numbers of harvests per plot. Between the plant and harvest dates, complementary data were generated by downloading Sentinel-2 RGB bands as single-band images and combining them into a single image. The exact process is applied using a meteorological dataset, selecting the closest meteorological station to obtain data for the same days between the plant and harvest dates.</div><div>Given the unavailability of integrated sugarcane datasets, this resource provides a valuable foundation for studies on crop yield prediction, analysis of production environments, and the development and evaluation of data-driven models in precision agriculture.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112549"},"PeriodicalIF":1.4,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146184990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An annotated dataset of images of Chinese giant salamanders 中国大鲵图像的注释数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1016/j.dib.2026.112552
Xinyao Yang , Junyi Chen , Didi Lu , Nanqing Sun , Mokai Xie , Haotian Qian
The Chinese giant salamander is classified as a Class II protected species in China and is recognized as critically endangered by the International Union for Conservation of Nature (IUCN). Due to their unique behavioral patterns, wild Chinese giant salamanders are primarily nocturnal and inhabit areas characterized by complex terrain, which results in limited detection coverage and significant challenges in observation. Consequently, images of wild Chinese giant salamanders are exceedingly rare, and the scarcity of existing data impedes the advancement and application of deep learning-based object detection models. This study constructs and releases a specialized dataset for Chinese giant salamanders, comprising 1386 images and a total of 1397 annotated bounding boxes. All images represent diverse field scenarios and are meticulously annotated in accordance with YOLO (You Only Look Once) labeling specifications. Annotation files are provided in both PASCAL VOC (Visual Object Classes) and COCO (Common Objects in Context) formats to ensure compatibility with leading detection frameworks, including YOLO v8 and YOLO v11. This dataset aims to offer high-quality, multi-scenario annotated data for research in computer vision and conservation biology, facilitating the training and evaluation of models for intelligent monitoring and species conservation of the Chinese giant salamander, thereby promoting the development of visual recognition technologies for endangered species.
中国大鲵在中国被列为二级保护物种,被国际自然保护联盟(IUCN)认定为极度濒危物种。由于其独特的行为模式,野生大鲵以夜间活动为主,居住在地形复杂的地区,这导致了探测覆盖率的限制和观测的重大挑战。因此,野生大鲵的图像非常罕见,现有数据的稀缺性阻碍了基于深度学习的目标检测模型的进步和应用。本研究构建并发布了中国大鲵的专业数据集,包含1386张图像和1397个带注释的边界框。所有图像代表不同的现场场景,并按照YOLO(你只看一次)标签规范精心注释。注释文件以PASCAL VOC(可视化对象类)和COCO(上下文中的公共对象)两种格式提供,以确保与领先的检测框架兼容,包括YOLO v8和YOLO v11。该数据集旨在为计算机视觉和保护生物学研究提供高质量、多场景的标注数据,促进大鲵智能监测和物种保护模型的训练和评估,从而促进濒危物种视觉识别技术的发展。
{"title":"An annotated dataset of images of Chinese giant salamanders","authors":"Xinyao Yang ,&nbsp;Junyi Chen ,&nbsp;Didi Lu ,&nbsp;Nanqing Sun ,&nbsp;Mokai Xie ,&nbsp;Haotian Qian","doi":"10.1016/j.dib.2026.112552","DOIUrl":"10.1016/j.dib.2026.112552","url":null,"abstract":"<div><div>The Chinese giant salamander is classified as a Class II protected species in China and is recognized as critically endangered by the International Union for Conservation of Nature (IUCN). Due to their unique behavioral patterns, wild Chinese giant salamanders are primarily nocturnal and inhabit areas characterized by complex terrain, which results in limited detection coverage and significant challenges in observation. Consequently, images of wild Chinese giant salamanders are exceedingly rare, and the scarcity of existing data impedes the advancement and application of deep learning-based object detection models. This study constructs and releases a specialized dataset for Chinese giant salamanders, comprising 1386 images and a total of 1397 annotated bounding boxes. All images represent diverse field scenarios and are meticulously annotated in accordance with YOLO (You Only Look Once) labeling specifications. Annotation files are provided in both PASCAL VOC (Visual Object Classes) and COCO (Common Objects in Context) formats to ensure compatibility with leading detection frameworks, including YOLO v8 and YOLO v11. This dataset aims to offer high-quality, multi-scenario annotated data for research in computer vision and conservation biology, facilitating the training and evaluation of models for intelligent monitoring and species conservation of the Chinese giant salamander, thereby promoting the development of visual recognition technologies for endangered species.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112552"},"PeriodicalIF":1.4,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transcriptome sequence dataset characterizing eggs, nymphs and adults of Oxycarenus hyalinipennis, the cotton seed bug 棉籽虫卵、若虫和成虫转录组序列数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-05 DOI: 10.1016/j.dib.2026.112532
Sam D. Heraghty, Aijun Zhang, Daniel Kuhar, Dawn E. Gundersen-Rindal, Michael E. Sparks
The cotton seed bug, Oxycarenus hyalinipennis, is an agricultural pest that has recently been detected in the United States and has the potential to cause extensive economic damage to the cotton production industry. Currently, there are no transcriptomic resources for this species. The data reported here will serve to help guide future efforts to create additional reference resources as well as facilitate the development of population control strategies. These data could also be of use towards identifying protein coding genes in a cotton seed bug genome assembly. A total of 13,384 differentially expressed genes was identified, which collectively encoded 40,871 distinct transcripts, of which 18,842 could be annotated with a reference protein in the NCBI NR database, 13,233 with Pfam protein families and 8,089 with GO Gene Ontology terms. These transcripts could, for example, be targeted for future functional genomics work.
棉花籽虫,透明质氧虫,是最近在美国发现的一种农业害虫,有可能对棉花生产工业造成广泛的经济损失。目前,没有关于该物种的转录组学资源。这里报告的数据将有助于指导今后创造更多参考资源的工作,并促进人口控制战略的发展。这些数据也可用于鉴定棉籽虫基因组组装中的蛋白质编码基因。共鉴定出13,384个差异表达基因,共编码40,871个不同的转录本,其中18,842个可以用NCBI NR数据库中的参考蛋白进行注释,13,233个可以用Pfam蛋白家族进行注释,8,089个可以用GO基因本体术语进行注释。例如,这些转录本可以成为未来功能基因组学研究的目标。
{"title":"A transcriptome sequence dataset characterizing eggs, nymphs and adults of Oxycarenus hyalinipennis, the cotton seed bug","authors":"Sam D. Heraghty,&nbsp;Aijun Zhang,&nbsp;Daniel Kuhar,&nbsp;Dawn E. Gundersen-Rindal,&nbsp;Michael E. Sparks","doi":"10.1016/j.dib.2026.112532","DOIUrl":"10.1016/j.dib.2026.112532","url":null,"abstract":"<div><div>The cotton seed bug, <em>Oxycarenus hyalinipennis,</em> is an agricultural pest that has recently been detected in the United States and has the potential to cause extensive economic damage to the cotton production industry. Currently, there are no transcriptomic resources for this species. The data reported here will serve to help guide future efforts to create additional reference resources as well as facilitate the development of population control strategies. These data could also be of use towards identifying protein coding genes in a cotton seed bug genome assembly. A total of 13,384 differentially expressed genes was identified, which collectively encoded 40,871 distinct transcripts, of which 18,842 could be annotated with a reference protein in the NCBI NR database, 13,233 with Pfam protein families and 8,089 with GO Gene Ontology terms. These transcripts could, for example, be targeted for future functional genomics work.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112532"},"PeriodicalIF":1.4,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset reporting the differential effect of HNRNPA1 isoforms on alternative splicing 数据集报告了HNRNPA1亚型对选择性剪接的差异影响
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-05 DOI: 10.1016/j.dib.2026.112544
Jade-Emmanuelle Deshaies , Valérie Triassi , Martine Tétreault , Christine Vande Velde
Heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1) encodes two main protein coding variants: hnRNP A1 and hnRNP A1B. The isoforms differ by the exclusion or inclusion of exon 8 (sometimes referred to as exon 7B), which extends the length of the intrinsically disordered region (IDR). HnRNP A1 is implicated in most major steps of nascent RNA transcript processing, with RNA splicing being the most studied function. While hnRNP A1 has been studied extensively, little is known about the relevance of the longer isoform, hnRNP A1B. In fact, with respect to alternative splicing, only two groups have reported a functional analysis of both isoforms, revealing that both isoforms modulate alternative splicing, albeit with different efficiencies. To better understand the contribution of each isoform on alternative splicing, we analyzed the transcriptomes of mouse erythroleukemia cells either lacking HNRNPA1 (CB3) or uniquely expressing one isoform [hnRNP A1 (CB3 A1) or hnRNP A1B (CB3 A1B)] via stable constitutive expression of murine cDNAs. Our data indicate that differential isoform expression modulates the splicing of both shared and isoform-specific gene sets. These genes are involved in a wide variety of molecular functions and biological processes. Finally, and intriguingly, analysis of the genes with the largest differences in inclusion levels revealed enrichment for genes implicated in several neurodegenerative and neurodevelopmental diseases, as well as intellectual disability, myopathy and cancer.
异质核核糖核蛋白A1 (HNRNPA1)编码两种主要的蛋白质编码变体:hnRNP A1和hnRNP A1B。同种异构体的不同之处在于排除或包含外显子8(有时称为外显子7B),这延长了内在无序区(IDR)的长度。HnRNP A1参与了新生RNA转录加工的大多数主要步骤,其中RNA剪接是研究最多的功能。虽然hnRNP A1已被广泛研究,但人们对其较长的同工异构体hnRNP A1B的相关性知之甚少。事实上,关于选择性剪接,只有两个研究小组报道了两种同工异构体的功能分析,揭示了两种同工异构体调节选择性剪接,尽管效率不同。为了更好地了解每种异构体对选择性剪接的贡献,我们分析了缺乏HNRNPA1 (CB3)或通过小鼠cdna的稳定组成表达唯一表达一种异构体[hnRNP A1 (CB3 A1)或hnRNP A1B (CB3 A1B)]的小鼠红白血病细胞的转录组。我们的数据表明,差异异构体表达调节了共享和异构体特异性基因集的剪接。这些基因参与了各种各样的分子功能和生物过程。最后,有趣的是,对包含水平差异最大的基因的分析显示,与几种神经退行性和神经发育疾病、智力残疾、肌病和癌症有关的基因富集。
{"title":"Dataset reporting the differential effect of HNRNPA1 isoforms on alternative splicing","authors":"Jade-Emmanuelle Deshaies ,&nbsp;Valérie Triassi ,&nbsp;Martine Tétreault ,&nbsp;Christine Vande Velde","doi":"10.1016/j.dib.2026.112544","DOIUrl":"10.1016/j.dib.2026.112544","url":null,"abstract":"<div><div>Heterogeneous nuclear ribonucleoprotein A1 (<em>HNRNPA1</em>) encodes two main protein coding variants: hnRNP A1 and hnRNP A1B. The isoforms differ by the exclusion or inclusion of exon 8 (sometimes referred to as exon 7B), which extends the length of the intrinsically disordered region (IDR). HnRNP A1 is implicated in most major steps of nascent RNA transcript processing, with RNA splicing being the most studied function. While hnRNP A1 has been studied extensively, little is known about the relevance of the longer isoform, hnRNP A1B. In fact, with respect to alternative splicing, only two groups have reported a functional analysis of both isoforms, revealing that both isoforms modulate alternative splicing, albeit with different efficiencies. To better understand the contribution of each isoform on alternative splicing, we analyzed the transcriptomes of mouse erythroleukemia cells either lacking <em>HNRNPA1</em> (CB3) or uniquely expressing one isoform [hnRNP A1 (CB3 A1) or hnRNP A1B (CB3 A1B)] via stable constitutive expression of murine cDNAs. Our data indicate that differential isoform expression modulates the splicing of both shared and isoform-specific gene sets. These genes are involved in a wide variety of molecular functions and biological processes. Finally, and intriguingly, analysis of the genes with the largest differences in inclusion levels revealed enrichment for genes implicated in several neurodegenerative and neurodevelopmental diseases, as well as intellectual disability, myopathy and cancer.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112544"},"PeriodicalIF":1.4,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1