Pub Date : 2026-04-01Epub Date: 2026-01-21DOI: 10.1016/j.dib.2026.112497
Niels Souverijns , Dirk Lauwaet , Quentin Lejeune , Chahan M. Kropf , Kam Lam Yeung , Shruti Nath , Carl F. Schleussner
Cities worldwide are increasingly facing the challenges of heat stress, a problem expected to worsen with ongoing climate change. The lack of detailed, city-specific data hinders effective response measures and limits the adaptive capacity of urban populations. In this data descriptor, we introduce a comprehensive database providing climate and heat stress information for 142 cities globally, covering the present and extending projections up to 2100 across three distinct climate scenarios, including two overshoot scenarios. This dataset includes 34 heat stress indicators at a spatial resolution of 100 meters, offering a unique database to identify vulnerable areas and deepen the understanding of urban heat risks. The data is presented through an accessible, user-friendly dashboard, enabling policymakers, researchers, and city planners, as well as non-experts, to easily visualise and interpret the findings, supporting more informed decision-making and urban adaptation strategies.
{"title":"100 m climate and heat stress data up to 2100 for 142 cities around the globe","authors":"Niels Souverijns , Dirk Lauwaet , Quentin Lejeune , Chahan M. Kropf , Kam Lam Yeung , Shruti Nath , Carl F. Schleussner","doi":"10.1016/j.dib.2026.112497","DOIUrl":"10.1016/j.dib.2026.112497","url":null,"abstract":"<div><div>Cities worldwide are increasingly facing the challenges of heat stress, a problem expected to worsen with ongoing climate change. The lack of detailed, city-specific data hinders effective response measures and limits the adaptive capacity of urban populations. In this data descriptor, we introduce a comprehensive database providing climate and heat stress information for 142 cities globally, covering the present and extending projections up to 2100 across three distinct climate scenarios, including two overshoot scenarios. This dataset includes 34 heat stress indicators at a spatial resolution of 100 meters, offering a unique database to identify vulnerable areas and deepen the understanding of urban heat risks. The data is presented through an accessible, user-friendly dashboard, enabling policymakers, researchers, and city planners, as well as non-experts, to easily visualise and interpret the findings, supporting more informed decision-making and urban adaptation strategies.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112497"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-12DOI: 10.1016/j.dib.2026.112458
Md. Darun Nayeem , Zarin Rafa , Tasnuva Tasnim Nova , Yasin Rahman , Abdul Mumeet Pathan , Md. Masudul Islam
This article describes a publicly available multimodal Bangla sentiment dataset designed to support research in speech processing, sentiment analysis, and low-resource language modeling. The dataset comprises two synchronized modalities: sentiment-annotated Bangla text and corresponding speech recordings. It contains 1,000 manually curated Bangla sentences evenly distributed across positive and negative sentiment classes, alongside 4,000 aligned audio recordings produced by four native speakers. Each sentence is recorded independently by all speakers to ensure speaker diversity while maintaining consistent textual content. The text component reflects natural, everyday Bangla language usage and is structured to facilitate sentiment classification and linguistic analysis. The audio recordings were collected under controlled yet realistic acoustic conditions using multiple recording devices, introducing natural variability relevant for real-world speech applications. All samples underwent manual quality verification to ensure accurate text–audio alignment and to remove noisy or duplicated recordings. The dataset is suitable for a wide range of applications, including multimodal sentiment classification, sentiment-aware speech recognition, audio–text alignment, and benchmarking of multimodal learning approaches for low-resource languages. Its modular structure allows straightforward extension with additional speakers, dialects, or sentiment categories. By providing aligned textual and speech data for Bangla, this dataset contributes a valuable resource to the research community and supports broader efforts toward linguistic diversity in artificial intelligence.
{"title":"BanglaMUSE: A multimodal Bangla sentiment dataset of text–audio pairs for speech and sentiment analysis","authors":"Md. Darun Nayeem , Zarin Rafa , Tasnuva Tasnim Nova , Yasin Rahman , Abdul Mumeet Pathan , Md. Masudul Islam","doi":"10.1016/j.dib.2026.112458","DOIUrl":"10.1016/j.dib.2026.112458","url":null,"abstract":"<div><div>This article describes a publicly available multimodal Bangla sentiment dataset designed to support research in speech processing, sentiment analysis, and low-resource language modeling. The dataset comprises two synchronized modalities: sentiment-annotated Bangla text and corresponding speech recordings. It contains 1,000 manually curated Bangla sentences evenly distributed across positive and negative sentiment classes, alongside 4,000 aligned audio recordings produced by four native speakers. Each sentence is recorded independently by all speakers to ensure speaker diversity while maintaining consistent textual content. The text component reflects natural, everyday Bangla language usage and is structured to facilitate sentiment classification and linguistic analysis. The audio recordings were collected under controlled yet realistic acoustic conditions using multiple recording devices, introducing natural variability relevant for real-world speech applications. All samples underwent manual quality verification to ensure accurate text–audio alignment and to remove noisy or duplicated recordings. The dataset is suitable for a wide range of applications, including multimodal sentiment classification, sentiment-aware speech recognition, audio–text alignment, and benchmarking of multimodal learning approaches for low-resource languages. Its modular structure allows straightforward extension with additional speakers, dialects, or sentiment categories. By providing aligned textual and speech data for Bangla, this dataset contributes a valuable resource to the research community and supports broader efforts toward linguistic diversity in artificial intelligence.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112458"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-14DOI: 10.1016/j.dib.2026.112465
Ha-Nam Nguyen , Hoai-Nam Nguyen , Thi-Thu Ngo
In the context of digital transformation in education, digital competence is one of the significant essential requirements for future teachers. This study surveyed and analyzed the digital competence structure of 1439 pre-service teachers in different regions of Viet Nam. We utilized a self-assessment questionnaire based on the Digital Kids Asia-Pacific (DKAP) framework, with references to TPACK (Technological Pedagogical Content Knowledge), DigComp (Digital Competence Framework for Citizens), and DigCompEdu frameworks. The dataset provided detailed information on each participant’s self-evaluated digital proficiency in five categories, along with demographic variables such as gender and subject specialization. The core of this data file locates itself in such potential to inform teacher training programs and educational policy by offering evidence on prowess and weakness in future teachers’ digital competence.
{"title":"Survey data on digital competence assessment among pre-service teachers in Vietnam","authors":"Ha-Nam Nguyen , Hoai-Nam Nguyen , Thi-Thu Ngo","doi":"10.1016/j.dib.2026.112465","DOIUrl":"10.1016/j.dib.2026.112465","url":null,"abstract":"<div><div>In the context of digital transformation in education, digital competence is one of the significant essential requirements for future teachers. This study surveyed and analyzed the digital competence structure of 1439 pre-service teachers in different regions of Viet Nam. We utilized a self-assessment questionnaire based on the Digital Kids Asia-Pacific (DKAP) framework, with references to TPACK (Technological Pedagogical Content Knowledge), DigComp (Digital Competence Framework for Citizens), and DigCompEdu frameworks. The dataset provided detailed information on each participant’s self-evaluated digital proficiency in five categories, along with demographic variables such as gender and subject specialization. The core of this data file locates itself in such potential to inform teacher training programs and educational policy by offering evidence on prowess and weakness in future teachers’ digital competence.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112465"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-21DOI: 10.1016/j.dib.2026.112486
Italo Aldo Campodonico-Avendano , Silvia Erba , Panayiotis Papadopoulos , Salvatore Carlucci , Antonio Luparelli , Amedeo Ingrosso , Greta Tresoldi , Muhammad Salman Shahid , Frederic Wurtz , Benoit Delinchant , Per Martin Leinan , Stefano Cera , Peter Riederer , Runar Solli , Amin Moazami , Mohammadreza Aghaei
Indoor Environmental Quality directly affects public health, productivity, and well-being, while also playing a vital role in developing climate-neutral, energy-efficient, and resilient buildings. This paper presents a comprehensive dataset of indoor environmental parameters that affect thermal comfort, indoor air quality, and visual comfort, which was created under the European Union’s Horizon 2020 Project Collective Intelligence for Energy Flexibility. The dataset comprises high-resolution measurements of carbon dioxide, pollutants, volatile organic compounds, air temperature, relative humidity, and illuminance on a horizontal plane, collected over a two-year period at 1-minute intervals. Data were gathered from 14 pilot buildings across four European climates: Cyprus, France, Italy, and Norway, covering diverse building types such as schools, medical centres, sports arenas, residential complexes, universities, and elder care facilities, representing about 40 % of common European building categories. Sensors were installed in specific thermal zones within each building to monitor environmental conditions. All data is organized by building and zone and supplemented with standardized Brick metadata to ensure interoperability. This comprehensive dataset, with its broad geographic coverage, variety of building types, long-term high-frequency measurements, and multimodal data, provides a valuable resource for comparative IEQ research, cross-domain modelling, and integrated assessments of comfort, ventilation, and daylighting across different climates and operational settings and is available upon request under a non-disclosure agreement provided by the consortium.
{"title":"COLLECTiEF dataset: A high-resolution indoor environmental dataset from European buildings across diverse climates supporting thermal, air-quality, and visual-comfort assessments","authors":"Italo Aldo Campodonico-Avendano , Silvia Erba , Panayiotis Papadopoulos , Salvatore Carlucci , Antonio Luparelli , Amedeo Ingrosso , Greta Tresoldi , Muhammad Salman Shahid , Frederic Wurtz , Benoit Delinchant , Per Martin Leinan , Stefano Cera , Peter Riederer , Runar Solli , Amin Moazami , Mohammadreza Aghaei","doi":"10.1016/j.dib.2026.112486","DOIUrl":"10.1016/j.dib.2026.112486","url":null,"abstract":"<div><div>Indoor Environmental Quality directly affects public health, productivity, and well-being, while also playing a vital role in developing climate-neutral, energy-efficient, and resilient buildings. This paper presents a comprehensive dataset of indoor environmental parameters that affect thermal comfort, indoor air quality, and visual comfort, which was created under the European Union’s Horizon 2020 Project <em>Collective Intelligence for Energy Flexibility</em>. The dataset comprises high-resolution measurements of carbon dioxide, pollutants, volatile organic compounds, air temperature, relative humidity, and illuminance on a horizontal plane, collected over a two-year period at 1-minute intervals. Data were gathered from 14 pilot buildings across four European climates: Cyprus, France, Italy, and Norway, covering diverse building types such as schools, medical centres, sports arenas, residential complexes, universities, and elder care facilities, representing about 40 % of common European building categories. Sensors were installed in specific thermal zones within each building to monitor environmental conditions. All data is organized by building and zone and supplemented with standardized Brick metadata to ensure interoperability. This comprehensive dataset, with its broad geographic coverage, variety of building types, long-term high-frequency measurements, and multimodal data, provides a valuable resource for comparative IEQ research, cross-domain modelling, and integrated assessments of comfort, ventilation, and daylighting across different climates and operational settings and is available upon request under a non-disclosure agreement provided by the consortium.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112486"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Insects feeding on xylem sap, such as adult Aphrophoridae spittlebugs, are vectors of the plant pathogenic xylem-limited bacterium Xylella fastidiosa (Xf), a causal agent of a number of severe diseases, including the Olive Quick Decline Syndrome (OQDS), which has decimated olive trees in the Mediterranean region. The Aphrophoridae life cycle and behaviour feature a weak stage, known as the juvenile stage, in which the insects live solitary on stems covered in a self-produced foamy fluid (froth) that protects them from dehydration and temperature stress. Juvenile vectors are ideal targets for a control intervention aimed at reducing transmission by adults. This paper presents the first, to the best of our knowledge, image dataset framing spittlebug froth samples in the field for the purpose of automated Aphrophoridae nymph identification. Images were captured using different devices including a consumer-grade RGB-D sensor, a digital reflex camera, and a smartphone camera. The dataset comprises 365 colour images, focusing on spittlebug foam. 211 of these images were captured in April 2024 during a two-day campaign. For these 211 images, a manual semantic annotation was performed, generating PNG binary masks that precisely distinguish spittlebug foam pixels from the background. To further enhance usability, labels are also provided in YOLO (You Only Look Once) format as text files, both for segmentation and object detection. The remaining 154 images were collected during a separate two-day campaign in 2025. These images are unannotated and are intended for further testing purposes. Overall, the dataset enables the development of both semantic segmentation models and object detectors for automated froth detection in natural images, thus facilitating the early identification of potentially harmful insects in sustainable pest management and control systems.
{"title":"Towards sustainable management of Xylella fastidiosa vectors: An annotated image dataset for automated in-field detection of Aphrophoridae foam","authors":"Michele Elia , Angelo Cardellicchio , Michele Paradiso , Giuseppe Veronico , Arianna Rana , Antonio Petitti , Vito Renò , Simone Pascuzzi , Annalisa Milella","doi":"10.1016/j.dib.2026.112477","DOIUrl":"10.1016/j.dib.2026.112477","url":null,"abstract":"<div><div>Insects feeding on xylem sap, such as adult <em>Aphrophoridae</em> spittlebugs, are vectors of the plant pathogenic xylem-limited bacterium <em>Xylella fastidiosa (Xf)</em>, a causal agent of a number of severe diseases, including the Olive Quick Decline Syndrome (OQDS), which has decimated olive trees in the Mediterranean region. The <em>Aphrophoridae</em> life cycle and behaviour feature a weak stage, known as the juvenile stage, in which the insects live solitary on stems covered in a self-produced foamy fluid (froth) that protects them from dehydration and temperature stress. Juvenile vectors are ideal targets for a control intervention aimed at reducing transmission by adults. This paper presents the first, to the best of our knowledge, image dataset framing spittlebug froth samples in the field for the purpose of automated <em>Aphrophoridae</em> nymph identification. Images were captured using different devices including a consumer-grade RGB-D sensor, a digital reflex camera, and a smartphone camera. The dataset comprises 365 colour images, focusing on spittlebug foam. 211 of these images were captured in April 2024 during a two-day campaign. For these 211 images, a manual semantic annotation was performed, generating PNG binary masks that precisely distinguish spittlebug foam pixels from the background. To further enhance usability, labels are also provided in YOLO (You Only Look Once) format as text files, both for segmentation and object detection. The remaining 154 images were collected during a separate two-day campaign in 2025. These images are unannotated and are intended for further testing purposes. Overall, the dataset enables the development of both semantic segmentation models and object detectors for automated froth detection in natural images, thus facilitating the early identification of potentially harmful insects in sustainable pest management and control systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112477"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-27DOI: 10.1016/j.dib.2026.112503
Alexandria B. Boehm , Marlene K. Wolfe , Amanda L. Bidwell , Alessandro Zulli , Bradley J. White , Bridgette Shelden , Dorothea Duong
This data article provides human pathogen nucleic-acid concentrations in wastewater solids from 147 treatment plants across 40 states in the United States. Concentrations were measured up to 7 times a week at the plants. The data run from 1 July 2024 through 15 September 2025, and represents an extension and expansion of the measurements provided in a previous data article. Nucleic-acid concentrations were measured using droplet digital (reverse-transcription–) polymerase chain reaction (ddRT-PCR). This article provides concentrations of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), influenza A and B viruses, respiratory syncytial virus, human metapneumovirus, enterovirus D68, parvovirus B19, norovirus genotype II, rotavirus, Candida auris, hepatitis A virus, human adenovirus group F, mpox virus clade Ib, mpox virus clade II, H1, H3, and H5 influenza A virus, measles, and pepper mild mottle virus nucleic acids in wastewater solids. These data can be used to study infectious disease epidemiology.
{"title":"Pathogen nucleic acids data in wastewater solids from 147 treatment plants in the United States: 2024–2025","authors":"Alexandria B. Boehm , Marlene K. Wolfe , Amanda L. Bidwell , Alessandro Zulli , Bradley J. White , Bridgette Shelden , Dorothea Duong","doi":"10.1016/j.dib.2026.112503","DOIUrl":"10.1016/j.dib.2026.112503","url":null,"abstract":"<div><div>This data article provides human pathogen nucleic-acid concentrations in wastewater solids from 147 treatment plants across 40 states in the United States. Concentrations were measured up to 7 times a week at the plants. The data run from 1 July 2024 through 15 September 2025, and represents an extension and expansion of the measurements provided in a previous data article. Nucleic-acid concentrations were measured using droplet digital (reverse-transcription–) polymerase chain reaction (ddRT-PCR). This article provides concentrations of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), influenza A and B viruses, respiratory syncytial virus, human metapneumovirus, enterovirus D68, parvovirus B19, norovirus genotype II, rotavirus, <em>Candida auris</em>, hepatitis A virus, human adenovirus group F, mpox virus clade Ib, mpox virus clade II, H1, H3, and H5 influenza A virus, measles, and pepper mild mottle virus nucleic acids in wastewater solids. These data can be used to study infectious disease epidemiology.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112503"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146184986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-27DOI: 10.1016/j.dib.2026.112504
Mingrui Li , Can Wu , Wentao Mao , Elif E. Firat , Robert S. Laramee
As the number of investment transactions grows, so does the importance of visual analysis to study financial data. Despite modern stock market platforms and research tools offering a range of stock data and visual analysis software, retail investment data is difficult to find due to privacy and security concerns. This challenge poses barriers to researchers and analysts interested in portfolio management, analysis, and visualization. This paper introduces StockVis, the first open and anonymized dataset of investment transactions from an individual investor. This freely accessible dataset can be used to study investment portfolio analysis, thereby improving strategic decision-making in portfolio management. StockVis features a comprehensive set of investment transactions focused on the U.S. stock market, encompassing the transaction records of a single anonymous investor over 3–4 years, complemented by derived metadata on the stocks of interest. We provide an overview of the dataset, detailing its features and the anonymization process and present some case studies and illustrative exemplar images as a foundation for further study. We are confident that the accessibility of this open data will significantly contribute to the research community, fostering enhanced exploration in the field of investment.
{"title":"StockData: An open investment transaction dataset","authors":"Mingrui Li , Can Wu , Wentao Mao , Elif E. Firat , Robert S. Laramee","doi":"10.1016/j.dib.2026.112504","DOIUrl":"10.1016/j.dib.2026.112504","url":null,"abstract":"<div><div>As the number of investment transactions grows, so does the importance of visual analysis to study financial data. Despite modern stock market platforms and research tools offering a range of stock data and visual analysis software, retail investment data is difficult to find due to privacy and security concerns. This challenge poses barriers to researchers and analysts interested in portfolio management, analysis, and visualization. This paper introduces StockVis, the first open and anonymized dataset of investment transactions from an individual investor. This freely accessible dataset can be used to study investment portfolio analysis, thereby improving strategic decision-making in portfolio management. StockVis features a comprehensive set of investment transactions focused on the U.S. stock market, encompassing the transaction records of a single anonymous investor over 3–4 years, complemented by derived metadata on the stocks of interest. We provide an overview of the dataset, detailing its features and the anonymization process and present some case studies and illustrative exemplar images as a foundation for further study. We are confident that the accessibility of this open data will significantly contribute to the research community, fostering enhanced exploration in the field of investment.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112504"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces a novel multilingual dataset designed to distinguish auto-tuned musical compositions from authentic recordings, addressing a significant gap in existing resources. The dataset encompasses songs in English, Mandarin, and Japanese, ensuring a diverse representation of linguistic contexts. The data collection process began with aggregating diverse datasets from the Music Information Retrieval domain, incorporating tracks from the three specified languages to capture a wide range of musical styles and recording qualities. Each audio file was subsequently standardized into 10-second intervals with the sample rate of 16 kHz to facilitate manageable analysis. For the creation of auto-tuned samples, pitch correction was implemented using the probabilistic YIN (PYIN) algorithm for accurate pitch detection, followed by transposition via the pitch-synchronized overlap and add (PSOLA) technique. To emulate realistic auto-tuning scenarios, pitch correction was randomly applied to portions of each 10-second segment, ensuring variability and realism in the dataset, which makes it suitable for training robust detection models. Additionally, time-domain labels indicating the exact locations of pitch correction within each segment were generated, providing precise annotations crucial for developing accurate detection algorithms. The resulting multilingual dataset comprises a comprehensive collection of both auto-tuned and authentic musical segments across English, Mandarin, and Japanese languages, each annotated with detailed information about pitch correction applications. This rich annotation allows for nuanced analysis and supports various research applications, while the dataset's structure and thorough documentation of its creation process make it a valuable resource for researchers in music analysis, machine learning, and audio signal processing.
{"title":"ATDD: Multi-lingual dataset for auto-tune detection in music recordings","authors":"Mahyar Gohari , Paolo Bestagini , Sergio Benini , Nicola Adami","doi":"10.1016/j.dib.2025.112446","DOIUrl":"10.1016/j.dib.2025.112446","url":null,"abstract":"<div><div>This study introduces a novel multilingual dataset designed to distinguish auto-tuned musical compositions from authentic recordings, addressing a significant gap in existing resources. The dataset encompasses songs in English, Mandarin, and Japanese, ensuring a diverse representation of linguistic contexts. The data collection process began with aggregating diverse datasets from the Music Information Retrieval domain, incorporating tracks from the three specified languages to capture a wide range of musical styles and recording qualities. Each audio file was subsequently standardized into 10-second intervals with the sample rate of 16 kHz to facilitate manageable analysis. For the creation of auto-tuned samples, pitch correction was implemented using the probabilistic YIN (PYIN) algorithm for accurate pitch detection, followed by transposition via the pitch-synchronized overlap and add (PSOLA) technique. To emulate realistic auto-tuning scenarios, pitch correction was randomly applied to portions of each 10-second segment, ensuring variability and realism in the dataset, which makes it suitable for training robust detection models. Additionally, time-domain labels indicating the exact locations of pitch correction within each segment were generated, providing precise annotations crucial for developing accurate detection algorithms. The resulting multilingual dataset comprises a comprehensive collection of both auto-tuned and authentic musical segments across English, Mandarin, and Japanese languages, each annotated with detailed information about pitch correction applications. This rich annotation allows for nuanced analysis and supports various research applications, while the dataset's structure and thorough documentation of its creation process make it a valuable resource for researchers in music analysis, machine learning, and audio signal processing.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112446"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-15DOI: 10.1016/j.dib.2026.112473
Justyna Kadłuczka , Tatsiana Chubukova , Przemysław Mielczarek , Agata Maziak , Adam Roman , Emilija Napieralska , Katarzyna Z. Kuter
The dataset shows proteomic results (timsTOF Pro 2 (Bruker)) obtained using an originally developed method of adult rat brain isolation of astrocytes or microglia from the same sample. Mechano-enzymatic dissociation and FACS sorting retrieved pure, separate cellular fractions from the substantia nigra. Results come from an animal model of early Parkinson’s disease of selective nigrostriatal dopaminergic system neuron degeneration by 6-OHDA, combined with 7-day-long astrocyte dysfunction and death induced by fluorocitrate. Astrocyte and neuron death both induce microglial activation, but to varying degrees and through different mechanisms. Previous studies did not allow for assigning changes in common mechanisms (such as, for example, energy metabolism) to a specific cell type in tissue, while in vitro studies lack functional dimension. This research enables the identification of clear information on mechanisms within each cell type, originating from a multidimensional environment, while maintaining the functional and tissue-specific context. Comparison of astrocyte death-induced vs neuron death-induced microglia activation processes can be analysed using this dataset. Raw data are available via ProteomeXchange with identifiers PXD066353 and PXD067265.
该数据集显示了蛋白质组学结果(timsTOF Pro 2 (Bruker)),使用最初开发的方法从相同样品中分离成年大鼠脑星形胶质细胞或小胶质细胞获得。机械酶解和FACS分选从黑质中提取了纯的、分离的细胞组分。结果来自于6-羟多巴胺诱导的选择性黑质纹状体多巴胺能系统神经元变性,并伴7天星形胶质细胞功能障碍和氟柠檬酸致死亡的早期帕金森病动物模型。星形胶质细胞和神经元死亡都能诱导小胶质细胞活化,但程度和机制不同。以前的研究不允许将共同机制的变化(例如,能量代谢)分配给组织中的特定细胞类型,而体外研究缺乏功能维度。这项研究能够在维持功能和组织特异性背景的同时,从多维环境中确定每种细胞类型机制的清晰信息。星形胶质细胞死亡诱导与神经元死亡诱导的小胶质细胞激活过程的比较可以使用该数据集进行分析。原始数据可通过ProteomeXchange与标识符PXD066353和PXD067265。
{"title":"Glial cell-specific proteomic data from the substantia nigra of a rat 6-OHDA and fluorocitrate model of astrocyte death and microglial activation","authors":"Justyna Kadłuczka , Tatsiana Chubukova , Przemysław Mielczarek , Agata Maziak , Adam Roman , Emilija Napieralska , Katarzyna Z. Kuter","doi":"10.1016/j.dib.2026.112473","DOIUrl":"10.1016/j.dib.2026.112473","url":null,"abstract":"<div><div>The dataset shows proteomic results (timsTOF Pro 2 (Bruker)) obtained using an originally developed method of adult rat brain isolation of astrocytes or microglia from the same sample. Mechano-enzymatic dissociation and FACS sorting retrieved pure, separate cellular fractions from the substantia nigra. Results come from an animal model of early Parkinson’s disease of selective nigrostriatal dopaminergic system neuron degeneration by 6-OHDA, combined with 7-day-long astrocyte dysfunction and death induced by fluorocitrate. Astrocyte and neuron death both induce microglial activation, but to varying degrees and through different mechanisms. Previous studies did not allow for assigning changes in common mechanisms (such as, for example, energy metabolism) to a specific cell type in tissue, while in vitro studies lack functional dimension. This research enables the identification of clear information on mechanisms within each cell type, originating from a multidimensional environment, while maintaining the functional and tissue-specific context. Comparison of astrocyte death-induced vs neuron death-induced microglia activation processes can be analysed using this dataset. Raw data are available via ProteomeXchange with identifiers PXD066353 and PXD067265.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112473"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-16DOI: 10.1016/j.dib.2026.112483
Anindita Das, Vinitha Hannah Subburaj, Yong Yang, Craig W. Bednarz
The dataset consists of drone images of cotton fields which were created to aid precision agriculture and machine learning-based weed detection research. The main goal is to enable the creation of object detection models for crop-weed differentiation while providing a standard for model evaluation. The dataset release serves two purposes: it supports the advancement of automated agricultural monitoring and sustainable farming practices, and it adds to the expanding research on AI solutions for agricultural productivity and environmental management.
{"title":"A UAV image dataset for object detection with annotations generated using LabelImg and Roboflow","authors":"Anindita Das, Vinitha Hannah Subburaj, Yong Yang, Craig W. Bednarz","doi":"10.1016/j.dib.2026.112483","DOIUrl":"10.1016/j.dib.2026.112483","url":null,"abstract":"<div><div>The dataset consists of drone images of cotton fields which were created to aid precision agriculture and machine learning-based weed detection research. The main goal is to enable the creation of object detection models for crop-weed differentiation while providing a standard for model evaluation. The dataset release serves two purposes: it supports the advancement of automated agricultural monitoring and sustainable farming practices, and it adds to the expanding research on AI solutions for agricultural productivity and environmental management.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112483"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146164538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}