Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111251
Sandra Lucía Hernández Zetina , Ana Belén Anquela Julián , Ángel Esteban Martín Furones , Carlos Martinez Montes , Santos Fernández Noguerol
The dataset offers a comprehensive information to analyse cities and neighbourhood that are potentially unsafe for women, this information has been collected for four cities: Toluca (Mexico), Valencia (Spain), Dublin (Ireland) and San Francisco (USA). The collection includes quantitative and qualitative variables obtained and processed from open data, georeferenced publications from a social media platform, and points located through participatory mapping sessions.
The data is structured in raw format, organized by country and city, and categorized according to the data source used while processing, which allows unrestricted access with most data analysis software and it does not depend on specific licenses. This format includes both geometric information and associated attributes allowing reusability and analysis in different environments.
Additionally, the release of this data allows developing models tailored to specific local contexts and represents a significant advance in open data access as stated in the Sustainable Development Goal 5 (SDG 5), especially in relation to indicator 5.2.2. In general, this indicator faces a lack of sufficient data for accurate measurement, which limits the ability to accurately assess and address gender-based violence. By providing an open and flexible resource, the dataset not only facilitates comparative research and informed policymaking, it also supports the international commitment for transparency and contributes to filling existing gaps in information on violence and insecurity.
{"title":"Integration of data sets for modelling gender violence and perception of insecurity","authors":"Sandra Lucía Hernández Zetina , Ana Belén Anquela Julián , Ángel Esteban Martín Furones , Carlos Martinez Montes , Santos Fernández Noguerol","doi":"10.1016/j.dib.2024.111251","DOIUrl":"10.1016/j.dib.2024.111251","url":null,"abstract":"<div><div>The dataset offers a comprehensive information to analyse cities and neighbourhood that are potentially unsafe for women, this information has been collected for four cities: Toluca (Mexico), Valencia (Spain), Dublin (Ireland) and San Francisco (USA). The collection includes quantitative and qualitative variables obtained and processed from open data, georeferenced publications from a social media platform, and points located through participatory mapping sessions.</div><div>The data is structured in raw format, organized by country and city, and categorized according to the data source used while processing, which allows unrestricted access with most data analysis software and it does not depend on specific licenses. This format includes both geometric information and associated attributes allowing reusability and analysis in different environments.</div><div>Additionally, the release of this data allows developing models tailored to specific local contexts and represents a significant advance in open data access as stated in the Sustainable Development Goal 5 (SDG 5), especially in relation to indicator 5.2.2. In general, this indicator faces a lack of sufficient data for accurate measurement, which limits the ability to accurately assess and address gender-based violence. By providing an open and flexible resource, the dataset not only facilitates comparative research and informed policymaking, it also supports the international commitment for transparency and contributes to filling existing gaps in information on violence and insecurity.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111251"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111225
René Báez-Santana , Miguel Aybar-Mejía , Máximo A. Domínguez-Garabitos , Víctor S. Ocaña-Guevara
The electric power industry has an impact on fossil fuel consumption, which must be considered in decarbonization strategies. Energy systems optimization modelling can be applied to evaluate policy scenarios in the power sector to accelerate energy transitions. These modelling tools need data to simulate different scenarios in the power system to clarify the design of energy policies. For this reason, collecting and processing technical and economic data is needed to guarantee quality input for the modelling tools. This article presents a dataset for an optimization model of the generation mix and the energy demand in the power system of the Dominican Republic to determine the capacity value of variable renewable energy (VRE), i.e., wind and solar, that can serve as an incentive for these technologies. While the data corresponds to the Dominican Republic's power system, the method of collecting and processing data can be implemented in other countries. The data collected is an open-access database of the independent system operator, the power sector regulator, and utilities, as well as websites and databases of international organizations.
{"title":"Techno-economic dataset for energy market and capacity payment co-optimization in the Dominican Republicʼs power market","authors":"René Báez-Santana , Miguel Aybar-Mejía , Máximo A. Domínguez-Garabitos , Víctor S. Ocaña-Guevara","doi":"10.1016/j.dib.2024.111225","DOIUrl":"10.1016/j.dib.2024.111225","url":null,"abstract":"<div><div>The electric power industry has an impact on fossil fuel consumption, which must be considered in decarbonization strategies. Energy systems optimization modelling can be applied to evaluate policy scenarios in the power sector to accelerate energy transitions. These modelling tools need data to simulate different scenarios in the power system to clarify the design of energy policies. For this reason, collecting and processing technical and economic data is needed to guarantee quality input for the modelling tools. This article presents a dataset for an optimization model of the generation mix and the energy demand in the power system of the Dominican Republic to determine the capacity value of variable renewable energy (VRE), i.e., wind and solar, that can serve as an incentive for these technologies. While the data corresponds to the Dominican Republic's power system, the method of collecting and processing data can be implemented in other countries. The data collected is an open-access database of the independent system operator, the power sector regulator, and utilities, as well as websites and databases of international organizations.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111225"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111268
Nikolaos Nagkoulis , Christos Adam , Ioannis Mamoutos , Stelios Katsanevakis , Antonios D. Mazaris
Incorporating ecological connectivity into spatial conservation planning is increasingly recognized as a key strategy to facilitate species movements, especially under changing environmental conditions. However, obtaining connectivity data is challenging, especially in the marine realm. Sea currents are essential for exploring marine structural connectivity, but transforming sea current data into spatial connectivity matrices involves complex and resource-intensive processing steps to ensure accuracy and usability. Here, an applied a graph-based methodology has been developed to transform current data into formats suitable for delineating ecological corridors and applied to Black Sea. The dataset produced can be integrated to spatial conservation prioritization tools to incorporate connectivity in the analysis. This approach involved converting current centroids into points and projecting current directions and magnitudes onto a nearest-neighbour graph connecting these points. Using open-source data from the Copernicus Black Sea Physics Reanalysis dataset from 1993 to 2023, a high-resolution dataset of graph objects (edge lists) and shapefiles (points and edges) for the Black Sea has been created. Analyses were conducted in R, and the algorithm developed to produce the data is accessible on Zenodo. The resulting datasets are compatible with multiple software platforms (e.g., R, Python, and QGIS). A total of 17 datasets are provided from 1993 to 2023: twelve for monthly, four for seasonal, and one for yearly aggregation, supporting diverse spatial and temporal analysis needs. Overall, the datasets can be used to analyse connectivity patterns across the entire Black Sea or focus on specific regions, particularly useful for ecological modelling, and environmental protection purposes.
{"title":"An ecological connectivity dataset for Black Sea obtained from sea currents","authors":"Nikolaos Nagkoulis , Christos Adam , Ioannis Mamoutos , Stelios Katsanevakis , Antonios D. Mazaris","doi":"10.1016/j.dib.2024.111268","DOIUrl":"10.1016/j.dib.2024.111268","url":null,"abstract":"<div><div>Incorporating ecological connectivity into spatial conservation planning is increasingly recognized as a key strategy to facilitate species movements, especially under changing environmental conditions. However, obtaining connectivity data is challenging, especially in the marine realm. Sea currents are essential for exploring marine structural connectivity, but transforming sea current data into spatial connectivity matrices involves complex and resource-intensive processing steps to ensure accuracy and usability. Here, an applied a graph-based methodology has been developed to transform current data into formats suitable for delineating ecological corridors and applied to Black Sea. The dataset produced can be integrated to spatial conservation prioritization tools to incorporate connectivity in the analysis. This approach involved converting current centroids into points and projecting current directions and magnitudes onto a nearest-neighbour graph connecting these points. Using open-source data from the Copernicus Black Sea Physics Reanalysis dataset from 1993 to 2023, a high-resolution dataset of graph objects (edge lists) and shapefiles (points and edges) for the Black Sea has been created. Analyses were conducted in R, and the algorithm developed to produce the data is accessible on Zenodo. The resulting datasets are compatible with multiple software platforms (e.g., R, Python, and QGIS). A total of 17 datasets are provided from 1993 to 2023: twelve for monthly, four for seasonal, and one for yearly aggregation, supporting diverse spatial and temporal analysis needs. Overall, the datasets can be used to analyse connectivity patterns across the entire Black Sea or focus on specific regions, particularly useful for ecological modelling, and environmental protection purposes.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111268"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11763243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111264
Georgios Charvalis , Michalis Koureas , Chloe Brimicombe , Chara Bogogiannidou , Fani Kalala , Varbara Mouchtouri , Christos Hadjichristodoulou , for HIGH Horizons Study Group
In this paper we present a dataset that contains daily mean, maximum and minimum values of 12 heat stress indices averaged over Greek communes from January 1998 to December 2022. The heat indices contained in the dataset include Apparent Temperature (AT), Heat Index (HI), Humidity Index (Humidex), Normal Effective Temperature (NET), Wet Bulb Globe Temperature (simple version WBGT), Wet Bulb Globe Temperature (thermofeelWBGT), Wet Bulb Temperature (WBT), Wind Chill Temperature (WCT), Mean Radiant Temperature (MRT), and Universal Thermal Climate Index (UTCI) with two variations (UTCI indoor and UTCI outdoor).
To develop the dataset, we used hourly climate variables, acquired from the ERA5 and ERA5-Land datasets, produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), which are accessible through the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) Application Program Interface (API) client. We used freely available python scripts and resources (HiTiSEA repository, thermofeel library), to calculate 12 heat stress indices for Greece at an enhanced spatial resolution of 0.1° × 0.1°. To facilitate geospatial analysis over the Greek communes, boundary data in shapefile format were obtained from the Hellenic Statistical Authority (ELSTAT). The execution of a built-in QGIS function was implemented to geospatially aggregate the NetCDF files of 12 daily mean, maximum and minimum, indices to 326 Greek communes for 9131 days.
The high spatial and temporal resolution of the data, makes the dataset appropriate for analysis and comparison of climate change impacts, heatwave patterns, and the development of climate adaptation strategies at a regional scale in Greece. Additionally, it can be used as a basis of a system to inform and devise targeted interventions and policies aimed at mitigating the effects of extreme heat events. The attribution of heat stress indices at the commune level (also referred as municipalities or municipal units), which is the lowest level of government within the organizational structure in Greece, enhances the usefulness of the data for statistical analysis against other parameters, such as epidemiological or socio-economic data, which are often available at this level. Finally, the dataset can support educational purposes, providing a practical example of climate data analysis and geospatial statistics applications.
{"title":"Daily time series of 12 human thermal stress indices in Greece, aggregated at commune level (1998–2022)","authors":"Georgios Charvalis , Michalis Koureas , Chloe Brimicombe , Chara Bogogiannidou , Fani Kalala , Varbara Mouchtouri , Christos Hadjichristodoulou , for HIGH Horizons Study Group","doi":"10.1016/j.dib.2024.111264","DOIUrl":"10.1016/j.dib.2024.111264","url":null,"abstract":"<div><div>In this paper we present a dataset that contains daily mean, maximum and minimum values of 12 heat stress indices averaged over Greek communes from January 1998 to December 2022. The heat indices contained in the dataset include Apparent Temperature (AT), Heat Index (HI), Humidity Index (Humidex), Normal Effective Temperature (NET), Wet Bulb Globe Temperature (simple version WBGT), Wet Bulb Globe Temperature (thermofeelWBGT), Wet Bulb Temperature (WBT), Wind Chill Temperature (WCT), Mean Radiant Temperature (MRT), and Universal Thermal Climate Index (UTCI) with two variations (UTCI indoor and UTCI outdoor).</div><div>To develop the dataset, we used hourly climate variables, acquired from the ERA5 and ERA5-Land datasets, produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), which are accessible through the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) Application Program Interface (API) client. We used freely available python scripts and resources (HiTiSEA repository, thermofeel library), to calculate 12 heat stress indices for Greece at an enhanced spatial resolution of 0.1° × 0.1°. To facilitate geospatial analysis over the Greek communes, boundary data in shapefile format were obtained from the Hellenic Statistical Authority (ELSTAT). The execution of a built-in QGIS function was implemented to geospatially aggregate the NetCDF files of 12 daily mean, maximum and minimum, indices to 326 Greek communes for 9131 days.</div><div>The high spatial and temporal resolution of the data, makes the dataset appropriate for analysis and comparison of climate change impacts, heatwave patterns, and the development of climate adaptation strategies at a regional scale in Greece. Additionally, it can be used as a basis of a system to inform and devise targeted interventions and policies aimed at mitigating the effects of extreme heat events. The attribution of heat stress indices at the commune level (also referred as municipalities or municipal units), which is the lowest level of government within the organizational structure in Greece, enhances the usefulness of the data for statistical analysis against other parameters, such as epidemiological or socio-economic data, which are often available at this level. Finally, the dataset can support educational purposes, providing a practical example of climate data analysis and geospatial statistics applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111264"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143131255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article describes data from an online survey conducted with the Swiss public from the two biggest language regions (German and French) in Switzerland. The survey was conducted in February 2023. Participants were recruited through a professional panel provider and quotas were used for age, gender and language region. The final sample contained 485 respondents. In the first part of the survey, respondents provided basic sociodemographic information. In the second part, their sustainability perceptions regarding four different weed management practices (full-surface spraying, hoeing machine, spot spraying and precise spraying) were investigated. Respondents were then randomly assigned to one of five experiment groups, in which information on a hoeing and a milking robot was presented, using 5 different information sources (male/female farmer, male/female scientist, no source). Technology perception was assessed using several questions and aspects (e.g. perception of economic, environmental and social sustainability). Finally, respondents answered questions assessing their attitudes towards the perception of farmers, food technology neophobia, chemophobia and the importance of naturalness. The survey can be used and adapted to different contents, aiming to investigate public perception of smart farming technologies and the influence of information sources on technology perception.
{"title":"Data on Swiss public's acceptance and sustainability perceptions of food produced with chemical, digital and mechanical weed control measures and the influence of information source on technology perception in agriculture","authors":"Jeanine Ammann , Nadja El Benni , Sandie Masson , Rita Saleh","doi":"10.1016/j.dib.2024.111212","DOIUrl":"10.1016/j.dib.2024.111212","url":null,"abstract":"<div><div>This article describes data from an online survey conducted with the Swiss public from the two biggest language regions (German and French) in Switzerland. The survey was conducted in February 2023. Participants were recruited through a professional panel provider and quotas were used for age, gender and language region. The final sample contained 485 respondents. In the first part of the survey, respondents provided basic sociodemographic information. In the second part, their sustainability perceptions regarding four different weed management practices (full-surface spraying, hoeing machine, spot spraying and precise spraying) were investigated. Respondents were then randomly assigned to one of five experiment groups, in which information on a hoeing and a milking robot was presented, using 5 different information sources (male/female farmer, male/female scientist, no source). Technology perception was assessed using several questions and aspects (e.g. perception of economic, environmental and social sustainability). Finally, respondents answered questions assessing their attitudes towards the perception of farmers, food technology neophobia, chemophobia and the importance of naturalness. The survey can be used and adapted to different contents, aiming to investigate public perception of smart farming technologies and the influence of information sources on technology perception.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111212"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142983029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111241
Mohammad Manzurul Islam, Md. Jubayer Ahmed, Mahmud Bin Shafi, Aritra Das, Md. Rakibul Hasan, Abdullah Al Rafi, Mohammad Rifat Ahmmad Rashid, Nishat Tasnim Niloy, Md. Sawkat Ali, Abdullahi Chowdhury, Ahmed Abdal Shafi Rasel
In the field of agriculture, particularly within the context of machine learning applications, quality datasets are essential for advancing research and development. To address the challenges of identifying different mango leaf types and recognizing the diverse and unique characteristics of mango varieties in Bangladesh, a comprehensive and publicly accessible dataset titled “BDMANGO” has been created. This dataset includes images essential for research, featuring six mango varieties: Amrapali, Banana, Chaunsa, Fazli, Haribhanga, and Himsagar, which were collected from different locations. The images were captured using the rear cameras of a Google Pixel 6a and an iPhone XR and were stored in 640 × 480 pixels resolution. Both sides of each mango leaf were photographed against white background to accurately reflect real-world scenarios in mango cultivation fields. The white background was specifically chosen to remove noise in image sample, allowing for accurate feature extraction by machine learning algorithms. This will ensure the trained model's efficacy in identifying a specific mango leaf while implemented alongside any segmentation algorithm. Additionally, image augmentation techniques such as rotation, horizontal flip, vertical flip, width shift, height shift, shear range, and zooming were applied to expand the dataset from 837 original images to a total of 6696 images (837 original image and 5859 augmented images). This expansion significantly enhances the dataset's utility for training, testing, and validating machine learning models designed for classifying mango leaf varieties, thereby supporting research efforts in this domain.
{"title":"BDMANGO: An image dataset for identifying the variety of mango based on the mango leaves","authors":"Mohammad Manzurul Islam, Md. Jubayer Ahmed, Mahmud Bin Shafi, Aritra Das, Md. Rakibul Hasan, Abdullah Al Rafi, Mohammad Rifat Ahmmad Rashid, Nishat Tasnim Niloy, Md. Sawkat Ali, Abdullahi Chowdhury, Ahmed Abdal Shafi Rasel","doi":"10.1016/j.dib.2024.111241","DOIUrl":"10.1016/j.dib.2024.111241","url":null,"abstract":"<div><div>In the field of agriculture, particularly within the context of machine learning applications, quality datasets are essential for advancing research and development. To address the challenges of identifying different mango leaf types and recognizing the diverse and unique characteristics of mango varieties in Bangladesh, a comprehensive and publicly accessible dataset titled “BDMANGO” has been created. This dataset includes images essential for research, featuring six mango varieties: Amrapali, Banana, Chaunsa, Fazli, Haribhanga, and Himsagar, which were collected from different locations. The images were captured using the rear cameras of a Google Pixel 6a and an iPhone XR and were stored in 640 × 480 pixels resolution. Both sides of each mango leaf were photographed against white background to accurately reflect real-world scenarios in mango cultivation fields. The white background was specifically chosen to remove noise in image sample, allowing for accurate feature extraction by machine learning algorithms. This will ensure the trained model's efficacy in identifying a specific mango leaf while implemented alongside any segmentation algorithm. Additionally, image augmentation techniques such as rotation, horizontal flip, vertical flip, width shift, height shift, shear range, and zooming were applied to expand the dataset from 837 original images to a total of 6696 images (837 original image and 5859 augmented images). This expansion significantly enhances the dataset's utility for training, testing, and validating machine learning models designed for classifying mango leaf varieties, thereby supporting research efforts in this domain.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111241"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111237
Jakty Kusuma , Analianasari , Anung Wahyudi , Muhammad Khalid Abdullah , Ahmad Zainul Hasan , Imam Asrowardi , Fitriani , Muhammad Tahir
Cloves (Syzygium aromaticum), a tree in the Myrtaceae family, are indigenous to the Maluku Islands in Indonesia and are widely utilized as a spice. Essential oils are commonly extracted from clove leaves, flower buds, and stalks. However, due to supply constraints, other clove species, notably Syzygium obtusifolium, are sometimes used as substitutes, leading to lower-grade essential oils. Here, we employed a non-targeted mass-spectrometry-based metabolomics approach to characterize the metabolic profiles of leaves from ten clove varieties, including S. obtusifolium. We identified and quantified 427 metabolites across various metabolic pathways. The metabolomics data for all samples are publicly available at the Figshare repository under 10.6084/m9.figshare.27212016. The data can be accessed directly at https://figshare.com/s/f4a40b7903b6a946b203.
{"title":"Diversity of the non-targeted metabolomic data across various varieties of Cloves (Syzygium spp.)","authors":"Jakty Kusuma , Analianasari , Anung Wahyudi , Muhammad Khalid Abdullah , Ahmad Zainul Hasan , Imam Asrowardi , Fitriani , Muhammad Tahir","doi":"10.1016/j.dib.2024.111237","DOIUrl":"10.1016/j.dib.2024.111237","url":null,"abstract":"<div><div>Cloves (<em>Syzygium aromaticum</em>), a tree in the Myrtaceae family, are indigenous to the Maluku Islands in Indonesia and are widely utilized as a spice. Essential oils are commonly extracted from clove leaves, flower buds, and stalks. However, due to supply constraints, other clove species, notably <em>Syzygium obtusifolium</em>, are sometimes used as substitutes, leading to lower-grade essential oils. Here, we employed a non-targeted mass-spectrometry-based metabolomics approach to characterize the metabolic profiles of leaves from ten clove varieties, including <em>S. obtusifolium</em>. We identified and quantified 427 metabolites across various metabolic pathways. The metabolomics data for all samples are publicly available at the Figshare repository under 10.6084/m9.figshare.27212016. The data can be accessed directly at <span><span>https://figshare.com/s/f4a40b7903b6a946b203</span><svg><path></path></svg></span><em>.</em></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111237"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111238
Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer
The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.
The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.
This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.
MaschinenBauIndustrie知识图谱(MBI-KG)是一个结构化和语义丰富的数据集,提取自1937年出版的“Die Maschinen-Industrie im Deutschen Reich”(德国帝国的机械工业),由“Wirtschaftsgruppe Maschinenbau”出版,由Herbert Patschan编辑。这个历史来源提供了二战前德国机械工程行业公司的数据。对该书进行数字化处理,并采用光学字符识别(OCR)技术提取文本。然后对提取的非结构化数据进行结构化和语义丰富,以支持数据集成和重用。语义丰富的数据被上传到一个开源的知识图谱软件中。由此产生的知识图谱包括与德国机械工程行业相关的公司、个人和行政实体的详细信息。数据可以通过各种方式访问,包括SPARQL端点、API、高级搜索功能、协调API和批量文件。知识图中的每个实体都可以以多种格式导出,例如CSV、RDF (ttl)、JSON和NDJSON,从而确保与各种研究工具和平台的兼容性。该数据集可以在各种研究领域中重用,包括经济史、数据科学和数字人文科学。通过提供一个关键历史时期的机器可读的结构化数据,MBI-KG促进了对20世纪初德国经济和工业景观的新颖分析和见解。该数据集与其他数据源的互操作性及其与FAIR原则的一致性进一步增强了其跨学科研究和长期保存的价值。
{"title":"MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book “Die Maschinen-Industrie im Deutschen Reich”","authors":"Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer","doi":"10.1016/j.dib.2024.111238","DOIUrl":"10.1016/j.dib.2024.111238","url":null,"abstract":"<div><div>The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.</div><div>The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.</div><div>This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111238"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper details the data collection process, dataset, and reuse potential of the Balkan Peace Index, a model designed to evaluate the levels of peacefulness in the Western Balkans. Data was gathered in phases: initially, a team of local experts conducted on-ground data collection, interviews, and focus groups, as well as using external international databases describing different notions of peace. This data was then processed and classified on a predefined scale by another team of experts using the Decision EXpert model. The BPI model incorporates both quantitative and qualitative data, reflecting the local context. The comprehensive dataset is stored in the Mendeley Data repository and offers significant reuse potential for further research, policy-making, and sensitivity analysis. This open-access resource aims to provide actionable insights for improving peace levels and preventing potential deterioration in the region.
{"title":"Balkan Peace Index decision EXpert model and data","authors":"Nemanja Džuverović , Sandro Radovanović , Goran Tepšić , Đorđe Krivokapić","doi":"10.1016/j.dib.2024.111181","DOIUrl":"10.1016/j.dib.2024.111181","url":null,"abstract":"<div><div>This paper details the data collection process, dataset, and reuse potential of the Balkan Peace Index, a model designed to evaluate the levels of peacefulness in the Western Balkans. Data was gathered in phases: initially, a team of local experts conducted on-ground data collection, interviews, and focus groups, as well as using external international databases describing different notions of peace. This data was then processed and classified on a predefined scale by another team of experts using the Decision EXpert model. The BPI model incorporates both quantitative and qualitative data, reflecting the local context. The comprehensive dataset is stored in the Mendeley Data repository and offers significant reuse potential for further research, policy-making, and sensitivity analysis. This open-access resource aims to provide actionable insights for improving peace levels and preventing potential deterioration in the region.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111181"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730566/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111250
Michel Visalli , Ronan Symoneaux , Cécile Mursic , Margaux Touret , Flore Lourtioux , Kipédène Coulibaly , Benjamin Mahieu
This dataset was created to investigate the impact of data collection modes and pre-processing techniques on the quality of free comment data related to consumers' sensory perceptions. A total of 200 consumers were recruited and divided into two groups of 100. Each group evaluated six madeleine samples (five distinct samples and one replicate) in a sensory analysis laboratory, using different free comment data collection modes. Consumers in the first group provided only words or short expressions, while those in the second group used complete sentences. Additionally, participants reported their liking for each sample.
The collected data provided valuable insights into the effectiveness of the free comment method in sensory evaluation of food products. They emphasized the importance of data pre-processing and demonstrated how the chosen techniques can impact the quality of the results. The dataset is based on real-world consumer data, showcasing how individuals naturally express their subjective perceptions. It features descriptions that reflect authentic consumer language, including informal expressions, incorrect phrasing, spelling errors, and unstructured sentences. This raw textual data has been annotated and translated into English. The dataset can therefore be repurposed to assess and compare the performance of different text mining, natural language processing and sentiment analysis algorithms in both French and English, as well as to drive innovations in AI tools for sensory and consumer research.
{"title":"A dataset of annotated free comments on the sensory perception of madeleines for benchmarking text mining techniques","authors":"Michel Visalli , Ronan Symoneaux , Cécile Mursic , Margaux Touret , Flore Lourtioux , Kipédène Coulibaly , Benjamin Mahieu","doi":"10.1016/j.dib.2024.111250","DOIUrl":"10.1016/j.dib.2024.111250","url":null,"abstract":"<div><div>This dataset was created to investigate the impact of data collection modes and pre-processing techniques on the quality of free comment data related to consumers' sensory perceptions. A total of 200 consumers were recruited and divided into two groups of 100. Each group evaluated six madeleine samples (five distinct samples and one replicate) in a sensory analysis laboratory, using different free comment data collection modes. Consumers in the first group provided only words or short expressions, while those in the second group used complete sentences. Additionally, participants reported their liking for each sample.</div><div>The collected data provided valuable insights into the effectiveness of the free comment method in sensory evaluation of food products. They emphasized the importance of data pre-processing and demonstrated how the chosen techniques can impact the quality of the results. The dataset is based on real-world consumer data, showcasing how individuals naturally express their subjective perceptions. It features descriptions that reflect authentic consumer language, including informal expressions, incorrect phrasing, spelling errors, and unstructured sentences. This raw textual data has been annotated and translated into English. The dataset can therefore be repurposed to assess and compare the performance of different text mining, natural language processing and sentiment analysis algorithms in both French and English, as well as to drive innovations in AI tools for sensory and consumer research.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111250"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}