Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-24 DOI: 10.3390/data8110161

Leonid V. Egorov, Viktor V. Aleksanov, Sergei K. Alekseev, Alexander B. Ruchin, Oleg N. Artaev, Mikhail N. Esin, Sergei V. Lukiyanov, Evgeniy A. Lobachev, Gennadiy B. Semishin

(1) Background: Carabidae is one of the most diverse families of Coleoptera. Many species of Carabidae are sensitive to anthropogenic impacts and are indicators of their environmental state. Some species of large beetles are on the verge of extinction. The aim of this research is to describe the Carabidae fauna of the Republic of Mordovia (central part of European Russia); (2) Methods: The research was carried out in April-September 1979, 1987, 2000, 2001, 2005, 2007–2022. Collections were performed using a variety of methods (light trapping, soil traps, window traps, etc.). For each observation, the coordinates of the sampling location, abundance, and dates were recorded; (3) Results: The dataset contains data on 251 species of Carabidae from 12 subfamilies and 4576 occurrences. A total of 66,378 specimens of Carabidae were studied. Another 29 species are additionally known from other publications. Also, twenty-two species were excluded from the fauna of the region, as they were determined earlier by mistake (4). Conclusions: The biodiversity of Carabidae in the Republic of Mordovia included 280 species from 12 subfamilies. Four species (Agonum scitulum, Lebia scapularis, Bembidion humerale, and Bembidion tenellum) were identified for the first time in the Republic of Mordovia.

(1)背景:蠓科是鞘翅目中种类最多的科之一。许多甲虫科物种对人类活动的影响非常敏感，是其环境状态的指示器。有些种类的大甲虫濒临灭绝。本研究的目的是描述莫尔多维亚共和国(欧洲俄罗斯中部)的甲壳类动物区系;(2)方法:研究时间分别为1979年4 - 9月、1987年、2000年、2001年、2005年、2007-2022年。采集方法多种多样(光捕法、土捕法、窗捕法等)。对于每次观测，记录采样位置、丰度和日期的坐标;(3)结果:该数据集包含足蠓科12亚科251种，共4576次。共调查蠓科标本66,378份。另有29种从其他出版物中已知。结论:莫尔多维亚共和国卡拉贝科的生物多样性包括12个亚科280种。在莫尔多瓦共和国首次鉴定到4种(gonum scitulum、Lebia scapularis、benbidion humerale和benbidion tenellum)。

{"title":"Dataset: Biodiversity of Ground Beetles (Coleoptera, Carabidae) of the Republic of Mordovia (Russia)","authors":"Leonid V. Egorov, Viktor V. Aleksanov, Sergei K. Alekseev, Alexander B. Ruchin, Oleg N. Artaev, Mikhail N. Esin, Sergei V. Lukiyanov, Evgeniy A. Lobachev, Gennadiy B. Semishin","doi":"10.3390/data8110161","DOIUrl":"https://doi.org/10.3390/data8110161","url":null,"abstract":"(1) Background: Carabidae is one of the most diverse families of Coleoptera. Many species of Carabidae are sensitive to anthropogenic impacts and are indicators of their environmental state. Some species of large beetles are on the verge of extinction. The aim of this research is to describe the Carabidae fauna of the Republic of Mordovia (central part of European Russia); (2) Methods: The research was carried out in April-September 1979, 1987, 2000, 2001, 2005, 2007–2022. Collections were performed using a variety of methods (light trapping, soil traps, window traps, etc.). For each observation, the coordinates of the sampling location, abundance, and dates were recorded; (3) Results: The dataset contains data on 251 species of Carabidae from 12 subfamilies and 4576 occurrences. A total of 66,378 specimens of Carabidae were studied. Another 29 species are additionally known from other publications. Also, twenty-two species were excluded from the fauna of the region, as they were determined earlier by mistake (4). Conclusions: The biodiversity of Carabidae in the Republic of Mordovia included 280 species from 12 subfamilies. Four species (Agonum scitulum, Lebia scapularis, Bembidion humerale, and Bembidion tenellum) were identified for the first time in the Republic of Mordovia.","PeriodicalId":36824,"journal":{"name":"Data","volume":"46 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135317564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences DataPLAN:基于web的植物科学数据管理计划生成器

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-24 DOI: 10.3390/data8110159

Xiao-Ran Zhou, Sebastian Beier, Dominik Brilhaus, Cristina Martins Rodrigues, Timo Mühlhaus, Dirk von Suchodoletz, Richard M. Twyman, Björn Usadel, Angela Kranz

Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts.

研究数据管理(RDM)结合了一组组织、存储和保存研究项目数据的实践。项目的RDM策略通常形式化为数据管理计划(DMP)——一个列出确保数据可查找性、可访问性、互操作性和可重用性(公平性)的过程的文档。RDM的许多方面都是跨学科标准化的，因此数据和元数据是可重用的，但是植物科学中的ddm组件通常是断开的。由于无法跨项目和资金来源重用特定于工厂的DMP内容，因此需要额外的时间和精力为不同的设置编写独特的DMP。为了解决这个问题，我们开发了dataplan——一个包含预先编写的植物科学DMP内容的开源工具，可以在线或离线使用来准备多个DMP。当前版本的DataPLAN支持地平线2020和地平线欧洲项目，以及由德国研究基金会(DFG)资助的项目。此外，DataPLAN还为用户提供了定制自己模板的选项。为配合其他资助计划，我们会在未来增加其他模板。DataPLAN通过提供针对不同资金背景优化的标准化RDM实践，减少了在植物科学中创建或更新dmp所需的工作量。

{"title":"DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences","authors":"Xiao-Ran Zhou, Sebastian Beier, Dominik Brilhaus, Cristina Martins Rodrigues, Timo Mühlhaus, Dirk von Suchodoletz, Richard M. Twyman, Björn Usadel, Angela Kranz","doi":"10.3390/data8110159","DOIUrl":"https://doi.org/10.3390/data8110159","url":null,"abstract":"Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts.","PeriodicalId":36824,"journal":{"name":"Data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135274085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Panel Regression Modelling for COVID-19 Infections and Deaths in Tamil Nadu, India 印度泰米尔纳德邦COVID-19感染和死亡的面板回归模型

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-23 DOI: 10.3390/data8100158

Rajarathinam Arunachalam

The impacts of the coronavirus disease 2019 (COVID-19) pandemic have been extremely severe, with both economic and health crises experienced worldwide. Based on the panel regression model, this study examined the trends and correlations in the number of COVID-19-related deaths and the number of COVID-19-infected cases in all 37 regions of the Tamil Nadu state in India, in August 2020. The fixed effects model had the greatest R2 value of 78% and exhibited significant results. The slope coefficient was also highly significant, showing a considerable variation in the relationship between new COVID-19 cases and deaths. Additionally, for every unit increase in COVID-19-infected cases, the death rate increased by 0.02%.

2019年冠状病毒病(COVID-19)大流行的影响极其严重，全球经历了经济和健康危机。本研究基于面板回归模型，研究了2020年8月印度泰米尔纳德邦所有37个地区与covid -19相关的死亡人数和covid -19感染病例数的趋势和相关性。固定效应模型的R2值最大，达到78%，结果显著。斜率系数也非常显著，表明新发病例与死亡之间的关系存在相当大的差异。此外，新冠肺炎感染病例每增加一个单位，死亡率就增加0.02%。

引用次数: 0

Industrial Environment Multi-Sensor Dataset for Vehicle Indoor Tracking with Wi-Fi, Inertial and Odometry Data 工业环境多传感器数据集，用于车辆室内跟踪与Wi-Fi，惯性和里程计数据

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-23 DOI: 10.3390/data8100157

Ivo Silva , Cristiano Pendão, Joaquín Torres-Sospedra, Adriano Moreira

This paper describes a dataset collected in an industrial setting using a mobile unit resembling an industrial vehicle equipped with several sensors. Wi-Fi interfaces collect signals from available Access Points (APs), while motion sensors collect data regarding the mobile unit’s movement (orientation and displacement). The distinctive features of this dataset include synchronous data collection from multiple sensors, such as Wi-Fi data acquired from multiple interfaces (including a radio map), orientation provided by two low-cost Inertial Measurement Unit (IMU) sensors, and displacement (travelled distance) measured by an absolute encoder attached to the mobile unit’s wheel. Accurate ground-truth information was determined using a computer vision approach that recorded timestamps as the mobile unit passed through reference locations. We assessed the quality of the proposed dataset by applying baseline methods for dead reckoning and Wi-Fi fingerprinting. The average positioning error for simple dead reckoning, without using any other absolute positioning technique, is 8.25 m and 11.66 m for IMU1 and IMU2, respectively. The average positioning error for simple Wi-Fi fingerprinting is 2.19 m when combining the RSSI information from five Wi-Fi interfaces. This dataset contributes to the fields of Industry 4.0 and mobile sensing, providing researchers with a resource to develop, test, and evaluate indoor tracking solutions for industrial vehicles.

本文描述了一个在工业环境中收集的数据集，使用一个类似于配备了几个传感器的工业车辆的移动单元。Wi-Fi接口从可用的接入点(ap)收集信号，而运动传感器收集有关移动设备运动(方向和位移)的数据。该数据集的独特特征包括来自多个传感器的同步数据收集，例如从多个接口(包括无线电地图)获取的Wi-Fi数据，两个低成本惯性测量单元(IMU)传感器提供的方向，以及由移动单元车轮上的绝对编码器测量的位移(行进距离)。使用计算机视觉方法确定准确的地面真实信息，该方法记录移动单元通过参考位置时的时间戳。我们通过应用航位推算和Wi-Fi指纹的基线方法来评估所建议数据集的质量。在不使用任何其他绝对定位技术的情况下，IMU1和IMU2的简单航位推算平均定位误差分别为8.25 m和11.66 m。结合5个Wi-Fi接口的RSSI信息，简单Wi-Fi指纹识别的平均定位误差为2.19 m。该数据集有助于工业4.0和移动传感领域，为研究人员提供开发、测试和评估工业车辆室内跟踪解决方案的资源。

{"title":"Industrial Environment Multi-Sensor Dataset for Vehicle Indoor Tracking with Wi-Fi, Inertial and Odometry Data","authors":"Ivo Silva , Cristiano Pendão, Joaquín Torres-Sospedra, Adriano Moreira","doi":"10.3390/data8100157","DOIUrl":"https://doi.org/10.3390/data8100157","url":null,"abstract":"This paper describes a dataset collected in an industrial setting using a mobile unit resembling an industrial vehicle equipped with several sensors. Wi-Fi interfaces collect signals from available Access Points (APs), while motion sensors collect data regarding the mobile unit’s movement (orientation and displacement). The distinctive features of this dataset include synchronous data collection from multiple sensors, such as Wi-Fi data acquired from multiple interfaces (including a radio map), orientation provided by two low-cost Inertial Measurement Unit (IMU) sensors, and displacement (travelled distance) measured by an absolute encoder attached to the mobile unit’s wheel. Accurate ground-truth information was determined using a computer vision approach that recorded timestamps as the mobile unit passed through reference locations. We assessed the quality of the proposed dataset by applying baseline methods for dead reckoning and Wi-Fi fingerprinting. The average positioning error for simple dead reckoning, without using any other absolute positioning technique, is 8.25 m and 11.66 m for IMU1 and IMU2, respectively. The average positioning error for simple Wi-Fi fingerprinting is 2.19 m when combining the RSSI information from five Wi-Fi interfaces. This dataset contributes to the fields of Industry 4.0 and mobile sensing, providing researchers with a resource to develop, test, and evaluate indoor tracking solutions for industrial vehicles.","PeriodicalId":36824,"journal":{"name":"Data","volume":"39 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135412994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks 用于阿拉伯语NLP任务的新伊斯兰法特瓦斯数据集的数据驱动探索

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-19 DOI: 10.3390/data8100155

Ohoud Alyemny, Hend Al-Khalifa, Abdulrahman Mirza

Islamic content is a broad and diverse domain that encompasses various sources, topics, and perspectives. However, there is a lack of comprehensive and reliable datasets that can facilitate conducting studies on Islamic content. In this paper, we present fatwaset, the first public Arabic dataset of Islamic fatwas. It contains Islamic fatwas that we collected from various trusted and authenticated sources in the Islamic fatwa domain, such as agencies, religious scholars, and websites. Fatwaset is a rich resource as it does not only contain fatwas but also includes a considerable set of their surrounding metadata. It can be used for many natural language processing (NLP) tasks, such as language modeling, question answering, author attribution, topic identification, text classification, and text summarization. It can also support other domains that are related to Islamic culture, such as philosophy and language art. We describe the methodology and criteria we used to select the content, as well as the challenges and limitations we faced. Additionally, we perform an Exploratory Data Analysis (EDA), which investigates the dataset from different perspectives. The results of the EDA reveal important information that greatly benefits researchers in this area.

伊斯兰内容是一个广泛而多样的领域，包括各种来源、主题和观点。然而，缺乏全面可靠的数据集来促进对伊斯兰内容的研究。在本文中，我们提出了fatwaset，第一个公开的阿拉伯语伊斯兰法特瓦数据集。它包含了我们从伊斯兰教法特瓦领域的各种可信和经过认证的来源收集的伊斯兰教法特瓦，例如机构，宗教学者和网站。Fatwaset是一个丰富的资源，因为它不仅包含fatwas，还包括相当多的围绕它们的元数据集。它可以用于许多自然语言处理(NLP)任务，如语言建模、问题回答、作者归属、主题识别、文本分类和文本摘要。它还可以支持与伊斯兰文化相关的其他领域，如哲学和语言艺术。我们描述了我们用来选择内容的方法和标准，以及我们面临的挑战和限制。此外，我们执行探索性数据分析(EDA)，从不同的角度调查数据集。EDA的结果揭示了对这一领域的研究人员大有裨益的重要信息。

{"title":"A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks","authors":"Ohoud Alyemny, Hend Al-Khalifa, Abdulrahman Mirza","doi":"10.3390/data8100155","DOIUrl":"https://doi.org/10.3390/data8100155","url":null,"abstract":"Islamic content is a broad and diverse domain that encompasses various sources, topics, and perspectives. However, there is a lack of comprehensive and reliable datasets that can facilitate conducting studies on Islamic content. In this paper, we present fatwaset, the first public Arabic dataset of Islamic fatwas. It contains Islamic fatwas that we collected from various trusted and authenticated sources in the Islamic fatwa domain, such as agencies, religious scholars, and websites. Fatwaset is a rich resource as it does not only contain fatwas but also includes a considerable set of their surrounding metadata. It can be used for many natural language processing (NLP) tasks, such as language modeling, question answering, author attribution, topic identification, text classification, and text summarization. It can also support other domains that are related to Islamic culture, such as philosophy and language art. We describe the methodology and criteria we used to select the content, as well as the challenges and limitations we faced. Additionally, we perform an Exploratory Data Analysis (EDA), which investigates the dataset from different perspectives. The results of the EDA reveal important information that greatly benefits researchers in this area.","PeriodicalId":36824,"journal":{"name":"Data","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135730705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cybersecurity Risk Assessments within Critical Infrastructure Social Networks 关键基础设施社会网络中的网络安全风险评估

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-19 DOI: 10.3390/data8100156

Alimbubi Aktayeva, Yerkhan Makatov, Akku Kubigenova Tulegenovna, Aibek Dautov, Rozamgul Niyazova, Maxud Zhamankarin, Sergey Khan

Cybersecurity social networking is a new scientific and engineering discipline that was interdisciplinary in its early days, but is now transdisciplinary. The issues of reviewing and analyzing of principal tasks related to information collection, monitoring of social networks, assessment methods, and preventing and combating cybersecurity threats are, therefore, essential and pending. There is a need to design certain methods, models, and program complexes aimed at estimating risks related to the cyberspace of social networks and the support of their activities. This study considers a risk to be the combination of consequences of a given event (or incident) with a probable occurrence (likelihood of occurrence) involved, while risk assessment is a general issue of identification, estimation, and evaluation of risk. The findings of the study made it possible to elucidate that the technique of cognitive modeling for risk assessment is part of a comprehensive cybersecurity approach included in the requirements of basic IT standards, including IT security risk management. The study presents a comprehensive approach in the field of cybersecurity in social networks that allows for consideration of all the elements that constitute cybersecurity as a complex, interconnected system. The ultimate goal of this approach to cybersecurity is the organization of an uninterrupted scheme of protection against any impacts related to physical, hardware, software, network, and human objects or resources of the critical infrastructure of social networks, as well as the integration of various levels and means of protection.

网络安全社交网络是一门新兴的科学和工程学科，在早期是跨学科的，但现在是跨学科的。因此，审查和分析与信息收集、社会网络监测、评估方法以及预防和应对网络安全威胁相关的主要任务的问题是必要的和有待解决的。有必要设计某些方法、模型和复杂的程序，旨在评估与社交网络的网络空间及其活动的支持相关的风险。本研究认为风险是给定事件(或事件)的后果与可能发生(发生的可能性)的结合，而风险评估是识别、估计和评估风险的一般问题。研究结果表明，用于风险评估的认知建模技术是包括it安全风险管理在内的基本it标准要求中包含的综合网络安全方法的一部分。该研究提出了一种社交网络网络安全领域的综合方法，允许将构成网络安全的所有要素考虑为一个复杂的、相互关联的系统。这种网络安全方法的最终目标是组织一个不间断的保护方案，以防止与物理、硬件、软件、网络和社交网络关键基础设施的人类对象或资源相关的任何影响，以及各种级别和保护手段的集成。

{"title":"Cybersecurity Risk Assessments within Critical Infrastructure Social Networks","authors":"Alimbubi Aktayeva, Yerkhan Makatov, Akku Kubigenova Tulegenovna, Aibek Dautov, Rozamgul Niyazova, Maxud Zhamankarin, Sergey Khan","doi":"10.3390/data8100156","DOIUrl":"https://doi.org/10.3390/data8100156","url":null,"abstract":"Cybersecurity social networking is a new scientific and engineering discipline that was interdisciplinary in its early days, but is now transdisciplinary. The issues of reviewing and analyzing of principal tasks related to information collection, monitoring of social networks, assessment methods, and preventing and combating cybersecurity threats are, therefore, essential and pending. There is a need to design certain methods, models, and program complexes aimed at estimating risks related to the cyberspace of social networks and the support of their activities. This study considers a risk to be the combination of consequences of a given event (or incident) with a probable occurrence (likelihood of occurrence) involved, while risk assessment is a general issue of identification, estimation, and evaluation of risk. The findings of the study made it possible to elucidate that the technique of cognitive modeling for risk assessment is part of a comprehensive cybersecurity approach included in the requirements of basic IT standards, including IT security risk management. The study presents a comprehensive approach in the field of cybersecurity in social networks that allows for consideration of all the elements that constitute cybersecurity as a complex, interconnected system. The ultimate goal of this approach to cybersecurity is the organization of an uninterrupted scheme of protection against any impacts related to physical, hardware, software, network, and human objects or resources of the critical infrastructure of social networks, as well as the integration of various levels and means of protection.","PeriodicalId":36824,"journal":{"name":"Data","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Dataset of Non-Indigenous and Native Fish of the Volga and Kama Rivers (European Russia) 伏尔加河和卡马河(俄罗斯欧洲部分)非本地和本地鱼类数据集

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-18 DOI: 10.3390/data8100154

Dmitry P. Karabanov, Dmitry D. Pavlov, Yury Y. Dgebuadze, Mikhail I. Bazarov, Elena A. Borovikova, Yuriy V. Gerasimov, Yulia V. Kodukhova, Pavel B. Mikheev, Eduard V. Nikitin, Tatyana L. Opaleva, Yuri A. Severov, Rimma Z. Sabitova, Alexey K. Smirnov, Yury I. Solomatin, Igor A. Stolbunov, Alexander I. Tsvetkov, Stanislav A. Vlasenko, Irina S. Voroshilova, Wenjun Zhong, Xiaowei Zhang, Alexey A. Kotov

Fish in the Volga-Kama River System (the largest river system in Europe) are important as a crucial food source for local populations; fish have the highest trophic level among hydrobionts. The purpose of this research is to describe the diversity of non-indigenous and native fish in the Volga and Kama Rivers, in the European part of Russia. This dataset encompasses data from June 2001 to September 2021 and comprises 1888 records (36,376 individual observations) for littoral and pelagic habitats from 143 sampling sites, representing 52 species from 42 genera in 22 families. The dataset has a Darwin Core standard format and has been fully released in the Global Biodiversity Information Facility (GBIF) under CC-BY 4.0 International license. The data are validated with several international databases such as FishBase, Eschmeyer’s Catalog of Fishes, the Barcode of Life Data System, and the SAS.Planet geoinformations system. Newly established populations have been found for several species belonging to the following Actinopteri families: Alosidae, Anguillidae, Cichlidae, Ehiravidae, Gobiidae, Odontobutidae, Syngnathidae, and Xenocyprididae. Therefore, this dataset can be used in the particular taxon species distribution analysis, which are especially important for non-indigenous species.

伏尔加-卡马河水系(欧洲最大的水系)中的鱼类是当地人口的重要食物来源;鱼类是水生生物中营养水平最高的。本研究的目的是描述俄罗斯欧洲部分伏尔加河和卡马河非本地和本地鱼类的多样性。该数据集涵盖2001年6月至2021年9月的数据，包括来自143个采样点的1888条记录(36,376个单独观察)，涉及22科42属52个物种。该数据集具有达尔文核心标准格式，并已在全球生物多样性信息设施(GBIF)中根据CC-BY 4.0国际许可完全发布。数据通过FishBase、Eschmeyer’s Catalog of Fishes、Barcode of Life data System和SAS等多个国际数据库进行验证。行星地理信息系统。新发现放线蝇科放线蝇科、鳗鲡科、鳗鲡科、鳗鲡科、鳗鲡科、鳗鲡科、梭鲈科、梭鲈科和异鲤科放线蝇科若干种。因此，该数据集可用于特定分类单元的物种分布分析，这对非本地物种尤其重要。

{"title":"A Dataset of Non-Indigenous and Native Fish of the Volga and Kama Rivers (European Russia)","authors":"Dmitry P. Karabanov, Dmitry D. Pavlov, Yury Y. Dgebuadze, Mikhail I. Bazarov, Elena A. Borovikova, Yuriy V. Gerasimov, Yulia V. Kodukhova, Pavel B. Mikheev, Eduard V. Nikitin, Tatyana L. Opaleva, Yuri A. Severov, Rimma Z. Sabitova, Alexey K. Smirnov, Yury I. Solomatin, Igor A. Stolbunov, Alexander I. Tsvetkov, Stanislav A. Vlasenko, Irina S. Voroshilova, Wenjun Zhong, Xiaowei Zhang, Alexey A. Kotov","doi":"10.3390/data8100154","DOIUrl":"https://doi.org/10.3390/data8100154","url":null,"abstract":"Fish in the Volga-Kama River System (the largest river system in Europe) are important as a crucial food source for local populations; fish have the highest trophic level among hydrobionts. The purpose of this research is to describe the diversity of non-indigenous and native fish in the Volga and Kama Rivers, in the European part of Russia. This dataset encompasses data from June 2001 to September 2021 and comprises 1888 records (36,376 individual observations) for littoral and pelagic habitats from 143 sampling sites, representing 52 species from 42 genera in 22 families. The dataset has a Darwin Core standard format and has been fully released in the Global Biodiversity Information Facility (GBIF) under CC-BY 4.0 International license. The data are validated with several international databases such as FishBase, Eschmeyer’s Catalog of Fishes, the Barcode of Life Data System, and the SAS.Planet geoinformations system. Newly established populations have been found for several species belonging to the following Actinopteri families: Alosidae, Anguillidae, Cichlidae, Ehiravidae, Gobiidae, Odontobutidae, Syngnathidae, and Xenocyprididae. Therefore, this dataset can be used in the particular taxon species distribution analysis, which are especially important for non-indigenous species.","PeriodicalId":36824,"journal":{"name":"Data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135823911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

USC-DCT: A Collection of Diverse Classification Tasks USC-DCT:不同分类任务的集合

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-12 DOI: 10.3390/data8100153

Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti

Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.

机器学习对于学术和现实世界的应用都是至关重要的工具。在这个领域中，分类问题经常被用作首选的展示，这导致了各种各样的数据集被收集并用于无数的应用程序。不幸的是，在如何收集、处理和传播这些数据集方面几乎没有标准化。随着终身学习或元学习等新的学习范式越来越流行，对大规模算法评估合并任务的需求也在增加。本文提供了一种处理和清理数据集的方法，该方法可以应用于现有或新的分类任务，并在称为USC-DCT的各种分类任务集合中实现这些实践。使用从internet收集的107个分类任务构建，该集合提供了一个透明和标准化的管道，可用于许多不同的应用程序和框架。虽然目前有107项任务，但USC-DCT旨在实现未来的增长。额外的讨论解释了机器学习范例中的应用，如迁移、终身学习或元学习，如何处理集合的修订，以及在这种规模下管理和使用分类任务的进一步提示。

{"title":"USC-DCT: A Collection of Diverse Classification Tasks","authors":"Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti","doi":"10.3390/data8100153","DOIUrl":"https://doi.org/10.3390/data8100153","url":null,"abstract":"Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.","PeriodicalId":36824,"journal":{"name":"Data","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136013300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dataset of Contamination (2009–2022) Legacy Contaminants (PCB and DDT) in Zooplankton of Lake Maggiore (CIPAIS, International Commission for the Protection of Italian-Swiss Waters) 马焦雷湖浮游动物遗留污染物(多氯联苯和滴滴涕)污染数据集(2009-2022)(CIPAIS，意大利-瑞士水域国际保护委员会)

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-12 DOI: 10.3390/data8100152

Roberta Bettinetti, Roberta Piscia, Marina Manca, Silvana Galassi, Silvia Quadroni, Carlo Dossi, Rossella Perna, Emanuela Boggio, Ginevra Boldrocchi, Michela Mazzoni, Benedetta Villa

In this paper, we describe a 13-year (2009–2022) dataset of legacy POP concentrations (DDTtot and sumPCB14 from 2016 isomers and congeners concentrations are also reported) in the planktonic crustaceans of Lake Maggiore (≥450 µm size fraction). The data were collected in the framework of a monitoring program finalized to assess the presence of pollutants in the lake biota, including zooplankton organisms directly preyed by fish. The data report both concentration of DDTtot and sumPCB14 in the zooplankton and the standing stock density and biomass of the population in each season. The dataset allows for detecting changes in the concentration over the long term and within a year, thus providing evidence for the seasonal and the plurennial variations in the presence of these pollutants in the lake. They also provide a basis for further studies aimed at modeling paths and the fate of persistent organic pollutants, for which the amount of toxicants stocked in the zooplankton compartment linked to fish is a crucial estimate.

在本文中，我们描述了一个13年(2009-2022)的数据集，该数据集记录了马焦雷湖浮游甲壳类动物(≥450µm粒径分数)中残留的POP浓度(ddttt和sumPCB14从2016年的异构体和同系物浓度也被报道)。这些数据是在一个监测项目的框架内收集的，该项目旨在评估湖泊生物群中污染物的存在，包括被鱼类直接捕食的浮游动物生物。数据报告了各季节浮游动物中ddttt和sumPCB14的浓度以及种群的蓄积量和生物量。该数据集允许检测长期和一年内浓度的变化，从而为湖泊中这些污染物存在的季节性和多年变化提供证据。它们还为旨在模拟持久性有机污染物的路径和命运的进一步研究提供了基础，在这些研究中，储存在与鱼类有关的浮游动物隔间中的有毒物质的数量是一个至关重要的估计。

{"title":"Dataset of Contamination (2009–2022) Legacy Contaminants (PCB and DDT) in Zooplankton of Lake Maggiore (CIPAIS, International Commission for the Protection of Italian-Swiss Waters)","authors":"Roberta Bettinetti, Roberta Piscia, Marina Manca, Silvana Galassi, Silvia Quadroni, Carlo Dossi, Rossella Perna, Emanuela Boggio, Ginevra Boldrocchi, Michela Mazzoni, Benedetta Villa","doi":"10.3390/data8100152","DOIUrl":"https://doi.org/10.3390/data8100152","url":null,"abstract":"In this paper, we describe a 13-year (2009–2022) dataset of legacy POP concentrations (DDTtot and sumPCB14 from 2016 isomers and congeners concentrations are also reported) in the planktonic crustaceans of Lake Maggiore (≥450 µm size fraction). The data were collected in the framework of a monitoring program finalized to assess the presence of pollutants in the lake biota, including zooplankton organisms directly preyed by fish. The data report both concentration of DDTtot and sumPCB14 in the zooplankton and the standing stock density and biomass of the population in each season. The dataset allows for detecting changes in the concentration over the long term and within a year, thus providing evidence for the seasonal and the plurennial variations in the presence of these pollutants in the lake. They also provide a basis for further studies aimed at modeling paths and the fate of persistent organic pollutants, for which the amount of toxicants stocked in the zooplankton compartment linked to fish is a crucial estimate.","PeriodicalId":36824,"journal":{"name":"Data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136012934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tracking a Decade of Hydrogeological Emergencies in Italian Municipalities 追踪意大利市政当局十年来的水文地质紧急情况

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-11 DOI: 10.3390/data8100151

Alessio Gatto, Stefano Clò, Federico Martellozzo, Samuele Segoni

This dataset collects tabular and geographical information about all hydrogeological disasters (landslides and floods) that occurred in Italy from 2013 to 2022 that caused such severe impacts as to require the declaration of national-level emergencies. The severity and spatiotemporal extension of each emergency are characterized in terms of duration and timing, funds requested by local administrations, funds approved by the national government, and municipalities and provinces hit by the event (further subdivided between those included in the emergency and those not, depending on whether relevant impacts were ascertained). Italian exposure to hydrogeological risk is portrayed strikingly: in the covered period, 123 emergencies affected Italy, all regions were struck at least once, and some provinces were struck more than 10 times. Damage declared by local institutions adds up to EUR 11,000,000,000, while national recovery funds add up to EUR 1,000,000,000. The dataset may foster further research on risk assessment, econometric analysis, public policy support, and decision-making implementation. Moreover, it provides systematic evidence helpful in raising awareness about hydrogeological risks affecting Italy.

该数据集收集了2013年至2022年在意大利发生的所有水文地质灾害(山体滑坡和洪水)的表格和地理信息，这些灾害造成的严重影响需要宣布国家级紧急情况。每次紧急情况的严重程度和时空延伸的特点是:持续时间和时间、地方行政部门要求的资金、国家政府批准的资金以及受该事件影响的市和省(进一步细分为紧急情况和非紧急情况，取决于是否确定了相关影响)。意大利暴露于水文地质风险是惊人的:在覆盖期间，123个紧急情况影响了意大利，所有地区至少被袭击一次，一些省份被袭击了10次以上。当地机构宣布的损失总计达110亿欧元，而国家恢复基金总计达10亿欧元。该数据集可以促进风险评估、计量经济分析、公共政策支持和决策实施方面的进一步研究。此外，它还提供了系统的证据，有助于提高对影响意大利的水文地质风险的认识。

{"title":"Tracking a Decade of Hydrogeological Emergencies in Italian Municipalities","authors":"Alessio Gatto, Stefano Clò, Federico Martellozzo, Samuele Segoni","doi":"10.3390/data8100151","DOIUrl":"https://doi.org/10.3390/data8100151","url":null,"abstract":"This dataset collects tabular and geographical information about all hydrogeological disasters (landslides and floods) that occurred in Italy from 2013 to 2022 that caused such severe impacts as to require the declaration of national-level emergencies. The severity and spatiotemporal extension of each emergency are characterized in terms of duration and timing, funds requested by local administrations, funds approved by the national government, and municipalities and provinces hit by the event (further subdivided between those included in the emergency and those not, depending on whether relevant impacts were ascertained). Italian exposure to hydrogeological risk is portrayed strikingly: in the covered period, 123 emergencies affected Italy, all regions were struck at least once, and some provinces were struck more than 10 times. Damage declared by local institutions adds up to EUR 11,000,000,000, while national recovery funds add up to EUR 1,000,000,000. The dataset may foster further research on risk assessment, econometric analysis, public policy support, and decision-making implementation. Moreover, it provides systematic evidence helpful in raising awareness about hydrogeological risks affecting Italy.","PeriodicalId":36824,"journal":{"name":"Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136057759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Data最新文献