首页 > 最新文献

Data最新文献

英文 中文
DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences DataPLAN:基于web的植物科学数据管理计划生成器
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-24 DOI: 10.3390/data8110159
Xiao-Ran Zhou, Sebastian Beier, Dominik Brilhaus, Cristina Martins Rodrigues, Timo Mühlhaus, Dirk von Suchodoletz, Richard M. Twyman, Björn Usadel, Angela Kranz
Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts.
研究数据管理(RDM)结合了一组组织、存储和保存研究项目数据的实践。项目的RDM策略通常形式化为数据管理计划(DMP)——一个列出确保数据可查找性、可访问性、互操作性和可重用性(公平性)的过程的文档。RDM的许多方面都是跨学科标准化的,因此数据和元数据是可重用的,但是植物科学中的ddm组件通常是断开的。由于无法跨项目和资金来源重用特定于工厂的DMP内容,因此需要额外的时间和精力为不同的设置编写独特的DMP。为了解决这个问题,我们开发了dataplan——一个包含预先编写的植物科学DMP内容的开源工具,可以在线或离线使用来准备多个DMP。当前版本的DataPLAN支持地平线2020和地平线欧洲项目,以及由德国研究基金会(DFG)资助的项目。此外,DataPLAN还为用户提供了定制自己模板的选项。为配合其他资助计划,我们会在未来增加其他模板。DataPLAN通过提供针对不同资金背景优化的标准化RDM实践,减少了在植物科学中创建或更新dmp所需的工作量。
{"title":"DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences","authors":"Xiao-Ran Zhou, Sebastian Beier, Dominik Brilhaus, Cristina Martins Rodrigues, Timo Mühlhaus, Dirk von Suchodoletz, Richard M. Twyman, Björn Usadel, Angela Kranz","doi":"10.3390/data8110159","DOIUrl":"https://doi.org/10.3390/data8110159","url":null,"abstract":"Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts.","PeriodicalId":36824,"journal":{"name":"Data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135274085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Panel Regression Modelling for COVID-19 Infections and Deaths in Tamil Nadu, India 印度泰米尔纳德邦COVID-19感染和死亡的面板回归模型
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-23 DOI: 10.3390/data8100158
Rajarathinam Arunachalam
The impacts of the coronavirus disease 2019 (COVID-19) pandemic have been extremely severe, with both economic and health crises experienced worldwide. Based on the panel regression model, this study examined the trends and correlations in the number of COVID-19-related deaths and the number of COVID-19-infected cases in all 37 regions of the Tamil Nadu state in India, in August 2020. The fixed effects model had the greatest R2 value of 78% and exhibited significant results. The slope coefficient was also highly significant, showing a considerable variation in the relationship between new COVID-19 cases and deaths. Additionally, for every unit increase in COVID-19-infected cases, the death rate increased by 0.02%.
2019年冠状病毒病(COVID-19)大流行的影响极其严重,全球经历了经济和健康危机。本研究基于面板回归模型,研究了2020年8月印度泰米尔纳德邦所有37个地区与covid -19相关的死亡人数和covid -19感染病例数的趋势和相关性。固定效应模型的R2值最大,达到78%,结果显著。斜率系数也非常显著,表明新发病例与死亡之间的关系存在相当大的差异。此外,新冠肺炎感染病例每增加一个单位,死亡率就增加0.02%。
{"title":"Panel Regression Modelling for COVID-19 Infections and Deaths in Tamil Nadu, India","authors":"Rajarathinam Arunachalam","doi":"10.3390/data8100158","DOIUrl":"https://doi.org/10.3390/data8100158","url":null,"abstract":"The impacts of the coronavirus disease 2019 (COVID-19) pandemic have been extremely severe, with both economic and health crises experienced worldwide. Based on the panel regression model, this study examined the trends and correlations in the number of COVID-19-related deaths and the number of COVID-19-infected cases in all 37 regions of the Tamil Nadu state in India, in August 2020. The fixed effects model had the greatest R2 value of 78% and exhibited significant results. The slope coefficient was also highly significant, showing a considerable variation in the relationship between new COVID-19 cases and deaths. Additionally, for every unit increase in COVID-19-infected cases, the death rate increased by 0.02%.","PeriodicalId":36824,"journal":{"name":"Data","volume":"36 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135414158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Industrial Environment Multi-Sensor Dataset for Vehicle Indoor Tracking with Wi-Fi, Inertial and Odometry Data 工业环境多传感器数据集,用于车辆室内跟踪与Wi-Fi,惯性和里程计数据
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-23 DOI: 10.3390/data8100157
Ivo Silva , Cristiano Pendão, Joaquín Torres-Sospedra, Adriano Moreira
This paper describes a dataset collected in an industrial setting using a mobile unit resembling an industrial vehicle equipped with several sensors. Wi-Fi interfaces collect signals from available Access Points (APs), while motion sensors collect data regarding the mobile unit’s movement (orientation and displacement). The distinctive features of this dataset include synchronous data collection from multiple sensors, such as Wi-Fi data acquired from multiple interfaces (including a radio map), orientation provided by two low-cost Inertial Measurement Unit (IMU) sensors, and displacement (travelled distance) measured by an absolute encoder attached to the mobile unit’s wheel. Accurate ground-truth information was determined using a computer vision approach that recorded timestamps as the mobile unit passed through reference locations. We assessed the quality of the proposed dataset by applying baseline methods for dead reckoning and Wi-Fi fingerprinting. The average positioning error for simple dead reckoning, without using any other absolute positioning technique, is 8.25 m and 11.66 m for IMU1 and IMU2, respectively. The average positioning error for simple Wi-Fi fingerprinting is 2.19 m when combining the RSSI information from five Wi-Fi interfaces. This dataset contributes to the fields of Industry 4.0 and mobile sensing, providing researchers with a resource to develop, test, and evaluate indoor tracking solutions for industrial vehicles.
本文描述了一个在工业环境中收集的数据集,使用一个类似于配备了几个传感器的工业车辆的移动单元。Wi-Fi接口从可用的接入点(ap)收集信号,而运动传感器收集有关移动设备运动(方向和位移)的数据。该数据集的独特特征包括来自多个传感器的同步数据收集,例如从多个接口(包括无线电地图)获取的Wi-Fi数据,两个低成本惯性测量单元(IMU)传感器提供的方向,以及由移动单元车轮上的绝对编码器测量的位移(行进距离)。使用计算机视觉方法确定准确的地面真实信息,该方法记录移动单元通过参考位置时的时间戳。我们通过应用航位推算和Wi-Fi指纹的基线方法来评估所建议数据集的质量。在不使用任何其他绝对定位技术的情况下,IMU1和IMU2的简单航位推算平均定位误差分别为8.25 m和11.66 m。结合5个Wi-Fi接口的RSSI信息,简单Wi-Fi指纹识别的平均定位误差为2.19 m。该数据集有助于工业4.0和移动传感领域,为研究人员提供开发、测试和评估工业车辆室内跟踪解决方案的资源。
{"title":"Industrial Environment Multi-Sensor Dataset for Vehicle Indoor Tracking with Wi-Fi, Inertial and Odometry Data","authors":"Ivo Silva , Cristiano Pendão, Joaquín Torres-Sospedra, Adriano Moreira","doi":"10.3390/data8100157","DOIUrl":"https://doi.org/10.3390/data8100157","url":null,"abstract":"This paper describes a dataset collected in an industrial setting using a mobile unit resembling an industrial vehicle equipped with several sensors. Wi-Fi interfaces collect signals from available Access Points (APs), while motion sensors collect data regarding the mobile unit’s movement (orientation and displacement). The distinctive features of this dataset include synchronous data collection from multiple sensors, such as Wi-Fi data acquired from multiple interfaces (including a radio map), orientation provided by two low-cost Inertial Measurement Unit (IMU) sensors, and displacement (travelled distance) measured by an absolute encoder attached to the mobile unit’s wheel. Accurate ground-truth information was determined using a computer vision approach that recorded timestamps as the mobile unit passed through reference locations. We assessed the quality of the proposed dataset by applying baseline methods for dead reckoning and Wi-Fi fingerprinting. The average positioning error for simple dead reckoning, without using any other absolute positioning technique, is 8.25 m and 11.66 m for IMU1 and IMU2, respectively. The average positioning error for simple Wi-Fi fingerprinting is 2.19 m when combining the RSSI information from five Wi-Fi interfaces. This dataset contributes to the fields of Industry 4.0 and mobile sensing, providing researchers with a resource to develop, test, and evaluate indoor tracking solutions for industrial vehicles.","PeriodicalId":36824,"journal":{"name":"Data","volume":"39 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135412994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks 用于阿拉伯语NLP任务的新伊斯兰法特瓦斯数据集的数据驱动探索
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-19 DOI: 10.3390/data8100155
Ohoud Alyemny, Hend Al-Khalifa, Abdulrahman Mirza
Islamic content is a broad and diverse domain that encompasses various sources, topics, and perspectives. However, there is a lack of comprehensive and reliable datasets that can facilitate conducting studies on Islamic content. In this paper, we present fatwaset, the first public Arabic dataset of Islamic fatwas. It contains Islamic fatwas that we collected from various trusted and authenticated sources in the Islamic fatwa domain, such as agencies, religious scholars, and websites. Fatwaset is a rich resource as it does not only contain fatwas but also includes a considerable set of their surrounding metadata. It can be used for many natural language processing (NLP) tasks, such as language modeling, question answering, author attribution, topic identification, text classification, and text summarization. It can also support other domains that are related to Islamic culture, such as philosophy and language art. We describe the methodology and criteria we used to select the content, as well as the challenges and limitations we faced. Additionally, we perform an Exploratory Data Analysis (EDA), which investigates the dataset from different perspectives. The results of the EDA reveal important information that greatly benefits researchers in this area.
伊斯兰内容是一个广泛而多样的领域,包括各种来源、主题和观点。然而,缺乏全面可靠的数据集来促进对伊斯兰内容的研究。在本文中,我们提出了fatwaset,第一个公开的阿拉伯语伊斯兰法特瓦数据集。它包含了我们从伊斯兰教法特瓦领域的各种可信和经过认证的来源收集的伊斯兰教法特瓦,例如机构,宗教学者和网站。Fatwaset是一个丰富的资源,因为它不仅包含fatwas,还包括相当多的围绕它们的元数据集。它可以用于许多自然语言处理(NLP)任务,如语言建模、问题回答、作者归属、主题识别、文本分类和文本摘要。它还可以支持与伊斯兰文化相关的其他领域,如哲学和语言艺术。我们描述了我们用来选择内容的方法和标准,以及我们面临的挑战和限制。此外,我们执行探索性数据分析(EDA),从不同的角度调查数据集。EDA的结果揭示了对这一领域的研究人员大有裨益的重要信息。
{"title":"A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks","authors":"Ohoud Alyemny, Hend Al-Khalifa, Abdulrahman Mirza","doi":"10.3390/data8100155","DOIUrl":"https://doi.org/10.3390/data8100155","url":null,"abstract":"Islamic content is a broad and diverse domain that encompasses various sources, topics, and perspectives. However, there is a lack of comprehensive and reliable datasets that can facilitate conducting studies on Islamic content. In this paper, we present fatwaset, the first public Arabic dataset of Islamic fatwas. It contains Islamic fatwas that we collected from various trusted and authenticated sources in the Islamic fatwa domain, such as agencies, religious scholars, and websites. Fatwaset is a rich resource as it does not only contain fatwas but also includes a considerable set of their surrounding metadata. It can be used for many natural language processing (NLP) tasks, such as language modeling, question answering, author attribution, topic identification, text classification, and text summarization. It can also support other domains that are related to Islamic culture, such as philosophy and language art. We describe the methodology and criteria we used to select the content, as well as the challenges and limitations we faced. Additionally, we perform an Exploratory Data Analysis (EDA), which investigates the dataset from different perspectives. The results of the EDA reveal important information that greatly benefits researchers in this area.","PeriodicalId":36824,"journal":{"name":"Data","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135730705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cybersecurity Risk Assessments within Critical Infrastructure Social Networks 关键基础设施社会网络中的网络安全风险评估
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-19 DOI: 10.3390/data8100156
Alimbubi Aktayeva, Yerkhan Makatov, Akku Kubigenova Tulegenovna, Aibek Dautov, Rozamgul Niyazova, Maxud Zhamankarin, Sergey Khan
Cybersecurity social networking is a new scientific and engineering discipline that was interdisciplinary in its early days, but is now transdisciplinary. The issues of reviewing and analyzing of principal tasks related to information collection, monitoring of social networks, assessment methods, and preventing and combating cybersecurity threats are, therefore, essential and pending. There is a need to design certain methods, models, and program complexes aimed at estimating risks related to the cyberspace of social networks and the support of their activities. This study considers a risk to be the combination of consequences of a given event (or incident) with a probable occurrence (likelihood of occurrence) involved, while risk assessment is a general issue of identification, estimation, and evaluation of risk. The findings of the study made it possible to elucidate that the technique of cognitive modeling for risk assessment is part of a comprehensive cybersecurity approach included in the requirements of basic IT standards, including IT security risk management. The study presents a comprehensive approach in the field of cybersecurity in social networks that allows for consideration of all the elements that constitute cybersecurity as a complex, interconnected system. The ultimate goal of this approach to cybersecurity is the organization of an uninterrupted scheme of protection against any impacts related to physical, hardware, software, network, and human objects or resources of the critical infrastructure of social networks, as well as the integration of various levels and means of protection.
网络安全社交网络是一门新兴的科学和工程学科,在早期是跨学科的,但现在是跨学科的。因此,审查和分析与信息收集、社会网络监测、评估方法以及预防和应对网络安全威胁相关的主要任务的问题是必要的和有待解决的。有必要设计某些方法、模型和复杂的程序,旨在评估与社交网络的网络空间及其活动的支持相关的风险。本研究认为风险是给定事件(或事件)的后果与可能发生(发生的可能性)的结合,而风险评估是识别、估计和评估风险的一般问题。研究结果表明,用于风险评估的认知建模技术是包括it安全风险管理在内的基本it标准要求中包含的综合网络安全方法的一部分。该研究提出了一种社交网络网络安全领域的综合方法,允许将构成网络安全的所有要素考虑为一个复杂的、相互关联的系统。这种网络安全方法的最终目标是组织一个不间断的保护方案,以防止与物理、硬件、软件、网络和社交网络关键基础设施的人类对象或资源相关的任何影响,以及各种级别和保护手段的集成。
{"title":"Cybersecurity Risk Assessments within Critical Infrastructure Social Networks","authors":"Alimbubi Aktayeva, Yerkhan Makatov, Akku Kubigenova Tulegenovna, Aibek Dautov, Rozamgul Niyazova, Maxud Zhamankarin, Sergey Khan","doi":"10.3390/data8100156","DOIUrl":"https://doi.org/10.3390/data8100156","url":null,"abstract":"Cybersecurity social networking is a new scientific and engineering discipline that was interdisciplinary in its early days, but is now transdisciplinary. The issues of reviewing and analyzing of principal tasks related to information collection, monitoring of social networks, assessment methods, and preventing and combating cybersecurity threats are, therefore, essential and pending. There is a need to design certain methods, models, and program complexes aimed at estimating risks related to the cyberspace of social networks and the support of their activities. This study considers a risk to be the combination of consequences of a given event (or incident) with a probable occurrence (likelihood of occurrence) involved, while risk assessment is a general issue of identification, estimation, and evaluation of risk. The findings of the study made it possible to elucidate that the technique of cognitive modeling for risk assessment is part of a comprehensive cybersecurity approach included in the requirements of basic IT standards, including IT security risk management. The study presents a comprehensive approach in the field of cybersecurity in social networks that allows for consideration of all the elements that constitute cybersecurity as a complex, interconnected system. The ultimate goal of this approach to cybersecurity is the organization of an uninterrupted scheme of protection against any impacts related to physical, hardware, software, network, and human objects or resources of the critical infrastructure of social networks, as well as the integration of various levels and means of protection.","PeriodicalId":36824,"journal":{"name":"Data","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Dataset of Non-Indigenous and Native Fish of the Volga and Kama Rivers (European Russia) 伏尔加河和卡马河(俄罗斯欧洲部分)非本地和本地鱼类数据集
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-18 DOI: 10.3390/data8100154
Dmitry P. Karabanov, Dmitry D. Pavlov, Yury Y. Dgebuadze, Mikhail I. Bazarov, Elena A. Borovikova, Yuriy V. Gerasimov, Yulia V. Kodukhova, Pavel B. Mikheev, Eduard V. Nikitin, Tatyana L. Opaleva, Yuri A. Severov, Rimma Z. Sabitova, Alexey K. Smirnov, Yury I. Solomatin, Igor A. Stolbunov, Alexander I. Tsvetkov, Stanislav A. Vlasenko, Irina S. Voroshilova, Wenjun Zhong, Xiaowei Zhang, Alexey A. Kotov
Fish in the Volga-Kama River System (the largest river system in Europe) are important as a crucial food source for local populations; fish have the highest trophic level among hydrobionts. The purpose of this research is to describe the diversity of non-indigenous and native fish in the Volga and Kama Rivers, in the European part of Russia. This dataset encompasses data from June 2001 to September 2021 and comprises 1888 records (36,376 individual observations) for littoral and pelagic habitats from 143 sampling sites, representing 52 species from 42 genera in 22 families. The dataset has a Darwin Core standard format and has been fully released in the Global Biodiversity Information Facility (GBIF) under CC-BY 4.0 International license. The data are validated with several international databases such as FishBase, Eschmeyer’s Catalog of Fishes, the Barcode of Life Data System, and the SAS.Planet geoinformations system. Newly established populations have been found for several species belonging to the following Actinopteri families: Alosidae, Anguillidae, Cichlidae, Ehiravidae, Gobiidae, Odontobutidae, Syngnathidae, and Xenocyprididae. Therefore, this dataset can be used in the particular taxon species distribution analysis, which are especially important for non-indigenous species.
伏尔加-卡马河水系(欧洲最大的水系)中的鱼类是当地人口的重要食物来源;鱼类是水生生物中营养水平最高的。本研究的目的是描述俄罗斯欧洲部分伏尔加河和卡马河非本地和本地鱼类的多样性。该数据集涵盖2001年6月至2021年9月的数据,包括来自143个采样点的1888条记录(36,376个单独观察),涉及22科42属52个物种。该数据集具有达尔文核心标准格式,并已在全球生物多样性信息设施(GBIF)中根据CC-BY 4.0国际许可完全发布。数据通过FishBase、Eschmeyer’s Catalog of Fishes、Barcode of Life data System和SAS等多个国际数据库进行验证。行星地理信息系统。新发现放线蝇科放线蝇科、鳗鲡科、鳗鲡科、鳗鲡科、鳗鲡科、鳗鲡科、梭鲈科、梭鲈科和异鲤科放线蝇科若干种。因此,该数据集可用于特定分类单元的物种分布分析,这对非本地物种尤其重要。
{"title":"A Dataset of Non-Indigenous and Native Fish of the Volga and Kama Rivers (European Russia)","authors":"Dmitry P. Karabanov, Dmitry D. Pavlov, Yury Y. Dgebuadze, Mikhail I. Bazarov, Elena A. Borovikova, Yuriy V. Gerasimov, Yulia V. Kodukhova, Pavel B. Mikheev, Eduard V. Nikitin, Tatyana L. Opaleva, Yuri A. Severov, Rimma Z. Sabitova, Alexey K. Smirnov, Yury I. Solomatin, Igor A. Stolbunov, Alexander I. Tsvetkov, Stanislav A. Vlasenko, Irina S. Voroshilova, Wenjun Zhong, Xiaowei Zhang, Alexey A. Kotov","doi":"10.3390/data8100154","DOIUrl":"https://doi.org/10.3390/data8100154","url":null,"abstract":"Fish in the Volga-Kama River System (the largest river system in Europe) are important as a crucial food source for local populations; fish have the highest trophic level among hydrobionts. The purpose of this research is to describe the diversity of non-indigenous and native fish in the Volga and Kama Rivers, in the European part of Russia. This dataset encompasses data from June 2001 to September 2021 and comprises 1888 records (36,376 individual observations) for littoral and pelagic habitats from 143 sampling sites, representing 52 species from 42 genera in 22 families. The dataset has a Darwin Core standard format and has been fully released in the Global Biodiversity Information Facility (GBIF) under CC-BY 4.0 International license. The data are validated with several international databases such as FishBase, Eschmeyer’s Catalog of Fishes, the Barcode of Life Data System, and the SAS.Planet geoinformations system. Newly established populations have been found for several species belonging to the following Actinopteri families: Alosidae, Anguillidae, Cichlidae, Ehiravidae, Gobiidae, Odontobutidae, Syngnathidae, and Xenocyprididae. Therefore, this dataset can be used in the particular taxon species distribution analysis, which are especially important for non-indigenous species.","PeriodicalId":36824,"journal":{"name":"Data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135823911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USC-DCT: A Collection of Diverse Classification Tasks USC-DCT:不同分类任务的集合
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-12 DOI: 10.3390/data8100153
Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti
Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.
机器学习对于学术和现实世界的应用都是至关重要的工具。在这个领域中,分类问题经常被用作首选的展示,这导致了各种各样的数据集被收集并用于无数的应用程序。不幸的是,在如何收集、处理和传播这些数据集方面几乎没有标准化。随着终身学习或元学习等新的学习范式越来越流行,对大规模算法评估合并任务的需求也在增加。本文提供了一种处理和清理数据集的方法,该方法可以应用于现有或新的分类任务,并在称为USC-DCT的各种分类任务集合中实现这些实践。使用从internet收集的107个分类任务构建,该集合提供了一个透明和标准化的管道,可用于许多不同的应用程序和框架。虽然目前有107项任务,但USC-DCT旨在实现未来的增长。额外的讨论解释了机器学习范例中的应用,如迁移、终身学习或元学习,如何处理集合的修订,以及在这种规模下管理和使用分类任务的进一步提示。
{"title":"USC-DCT: A Collection of Diverse Classification Tasks","authors":"Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti","doi":"10.3390/data8100153","DOIUrl":"https://doi.org/10.3390/data8100153","url":null,"abstract":"Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.","PeriodicalId":36824,"journal":{"name":"Data","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136013300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of Contamination (2009–2022) Legacy Contaminants (PCB and DDT) in Zooplankton of Lake Maggiore (CIPAIS, International Commission for the Protection of Italian-Swiss Waters) 马焦雷湖浮游动物遗留污染物(多氯联苯和滴滴涕)污染数据集(2009-2022)(CIPAIS,意大利-瑞士水域国际保护委员会)
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-12 DOI: 10.3390/data8100152
Roberta Bettinetti, Roberta Piscia, Marina Manca, Silvana Galassi, Silvia Quadroni, Carlo Dossi, Rossella Perna, Emanuela Boggio, Ginevra Boldrocchi, Michela Mazzoni, Benedetta Villa
In this paper, we describe a 13-year (2009–2022) dataset of legacy POP concentrations (DDTtot and sumPCB14 from 2016 isomers and congeners concentrations are also reported) in the planktonic crustaceans of Lake Maggiore (≥450 µm size fraction). The data were collected in the framework of a monitoring program finalized to assess the presence of pollutants in the lake biota, including zooplankton organisms directly preyed by fish. The data report both concentration of DDTtot and sumPCB14 in the zooplankton and the standing stock density and biomass of the population in each season. The dataset allows for detecting changes in the concentration over the long term and within a year, thus providing evidence for the seasonal and the plurennial variations in the presence of these pollutants in the lake. They also provide a basis for further studies aimed at modeling paths and the fate of persistent organic pollutants, for which the amount of toxicants stocked in the zooplankton compartment linked to fish is a crucial estimate.
在本文中,我们描述了一个13年(2009-2022)的数据集,该数据集记录了马焦雷湖浮游甲壳类动物(≥450µm粒径分数)中残留的POP浓度(ddttt和sumPCB14从2016年的异构体和同系物浓度也被报道)。这些数据是在一个监测项目的框架内收集的,该项目旨在评估湖泊生物群中污染物的存在,包括被鱼类直接捕食的浮游动物生物。数据报告了各季节浮游动物中ddttt和sumPCB14的浓度以及种群的蓄积量和生物量。该数据集允许检测长期和一年内浓度的变化,从而为湖泊中这些污染物存在的季节性和多年变化提供证据。它们还为旨在模拟持久性有机污染物的路径和命运的进一步研究提供了基础,在这些研究中,储存在与鱼类有关的浮游动物隔间中的有毒物质的数量是一个至关重要的估计。
{"title":"Dataset of Contamination (2009–2022) Legacy Contaminants (PCB and DDT) in Zooplankton of Lake Maggiore (CIPAIS, International Commission for the Protection of Italian-Swiss Waters)","authors":"Roberta Bettinetti, Roberta Piscia, Marina Manca, Silvana Galassi, Silvia Quadroni, Carlo Dossi, Rossella Perna, Emanuela Boggio, Ginevra Boldrocchi, Michela Mazzoni, Benedetta Villa","doi":"10.3390/data8100152","DOIUrl":"https://doi.org/10.3390/data8100152","url":null,"abstract":"In this paper, we describe a 13-year (2009–2022) dataset of legacy POP concentrations (DDTtot and sumPCB14 from 2016 isomers and congeners concentrations are also reported) in the planktonic crustaceans of Lake Maggiore (≥450 µm size fraction). The data were collected in the framework of a monitoring program finalized to assess the presence of pollutants in the lake biota, including zooplankton organisms directly preyed by fish. The data report both concentration of DDTtot and sumPCB14 in the zooplankton and the standing stock density and biomass of the population in each season. The dataset allows for detecting changes in the concentration over the long term and within a year, thus providing evidence for the seasonal and the plurennial variations in the presence of these pollutants in the lake. They also provide a basis for further studies aimed at modeling paths and the fate of persistent organic pollutants, for which the amount of toxicants stocked in the zooplankton compartment linked to fish is a crucial estimate.","PeriodicalId":36824,"journal":{"name":"Data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136012934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking a Decade of Hydrogeological Emergencies in Italian Municipalities 追踪意大利市政当局十年来的水文地质紧急情况
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-11 DOI: 10.3390/data8100151
Alessio Gatto, Stefano Clò, Federico Martellozzo, Samuele Segoni
This dataset collects tabular and geographical information about all hydrogeological disasters (landslides and floods) that occurred in Italy from 2013 to 2022 that caused such severe impacts as to require the declaration of national-level emergencies. The severity and spatiotemporal extension of each emergency are characterized in terms of duration and timing, funds requested by local administrations, funds approved by the national government, and municipalities and provinces hit by the event (further subdivided between those included in the emergency and those not, depending on whether relevant impacts were ascertained). Italian exposure to hydrogeological risk is portrayed strikingly: in the covered period, 123 emergencies affected Italy, all regions were struck at least once, and some provinces were struck more than 10 times. Damage declared by local institutions adds up to EUR 11,000,000,000, while national recovery funds add up to EUR 1,000,000,000. The dataset may foster further research on risk assessment, econometric analysis, public policy support, and decision-making implementation. Moreover, it provides systematic evidence helpful in raising awareness about hydrogeological risks affecting Italy.
该数据集收集了2013年至2022年在意大利发生的所有水文地质灾害(山体滑坡和洪水)的表格和地理信息,这些灾害造成的严重影响需要宣布国家级紧急情况。每次紧急情况的严重程度和时空延伸的特点是:持续时间和时间、地方行政部门要求的资金、国家政府批准的资金以及受该事件影响的市和省(进一步细分为紧急情况和非紧急情况,取决于是否确定了相关影响)。意大利暴露于水文地质风险是惊人的:在覆盖期间,123个紧急情况影响了意大利,所有地区至少被袭击一次,一些省份被袭击了10次以上。当地机构宣布的损失总计达110亿欧元,而国家恢复基金总计达10亿欧元。该数据集可以促进风险评估、计量经济分析、公共政策支持和决策实施方面的进一步研究。此外,它还提供了系统的证据,有助于提高对影响意大利的水文地质风险的认识。
{"title":"Tracking a Decade of Hydrogeological Emergencies in Italian Municipalities","authors":"Alessio Gatto, Stefano Clò, Federico Martellozzo, Samuele Segoni","doi":"10.3390/data8100151","DOIUrl":"https://doi.org/10.3390/data8100151","url":null,"abstract":"This dataset collects tabular and geographical information about all hydrogeological disasters (landslides and floods) that occurred in Italy from 2013 to 2022 that caused such severe impacts as to require the declaration of national-level emergencies. The severity and spatiotemporal extension of each emergency are characterized in terms of duration and timing, funds requested by local administrations, funds approved by the national government, and municipalities and provinces hit by the event (further subdivided between those included in the emergency and those not, depending on whether relevant impacts were ascertained). Italian exposure to hydrogeological risk is portrayed strikingly: in the covered period, 123 emergencies affected Italy, all regions were struck at least once, and some provinces were struck more than 10 times. Damage declared by local institutions adds up to EUR 11,000,000,000, while national recovery funds add up to EUR 1,000,000,000. The dataset may foster further research on risk assessment, econometric analysis, public policy support, and decision-making implementation. Moreover, it provides systematic evidence helpful in raising awareness about hydrogeological risks affecting Italy.","PeriodicalId":36824,"journal":{"name":"Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136057759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Power-Flow Simulations for Integrating Renewable Distributed Generation from Biogas, Photovoltaic, and Small Wind Sources on an Underground Distribution Feeder 在地下配电馈线上集成沼气、光伏和小型风力可再生分布式发电的潮流模拟
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-07 DOI: 10.3390/data8100150
Welson Bassi, Igor Cordeiro, Ildo Luis Sauer
The rapid expansion of distributed generation leads to the integration of an increasing number of energy generation sources. However, integrating these sources into electrical distribution networks presents specific challenges to ensure that the distribution networks can effectively accommodate the associated distributed energy and power. Thus, it is crucial to evaluate the electrical effects of power along the conductors, components, and loads. Power-flow analysis is a well-established numerical methodology for assessing parameters and quantities within power systems during steady-state operation. The University of São Paulo’s Cidade Universitária “Armando de Salles Oliveira” (CUASO) campus in São Paulo, Brazil, features an underground power distribution system. The Institute of Energy and Environment (IEE) leads the integration of several distributed generation (DG) sources, including a biogas plant, photovoltaic installations, and a small wind turbine, into one of the CUASO’s feeders, referred to as “USP-105”. Load-flow simulations were conducted using the PowerWorldTM Simulator v.23, considering the interconnection of these sources. This dataset provides comprehensive information and computational files utilized in the simulations. It serves as a valuable resource for reanalysis, didactic purposes, and the dissemination of technical insights related to DG implementation.
分布式发电的迅速发展导致了越来越多的能源集成化。然而,将这些能源整合到配电网络中,为确保配电网络能够有效地容纳相关的分布式能源和电力,提出了具体的挑战。因此,评估沿导体、元件和负载的电力效应是至关重要的。潮流分析是一种成熟的数值方法,用于评估电力系统在稳态运行时的参数和数量。巴西圣保罗大学Cidade Universitária“Armando de Salles Oliveira”(CUASO)校区位于巴西圣保罗,其特色是地下配电系统。能源与环境研究所(IEE)领导了几个分布式发电(DG)来源的整合,包括沼气厂,光伏装置和小型风力涡轮机,进入CUASO的一个馈线,称为“USP-105”。考虑到这些源的互连,使用PowerWorldTM Simulator v.23进行负载流模拟。该数据集提供了模拟中使用的综合信息和计算文件。它是重新分析、教学目的和传播与DG实施相关的技术见解的宝贵资源。
{"title":"Power-Flow Simulations for Integrating Renewable Distributed Generation from Biogas, Photovoltaic, and Small Wind Sources on an Underground Distribution Feeder","authors":"Welson Bassi, Igor Cordeiro, Ildo Luis Sauer","doi":"10.3390/data8100150","DOIUrl":"https://doi.org/10.3390/data8100150","url":null,"abstract":"The rapid expansion of distributed generation leads to the integration of an increasing number of energy generation sources. However, integrating these sources into electrical distribution networks presents specific challenges to ensure that the distribution networks can effectively accommodate the associated distributed energy and power. Thus, it is crucial to evaluate the electrical effects of power along the conductors, components, and loads. Power-flow analysis is a well-established numerical methodology for assessing parameters and quantities within power systems during steady-state operation. The University of São Paulo’s Cidade Universitária “Armando de Salles Oliveira” (CUASO) campus in São Paulo, Brazil, features an underground power distribution system. The Institute of Energy and Environment (IEE) leads the integration of several distributed generation (DG) sources, including a biogas plant, photovoltaic installations, and a small wind turbine, into one of the CUASO’s feeders, referred to as “USP-105”. Load-flow simulations were conducted using the PowerWorldTM Simulator v.23, considering the interconnection of these sources. This dataset provides comprehensive information and computational files utilized in the simulations. It serves as a valuable resource for reanalysis, didactic purposes, and the dissemination of technical insights related to DG implementation.","PeriodicalId":36824,"journal":{"name":"Data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135252246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1