首页 > 最新文献

Scientific Data最新文献

英文 中文
A kinematic dataset of locomotion with gait and sit-to-stand movements of young adults. 青壮年步态和坐立运动的运动学数据集。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-09 DOI: 10.1038/s41597-024-04020-6
Simon Hanisch, Loreen Pogrzeba, Evelyn Muschter, Shu-Chen Li, Thorsten Strufe

Kinematic data is a valuable source of movement information that provides insights into the health status, mental state, and motor skills of individuals. Additionally, kinematic data can serve as biometric data, enabling the identification of personal characteristics such as height, weight, and sex. In CeTI-Locomotion, four types of walking tasks and the 5 times sit-to-stand test (5RSTST) were recorded from 50 young adults wearing motion capture (mocap) suits equipped with Inertia-Measurement-Units (IMU). Our dataset is unique in that it allows the study of both intra- and inter-participant variability with high quality kinematic motion data for different motion tasks. Along with the raw kinematic data, we provide the source code for phase segmentation and the processed data, which has been segmented into a total of 4672 individual motion repetitions. To validate the data, we conducted visual inspection as well as machine-learning based identity and action recognition tests, achieving 97% and 84% accuracy, respectively. The data can serve as a normative reference of gait and sit-to-stand movements in healthy young adults and as training data for biometric recognition.

运动学数据是一种宝贵的运动信息来源,可帮助人们深入了解个人的健康状况、精神状态和运动技能。此外,运动学数据还可作为生物识别数据,用于识别个人特征,如身高、体重和性别。在 CeTI-Locomotion 中,我们记录了 50 名年轻成年人的四种步行任务和五次坐立测试(5RSTST),他们都穿上了配备惯性测量单元(IMU)的运动捕捉(mocap)服。我们的数据集具有独特性,因为它可以利用不同运动任务的高质量运动学数据研究参与者内部和参与者之间的变异性。除了原始运动学数据,我们还提供了相位分割的源代码和经过处理的数据,这些数据已被分割成总共 4672 个单独的运动重复。为了验证这些数据,我们进行了视觉检测以及基于机器学习的身份和动作识别测试,准确率分别达到 97% 和 84%。这些数据可作为健康年轻人步态和坐立运动的标准参考,也可作为生物识别的训练数据。
{"title":"A kinematic dataset of locomotion with gait and sit-to-stand movements of young adults.","authors":"Simon Hanisch, Loreen Pogrzeba, Evelyn Muschter, Shu-Chen Li, Thorsten Strufe","doi":"10.1038/s41597-024-04020-6","DOIUrl":"10.1038/s41597-024-04020-6","url":null,"abstract":"<p><p>Kinematic data is a valuable source of movement information that provides insights into the health status, mental state, and motor skills of individuals. Additionally, kinematic data can serve as biometric data, enabling the identification of personal characteristics such as height, weight, and sex. In CeTI-Locomotion, four types of walking tasks and the 5 times sit-to-stand test (5RSTST) were recorded from 50 young adults wearing motion capture (mocap) suits equipped with Inertia-Measurement-Units (IMU). Our dataset is unique in that it allows the study of both intra- and inter-participant variability with high quality kinematic motion data for different motion tasks. Along with the raw kinematic data, we provide the source code for phase segmentation and the processed data, which has been segmented into a total of 4672 individual motion repetitions. To validate the data, we conducted visual inspection as well as machine-learning based identity and action recognition tests, achieving 97% and 84% accuracy, respectively. The data can serve as a normative reference of gait and sit-to-stand movements in healthy young adults and as training data for biometric recognition.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1209"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Dynamics of Creative Movements During the Rehearsal and Performance of "LiveWire". 排练和表演 "LiveWire "时创意动作的神经动力学。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-09 DOI: 10.1038/s41597-024-04010-8
Maxine Annel Pacheco-Ramírez, Mauricio A Ramírez-Moreno, Komal Kukkar, Nishant Rao, Derek Huber, Anthony K Brandt, Andy Noble, Dionne Noble, Bryan Ealey, Jose L Contreras-Vidal

This report contains a description of physiological and motion data, recorded simultaneously and in synchrony using the hyperscanning method from two professional dancers using wireless mobile brain-body imaging (MoBI) technology during rehearsals and public performances of "LiveWire" - a new composition comprised of five choreographed music and dance sections inspired by neuroscience principles. Brain and ocular activity were measured using 28-channel scalp electroencephalography (EEG), and 4-channel electrooculography (EOG), respectively; and head motion was recorded using an inertial measurement unit (IMU) placed on the forehead of each dancer. Video recordings were obtained for each session to allow for tagging of physiological and motion signals and for behavioral analysis. Data recordings were collected from 10 sessions over a 4-month period, in which the dancers rehearsed or performed (in front of an audience) choreographed expressive movements. A detailed explanation of the experimental set-up, the steps carried out for data collection, and an explanation on the usage are provided in this report.

本报告描述了两位专业舞者在排练和公开演出 "LiveWire"(由五个受神经科学原理启发而编排的音乐和舞蹈部分组成的新作品)期间,使用无线移动脑体成像(MoBI)技术,通过超扫描方法同步记录的生理和运动数据。分别使用 28 通道头皮脑电图(EEG)和 4 通道眼电图(EOG)测量大脑和眼部活动;使用放置在每位舞者前额的惯性测量单元(IMU)记录头部运动。每次训练都有视频记录,以便对生理和运动信号进行标记和行为分析。在为期 4 个月的 10 次训练中,舞者们排练或(在观众面前)表演了编排好的富有表现力的动作。本报告详细介绍了实验装置、数据收集步骤和使用方法。
{"title":"Neural Dynamics of Creative Movements During the Rehearsal and Performance of \"LiveWire\".","authors":"Maxine Annel Pacheco-Ramírez, Mauricio A Ramírez-Moreno, Komal Kukkar, Nishant Rao, Derek Huber, Anthony K Brandt, Andy Noble, Dionne Noble, Bryan Ealey, Jose L Contreras-Vidal","doi":"10.1038/s41597-024-04010-8","DOIUrl":"10.1038/s41597-024-04010-8","url":null,"abstract":"<p><p>This report contains a description of physiological and motion data, recorded simultaneously and in synchrony using the hyperscanning method from two professional dancers using wireless mobile brain-body imaging (MoBI) technology during rehearsals and public performances of \"LiveWire\" - a new composition comprised of five choreographed music and dance sections inspired by neuroscience principles. Brain and ocular activity were measured using 28-channel scalp electroencephalography (EEG), and 4-channel electrooculography (EOG), respectively; and head motion was recorded using an inertial measurement unit (IMU) placed on the forehead of each dancer. Video recordings were obtained for each session to allow for tagging of physiological and motion signals and for behavioral analysis. Data recordings were collected from 10 sessions over a 4-month period, in which the dancers rehearsed or performed (in front of an audience) choreographed expressive movements. A detailed explanation of the experimental set-up, the steps carried out for data collection, and an explanation on the usage are provided in this report.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1208"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An OpenStreetMap derived building classification dataset for the United States. OpenStreetMap 导出的美国建筑分类数据集。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-09 DOI: 10.1038/s41597-024-04046-w
Henrique F de Arruda, Sandro M Reia, Shiyang Ruan, Kuldip S Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser

Building classification is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work presents a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We developed a dataset of building types based on building footprints and the available OpenStreetMap information. The dataset is validated using authoritative ground truth data for select counties in the U.S., which shows a high precision for non-residential building classification and a high recall for residential buildings. In addition to the building classifications, this dataset includes detailed information on the OpenStreetMap data used in the classification process. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.

建筑物分类对人口估计、交通规划、城市规划和应急响应应用至关重要。尽管这些数据非常重要,但往往不易获得。为了缓解这一问题,这项工作通过提供覆盖全美的住宅/非住宅建筑分类,提供了一个全面的数据集。我们根据建筑物足迹和可用的 OpenStreetMap 信息开发了一个建筑物类型数据集。该数据集使用美国部分郡县的权威地面实况数据进行了验证,结果显示非住宅建筑分类的精确度较高,而住宅建筑的召回率较高。除建筑分类外,该数据集还包括分类过程中使用的 OpenStreetMap 数据的详细信息。这项工作的一个主要成果是产生了 67 705 475 个建筑物分类数据集。我们希望这些数据对科学界,包括城市和交通规划者有价值。
{"title":"An OpenStreetMap derived building classification dataset for the United States.","authors":"Henrique F de Arruda, Sandro M Reia, Shiyang Ruan, Kuldip S Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser","doi":"10.1038/s41597-024-04046-w","DOIUrl":"10.1038/s41597-024-04046-w","url":null,"abstract":"<p><p>Building classification is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work presents a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We developed a dataset of building types based on building footprints and the available OpenStreetMap information. The dataset is validated using authoritative ground truth data for select counties in the U.S., which shows a high precision for non-residential building classification and a high recall for residential buildings. In addition to the building classifications, this dataset includes detailed information on the OpenStreetMap data used in the classification process. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1210"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemical Profiles of Particulate Matter Emitted from Anthropogenic Sources in Selected Regions of China. 中国部分地区人为源排放颗粒物的化学特征。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-08 DOI: 10.1038/s41597-024-04058-6
Lixin Zheng, Di Wu, Xiu Chen, Yang Li, Anyuan Cheng, Jinrun Yi, Qing Li

Particulate matter (PM) emissions from anthropogenic sources contribute substantially to air pollution. The unequal adverse health effects caused by source-emitted PM emphasize the need to consider the discrepancy of PM-bound chemicals rather than solely focusing on the mass concentration of PM when making air pollution control strategies. Here, we present a dataset about chemical compositions of real-world PM emissions from typical anthropogenic sources in China, including industrial (power, industrial boiler, iron & steel, cement, and other industrial process), residential (coal/biomass burning, and cooking), and transportation sectors (on-road vehicle, ship, and non-exhaust emission). The data was obtained under the same strict quality control condition on field measurements and chemical analysis, minimizing the uncertainty caused by different study approaches. The concentrations of PM-bound chemical components, including toxic elements and PAHs, exhibit substantial discrepancies among different emission sectors. This dataset provides experimental data with informative inputs to emission inventories, air quality simulation models, and health risk estimation. The obtained results can gain insight into understanding on source-specific PMs and tailoring effective control strategies.

人为污染源排放的颗粒物(PM)是造成空气污染的主要原因。源排放的可吸入颗粒物对健康造成的不利影响是不平等的,这就强调了在制定空气污染控制策略时,需要考虑与可吸入颗粒物结合的化学物质的差异,而不是仅仅关注可吸入颗粒物的质量浓度。在此,我们介绍了中国典型人为源的实际可吸入颗粒物排放化学成分数据集,包括工业(电力、工业锅炉、钢铁、水泥和其他工业过程)、居民(燃煤/生物质燃烧和烹饪)和交通部门(道路车辆、船舶和非废气排放)。数据是在同样严格的现场测量和化学分析质量控制条件下获得的,从而最大限度地减少了不同研究方法造成的不确定性。包括有毒元素和多环芳烃在内的可吸入颗粒物化学成分的浓度在不同排放部门之间存在很大差异。该数据集为排放清单、空气质量模拟模型和健康风险评估提供了实验数据和信息输入。所获得的结果可帮助人们深入了解特定来源的可吸入颗粒物,并制定有效的控制策略。
{"title":"Chemical Profiles of Particulate Matter Emitted from Anthropogenic Sources in Selected Regions of China.","authors":"Lixin Zheng, Di Wu, Xiu Chen, Yang Li, Anyuan Cheng, Jinrun Yi, Qing Li","doi":"10.1038/s41597-024-04058-6","DOIUrl":"10.1038/s41597-024-04058-6","url":null,"abstract":"<p><p>Particulate matter (PM) emissions from anthropogenic sources contribute substantially to air pollution. The unequal adverse health effects caused by source-emitted PM emphasize the need to consider the discrepancy of PM-bound chemicals rather than solely focusing on the mass concentration of PM when making air pollution control strategies. Here, we present a dataset about chemical compositions of real-world PM emissions from typical anthropogenic sources in China, including industrial (power, industrial boiler, iron & steel, cement, and other industrial process), residential (coal/biomass burning, and cooking), and transportation sectors (on-road vehicle, ship, and non-exhaust emission). The data was obtained under the same strict quality control condition on field measurements and chemical analysis, minimizing the uncertainty caused by different study approaches. The concentrations of PM-bound chemical components, including toxic elements and PAHs, exhibit substantial discrepancies among different emission sectors. This dataset provides experimental data with informative inputs to emission inventories, air quality simulation models, and health risk estimation. The obtained results can gain insight into understanding on source-specific PMs and tailoring effective control strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1206"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CartoMark: a benchmark dataset for map pattern recognition and map content retrieval with machine intelligence. CartoMark:利用机器智能进行地图模式识别和地图内容检索的基准数据集。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-08 DOI: 10.1038/s41597-024-04057-7
Xiran Zhou, Yi Wen, Zhenfeng Shao, Wenwen Li, Kaiyuan Li, Honghao Li, Xiao Xie, Zhigang Yan

Maps are fundamental medium to visualize and represent the real word in a simple and philosophical way. The emergence of the big data tide has made a proportion of maps generated from multiple sources, significantly enriching the dimensions and perspectives for understanding the characteristics of the real world. However, a majority of these map datasets remain undiscovered, unacquired and ineffectively used, which arises from the lack of numerous well-labelled benchmark datasets, which are of significance to implement the deep learning techniques into identifying complicated map content. To address this issue, we develop a large-scale benchmark dataset involving well-labelled datasets to employ the state-of-the-art machine intelligence technologies for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring. Furthermore, these well-labelled datasets would facilitate map feature detection, map pattern recognition and map content retrieval. We hope our efforts would provide well-labelled data resources for advancing the ability to recognize and discover valuable map content.

地图是以简单而富有哲理的方式直观呈现现实世界的基本媒介。大数据浪潮的出现使得从多种来源生成的地图所占比例大幅增加,极大地丰富了人们了解现实世界特征的维度和视角。然而,这些地图数据集大部分仍未被发现、获取和有效利用,其原因在于缺乏大量标记良好的基准数据集,而这些数据集对于实施深度学习技术识别复杂的地图内容具有重要意义。为了解决这个问题,我们开发了一个大规模基准数据集,其中涉及标记良好的数据集,用于在地图文本注释识别、地图场景分类、地图超分辨率重建和地图风格转移等方面采用最先进的机器智能技术。此外,这些标记良好的数据集将有助于地图特征检测、地图模式识别和地图内容检索。我们希望我们的努力能为提高识别和发现有价值的地图内容的能力提供有良好标签的数据资源。
{"title":"CartoMark: a benchmark dataset for map pattern recognition and map content retrieval with machine intelligence.","authors":"Xiran Zhou, Yi Wen, Zhenfeng Shao, Wenwen Li, Kaiyuan Li, Honghao Li, Xiao Xie, Zhigang Yan","doi":"10.1038/s41597-024-04057-7","DOIUrl":"https://doi.org/10.1038/s41597-024-04057-7","url":null,"abstract":"<p><p>Maps are fundamental medium to visualize and represent the real word in a simple and philosophical way. The emergence of the big data tide has made a proportion of maps generated from multiple sources, significantly enriching the dimensions and perspectives for understanding the characteristics of the real world. However, a majority of these map datasets remain undiscovered, unacquired and ineffectively used, which arises from the lack of numerous well-labelled benchmark datasets, which are of significance to implement the deep learning techniques into identifying complicated map content. To address this issue, we develop a large-scale benchmark dataset involving well-labelled datasets to employ the state-of-the-art machine intelligence technologies for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring. Furthermore, these well-labelled datasets would facilitate map feature detection, map pattern recognition and map content retrieval. We hope our efforts would provide well-labelled data resources for advancing the ability to recognize and discover valuable map content.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1205"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models. 用于训练和验证质谱蛋白质组学机器学习模型的多物种基准。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-08 DOI: 10.1038/s41597-024-04068-4
Bo Wen, William Stafford Noble

Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.

要训练机器学习模型来完成从头测序或光谱聚类等任务,需要大量可靠鉴定的光谱集合。在这里,我们描述了一个包含 280 万个来自 9 个不同物种的高置信度肽谱匹配数据集。该数据集基于之前描述的基准,但经过了重新处理,以确保数据质量的一致性,并强制分离训练肽段和测试肽段。
{"title":"A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.","authors":"Bo Wen, William Stafford Noble","doi":"10.1038/s41597-024-04068-4","DOIUrl":"10.1038/s41597-024-04068-4","url":null,"abstract":"<p><p>Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1207"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States. 美国具有社会网络的大规模地理意义上的合成人口。
IF 8.3 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-07 DOI: 10.1038/s41597-024-03970-1
Na Jiang, Fuzhen Yin, Boyu Wang, Andrew T Crooks

Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.

在地理模拟研究领域,微观模拟和基于代理的建模往往需要创建合成人口。创建此类数据是一项耗时的任务,而且往往缺乏社会网络,而社会网络对于研究人类互动(如疾病传播、灾难响应)至关重要,同时也会影响决策。为了应对这些挑战,我们引入了一种基于 Python 的方法,该方法使用包括 2020 年美国人口普查数据在内的开放数据,为美国 50 个州和华盛顿特区生成大规模现实地理明确的合成人口以及风格化的社交网络(如家庭、工作和学校)。由此产生的合成人口可用于各种地理模拟方法(如基于代理的建模),探索通过人类互动产生的复杂现象,并进一步促进对城市数字双胞胎的研究。
{"title":"A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States.","authors":"Na Jiang, Fuzhen Yin, Boyu Wang, Andrew T Crooks","doi":"10.1038/s41597-024-03970-1","DOIUrl":"10.1038/s41597-024-03970-1","url":null,"abstract":"<p><p>Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1204"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mass spectrometry-based peptidomic dataset of the spermosphere in common bean (Phaseolus vulgaris L.) seeds. 基于质谱技术的蚕豆(Phaseolus vulgaris L.)种子精子层肽组数据集。
IF 8.3 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-07 DOI: 10.1038/s41597-024-04044-y
Chandrodhay Saccaram, Céline Brosse, Boris Collet, Delphine Sourdeval, Tracy François, Benoît Bernay, Massimiliano Corso, Loïc Rajjou

The spermosphere, a dynamic microenvironment surrounding germinating seeds, is shaped by the complex interactions between natural compounds exuded by seeds and seed-associated microbial communities. While peptides exuded by plants are known to influence microbiota diversity, little is known about those specifically exuded by seeds. In this study, we characterised the peptidome profile of the spermosphere for the first time using seeds from eight genotypes of common bean (Phaseolus vulgaris) grown in two contrasting production regions. An untargeted LC-MS/MS peptidomic analysis revealed 3,258 peptides derived from 414 precursor proteins of common bean in the spermosphere. This comprehensive peptidomic dataset provides valuable insights into the characteristics of peptides exuded by common bean seeds in the spermosphere. It can be used to identify peptides with potential antimicrobial or other biological activities, advancing our understanding of the functional roles of seed-exuded peptides in the spermosphere.

精囊层是围绕萌发种子的动态微环境,由种子渗出的天然化合物与种子相关微生物群落之间复杂的相互作用所形成。虽然人们知道植物渗出的肽会影响微生物群的多样性,但对种子渗出的肽却知之甚少。在这项研究中,我们首次利用生长在两个不同产区的八种基因型的蚕豆(Phaseolus vulgaris)种子描述了精子层的肽组特征。一项非靶向的 LC-MS/MS 肽组分析揭示了精子层中来自 414 个芸豆前体蛋白的 3,258 个肽段。这一全面的肽组数据集为了解精囊中普通豆类种子渗出肽的特征提供了宝贵的信息。它可用于鉴定具有潜在抗菌或其他生物活性的多肽,从而加深我们对精子贮藏层中种子渗出多肽功能作用的了解。
{"title":"A mass spectrometry-based peptidomic dataset of the spermosphere in common bean (Phaseolus vulgaris L.) seeds.","authors":"Chandrodhay Saccaram, Céline Brosse, Boris Collet, Delphine Sourdeval, Tracy François, Benoît Bernay, Massimiliano Corso, Loïc Rajjou","doi":"10.1038/s41597-024-04044-y","DOIUrl":"10.1038/s41597-024-04044-y","url":null,"abstract":"<p><p>The spermosphere, a dynamic microenvironment surrounding germinating seeds, is shaped by the complex interactions between natural compounds exuded by seeds and seed-associated microbial communities. While peptides exuded by plants are known to influence microbiota diversity, little is known about those specifically exuded by seeds. In this study, we characterised the peptidome profile of the spermosphere for the first time using seeds from eight genotypes of common bean (Phaseolus vulgaris) grown in two contrasting production regions. An untargeted LC-MS/MS peptidomic analysis revealed 3,258 peptides derived from 414 precursor proteins of common bean in the spermosphere. This comprehensive peptidomic dataset provides valuable insights into the characteristics of peptides exuded by common bean seeds in the spermosphere. It can be used to identify peptides with potential antimicrobial or other biological activities, advancing our understanding of the functional roles of seed-exuded peptides in the spermosphere.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1202"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous single-nucleus RNA sequencing and single-nucleus ATAC sequencing of neuroblastoma cell lines. 同时对神经母细胞瘤细胞系进行单核 RNA 测序和单核 ATAC 测序。
IF 8.3 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-07 DOI: 10.1038/s41597-024-04061-x
Richard A Guyer, Jessica L Mueller, Nicole Picard, Allan M Goldstein

Neuroblastoma is the most common extracranial solid tumor in children, and a leading cause of childhood cancer deaths. All neuroblastomas arise from neural crest-derived sympathetic neuronal progenitors, but numerous mutations, the most common of which is MYCN amplification, give rise to these lesions. Epigenetic aberrations also play a role in oncogenesis and tumor progression. To better understand biologic diversity of neuroblastomas, we performed joint single-nucleus ATAC sequencing and single-nucleus RNA sequencing on six neuroblastoma cell lines, three of which are MYCN amplified. After standard filtering for high-quality nuclei, we obtained chromatin accessibility and transcript abundance data from 41,733 neuroblastoma tumor cells. Preliminary analysis reveals significant diversity in chromatin landscape and gene expression across neuroblastoma cell lines. This dataset is a valuable resource for studying the transcriptional and epigenetic mechanisms of this deadly childhood disease.

神经母细胞瘤是儿童最常见的颅外实体瘤,也是儿童癌症死亡的主要原因。所有神经母细胞瘤都源于神经嵴交感神经元祖细胞,但有许多突变,其中最常见的是 MYCN 扩增,导致了这些病变。表观遗传畸变也在肿瘤发生和发展过程中发挥着作用。为了更好地了解神经母细胞瘤的生物多样性,我们对六种神经母细胞瘤细胞系进行了单核 ATAC 测序和单核 RNA 测序。在对高质量细胞核进行标准过滤后,我们获得了来自 41,733 个神经母细胞瘤肿瘤细胞的染色质可及性和转录本丰度数据。初步分析显示,不同神经母细胞瘤细胞系的染色质景观和基因表达存在显著差异。这个数据集是研究这种致命儿童疾病的转录和表观遗传机制的宝贵资源。
{"title":"Simultaneous single-nucleus RNA sequencing and single-nucleus ATAC sequencing of neuroblastoma cell lines.","authors":"Richard A Guyer, Jessica L Mueller, Nicole Picard, Allan M Goldstein","doi":"10.1038/s41597-024-04061-x","DOIUrl":"10.1038/s41597-024-04061-x","url":null,"abstract":"<p><p>Neuroblastoma is the most common extracranial solid tumor in children, and a leading cause of childhood cancer deaths. All neuroblastomas arise from neural crest-derived sympathetic neuronal progenitors, but numerous mutations, the most common of which is MYCN amplification, give rise to these lesions. Epigenetic aberrations also play a role in oncogenesis and tumor progression. To better understand biologic diversity of neuroblastomas, we performed joint single-nucleus ATAC sequencing and single-nucleus RNA sequencing on six neuroblastoma cell lines, three of which are MYCN amplified. After standard filtering for high-quality nuclei, we obtained chromatin accessibility and transcript abundance data from 41,733 neuroblastoma tumor cells. Preliminary analysis reveals significant diversity in chromatin landscape and gene expression across neuroblastoma cell lines. This dataset is a valuable resource for studying the transcriptional and epigenetic mechanisms of this deadly childhood disease.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1203"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying genomic data use with the Data Citation Explorer. 使用数据引用资源管理器识别基因组数据的使用。
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-11-06 DOI: 10.1038/s41597-024-04049-7
Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan

Increases in sequencing capacity, combined with rapid accumulation of publications and associated data resources, have increased the complexity of maintaining associations between literature and genomic data. As the volume of literature and data have exceeded the capacity of manual curation, automated approaches to maintaining and confirming associations among these resources have become necessary. Here we present the Data Citation Explorer (DCE), which discovers literature incorporating genomic data that was not formally cited. This service provides advantages over manual curation methods including consistent resource coverage, metadata enrichment, documentation of new use cases, and identification of conflicting metadata. The service reduces labor costs associated with manual review, improves the quality of genome metadata maintained by the U.S. Department of Energy Joint Genome Institute (JGI), and increases the number of known publications that incorporate its data products. The DCE facilitates an understanding of JGI impact, improves credit attribution for data generators, and can encourage data sharing by allowing scientists to see how reuse amplifies the impact of their original studies.

测序能力的提高,加上出版物和相关数据资源的快速积累,增加了维护文献和基因组数据之间关联的复杂性。由于文献和数据的数量已经超过了人工整理的能力,因此有必要采用自动化方法来维护和确认这些资源之间的关联。在这里,我们介绍数据引用资源管理器(DCE),它能发现未被正式引用的包含基因组数据的文献。这项服务比人工整理方法更具优势,包括资源覆盖范围一致、元数据丰富、记录新用例以及识别冲突元数据。这项服务降低了人工审核的人力成本,提高了美国能源部联合基因组研究所(JGI)维护的基因组元数据的质量,并增加了采用其数据产品的已知出版物的数量。DCE 有助于了解 JGI 的影响,改善数据生成者的信用归属,并通过让科学家了解重复使用如何扩大其原始研究的影响来鼓励数据共享。
{"title":"Identifying genomic data use with the Data Citation Explorer.","authors":"Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan","doi":"10.1038/s41597-024-04049-7","DOIUrl":"10.1038/s41597-024-04049-7","url":null,"abstract":"<p><p>Increases in sequencing capacity, combined with rapid accumulation of publications and associated data resources, have increased the complexity of maintaining associations between literature and genomic data. As the volume of literature and data have exceeded the capacity of manual curation, automated approaches to maintaining and confirming associations among these resources have become necessary. Here we present the Data Citation Explorer (DCE), which discovers literature incorporating genomic data that was not formally cited. This service provides advantages over manual curation methods including consistent resource coverage, metadata enrichment, documentation of new use cases, and identification of conflicting metadata. The service reduces labor costs associated with manual review, improves the quality of genome metadata maintained by the U.S. Department of Energy Joint Genome Institute (JGI), and increases the number of known publications that incorporate its data products. The DCE facilitates an understanding of JGI impact, improves credit attribution for data generators, and can encourage data sharing by allowing scientists to see how reuse amplifies the impact of their original studies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1200"},"PeriodicalIF":5.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1