Pub Date : 2024-11-09DOI: 10.1038/s41597-024-04020-6
Simon Hanisch, Loreen Pogrzeba, Evelyn Muschter, Shu-Chen Li, Thorsten Strufe
Kinematic data is a valuable source of movement information that provides insights into the health status, mental state, and motor skills of individuals. Additionally, kinematic data can serve as biometric data, enabling the identification of personal characteristics such as height, weight, and sex. In CeTI-Locomotion, four types of walking tasks and the 5 times sit-to-stand test (5RSTST) were recorded from 50 young adults wearing motion capture (mocap) suits equipped with Inertia-Measurement-Units (IMU). Our dataset is unique in that it allows the study of both intra- and inter-participant variability with high quality kinematic motion data for different motion tasks. Along with the raw kinematic data, we provide the source code for phase segmentation and the processed data, which has been segmented into a total of 4672 individual motion repetitions. To validate the data, we conducted visual inspection as well as machine-learning based identity and action recognition tests, achieving 97% and 84% accuracy, respectively. The data can serve as a normative reference of gait and sit-to-stand movements in healthy young adults and as training data for biometric recognition.
{"title":"A kinematic dataset of locomotion with gait and sit-to-stand movements of young adults.","authors":"Simon Hanisch, Loreen Pogrzeba, Evelyn Muschter, Shu-Chen Li, Thorsten Strufe","doi":"10.1038/s41597-024-04020-6","DOIUrl":"10.1038/s41597-024-04020-6","url":null,"abstract":"<p><p>Kinematic data is a valuable source of movement information that provides insights into the health status, mental state, and motor skills of individuals. Additionally, kinematic data can serve as biometric data, enabling the identification of personal characteristics such as height, weight, and sex. In CeTI-Locomotion, four types of walking tasks and the 5 times sit-to-stand test (5RSTST) were recorded from 50 young adults wearing motion capture (mocap) suits equipped with Inertia-Measurement-Units (IMU). Our dataset is unique in that it allows the study of both intra- and inter-participant variability with high quality kinematic motion data for different motion tasks. Along with the raw kinematic data, we provide the source code for phase segmentation and the processed data, which has been segmented into a total of 4672 individual motion repetitions. To validate the data, we conducted visual inspection as well as machine-learning based identity and action recognition tests, achieving 97% and 84% accuracy, respectively. The data can serve as a normative reference of gait and sit-to-stand movements in healthy young adults and as training data for biometric recognition.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1209"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1038/s41597-024-04010-8
Maxine Annel Pacheco-Ramírez, Mauricio A Ramírez-Moreno, Komal Kukkar, Nishant Rao, Derek Huber, Anthony K Brandt, Andy Noble, Dionne Noble, Bryan Ealey, Jose L Contreras-Vidal
This report contains a description of physiological and motion data, recorded simultaneously and in synchrony using the hyperscanning method from two professional dancers using wireless mobile brain-body imaging (MoBI) technology during rehearsals and public performances of "LiveWire" - a new composition comprised of five choreographed music and dance sections inspired by neuroscience principles. Brain and ocular activity were measured using 28-channel scalp electroencephalography (EEG), and 4-channel electrooculography (EOG), respectively; and head motion was recorded using an inertial measurement unit (IMU) placed on the forehead of each dancer. Video recordings were obtained for each session to allow for tagging of physiological and motion signals and for behavioral analysis. Data recordings were collected from 10 sessions over a 4-month period, in which the dancers rehearsed or performed (in front of an audience) choreographed expressive movements. A detailed explanation of the experimental set-up, the steps carried out for data collection, and an explanation on the usage are provided in this report.
{"title":"Neural Dynamics of Creative Movements During the Rehearsal and Performance of \"LiveWire\".","authors":"Maxine Annel Pacheco-Ramírez, Mauricio A Ramírez-Moreno, Komal Kukkar, Nishant Rao, Derek Huber, Anthony K Brandt, Andy Noble, Dionne Noble, Bryan Ealey, Jose L Contreras-Vidal","doi":"10.1038/s41597-024-04010-8","DOIUrl":"10.1038/s41597-024-04010-8","url":null,"abstract":"<p><p>This report contains a description of physiological and motion data, recorded simultaneously and in synchrony using the hyperscanning method from two professional dancers using wireless mobile brain-body imaging (MoBI) technology during rehearsals and public performances of \"LiveWire\" - a new composition comprised of five choreographed music and dance sections inspired by neuroscience principles. Brain and ocular activity were measured using 28-channel scalp electroencephalography (EEG), and 4-channel electrooculography (EOG), respectively; and head motion was recorded using an inertial measurement unit (IMU) placed on the forehead of each dancer. Video recordings were obtained for each session to allow for tagging of physiological and motion signals and for behavioral analysis. Data recordings were collected from 10 sessions over a 4-month period, in which the dancers rehearsed or performed (in front of an audience) choreographed expressive movements. A detailed explanation of the experimental set-up, the steps carried out for data collection, and an explanation on the usage are provided in this report.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1208"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1038/s41597-024-04046-w
Henrique F de Arruda, Sandro M Reia, Shiyang Ruan, Kuldip S Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser
Building classification is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work presents a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We developed a dataset of building types based on building footprints and the available OpenStreetMap information. The dataset is validated using authoritative ground truth data for select counties in the U.S., which shows a high precision for non-residential building classification and a high recall for residential buildings. In addition to the building classifications, this dataset includes detailed information on the OpenStreetMap data used in the classification process. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.
{"title":"An OpenStreetMap derived building classification dataset for the United States.","authors":"Henrique F de Arruda, Sandro M Reia, Shiyang Ruan, Kuldip S Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser","doi":"10.1038/s41597-024-04046-w","DOIUrl":"10.1038/s41597-024-04046-w","url":null,"abstract":"<p><p>Building classification is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work presents a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We developed a dataset of building types based on building footprints and the available OpenStreetMap information. The dataset is validated using authoritative ground truth data for select counties in the U.S., which shows a high precision for non-residential building classification and a high recall for residential buildings. In addition to the building classifications, this dataset includes detailed information on the OpenStreetMap data used in the classification process. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1210"},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1038/s41597-024-04058-6
Lixin Zheng, Di Wu, Xiu Chen, Yang Li, Anyuan Cheng, Jinrun Yi, Qing Li
Particulate matter (PM) emissions from anthropogenic sources contribute substantially to air pollution. The unequal adverse health effects caused by source-emitted PM emphasize the need to consider the discrepancy of PM-bound chemicals rather than solely focusing on the mass concentration of PM when making air pollution control strategies. Here, we present a dataset about chemical compositions of real-world PM emissions from typical anthropogenic sources in China, including industrial (power, industrial boiler, iron & steel, cement, and other industrial process), residential (coal/biomass burning, and cooking), and transportation sectors (on-road vehicle, ship, and non-exhaust emission). The data was obtained under the same strict quality control condition on field measurements and chemical analysis, minimizing the uncertainty caused by different study approaches. The concentrations of PM-bound chemical components, including toxic elements and PAHs, exhibit substantial discrepancies among different emission sectors. This dataset provides experimental data with informative inputs to emission inventories, air quality simulation models, and health risk estimation. The obtained results can gain insight into understanding on source-specific PMs and tailoring effective control strategies.
{"title":"Chemical Profiles of Particulate Matter Emitted from Anthropogenic Sources in Selected Regions of China.","authors":"Lixin Zheng, Di Wu, Xiu Chen, Yang Li, Anyuan Cheng, Jinrun Yi, Qing Li","doi":"10.1038/s41597-024-04058-6","DOIUrl":"10.1038/s41597-024-04058-6","url":null,"abstract":"<p><p>Particulate matter (PM) emissions from anthropogenic sources contribute substantially to air pollution. The unequal adverse health effects caused by source-emitted PM emphasize the need to consider the discrepancy of PM-bound chemicals rather than solely focusing on the mass concentration of PM when making air pollution control strategies. Here, we present a dataset about chemical compositions of real-world PM emissions from typical anthropogenic sources in China, including industrial (power, industrial boiler, iron & steel, cement, and other industrial process), residential (coal/biomass burning, and cooking), and transportation sectors (on-road vehicle, ship, and non-exhaust emission). The data was obtained under the same strict quality control condition on field measurements and chemical analysis, minimizing the uncertainty caused by different study approaches. The concentrations of PM-bound chemical components, including toxic elements and PAHs, exhibit substantial discrepancies among different emission sectors. This dataset provides experimental data with informative inputs to emission inventories, air quality simulation models, and health risk estimation. The obtained results can gain insight into understanding on source-specific PMs and tailoring effective control strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1206"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1038/s41597-024-04057-7
Xiran Zhou, Yi Wen, Zhenfeng Shao, Wenwen Li, Kaiyuan Li, Honghao Li, Xiao Xie, Zhigang Yan
Maps are fundamental medium to visualize and represent the real word in a simple and philosophical way. The emergence of the big data tide has made a proportion of maps generated from multiple sources, significantly enriching the dimensions and perspectives for understanding the characteristics of the real world. However, a majority of these map datasets remain undiscovered, unacquired and ineffectively used, which arises from the lack of numerous well-labelled benchmark datasets, which are of significance to implement the deep learning techniques into identifying complicated map content. To address this issue, we develop a large-scale benchmark dataset involving well-labelled datasets to employ the state-of-the-art machine intelligence technologies for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring. Furthermore, these well-labelled datasets would facilitate map feature detection, map pattern recognition and map content retrieval. We hope our efforts would provide well-labelled data resources for advancing the ability to recognize and discover valuable map content.
{"title":"CartoMark: a benchmark dataset for map pattern recognition and map content retrieval with machine intelligence.","authors":"Xiran Zhou, Yi Wen, Zhenfeng Shao, Wenwen Li, Kaiyuan Li, Honghao Li, Xiao Xie, Zhigang Yan","doi":"10.1038/s41597-024-04057-7","DOIUrl":"https://doi.org/10.1038/s41597-024-04057-7","url":null,"abstract":"<p><p>Maps are fundamental medium to visualize and represent the real word in a simple and philosophical way. The emergence of the big data tide has made a proportion of maps generated from multiple sources, significantly enriching the dimensions and perspectives for understanding the characteristics of the real world. However, a majority of these map datasets remain undiscovered, unacquired and ineffectively used, which arises from the lack of numerous well-labelled benchmark datasets, which are of significance to implement the deep learning techniques into identifying complicated map content. To address this issue, we develop a large-scale benchmark dataset involving well-labelled datasets to employ the state-of-the-art machine intelligence technologies for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring. Furthermore, these well-labelled datasets would facilitate map feature detection, map pattern recognition and map content retrieval. We hope our efforts would provide well-labelled data resources for advancing the ability to recognize and discover valuable map content.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1205"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1038/s41597-024-04068-4
Bo Wen, William Stafford Noble
Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.
{"title":"A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.","authors":"Bo Wen, William Stafford Noble","doi":"10.1038/s41597-024-04068-4","DOIUrl":"10.1038/s41597-024-04068-4","url":null,"abstract":"<p><p>Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1207"},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1038/s41597-024-03970-1
Na Jiang, Fuzhen Yin, Boyu Wang, Andrew T Crooks
Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.
{"title":"A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States.","authors":"Na Jiang, Fuzhen Yin, Boyu Wang, Andrew T Crooks","doi":"10.1038/s41597-024-03970-1","DOIUrl":"10.1038/s41597-024-03970-1","url":null,"abstract":"<p><p>Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1204"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The spermosphere, a dynamic microenvironment surrounding germinating seeds, is shaped by the complex interactions between natural compounds exuded by seeds and seed-associated microbial communities. While peptides exuded by plants are known to influence microbiota diversity, little is known about those specifically exuded by seeds. In this study, we characterised the peptidome profile of the spermosphere for the first time using seeds from eight genotypes of common bean (Phaseolus vulgaris) grown in two contrasting production regions. An untargeted LC-MS/MS peptidomic analysis revealed 3,258 peptides derived from 414 precursor proteins of common bean in the spermosphere. This comprehensive peptidomic dataset provides valuable insights into the characteristics of peptides exuded by common bean seeds in the spermosphere. It can be used to identify peptides with potential antimicrobial or other biological activities, advancing our understanding of the functional roles of seed-exuded peptides in the spermosphere.
{"title":"A mass spectrometry-based peptidomic dataset of the spermosphere in common bean (Phaseolus vulgaris L.) seeds.","authors":"Chandrodhay Saccaram, Céline Brosse, Boris Collet, Delphine Sourdeval, Tracy François, Benoît Bernay, Massimiliano Corso, Loïc Rajjou","doi":"10.1038/s41597-024-04044-y","DOIUrl":"10.1038/s41597-024-04044-y","url":null,"abstract":"<p><p>The spermosphere, a dynamic microenvironment surrounding germinating seeds, is shaped by the complex interactions between natural compounds exuded by seeds and seed-associated microbial communities. While peptides exuded by plants are known to influence microbiota diversity, little is known about those specifically exuded by seeds. In this study, we characterised the peptidome profile of the spermosphere for the first time using seeds from eight genotypes of common bean (Phaseolus vulgaris) grown in two contrasting production regions. An untargeted LC-MS/MS peptidomic analysis revealed 3,258 peptides derived from 414 precursor proteins of common bean in the spermosphere. This comprehensive peptidomic dataset provides valuable insights into the characteristics of peptides exuded by common bean seeds in the spermosphere. It can be used to identify peptides with potential antimicrobial or other biological activities, advancing our understanding of the functional roles of seed-exuded peptides in the spermosphere.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1202"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1038/s41597-024-04061-x
Richard A Guyer, Jessica L Mueller, Nicole Picard, Allan M Goldstein
Neuroblastoma is the most common extracranial solid tumor in children, and a leading cause of childhood cancer deaths. All neuroblastomas arise from neural crest-derived sympathetic neuronal progenitors, but numerous mutations, the most common of which is MYCN amplification, give rise to these lesions. Epigenetic aberrations also play a role in oncogenesis and tumor progression. To better understand biologic diversity of neuroblastomas, we performed joint single-nucleus ATAC sequencing and single-nucleus RNA sequencing on six neuroblastoma cell lines, three of which are MYCN amplified. After standard filtering for high-quality nuclei, we obtained chromatin accessibility and transcript abundance data from 41,733 neuroblastoma tumor cells. Preliminary analysis reveals significant diversity in chromatin landscape and gene expression across neuroblastoma cell lines. This dataset is a valuable resource for studying the transcriptional and epigenetic mechanisms of this deadly childhood disease.
{"title":"Simultaneous single-nucleus RNA sequencing and single-nucleus ATAC sequencing of neuroblastoma cell lines.","authors":"Richard A Guyer, Jessica L Mueller, Nicole Picard, Allan M Goldstein","doi":"10.1038/s41597-024-04061-x","DOIUrl":"10.1038/s41597-024-04061-x","url":null,"abstract":"<p><p>Neuroblastoma is the most common extracranial solid tumor in children, and a leading cause of childhood cancer deaths. All neuroblastomas arise from neural crest-derived sympathetic neuronal progenitors, but numerous mutations, the most common of which is MYCN amplification, give rise to these lesions. Epigenetic aberrations also play a role in oncogenesis and tumor progression. To better understand biologic diversity of neuroblastomas, we performed joint single-nucleus ATAC sequencing and single-nucleus RNA sequencing on six neuroblastoma cell lines, three of which are MYCN amplified. After standard filtering for high-quality nuclei, we obtained chromatin accessibility and transcript abundance data from 41,733 neuroblastoma tumor cells. Preliminary analysis reveals significant diversity in chromatin landscape and gene expression across neuroblastoma cell lines. This dataset is a valuable resource for studying the transcriptional and epigenetic mechanisms of this deadly childhood disease.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1203"},"PeriodicalIF":8.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142606241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1038/s41597-024-04049-7
Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan
Increases in sequencing capacity, combined with rapid accumulation of publications and associated data resources, have increased the complexity of maintaining associations between literature and genomic data. As the volume of literature and data have exceeded the capacity of manual curation, automated approaches to maintaining and confirming associations among these resources have become necessary. Here we present the Data Citation Explorer (DCE), which discovers literature incorporating genomic data that was not formally cited. This service provides advantages over manual curation methods including consistent resource coverage, metadata enrichment, documentation of new use cases, and identification of conflicting metadata. The service reduces labor costs associated with manual review, improves the quality of genome metadata maintained by the U.S. Department of Energy Joint Genome Institute (JGI), and increases the number of known publications that incorporate its data products. The DCE facilitates an understanding of JGI impact, improves credit attribution for data generators, and can encourage data sharing by allowing scientists to see how reuse amplifies the impact of their original studies.
{"title":"Identifying genomic data use with the Data Citation Explorer.","authors":"Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan","doi":"10.1038/s41597-024-04049-7","DOIUrl":"10.1038/s41597-024-04049-7","url":null,"abstract":"<p><p>Increases in sequencing capacity, combined with rapid accumulation of publications and associated data resources, have increased the complexity of maintaining associations between literature and genomic data. As the volume of literature and data have exceeded the capacity of manual curation, automated approaches to maintaining and confirming associations among these resources have become necessary. Here we present the Data Citation Explorer (DCE), which discovers literature incorporating genomic data that was not formally cited. This service provides advantages over manual curation methods including consistent resource coverage, metadata enrichment, documentation of new use cases, and identification of conflicting metadata. The service reduces labor costs associated with manual review, improves the quality of genome metadata maintained by the U.S. Department of Energy Joint Genome Institute (JGI), and increases the number of known publications that incorporate its data products. The DCE facilitates an understanding of JGI impact, improves credit attribution for data generators, and can encourage data sharing by allowing scientists to see how reuse amplifies the impact of their original studies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1200"},"PeriodicalIF":5.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}