Pub Date : 2026-03-16DOI: 10.1038/s41597-026-07063-z
Lingbo Liu, Tracy Onega, Erika L Moen, Anna N A Tosteson, Rebecca E Smith, Qianfei Wang, Lauren Cowan, Fahui Wang
Telehealth can reduce travel barriers to oncology, yet its impact depends on both digital connectivity and the geography of care. We present an open, reusable dataset that characterizes two critical components of telehealth infrastructure for cancer care, accessible oncologists and sufficient and affordable internet, at the ZIP Code Tabulation Area level across the United States. The resource integrates population-weighted fixed broadband measures and 5G coverage, internet subscription as an affordability proxy, geocoded oncologist practice sites with full-time-equivalent capacity, and a national origin-destination matrix of road travel times. From these inputs we compute spatial accessibility for in-person care by two-step floating catchment area method (2SFCA) and telehealth-enabled care by two-step virtual catchment area method (2SVCA) at 45-120-minute thresholds. We support transparency by releasing the source and intermediate indicators, the final accessibility scores, and a replicable 2SFCA/2SVCA workflow. Anticipated uses include benchmarking infrastructure across states and metropolitan areas, analyses of disparities by rurality and area deprivation, subsidy simulations, and rapid replication in new diseases or providers contexts.
{"title":"Telehealth Infrastructure for Cancer Care in the United States.","authors":"Lingbo Liu, Tracy Onega, Erika L Moen, Anna N A Tosteson, Rebecca E Smith, Qianfei Wang, Lauren Cowan, Fahui Wang","doi":"10.1038/s41597-026-07063-z","DOIUrl":"https://doi.org/10.1038/s41597-026-07063-z","url":null,"abstract":"<p><p>Telehealth can reduce travel barriers to oncology, yet its impact depends on both digital connectivity and the geography of care. We present an open, reusable dataset that characterizes two critical components of telehealth infrastructure for cancer care, accessible oncologists and sufficient and affordable internet, at the ZIP Code Tabulation Area level across the United States. The resource integrates population-weighted fixed broadband measures and 5G coverage, internet subscription as an affordability proxy, geocoded oncologist practice sites with full-time-equivalent capacity, and a national origin-destination matrix of road travel times. From these inputs we compute spatial accessibility for in-person care by two-step floating catchment area method (2SFCA) and telehealth-enabled care by two-step virtual catchment area method (2SVCA) at 45-120-minute thresholds. We support transparency by releasing the source and intermediate indicators, the final accessibility scores, and a replicable 2SFCA/2SVCA workflow. Anticipated uses include benchmarking infrastructure across states and metropolitan areas, analyses of disparities by rurality and area deprivation, subsidy simulations, and rapid replication in new diseases or providers contexts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147469119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-16DOI: 10.1038/s41597-026-06963-4
Luke Korthals, Ingmar Visser, Šimon Kucharský
Analysis of eye tracking data often requires accurate classification of eye movement events. Human experts and classification algorithms often confuse episodes of fixations (fixating stationary targets) and smooth pursuits (fixating moving targets) because their feature characteristics overlap. To foster the development of better classification algorithms, we created a benchmark data set that does not rely on human annotation as the gold standard. It consists of almost four hours of eye movements. Ten participants fixated different targets designed to induce saccades, fixations, and smooth pursuits. Plausible benchmark labels were established by designing stimuli that prevent fixations and smooth pursuits to co-occur, and separating them from saccades by their velocity. Here we make available both the raw data and offer a convenient way for preprocessing and assigning plausible benchmark labels in the form of a companion package in Python. We encourage researchers to utilize them for feature engineering, and to train, validate, and benchmark their algorithms.
{"title":"Eye movement benchmark data for smooth-pursuit classification.","authors":"Luke Korthals, Ingmar Visser, Šimon Kucharský","doi":"10.1038/s41597-026-06963-4","DOIUrl":"10.1038/s41597-026-06963-4","url":null,"abstract":"<p><p>Analysis of eye tracking data often requires accurate classification of eye movement events. Human experts and classification algorithms often confuse episodes of fixations (fixating stationary targets) and smooth pursuits (fixating moving targets) because their feature characteristics overlap. To foster the development of better classification algorithms, we created a benchmark data set that does not rely on human annotation as the gold standard. It consists of almost four hours of eye movements. Ten participants fixated different targets designed to induce saccades, fixations, and smooth pursuits. Plausible benchmark labels were established by designing stimuli that prevent fixations and smooth pursuits to co-occur, and separating them from saccades by their velocity. Here we make available both the raw data and offer a convenient way for preprocessing and assigning plausible benchmark labels in the form of a companion package in Python. We encourage researchers to utilize them for feature engineering, and to train, validate, and benchmark their algorithms.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"13 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12992799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147469148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oracle bone inscriptions, the earliest known form of Chinese writing, hold immense historical and linguistic significance. However, existing digital datasets are typically limited to isolated characters and lack contextual and structural information essential for comprehensive analysis. We present the Oracle Bone Inscriptions Multi-modal Dataset (OBIMD), a large-scale, publicly available corpus to provide pixel-aligned rubbing and facsimile images, character-level annotations, and sentence-level transcriptions with corresponding reading sequences. OBIMD encompasses 10,077 oracle bone inscription images spanning five phases of the Shang Dynasty, featuring 93,652 annotated characters, 21,667 recorded missing-character positions, 21,941 sentence units, and 4,192 non-sentential elements. By integrating visual, structural, and linguistic modalities, OBIMD supports multi-modal learning and diverse tasks such as facsimile enhancement, character retrieval, and syntactic reconstruction. It constitutes a foundational resource for oracle bone inscription recognition and interpretation, enabling scalable and systematic analysis of ancient Chinese writing.
{"title":"OBIMD: A Multi-modal Dataset for Contextual Interpretation of Oracle Bone Inscriptions.","authors":"Bang Li, Jing Yang, Yujie Liang, Xiaobin Hu, Zengmao Ding, Xu Peng, Shengwei Han, Peichao Qin, Donghao Luo, Taisong Jin, Feng Gao, Yongge Liu, Rongrong Ji","doi":"10.1038/s41597-026-06967-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06967-0","url":null,"abstract":"<p><p>Oracle bone inscriptions, the earliest known form of Chinese writing, hold immense historical and linguistic significance. However, existing digital datasets are typically limited to isolated characters and lack contextual and structural information essential for comprehensive analysis. We present the Oracle Bone Inscriptions Multi-modal Dataset (OBIMD), a large-scale, publicly available corpus to provide pixel-aligned rubbing and facsimile images, character-level annotations, and sentence-level transcriptions with corresponding reading sequences. OBIMD encompasses 10,077 oracle bone inscription images spanning five phases of the Shang Dynasty, featuring 93,652 annotated characters, 21,667 recorded missing-character positions, 21,941 sentence units, and 4,192 non-sentential elements. By integrating visual, structural, and linguistic modalities, OBIMD supports multi-modal learning and diverse tasks such as facsimile enhancement, character retrieval, and syntactic reconstruction. It constitutes a foundational resource for oracle bone inscription recognition and interpretation, enabling scalable and systematic analysis of ancient Chinese writing.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The storage and retrieval of wave spectra for boundary conditions in nested wave modeling are computationally intensive due to the substantial storage requirements of each spectrum. A recently developed method (Jiang et al., 2023) addresses this by representing two-dimensional wave spectra using a set of Reconstruction Parameters (RPs), enabling efficient long-term and large-scale wave spectrum storage. This study presents an RP dataset for the China-adjacent seas, derived from 165,590 grid points at a 1⁄ 12° × 1⁄ 12° resolution and hourly intervals from 2000 to 2024, supporting the reconstruction of spectra with up to six spectral partitions. Validation against independent buoy and satellite observations shows strong agreement with wave parameters derived from the simulated spectra. Moreover, comparative analysis reveals remarkably close consistency between characteristics obtained from the original simulated spectra and their reconstructed counterparts, with the reconstruction accuracy exceeding the inherent uncertainties of the original numerical simulations. Additional nested modeling experiments further affirm the dataset's exceptional utility for wave hindcasting and forecasting applications in the China-adjacent seas.
{"title":"Wave spectrum Reconstruction Parameters for nested wave modeling in the China-adjacent seas from 2000 to 2024.","authors":"Xingjie Jiang, Yongzeng Yang, Xunqiang Yin, Yuxuan Zha","doi":"10.1038/s41597-026-07017-5","DOIUrl":"https://doi.org/10.1038/s41597-026-07017-5","url":null,"abstract":"<p><p>The storage and retrieval of wave spectra for boundary conditions in nested wave modeling are computationally intensive due to the substantial storage requirements of each spectrum. A recently developed method (Jiang et al., 2023) addresses this by representing two-dimensional wave spectra using a set of Reconstruction Parameters (RPs), enabling efficient long-term and large-scale wave spectrum storage. This study presents an RP dataset for the China-adjacent seas, derived from 165,590 grid points at a 1⁄ 12° × 1⁄ 12° resolution and hourly intervals from 2000 to 2024, supporting the reconstruction of spectra with up to six spectral partitions. Validation against independent buoy and satellite observations shows strong agreement with wave parameters derived from the simulated spectra. Moreover, comparative analysis reveals remarkably close consistency between characteristics obtained from the original simulated spectra and their reconstructed counterparts, with the reconstruction accuracy exceeding the inherent uncertainties of the original numerical simulations. Additional nested modeling experiments further affirm the dataset's exceptional utility for wave hindcasting and forecasting applications in the China-adjacent seas.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41597-026-06938-5
Zhe Han, Charlie Budd, Gongyu Zhang, Huanyu Tian, Christos Bergeles, Tom Vercauteren
Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.
{"title":"ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments.","authors":"Zhe Han, Charlie Budd, Gongyu Zhang, Huanyu Tian, Christos Bergeles, Tom Vercauteren","doi":"10.1038/s41597-026-06938-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06938-5","url":null,"abstract":"<p><p>Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41597-026-07041-5
Priyanka Ghosh, Kirti Saluja, Arpan Banerjee
Salient sounds in the environment automatically capture our attention, causing a shift of focus away from ongoing goal-directed tasks. Studies of cognitive flexibility can employ such paradigms to examine how the brain reorients attention to the ongoing goal, an ability notably impaired in neurodevelopmental and clinical populations. The current dataset captures attentional reorientation to real-world distractors, featuring 60 naturalistic salient sounds (e.g., ambulance siren, dog bark) presented during goal-directed auditory discrimination tasks involving pure tones, frequency-modulated sweeps, and speech syllables. Novel behavioral and preprocessed electroencephalography (EEG) open-source data are made available from twenty-seven healthy human volunteers performing goal-directed auditory tasks validated across three spectrotemporally different acoustic contexts, along with all task stimuli files. Behavioral data confirmed that distractors significantly modulated task performance across all three auditory tasks, and EEG spectral analyses demonstrated significant power changes linked to auditory distractors. To support accurate source-level analyses, we also provide all individual-specific structural MRIs (3.0 T), 3D head shape digitization files and computed forward models.
{"title":"A human EEG dataset to study cognitive flexibility during auditory discrimination under real-world distractors.","authors":"Priyanka Ghosh, Kirti Saluja, Arpan Banerjee","doi":"10.1038/s41597-026-07041-5","DOIUrl":"https://doi.org/10.1038/s41597-026-07041-5","url":null,"abstract":"<p><p>Salient sounds in the environment automatically capture our attention, causing a shift of focus away from ongoing goal-directed tasks. Studies of cognitive flexibility can employ such paradigms to examine how the brain reorients attention to the ongoing goal, an ability notably impaired in neurodevelopmental and clinical populations. The current dataset captures attentional reorientation to real-world distractors, featuring 60 naturalistic salient sounds (e.g., ambulance siren, dog bark) presented during goal-directed auditory discrimination tasks involving pure tones, frequency-modulated sweeps, and speech syllables. Novel behavioral and preprocessed electroencephalography (EEG) open-source data are made available from twenty-seven healthy human volunteers performing goal-directed auditory tasks validated across three spectrotemporally different acoustic contexts, along with all task stimuli files. Behavioral data confirmed that distractors significantly modulated task performance across all three auditory tasks, and EEG spectral analyses demonstrated significant power changes linked to auditory distractors. To support accurate source-level analyses, we also provide all individual-specific structural MRIs (3.0 T), 3D head shape digitization files and computed forward models.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The acidity and buffering capacity of inland waters are essential for biogeochemical processes and impose significant constraints on the distribution of freshwater species. Although many measurements exist worldwide, the data distribution is biased toward more-studied regions, and a global assessment of gradients and their spatial distribution is lacking. In the PHALK dataset, we compile alkalinity and pH values for continental surface waters worldwide, collating chemical data from 18 source databases and 55 scientific publications. A quality-control filter yielded high-quality alkalinity and pH datasets, including 50,916 and 107,896 sites, respectively. Based on the collated dataset and a random forest model, pH and alkalinity in surface waters were modeled worldwide at the basin scale (HydroBASINS v1 sub-basin level 12: 1,034,083 drainage basins) using 23 variables describing basin geological and hydrological characteristics. Each extrapolated value is accompanied by two uncertainty indicators: environmental differentiation, based on the similarity of the basin's environmental conditions to those of basins with measured data, and upscaling confidence, based on the variation in the random forest's internal bootstrap.
{"title":"Global basin-scale mapping of pH and alkalinity in inland waters.","authors":"Meritxell Batalla, Jordi Martínez-Artero, Jordi Catalan","doi":"10.1038/s41597-026-07028-2","DOIUrl":"https://doi.org/10.1038/s41597-026-07028-2","url":null,"abstract":"<p><p>The acidity and buffering capacity of inland waters are essential for biogeochemical processes and impose significant constraints on the distribution of freshwater species. Although many measurements exist worldwide, the data distribution is biased toward more-studied regions, and a global assessment of gradients and their spatial distribution is lacking. In the PHALK dataset, we compile alkalinity and pH values for continental surface waters worldwide, collating chemical data from 18 source databases and 55 scientific publications. A quality-control filter yielded high-quality alkalinity and pH datasets, including 50,916 and 107,896 sites, respectively. Based on the collated dataset and a random forest model, pH and alkalinity in surface waters were modeled worldwide at the basin scale (HydroBASINS v1 sub-basin level 12: 1,034,083 drainage basins) using 23 variables describing basin geological and hydrological characteristics. Each extrapolated value is accompanied by two uncertainty indicators: environmental differentiation, based on the similarity of the basin's environmental conditions to those of basins with measured data, and upscaling confidence, based on the variation in the random forest's internal bootstrap.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-13DOI: 10.1038/s41597-026-06703-8
Benjamin Tilbury, Miguel Arevalillo-Herráez, Naeem Ramzan
We present a new dataset comprising radar, Electrocardiography (ECG), respiration, and inertial measurement signal recordings from 23 individuals while performing a series of simulated harmful behaviors. This dataset covers a range of actions across various levels of agitation and is especially well-suited for conducting research in health monitoring within high-risk clinical settings, such as inpatient psychiatric units. The dataset's design prioritizes unrestricted, naturalistic behavior capture, providing valuable insights into real-world scenarios and supporting a wide range of applications. Although the dataset was initially designed for patient monitoring, the provided ECG and respiration recording extend the potential uses of the data to localization and non-contact vital sign measurement.
{"title":"A multimodal dataset of harmful simulated behaviours in high-risk clinical settings using radar.","authors":"Benjamin Tilbury, Miguel Arevalillo-Herráez, Naeem Ramzan","doi":"10.1038/s41597-026-06703-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06703-8","url":null,"abstract":"<p><p>We present a new dataset comprising radar, Electrocardiography (ECG), respiration, and inertial measurement signal recordings from 23 individuals while performing a series of simulated harmful behaviors. This dataset covers a range of actions across various levels of agitation and is especially well-suited for conducting research in health monitoring within high-risk clinical settings, such as inpatient psychiatric units. The dataset's design prioritizes unrestricted, naturalistic behavior capture, providing valuable insights into real-world scenarios and supporting a wide range of applications. Although the dataset was initially designed for patient monitoring, the provided ECG and respiration recording extend the potential uses of the data to localization and non-contact vital sign measurement.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-13DOI: 10.1038/s41597-026-07031-7
Chong Li, Ke Lu, Li-Wen Su, Peng Zhou, Guo-Ji Lin, Jia-Qi Liang, Ya-Qin Gong, Jian Jin, Wen-Rong Xu
Osteoporotic fractures (OPF) and hypertension frequently co-occur in older adults, yet comprehensive datasets integrating clinical, pharmacological, and longitudinal outcome data remain scarce. We describe a longitudinal dataset derived from the Osteoporotic Fracture Registration System at Kunshan Hospital, Jiangsu University, including patients aged ≥ 50 years hospitalized for OPF between 2017 and 2024. A total of 4,782 patients were initially registered. After applying predefined eligibility criteria, 4,325 patients were included in the final analytical cohort. The dataset integrates demographic, clinical, and pharmacologic variables with long-term outcomes on mortality and refracture through deterministic linkage with regional health and mortality registries. Longitudinal antihypertensive prescription records (n = 42,367) were linked via the Kunshan Municipal Health Data Integration Platform, enabling detailed characterization of medication exposure patterns over time. Technical validation, including survival analysis, propensity score methods, and risk prediction modeling, was conducted to assess internal consistency and illustrate potential applications. This structured and de-identified dataset provides a quality-checked resource to support future research in osteoporosis, cardiovascular comorbidity, multimorbidity, and real-world comparative effectiveness studies.
{"title":"A longitudinal dataset of hypertensive osteoporotic fracture patients: treatments and long-term outcomes.","authors":"Chong Li, Ke Lu, Li-Wen Su, Peng Zhou, Guo-Ji Lin, Jia-Qi Liang, Ya-Qin Gong, Jian Jin, Wen-Rong Xu","doi":"10.1038/s41597-026-07031-7","DOIUrl":"https://doi.org/10.1038/s41597-026-07031-7","url":null,"abstract":"<p><p>Osteoporotic fractures (OPF) and hypertension frequently co-occur in older adults, yet comprehensive datasets integrating clinical, pharmacological, and longitudinal outcome data remain scarce. We describe a longitudinal dataset derived from the Osteoporotic Fracture Registration System at Kunshan Hospital, Jiangsu University, including patients aged ≥ 50 years hospitalized for OPF between 2017 and 2024. A total of 4,782 patients were initially registered. After applying predefined eligibility criteria, 4,325 patients were included in the final analytical cohort. The dataset integrates demographic, clinical, and pharmacologic variables with long-term outcomes on mortality and refracture through deterministic linkage with regional health and mortality registries. Longitudinal antihypertensive prescription records (n = 42,367) were linked via the Kunshan Municipal Health Data Integration Platform, enabling detailed characterization of medication exposure patterns over time. Technical validation, including survival analysis, propensity score methods, and risk prediction modeling, was conducted to assess internal consistency and illustrate potential applications. This structured and de-identified dataset provides a quality-checked resource to support future research in osteoporosis, cardiovascular comorbidity, multimorbidity, and real-world comparative effectiveness studies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-13DOI: 10.1038/s41597-026-06954-5
P D Madan Kumar, K Ranganathan, C Lavanya, S Rajeshwari, Anwesh Nayak, Ramesh Kestur, Raghuram Bharadwaj Diddigi, Sushree S Behera
This study introduces a SMARTphone-based, expert annotated dataset of Oral Mucosa images (SMART-OM), collected to facilitate the development of Artificial Intelligence and Machine Learning (AI/ML) technologies for automated diagnosis of Oral Cancer (OC) and Oral Potentially Malignant Disorders (OPMD). The dataset consists of 2,469 images from 331 subjects from four distinct classes: healthy/normal, variations from normal, OPMD, and OC. The images are captured using Android and iOS smartphone cameras under real-world clinical conditions in visible light. Each image is annotated by expert dental surgeons using the open-source VGG image annotator. Elaborate patient metadata, including clinical diagnosis, age, sex, and lifestyle-based risk indicators such as smoking, smokeless tobacco usage, alcohol consumption, and areca nut chewing, are recorded via a customized Jotform. The data collection and handling procedures are adhered to the ethical guidelines outlined in the Declaration of Helsinki and its amendments for research involving human subjects, with informed consent obtained from each subject. The SMART-OM dataset is intended to advance research and development of AI/ML algorithms for automated oral lesion detection.
{"title":"A Smartphone-based Comprehensive Dataset of Annotated Oral Cavity Images for Enhanced Oral Disease Diagnosis.","authors":"P D Madan Kumar, K Ranganathan, C Lavanya, S Rajeshwari, Anwesh Nayak, Ramesh Kestur, Raghuram Bharadwaj Diddigi, Sushree S Behera","doi":"10.1038/s41597-026-06954-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06954-5","url":null,"abstract":"<p><p>This study introduces a SMARTphone-based, expert annotated dataset of Oral Mucosa images (SMART-OM), collected to facilitate the development of Artificial Intelligence and Machine Learning (AI/ML) technologies for automated diagnosis of Oral Cancer (OC) and Oral Potentially Malignant Disorders (OPMD). The dataset consists of 2,469 images from 331 subjects from four distinct classes: healthy/normal, variations from normal, OPMD, and OC. The images are captured using Android and iOS smartphone cameras under real-world clinical conditions in visible light. Each image is annotated by expert dental surgeons using the open-source VGG image annotator. Elaborate patient metadata, including clinical diagnosis, age, sex, and lifestyle-based risk indicators such as smoking, smokeless tobacco usage, alcohol consumption, and areca nut chewing, are recorded via a customized Jotform. The data collection and handling procedures are adhered to the ethical guidelines outlined in the Declaration of Helsinki and its amendments for research involving human subjects, with informed consent obtained from each subject. The SMART-OM dataset is intended to advance research and development of AI/ML algorithms for automated oral lesion detection.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147459650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}