Background: As data-driven medical research advances, vast amounts of medical data are being collected, giving researchers access to important information. However, issues such as heterogeneity, complexity, and incompleteness of datasets limit their practical use. Errors and missing data negatively affect artificial intelligence-based predictive models, undermining the reliability of clinical decision-making. Thus, it is important to develop a quality management process (QMP) for clinical data.
Objective: This study aimed to develop a rules-based QMP to address errors and impute missing values in real-world data, establishing high-quality data for clinical research.
Methods: We used clinical data from 6491 patients with colorectal cancer (CRC) collected at Gachon University Gil Medical Center between 2010 and 2022, leveraging the clinical library established within the Korea Clinical Data Use Network for Research Excellence. First, we conducted a literature review on the prognostic prediction of CRC to assess whether the data met our research purposes, comparing selected variables with real-world data. A labeling process was then implemented to extract key variables, which facilitated the creation of an automatic staging library. This library, combined with a rule-based process, allowed for systematic analysis and evaluation.
Results: Theoretically, the tumor, node, metastasis (TNM) stage was identified as an important prognostic factor for CRC, but it was not selected through feature selection in real-world data. After applying the QMP, rates of missing data were reduced from 75.3% to 35.7% for TNM and from 24.3% to 18.5% for surveillance, epidemiology, and end results across 6491 cases, confirming the system's effectiveness. Variable importance analysis through feature selection revealed that TNM stage and detailed code variables, which were previously unselected, were included in the improved model.
Conclusions: In sum, we developed a rules-based QMP to address errors and impute missing values in Korea Clinical Data Use Network for Research Excellence data, enhancing data quality. The applicability of the process to real-world datasets highlights its potential for broader use in clinical studies and cancer research.
Background: Integrated health data are foundational for secondary use, research, and policymaking. However, data quality issues-such as missing values and inconsistencies-are common due to the heterogeneity of health data sources. Existing frameworks often use static, 1-time assessments, which limit their ability to address quality issues across evolving data pipelines.
Objective: This study evaluates the AIDAVA (artificial intelligence-powered data curation and validation) data quality framework, which introduces dynamic, life cycle-based validation of health data using knowledge graph technologies and SHACL (Shapes Constraint Language)-based rules. The framework is assessed for its ability to detect and manage data quality issues-specifically, completeness and consistency-during integration.
Methods: Using the MIMIC-III (Medical Information Mart for Intensive Care-III) dataset, we simulated real-world data quality challenges by introducing structured noise, including missing values and logical inconsistencies. The data was transformed into source knowledge graphs and integrated into a unified personal health knowledge graph. SHACL validation rules were applied iteratively during the integration process, and data quality was assessed under varying noise levels and integration orders.
Results: The AIDAVA framework effectively detected completeness and consistency issues across all scenarios. Completeness was shown to influence the interpretability of consistency scores, and domain-specific attributes (eg, diagnoses and procedures) were more sensitive to integration order and data gaps.
Conclusions: AIDAVA supports dynamic, rule-based validation throughout the data life cycle. By addressing both dimension-specific vulnerabilities and cross-dimensional effects, it lays the groundwork for scalable, high-quality health data integration. Future work should explore deployment in live clinical settings and expand to additional quality dimensions.
Background: Although an increasing number of bedside medical devices are equipped with wireless connections for reliable notifications, many nonnetworked devices remain effective at detecting abnormal patient conditions and alerting medical staff through auditory alarms. Staff members, however, can miss these notifications, especially when in distant areas or other private rooms. In contrast, the signal-to-noise ratio of alarm systems for medical devices in the neonatal intensive care unit is 0 dB or higher. A feasible system for automatic sound identification with high accuracy is needed to prevent alarm sounds from being missed by the staff.
Objective: The purpose of this study was to design a method for classifying multiple alarm sounds collected with a monaural microphone in a noisy environment.
Methods: Features of 7 alarm sounds were extracted using a mel filter bank and incorporated into a classifier using convolutional and recurrent neural networks. To estimate its clinical usefulness, the classifier was evaluated with mixtures of up to 7 alarm sounds and hospital ward noise.
Results: The proposed convolutional recurrent neural network model was evaluated using a simulation dataset of 7 alarm sounds mixed with hospital ward noise. At a signal-to-noise ratio of 0 dB, the best-performing model (convolutional neural network 3+bidirectional gate recurrent unit) achieved an event-based F1-score of 0.967, with a precision of 0.944 and a recall of 0.991. When the venous foot pump class was excluded, the classwise recall of the classifier ranged from 0.990 to 1.000.
Conclusions: The proposed classifier was found to be highly accurate in detecting alarm sounds. Although the performance of the proposed classifier in a clinical environment can be improved, the classifier could be incorporated into an alarm sound detection system. The classifier, combined with network connectivity, could improve the notification of abnormal status detected by unconnected medical devices.
Background: Predicting serious hematological adverse events (SHAEs) from poly (adenosine diphosphate ribose) polymerase inhibitors (PARPis) would allow us to prioritize patients with ovarian cancer at higher risk for more intensive care, ultimately lowering morbidity and preventing them from premature termination of medication.
Objective: This study aimed to explore the risk factors for SHAEs in patients with ovarian cancer receiving PARPi treatment and develop a risk prediction model for such events.
Methods: Prospective clinical data were collected on patients with ovarian cancer who received PARPi treatment at the Guangxi Medical University Affiliated Tumor Hospital from December 2018 to August 2024. They were divided into a SHAE group and a no-SHAE group based on the occurrence of SHAEs. Variable differences were screened using the chi-square test or Fisher exact test. Multivariate logistic regression was used to determine independent factors influencing SHAEs in patients with ovarian cancer. A predictive model for serious blood-related complications in ovarian cancer treatment was developed from identified independent risk factors using the R software. The model's clinical utility was assessed through decision curve analysis (net benefit), calibration (calibration curve), and discrimination (receiver operating characteristic curve).
Results: A total of 70 patients with ovarian cancer receiving PARPi treatment were included in this study. Of these 70 patients, 16 (23%) experienced SHAEs, with decreases in red blood cell (RBC) count and hemoglobin levels being the most common. Multiple logistic regression analysis identified 4 independent predictors of PARPi-associated SHAEs in patients with ovarian cancer: lymph node metastasis (odds ratio [OR] 6.733, 95% CI 1.197-37.873; P=.03), creatinine clearance rate of ≤60 mL per minute (OR 23.722, 95% CI 3.121-180.303; P=.002), RBC count of ≤3.3×1012 per liter (OR 4.847, 95% CI 1.020-23.041; P=.047), and combination therapy with vascular endothelial growth factor inhibitors (OR 6.749, 95% CI 1.313-34.689; P=.02). The internal validation yielded an area under the curve of 0.874 (95% CI 0.793-0.955), indicating moderate clinical utility and accuracy for the risk prediction model incorporating these predictors.
Conclusions: Lymph node metastasis, creatinine clearance rate of ≤60 mL per minute, RBC count of ≤3.3×1012 per liter, and combination therapy with vascular endothelial growth factor inhibitors are independent risk factors for PARPi SHAEs in patients with ovarian cancer. The risk prediction model established based on these factors demonstrated moderate predictive value.
Background: Delayed extubation after general anesthesia increases complications and can lead to longer hospital stays and higher mortality. Current risk assessments often rely on subjective judgment or simple tools, whereas machine learning offers potential for real-time evaluation, though research is limited and typically uses single-algorithm models.
Objective: The aims of this study were to identify risk factors for delayed extubation after general anesthesia in the sample and to construct a risk prediction model for delayed extubation in this population.
Methods: Data from 4779 patients admitted to the postanesthesia care unit between September 2023 and May 2024 were used to develop prediction models for delayed extubation using k-nearest neighbor, decision tree, extreme gradient boosting, random forest, a light gradient boosting machine, and an artificial neural network. Model performance was assessed by calculating the area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, F1-score, and Brier score. Calibration performance was evaluated using calibration curves generated with 100-bin quantile calibration and Loess smoothing to provide bias-corrected and smoothed visual assessment. Additionally, the Hosmer-Lemeshow goodness-of-fit test was performed to quantitatively evaluate calibration, with P values >.05 indicating good calibration.
Results: Among the 6 models evaluated, the extreme gradient boosting model demonstrated the best performance, with an area under the receiver operating characteristic curve of 0.750 (95% CI 0.703-0.796), a sensitivity of 0.734 (95% CI 0.635-0.827), and a specificity of 0.647 (95% CI 0.623-0.673). The model calibration was acceptable, with a Brier score of 0.0505 and a nonsignificant Hosmer-Lemeshow goodness-of-fit test (χ²6=7.3; P=.287), indicating good calibration. Shapley additive explanations were used to rank feature importance.
Conclusions: These machine learning models enable early identification of delayed extubation risk, supporting personalized clinical decisions and optimizing postanesthesia care unit resource allocation.
Background: The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.
Objective: Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.
Methods: To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM-based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).
Results: In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness-where standardized measures may not fully reflect the patient's lived experience.
Conclusions: Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.
Background: Extracting genetic phenotype mentions from clinical reports and normalizing them to standardized concepts within the human phenotype ontology are essential for consistent interpretation and representation of genetic conditions. This is particularly important in fields such as dysmorphology and plays a key role in advancing personalized health care. However, modern clinical named entity recognition methods face challenges in accurately identifying discontinuous mentions (ie, entity spans that are interrupted by unrelated words), which can be found in these clinical reports.
Objective: This study aims to develop a system that can accurately extract and normalize genetic phenotypes, specifically from physical examination reports related to dysmorphology assessment. These mentions appear in both continuous and discontinuous lexical forms, with a focus on addressing challenging discontinuous entity spans.
Methods: We introduce DiscHPO, a 2-phase pipeline consisting of a sequence-to-sequence named entity recognition model for span extraction, and an entity normalizer that uses a sentence transformer biencoder for candidate generation and a cross-encoder reranker for selecting the best candidate as the normalized concept. This system was tested as part of our participation in Track 3 of the BioCreative VIII shared task.
Results: For overall performance on the test set, the top-performing model for entity normalization achieved an F1-score of 0.723, while the best span extraction model reached an F1-score of 0.665. Both scores surpassed those of 2 baseline models using the same dataset, indicating superior efficacy in handling both continuous and discontinuous spans. On the validation set, we were able to demonstrate our system's ability to recognize these mentions, with the model achieving an F1-score of 0.631 for exact match on discontinuous spans only.
Conclusions: The findings suggest that exact extraction of entity spans may not always be necessary for successful normalization. Partial mention matches can be sufficient as long as they capture the essential concept information, supporting the system's utility in clinical downstream tasks.

