Background: Data linkage in pharmacoepidemiological research is commonly employed to ascertain exposures and outcomes or to obtain additional information on confounding variables. However, to protect patient confidentiality, unique patient identifiers are not provided, which makes data linkage across multiple sources challenging. The Saudi Real-World Evidence Network (SRWEN) aggregates electronic health records from various hospitals, which may require robust linkage techniques.
Objective: We aimed to evaluate and compare the performance of deterministic, probabilistic, and machine learning (ML) approaches for linking deidentified data of patients with multiple sclerosis (MS) from the SRWEN and Ministry of National Guard Health Affairs electronic health record systems.
Methods: A simulation-based validation framework was applied before linking real-world data sources. Deterministic linkage was based on predefined rules, whereas probabilistic linkage was based on a similarity score-based matching. For ML, both similarity score-based and classification approaches were applied using neural networks, logistic regression, and random forest models. The performance of each approach was assessed using confusion matrices, focusing on sensitivity, positive predictive value, F1 score, and computational efficiency.
Results: The study included linked data of 2247 patients with MS from 2016 to 2023. The deterministic approach resulted in an average F1 score of 97.2% in the simulation and demonstrated varying match rates in real-world linkage: 1046/2247 (46.6%) to 1946/2247 (86.6%). This linkage was computationally efficient, with run times of <1 second per rule. The probabilistic approach provided an average F1 score of 93.9% in the simulation, with real-world match rates ranging from 1472/2247 (65.5%) to 2144/2247 (95.4%) and processing times ranging from approximately 0.1 to 5 seconds per rule. ML approaches achieved high performance (F1 score reached 99.8%) but were computationally expensive. Processing times ranged from approximately 13 to 16,936 seconds for the classification-based approaches and from approximately 13 to 7467 seconds for the similarity score-based approaches. Real-world match rates from ML models were highly variable depending on the method used; the similarity score-based approach identified 789/2247 (35.1%) matched pairs, whereas the classification-based approach identified 2014/2247 (89.6%).
Conclusions: Probabilistic linkage offers high linkage capacity by recovering matches missed by deterministic methods and proved to be both flexible and efficient, particularly in real-world scenarios where unique identifiers are lacking. This method achieved a great balance between recall and precision, enabling better integration of various data sources that could be useful in MS research.
Background: The Cox proportional hazards (CPH) model is a common choice for analyzing time-to-treatment interruptions in patients on antiretroviral therapy (ART), valued for its straightforward interpretability and flexibility in handling time-dependent covariates. Machine learning (ML) models have increasingly been adapted for handling temporal data, with added advantages of handling complex, nonlinear relationships and large datasets, and providing clear practical interpretations.
Objective: This study aims to compare the predictive performance of the traditional CPH model and ML models in predicting treatment interruptions among patients on ART, while also providing both global and individual-level explanations to support personalized, data-driven interventions for improving treatment retention.
Methods: Using data from 621,115 patients who started ART between 2017 and 2023, in Kenya, we compared the performance of the CPH with the following ML models-gradient boosting machine, extreme gradient boosting, regularized generalized linear models (Ridge, Lasso, and Elastic-Net), and recursive partitioning-in predicting first and multiple treatment interruptions. Explainable surrogate technique (model-agnostic) was applied to interpret the best performing model's predictions globally, using variable importance and partial dependence profiles, and at individual level, using breakdown additive, Shapley Additive Explanations, and ceteris paribus.
Results: The recursive partitioning model achieved the best performance with a predictive concordance index score of 0.81 for first treatment interruptions and 0.89 for multiple interruptions, outperforming the CPH model, which scored 0.78 and 0.87 for the same scenarios, respectively. Recursive partitioning's performance can be attributed to its ability to model nonlinear relationships and automatically detect complex interactions. The global model-agnostic explanations aligned closely with the interpretations offered by hazard ratios in the CPH model, while offering additional insights into the impact of specific features on the model's predictions. The breakdown additive and Shapley Additive Explanations explainers demonstrated how different variables contribute to the predicted risk at the individual patient level. The ceteris paribus profiles further explored the time-varying model to illustrate how changes in a patient's covariates over time could impact their predicted risk of treatment interruption.
Conclusions: Our results highlight the superior predictive performance of ML models and their ability to provide patient-specific risk predictions and insights that can support targeted interventions to reduce treatment interruptions in ART care.
Background: Multiple instance learning (MIL) is widely used for slide-level classification in digital pathology without requiring expert annotations. However, even partial expert annotations offer valuable supervision; few studies have effectively leveraged this information within MIL frameworks.
Objective: This study aims to develop and evaluate a ranking-aware MIL framework, called rank induction, that effectively incorporates partial expert annotations to improve slide-level classification performance under realistic annotation constraints.
Methods: We developed rank induction, a MIL approach that incorporates expert annotations using a pairwise rank loss inspired by RankNet. The method encourages the model to assign higher attention scores to annotated regions than to unannotated ones, guiding it to focus on diagnostically relevant patches. We evaluated rank induction on 2 public datasets (Camelyon16 and DigestPath2019) and an in-house dataset (Seegene Medical Foundation-stomach; SMF-stomach) and tested its robustness under 3 real-world conditions: low-data regimes, coarse within-slide annotations, and sparse slide-level annotations.
Results: Rank induction outperformed existing methodologies, achieving an area under the receiver operating characteristic curve (AUROC) of 0.839 on Camelyon16, 0.995 on DigestPath2019, and 0.875 on SMF-stomach. It remained robust under low-data conditions, maintaining an AUROC of 0.761 with only 60.2% (130/216) of the training data. When using coarse annotations (with 2240-pixel padding), performance slightly declined to 0.823. Remarkably, annotating just 20% (18/89) of the slides was enough to reach near-saturated performance (AUROC of 0.806, vs 0.839 with full annotations).
Conclusions: Incorporating expert annotations through ranking-based supervision improves MIL-based classification. Rank induction remains robust even with limited, coarse, or sparsely available annotations, demonstrating its practicality in real-world scenarios.
This research letter summarizes early lessons from 4 enterprise implementations of artificial intelligence-enabled customer relationship management platforms in health care and describes governance practices associated with improvements in affordability, adherence, and access at program level.
Background: Rib fractures are present in 10%-15% of thoracic trauma cases but are often missed on chest radiographs, delaying diagnosis and treatment. Artificial intelligence (AI) may improve detection and triage in emergency settings.
Objective: This study aims to evaluate diagnostic accuracy, processing speed, and technical feasibility of an artificial intelligence-assisted rib fracture detection system using prospectively collected data within a real-world, high-volume emergency department workflow.
Methods: We conducted an observational feasibility study with prospective data collection of a faster region-based convolutional neural network-based AI model deployed in the emergency department to analyze 23,251 real-world chest radiographs (22,946 anteroposterior; 305 oblique) from April 1 to July 2, 2023. This study was approved by the Institutional Review Board of MacKay Memorial Hospital (IRB No. 20MMHIS483e). AI operated passively, without influencing clinical decision-making. The reference standard was the final report issued by board-certified radiologists. A subset of discordant cases underwent post hoc computed tomography review for exploratory analysis.
Results: AI achieved 74.5% sensitivity (95% CI 0.708-0.780), 93.3% specificity (95% CI 0.930-0.937), 24.2% positive predictive value, and 99.2% negative predictive value. Median inference time was 10.6 seconds versus 3.3 hours for radiologist reports (paired Wilcoxon signed-rank test W=112 987.5, P<.001). The analysis revealed peak imaging demand between 08:00 and 16:00 and Thursday-Saturday evenings. A 14-day graphics processing unit outage underscored the importance of infrastructure resilience.
Conclusions: The AI system demonstrated strong technical feasibility for real-time rib fracture detection in a high-volume emergency department setting, with rapid inference and stable performance during prospective deployment. Although the system showed high negative predictive value, the observed false-positive and false-negative rates indicate that it should be considered a supportive screening tool rather than a stand-alone diagnostic solution or a replacement for clinical judgment. These findings support further clinician-in-the-loop studies to evaluate clinical feasibility, workflow integration, and impact on diagnostic decision-making. However, interpretation is limited by reliance on radiology reports as the reference standard and the system's passive, non-interventional deployment.
We used the free artificial intelligence (AI) tool Google NotebookLM, powered by the large language model Gemini 2.0, to construct a medical decision-making aid for diagnosing and managing airway diseases and subsequently evaluated its functionality and performance in a clinical workflow. After feeding this tool with relevant published clinical guidelines for these diseases, we evaluated the feasibility of the system regarding its behavior, ability, and potential, and we created simulated cases and used the system to solve associated medical problems. The test and simulation questions were designed by a pulmonologist, and the appropriateness (focusing on accuracy and completeness) of AI responses was judged by 3 pulmonologists independently. The system was then deployed in an emergency department setting, where it was tested by medical staff (n=20) to assess how it affected the process of clinical consultation. Test opinions were collected through a questionnaire. Most (56/84, 67%) of the specialists' ratings regarding AI responses were above average. The interrater reliability was moderate for accuracy (intraclass correlation coefficient=0.612; P<.001) and good on completeness (intraclass correlation coefficient=0.773; P<.001). When deployed in an emergency department (ED) setting, this system could respond with reasonable answers, enhance the literacy of personnel about these diseases. The potential to save the time spent in consultation did not reach statistical significance (Kolmogorov-Smirnov [K-S] D=0.223, P=.24) across all participants, but it indicated a favorable outcome when we analyzed only physicians' responses. We concluded that this system is customizable, cost efficient, and accessible to clinicians and allied health care professionals without any computer coding experience in treating airway diseases. It provides convincing guideline-based recommendations, increases the staff's medical literacy, and potentially saves physicians' time spent on consultation. This system warrants further evaluation in other medical disciplines and health care environments.
Background: A rapidly aging population led to an increase in the number of patients with chronic diseases and polypharmacy. Although investigations on the appropriate number of drugs for older patients have been conducted, there is a shortage of studies on polypharmacy criteria in older inpatients from multiple institutions.
Objective: The aim of this study was to examine the patterns of polypharmacy and determine the criteria for the number of drugs defining polypharmacy in the geriatric inpatient population.
Methods: Electronic health records of 4 medical institutions for patients aged 65 years and older hospitalized between January 1, 2012, and December 31, 2020, were analyzed for the study. The maximum number of drugs prescribed was obtained for each patient and, along with a literature review, was used to determine the appropriate polypharmacy level for our population.
Results: We suggest a 4-level polypharmacy category system consisting of nonpolypharmacy, polypharmacy, major polypharmacy, and excessive polypharmacy based on a review of international guidelines and polypharmacy literature. Application of this system to our study population showed that the major polypharmacy category (use of 10-19 concurrent drugs) was an appropriate threshold for polypharmacy in hospitalized patients versus the traditional threshold of 5 or more concurrent drugs. The tendency of our study population to have a higher disease and drug count supports this threshold. Frequently prescribed therapeutic subgroups in this category were antibacterials for systemic use, anesthetics, and cardiac therapy.
Conclusions: This study proposes a polypharmacy categorization system for older inpatients, which differs from the common definition of the concomitant prescription of 5 or more drugs. The older population tends to have severe conditions including those requiring major surgeries; therefore, a drug count corresponding to the definition of major polypharmacy is appropriate.

