Background: Diagnostic errors and administrative burdens, including medical coding, remain major challenges in health care. Large language models (LLMs) have the potential to alleviate these problems, but their adoption has been limited by concerns regarding reliability, transparency, and clinical safety.
Objective: This study introduces and evaluates 2 LLM-based frameworks, implemented within the Rhazes Clinician platform, designed to address these challenges: generation-assisted retrieval-augmented generation (GARAG) for automated evidence-based treatment planning and generation-assisted vector search (GAVS) for automated medical coding.
Methods: GARAG was evaluated on 21 clinical test cases created by medically qualified authors. Each case was executed 3 times independently, and outputs were assessed using 4 criteria: correctness of references, absence of duplication, adherence to formatting, and clinical appropriateness of the generated management plan. GAVS was evaluated on 958 randomly selected admissions from the Medical Information Mart for Intensive Care (MIMIC)-IV database, in which billed International Classification of Diseases, Tenth Revision (ICD-10) codes served as the ground truth. Two approaches were compared: a direct GPT-4.1 baseline prompted to predict ICD-10 codes without constraints and GAVS, in which GPT-4.1 generated diagnostic entities that were each mapped onto the top 10 matching ICD-10 codes through vector search.
Results: Across the 63 outputs, 62 (98.4%) satisfied all evaluation criteria, with the only exception being a minor ordering inconsistency in one repetition of case 14. For GAVS, the 958 admissions contained 8576 assigned ICD-10 subcategory codes (1610 unique). The vanilla LLM produced 131,329 candidate codes, whereas GAVS produced 136,920. At the subcategory level, the vanilla LLM achieved 17.95% average recall (15.86% weighted), while GAVS achieved 20.63% (18.62% weighted), a statistically significant improvement (P<.001). At the category level, performance converged (32.60% vs 32.58% average weighted recall; P=.99).
Conclusions: GARAG demonstrated a workflow that grounds management plans in diagnosis-specific, peer-reviewed guideline evidence, preserving fine-grained clinical detail during retrieval. GAVS significantly improved fine-grained diagnostic coding recall compared with a direct LLM baseline. Together, these frameworks illustrate how LLM-based methods can enhance clinical decision support and medical coding. Both were subsequently integrated into Rhazes Clinician, a clinician-facing web application that orchestrates LLM agents to call specialized tools, providing a single interface for physician use. Further independent validation and large-scale studies are required to confirm generalizability and assess their impact on patient outcomes.
Background: Adapting physical activity monitors to detect gait events (ie, at initial and final contact) has the potential to build a more personalized approach to gait rehabilitation after stroke. Meeting laboratory standards for detecting these events in impaired populations is challenging, without resorting to a multisensor solution. The Teager-Kaiser energy operator (TKEO) estimates the instantaneous energy of a signal; its enhanced sensitivity has successfully detected gait events from the acceleration signals of individuals with impaired mobility, but has not been applied to stroke.
Objective: This study aimed to test the criterion validity of TKEO gait event detection (and derived spatiotemporal metrics) using data from thigh mounted physical activity monitors compared with concurrent 3D motion capture in chronic survivors of stroke.
Methods: Participants with a history of stroke(n=13, mean age 59, SD 14 years), time since stroke (mean 1.5, SD 0.5 years), walking speed (mean 0.93ms-1 , SD 0.38 m/s) performed two 10m walks at their comfortable speed, while wearing two ActivPAL 4+ (AP4) sensors (anterior of both thighs) and LED cluster markers on the pelvis and ankles which were tracked by a motion capture system. The TKEO signal processing technique was then used to extract gait events (initial and final contact) and calculate stance durations which were compared with motion capture data.
Results: There was very good agreement between the AP4 and motion capture data for stance duration (AP4 0.85s, motion capture system 0.88s, 95% CI of difference -0.07 to 0.13, intraclass correlation coefficient [3,1]=0.79).
Conclusions: The TKEO method for gait event detection using AP4 data provides stance time durations that are comparable with laboratory-based systems in a population with chronic stroke. Providing accurate stance time durations from wearable sensors could extend gait training out of clinical environments. Limitations include ecological and external validity. Future work should confirm findings with a larger sample of participants with a history of stroke.
[This corrects the article DOI: 10.2196/67481.].
Background: Photoplethysmography (PPG) signals captured by wearable devices can provide vascular age information and support pervasive and long-term monitoring of personal health condition.
Objective: In this study, we aimed to estimate brachial-ankle pulse wave velocity (baPWV) from wrist PPG and electrocardiography (ECG) from smartwatch.
Methods: A total of 914 wrist PPG and ECG sequences and 278 baPWV measurements were collected via the smartwatch from 80 men and 82 women with average age of 63.4 (SD 13.4) and 64.3 (SD 11.6) years. Feature extraction and weighted pulse decomposition were applied to identify morphological characteristics regarding blood volume change and component waves in preprocessed PPG and ECG signals. A systematic strategy of feature combination was performed. The hierarchical regression method based on the random forest for classification and extreme gradient boosting (XGBoost) algorithms for regression was used, which first classified the data into subdivisions. The respective regression model for the subdivision was constructed with an overlapping zone.
Results: By using 914 sets of wrist PPG and ECG signals for baPWV estimation, the hierarchical regression model with 2 subdivisions and an overlapping zone of 400 cm per second achieved root-mean-square error of 145.0 cm per second and 141.4 cm per second for 24 men and 26 women, respectively, which is better than the general XGBoost regression model and the multivariable regression model (all P<.001).
Conclusions: We for the first time demonstrated that baPWV could be reliably estimated by the wrist PPG and ECG signals measured by the wearable device. Whether our algorithm could be applied clinically needs further verification.
Background: Implantable medical devices (IMDs), such as pacemakers, increasingly communicate wirelessly with external devices. To secure this wireless communication channel, a pairing process is needed to bootstrap a secret key between the devices. Previous work has proposed pairing approaches that often adopt a "seamless" design and render the pairing process imperceptible to patients. This lack of user perception can significantly compromise security and pose threats to patients.
Objective: The study aimed to explore the use of highly perceptible vibrations for pairing with IMDs and aim to propose a novel technique that leverages the natural randomness in human motor behavior as a shared source of entropy for pairing, potentially deployable to current IMD products.
Methods: A proof of concept was developed to demonstrate the proposed technique. A wearable prototype was built to simulate an individual acting as an IMD patient (real patients were not involved to avoid potential risks), and signal processing algorithms were devised to use accelerometer readings for facilitating secure pairing with an IMD. The technique was thoroughly evaluated in terms of accuracy, security, and usability through a lab study involving 24 participants.
Results: Our proposed pairing technique achieves high pairing accuracy, with a zero false acceptance rate (indicating low risks from adversaries) and a false rejection rate of only 0.6% (1/192; suggesting that legitimate users will likely experience very few failures). Our approach also offers robust security, which passes the National Institute of Standards and Technology statistical tests (with all P values >.01). Moreover, our technique has high usability, evidenced by an average System Usability Scale questionnaire score of 73.6 (surpassing the standard benchmark of 68 for "good usability") and insights gathered from the interviews. Furthermore, the entire pairing process can be efficiently completed within 5 seconds.
Conclusions: Vibration can be used to realize secure, usable, and deployable pairing in the context of IMDs. Our method also exhibits advantages over previous approaches, for example, lenient requirements on the sensing capabilities of IMDs and the synchronization between the IMD and the external device.
Background: Accurately assessing pain severity is essential for effective pain treatment and desirable patient outcomes. In clinical settings, pain intensity assessment relies on self-reporting methods, which are subjective to individuals and impractical for noncommunicative or critically ill patients. Previous studies have attempted to measure pain objectively using physiological responses to an external pain stimulus, assuming that the participant is free of internal body pain. However, this approach does not reflect the situation in a clinical setting, where a patient subjected to an external pain stimulus may already be experiencing internal body pain.
Objective: This study investigates the hypothesis that an individual's physiological response to external pain varies in the presence of preexisting pain.
Methods: We recruited 39 healthy participants aged 22-37 years, including 23 female and 16 male participants. Physiological signals, electrodermal activity, and electromyography were recorded while participants were subject to a combination of preexisting heat pain and cold pain stimuli. Feature engineering methods were applied to extract time-series features, and statistical analysis using ANOVA was conducted to assess significance.
Results: We found that the preexisting pain influences the body's physiological responses to an external pain stimulus. Several features-particularly those related to temporal statistics, successive differences, and distributions-showed statistically significant variation across varying preexisting pain conditions, with P values <.05 depending on the feature and stimulus.
Conclusions: Our findings suggest that preexisting pain alters the body's physiological response to new pain stimuli, highlighting the importance of considering pain history in objective pain assessment models.
Background: The use of acoustic biomarkers derived from speech signals is a promising non-invasive technique for diagnosing type 2 diabetes mellitus (T2DM). Despite its potential, there remains a critical gap in knowledge regarding the optimal number of voice recordings and recording schedule necessary to achieve effective diagnostic accuracy.
Objective: This study aimed to determine the optimal number of voice samples and the ideal recording schedule (frequency and timing), required to maintain the T2DM diagnostic efficacy while reducing patient burden.
Methods: We analyzed voice recordings from 78 adults (22 women), including 39 individuals diagnosed with T2DM. Participants had a mean (SD) age of 45.26 (10.63) years and mean (SD) BMI of 28.07 (4.59) kg/m². In total, 5035 voice recordings were collected, with a mean (SD) of 4.91 (1.45) recordings per day; higher adherence was observed among women (5.13 [1.38] vs 4.82 [1.46] in men). We evaluated the diagnostic accuracy of a previously developed voice-based model under different recording conditions. Segmented linear regression analysis was used to assess model accuracy across varying numbers of voice recordings, and the Kendall tau correlation was used to measure the relationship between recording settings and accuracy. A significance threshold of P<.05 was applied.
Results: Our results showed that including up to 6 voice recordings notably improved the model accuracy for T2DM compared to using only one recording, with accuracy increasing from 59.61 to 65.02 for men and from 65.55 to 69.43 for women. Additionally, the day on which voice recordings were collected did not significantly affect model accuracy (P>.05). However, adhering to recording within a single day demonstrated higher accuracy, with accuracy of 73.95% for women and 85.48% for men when all recordings were from the first and second days.
Conclusions: This study underscores the optimal voice recording settings to reduce patient burden while maintaining diagnostic efficacy.

