Objectives: This work evaluated algorithmic bias in biomarkers classification using electronic pathology reports from female breast cancer cases. Bias was assessed across 5 subgroups: cancer registry, race, Hispanic ethnicity, age at diagnosis, and socioeconomic status.
Materials and methods: We utilized 594 875 electronic pathology reports from 178 121 tumors diagnosed in Kentucky, Louisiana, New Jersey, New Mexico, Seattle, and Utah to train 2 deep-learning algorithms to classify breast cancer patients using their biomarkers test results. We used balanced error rate (BER), demographic parity (DP), equalized odds (EOD), and equal opportunity (EOP) to assess bias.
Results: We found differences in predictive accuracy between registries, with the highest accuracy in the registry that contributed the most data (Seattle Registry, BER ratios for all registries >1.25). BER showed no significant algorithmic bias in extracting biomarkers (estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2) for race, Hispanic ethnicity, age at diagnosis, or socioeconomic subgroups (BER ratio <1.25). DP, EOD, and EOP all showed insignificant results.
Discussion: We observed significant differences in BER by registry, but no significant bias using the DP, EOD, and EOP metrics for socio-demographic or racial categories. This highlights the importance of employing a diverse set of metrics for a comprehensive evaluation of model fairness.
Conclusion: A thorough evaluation of algorithmic biases that may affect equality in clinical care is a critical step before deploying algorithms in the real world. We found little evidence of algorithmic bias in our biomarker classification tool. Artificial intelligence tools to expedite information extraction from clinical records could accelerate clinical trial matching and improve care.
Objectives: The objective of the study was to determine, after medication review, the patient risk score threshold that would distinguish between stays with prescriptions triggering pharmacist intervention (PI) and stays with prescriptions not triggering PI.
Materials and methods: The study was retrospective and observational, conducted in the clinical pharmacy team. The patient risk score was adapted from a Canadian score and was integrated in the clinical decision support system (CDSS). For each hospital stay, the score was calculated at the beginning of hospitalization and we retrospectively showed if a medication review and a PI were conducted. Then, the optimal patient risk score threshold was determined to help pharmacist in optimizing medication review.
Results: During the study, 973 (56.7%) medication reviews were performed and 248 (25.5%) led to a PI. After analyzing sensitivity, specificity, and positive predictive value of different thresholds, the threshold of 4 was deemed discriminating to identify hospital stays likely to lead to a PI following a medication review. At this threshold, 600 hospital stays would have been detected (33.3% of the latter led to a PI), and 5.0% of stays with a medication review would not have been detected even though they were hospital stays that had triggered a PI.
Discussion and conclusion: Integration of a patient risk score in a CDSS can help clinical pharmacist to target hospital stays likely to trigger a PI. However, an optimal threshold is difficult to determine. Constructing and using a score in practice should be organized with the local clinical pharmacy team, in order to understand the tool's limitations and maximize its use in detecting at-risk drug prescriptions.
Objective: Electronic patient portals (PP) allow for targeted and efficient research recruitment. We assessed pre- and postnatal women's recruitment methods preferences, focusing on PP.
Materials and methods: We conducted 4 in-person focus groups with new and expecting mothers. Participants reported demographics, health status, and comfort with technology including PP. We used descriptive statistics to characterize quantitative data and a quasi-deductive approach to analyze qualitative data.
Results: Participants (n = 32) were an average age of 31.9 years, mostly White (65.6%), married (90.6%), and had a 4-year degree or higher (71.9%). Although they preferred PP for research recruitment over other methods (eg, in-person, physical mail), participants suggested potential barriers, including high message frequency, messages feeling like spam, and concerns about confidentiality. Participants suggested solutions, including enhancing autonomy through opt-in methods; integrating their healthcare provider's feedback; sending personal and relevant messages; and assuring their PP data are confidential.
Discussion: PPs are a promising recruitment method for pre- and postnatal women including for maternal-child health studies. To ensure engagement with the method, researchers must respond to known patient concerns and incorporate their feedback into future efforts.
Conclusion: Although PP were generally viewed as an acceptable recruitment method, researchers should be mindful of barriers that may limit its reach and effectiveness.
Objectives: Data-sharing policies are rapidly evolving toward increased data sharing. However, participants' perspectives are not well understood and could have an adverse impact on participation in research. We evaluated participants' preferences for sharing specific types of data with specific groups, and strategies to enhance trust in data-sharing practices.
Materials and methods: In March 2023, we conducted a nationally representative online survey with 610 US adults and used logistic regression models to assess sociodemographic differences in their willingness to share different types of data.
Results: Our findings highlight notable racial disparities in willingness to share research data with external entities, especially health policy and public health organizations. Black participants were significantly less likely to share most health data with public health organizations, including mental health (odds ratio [OR]: 0.543, 95% CI, 0.323-0.895) and sexual health/fertility information (OR: 0.404, 95% CI, 0.228-0.691), compared to White participants. Moreover, 63% of participants expressed that their trust in researchers would improve if given control over the data recipients.
Discussion: Participants exhibit reluctance to share specific types of personal research data, emphasizing strong preferences regarding external data access. This highlights the need for a critical reassessment of current data-sharing policies to align with participant concerns.
Conclusion: It is imperative for data-sharing policies to integrate diverse patient viewpoints to mitigate risk of distrust and a potential unintended consequence of lower participation among racial and ethnic minority participants in research.
Objectives: Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confounding that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method since it provides unbiased treatment effect estimation and its derivation is straightforward. In this study, we aim to utilize IPTW to estimate treatment effect in the presence of time-dependent confounding using claims records.
Materials and methods: Previous studies have utilized propensity score methods with features derived from claims records through feature processing, which generally requires domain knowledge and additional resources to extract information to accurately estimate propensity scores. Deep learning, particularly using deep sequence models such as recurrent neural networks and Transformer, has demonstrated good performance in modeling EHRs for various downstream tasks. We propose that these deep sequence models can provide accurate IPTW estimation of treatment effect by directly estimating the propensity scores from claims records without the need for feature processing.
Results: Comprehensive evaluations on synthetic and semi-synthetic datasets demonstrate that IPTW treatment effect estimation using deep sequence models consistently outperforms baseline approaches, including logistic regression and multilayer perceptrons, combined with feature processing.
Discussion: Our findings demonstrate that deep sequence models consistently outperform traditional approaches in estimating treatment effects, particularly under time-dependent confounding. Moreover, Transformer-based models offer interpretability by assigning higher attention weights to relevant confounders, even when prior domain knowledge is limited.
Conclusion: Deep sequence models enable accurate treatment effect estimation through IPTW without the need for feature processing.
Objectives: There is limited knowledge on how providers and patients in the emergency department (ED) use electronic health records (EHRs) to facilitate the diagnostic process. While EHRs can support diagnostic decision-making, EHR features that are not user-centered may increase the likelihood of diagnostic error. We aimed to identify how EHRs facilitate or impede the diagnostic process in the ED and to identify opportunities to reduce diagnostic errors and improve care quality.
Materials and methods: We conducted semistructured interviews with 10 physicians, 15 nurses, and 8 patients across 4 EDs. Data were analyzed using a hybrid thematic analysis approach, which blends deductive (ie, using multiple conceptual frameworks) and inductive coding strategies. A team of 4 coders performed coding.
Results: We identified 4 themes, 3 at the care team level and 1 at the patient level. At the care team level, the benefits of the EHR in the diagnostic process included (1) customizing features to facilitate diagnostic workup and (2) aiding in communication. However, (3) EHR-driven protocols were found to potentially burden the care process and reliance on asynchronous communication could impede team dynamics. At the patient-level, we found that (4) patient portals facilitated meaningful patient engagement through timely delivery of results.
Discussion: While EHRs can improve the diagnostic process, they can also impair communication and increase workload. Electronic health record design should leverage provider-created tools to improve usability and enhance diagnostic safety.
Conclusions: Our findings have important implications for health information technology design and policy. Further work should assess optimal ways to release patient results via the EHR portal.
[This corrects the article DOI: 10.1093/jamiaopen/ooaf007.].
Objectives: Recent advances in deep learning show significant potential in analyzing continuous monitoring electronic health records (EHR) data for clinical outcome prediction. We aim to develop a Transformer-based, Encounter-level Clinical Outcome (TECO) model to predict mortality in the intensive care unit (ICU) using inpatient EHR data.
Materials and methods: The TECO model was developed using multiple baseline and time-dependent clinical variables from 2579 hospitalized COVID-19 patients to predict ICU mortality and was validated externally in an acute respiratory distress syndrome cohort (n = 2799) and a sepsis cohort (n = 6622) from the Medical Information Mart for Intensive Care IV (MIMIC-IV). Model performance was evaluated based on the area under the receiver operating characteristic (AUC) and compared with Epic Deterioration Index (EDI), random forest (RF), and extreme gradient boosting (XGBoost).
Results: In the COVID-19 development dataset, TECO achieved higher AUC (0.89-0.97) across various time intervals compared to EDI (0.86-0.95), RF (0.87-0.96), and XGBoost (0.88-0.96). In the 2 MIMIC testing datasets (EDI not available), TECO yielded higher AUC (0.65-0.77) than RF (0.59-0.75) and XGBoost (0.59-0.74). In addition, TECO was able to identify clinically interpretable features that were correlated with the outcome.
Discussion: The TECO model outperformed proprietary metrics and conventional machine learning models in predicting ICU mortality among patients with COVID-19, widespread inflammation, respiratory illness, and other organ failures.
Conclusion: The TECO model demonstrates a strong capability for predicting ICU mortality using continuous monitoring data. While further validation is needed, TECO has the potential to serve as a powerful early warning tool across various diseases in inpatient settings.
Objectives: The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.
Materials and methods: In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-Standard, Randomized, Persona, and Randomized Persona-with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed Accuracy, Relevance, Clarity, Actionability, Empathy/Tone, Completeness, Evidence, and Consistency. Descriptive statistics were used to summarize the performance of each LLM across all prompts.
Results: Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of Evidence.
Discussion: This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying Persona-style prompts to a patient's question.
Conclusion: Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.

