In the rapidly advancing landscape of artificial intelligence (AI) within integrative health care (IHC), the issue of data ownership has become pivotal. This study explores the intricate dynamics of data ownership in the context of IHC and the AI era, presenting the novel Collaborative Healthcare Data Ownership (CHDO) framework. The analysis delves into the multifaceted nature of data ownership, involving patients, providers, researchers, and AI developers, and addresses challenges such as ambiguous consent, attribution of insights, and international inconsistencies. Examining various ownership models, including privatization and communization postulates, as well as distributed access control, data trusts, and blockchain technology, the study assesses their potential and limitations. The proposed CHDO framework emphasizes shared ownership, defined access and control, and transparent governance, providing a promising avenue for responsible and collaborative AI integration in IHC. This comprehensive analysis offers valuable insights into the complex landscape of data ownership in IHC and the AI era, potentially paving the way for ethical and sustainable advancements in data-driven health care.
Background: Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows.
Objective: This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition.
Methods: We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks.
Results: Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences.
Conclusions: This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics.
Background: Artificial intelligence (AI) has shown great promise in assisting medical diagnosis, but its application in renal pathology remains limited.
Objective: We evaluated the performance of an advanced AI language model, Claude 3 Opus (Anthropic), in generating diagnostic descriptions for renal pathological images.
Methods: We carefully curated a dataset of 100 representative renal pathological images from the Diagnostic Atlas of Renal Pathology (3rd edition). The image selection aimed to cover a wide spectrum of common renal diseases, ensuring a balanced and comprehensive dataset. Claude 3 Opus generated diagnostic descriptions for each image, which were scored by 2 pathologists on clinical relevance, accuracy, fluency, completeness, and overall value.
Results: Claude 3 Opus achieved a high mean score in language fluency (3.86) but lower scores in clinical relevance (1.75), accuracy (1.55), completeness (2.01), and overall value (1.75). Performance varied across disease types. Interrater agreement was substantial for relevance (κ=0.627) and overall value (κ=0.589) and moderate for accuracy (κ=0.485) and completeness (κ=0.458).
Conclusions: Claude 3 Opus shows potential in generating fluent renal pathology descriptions but needs improvement in accuracy and clinical value. The AI's performance varied across disease types. Addressing the limitations of single-source data and incorporating comparative analyses with other AI approaches are essential steps for future research. Further optimization and validation are needed for clinical applications.
Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.
Objective: This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings.
Methods: A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.
Results: From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity.
Conclusions: This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.
Unlabelled: Interoperability has been designed to improve the quality and efficiency of health care. It allows the Centers for Medicare and Medicaid Services to collect data on quality measures as a part of the Meaningful Use program. Covered providers who fail to provide data have lower rates of reimbursement. Unintended consequences also arise at each step of the data collection process: (1) providers are not reimbursed for the extra time required to generate data; (2) patients do not have control over when and how their data are provided to or used by the government; and (3) large datasets increase the chances of an accidental data breach or intentional hacker attack. After detailing the issues, we describe several solutions, including an appropriate data use review board, which is designed to oversee certain aspects of the process and ensure accountability and transparency.
Background: Collecting the medical history during a first outpatient consultation plays an important role in making a diagnosis. However, it is a time-consuming process, and time is scarce in today's health care environment. The computer-assisted history taking (CAHT) systems allow patients to share their medical history electronically before their visit. Although multiple advantages of CAHT have been demonstrated, adoption in everyday medical practice remains low, which has been attributed to various barriers.
Objective: This study aimed to implement a CAHT questionnaire for orthopedic patients in preparation for their first outpatient consultation and analyze its completion rate and added value.
Methods: A multicenter implementation study was conducted in which all patients who were referred to the orthopedic department were invited to self-complete the CAHT questionnaire. The primary outcome of the study is the completion rate of the questionnaire. Secondary outcomes included patient and physician satisfaction. These were assessed via surveys and semistructured interviews.
Unlabelled: In total, 5321 patients were invited, and 4932 (92.7%) fully completed the CAHT questionnaire between April 2022 and July 2022. On average, participants (n=224) rated the easiness of completing the questionnaire at 8.0 (SD 1.9; 0-10 scale) and the satisfaction of the consult at 8.0 (SD 1.7; 0-10 scale). Satisfaction with the outpatient consultation was higher in cases where the given answers were used by the orthopedic surgeon during this consultation (median 8.3, IQR 8.0-9.1 vs median 8.0, IQR 7.0-8.5; P<.001). Physicians (n=15) scored the average added value as 7.8 (SD 1.7; 0-10 scale) and unanimously recognized increased efficiency, better patient engagement, and better medical record completeness. Implementing the patient's answers into the electronic health record was deemed necessary.
Conclusions: In this study, we have shown that previously recognized barriers to implementing and adapting CAHT can now be effectively overcome. We demonstrated that almost all patients completed the CAHT questionnaire. This results in reported improvements in both the efficiency and personalization of outpatient consultations. Given the pressing need for personalized health care delivery in today's time-constrained medical environment, we recommend implementing CAHT systems in routine medical practice.
Background: Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary.
Objective: In natural language processing applications, the word embedding association test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of the WEAT have been identified (ie, their nonrobust measure of bias and their reliance on predefined and limited groups of words or sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains.
Methods: In this study, we introduce the SD-WEAT, which is a modified version of the WEAT that uses the SD of multiple permutations of the WEATs to calculate bias in input embeddings. With the SD-WEAT, we evaluated the biases and stability of several language embedding models, including Global Vectors for Word Representation (GloVe), Word2Vec, and bidirectional encoder representations from transformers (BERT).
Results: This method produces results comparable to those of the WEAT, with strong correlations between the methods' bias scores or effect sizes (r=0.786) and P values (r=0.776), while addressing some of its largest limitations. More specifically, the SD-WEAT is more accessible, as it removes the need to predefine attribute groups, and because the SD-WEAT measures bias over multiple runs rather than one, it reduces the impact of outliers and sample size. Furthermore, the SD-WEAT was found to be more consistent and reliable than its predecessor.
Conclusions: Thus, the SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.
Background: Accurate history taking is essential for diagnosis, treatment, and patient care, yet miscommunications and time constraints often lead to incomplete information. Consequently, there has been a pressing need to establish a system whereby the questionnaire is duly completed before the medical appointment, entered into the electronic health record (EHR), and stored in a structured format within a database.
Objective: This study aimed to develop and evaluate a streamlined electronic questionnaire system, BEST-Survey (Bundang Hospital Electronic System for Total Care-Survey), integrated with the EHR, to enhance history taking and data management for patients with pediatric headaches.
Methods: An electronic questionnaire system was developed at Seoul National University Bundang Hospital, allowing patients to complete previsit questionnaires on a tablet PC. The information is automatically integrated into the EHR and stored in a structured database for further analysis. A retrospective analysis compared clinical information acquired from patients aged <18 years visiting the pediatric neurology outpatient clinic for headaches, before and after implementing the BEST-Survey system. The study included 365 patients before and 452 patients after system implementation. Answer rates and positive rates of key headache characteristics were compared between the 2 groups to evaluate the system's clinical utility.
Results: Implementation of the BEST-Survey system significantly increased the mean data acquisition rate from 54.6% to 99.3% (P<.001). Essential clinical features such as onset, location, duration, severity, nature, and frequency were obtained in over 98.7% (>446/452) of patients after implementation, compared to from 53.7% (196/365) to 85.2% (311/365) before. The electronic system facilitated comprehensive data collection, enabling detailed analysis of headache characteristics in the patient population. Most patients (280/452, 61.9%) reported headache onset less than 1 year prior, with the temporal region being the most common pain location (261/703, 37.1%). Over half (232/452, 51.3%) experienced headaches lasting less than 2 hours, with nausea and vomiting as the most commonly associated symptoms (231/1036, 22.3%).
Conclusions: The BEST-Survey system markedly improved the completeness and accuracy of essential history items for patients with pediatric headaches. The system also streamlined data extraction and analysis for clinical and research purposes. While the electronic questionnaire cannot replace physician-led history taking, it serves as a valuable adjunctive tool to enhance patient care.