Background: The rapidly increasing availability of medical data in electronic health records (EHRs) may contribute to the concept of learning health systems, allowing for better personalized care. Type 2 diabetes mellitus was chosen as the use case in this study.
Objective: This study aims to explore the applicability of a recently developed patient similarity-based analytics approach based on EHRs as a candidate data analytical decision support tool.
Methods: A previously published precision cohort analytics workflow was adapted for the Dutch primary care setting using EHR data from the Nivel Primary Care Database. The workflow consisted of extracting patient data from the Nivel Primary Care Database to retrospectively generate decision points for treatment change, training a similarity model, generating a precision cohort of the most similar patients, and analyzing treatment options. This analysis showed the treatment options that led to a better outcome for the precision cohort in terms of clinical readouts for glycemic control.
Results: Data from 11,490 registered patients diagnosed with type 2 diabetes mellitus were extracted from the database. Treatment-specific filter cohorts of patient groups were generated, and the effect of past treatment choices in these cohorts was assessed separately for glycated hemoglobin and fasting glucose as clinical outcome variables. Precision cohorts were generated for several individual patients from the filter cohorts. Treatment options and outcome analyses were technically well feasible but in general had a lack of statistical power to demonstrate statistical significance for treatment options with better outcomes.
Conclusions: The precision cohort analytics workflow was successfully adapted for the Dutch primary care setting, proving its potential for use as a learning health system component. Although the approach proved technically well feasible, data size limitations need to be overcome before application for clinical decision support becomes realistically possible.
Machine learning (ML) approaches could expand the usefulness and application of implementation science methods in clinical medicine and public health settings. The aim of this viewpoint is to introduce a roadmap for applying ML techniques to address implementation science questions, such as predicting what will work best, for whom, under what circumstances, and with what predicted level of support, and what and when adaptation or deimplementation are needed. We describe how ML approaches could be used and discuss challenges that implementation scientists and methodologists will need to consider when using ML throughout the stages of implementation.
Background: Technological advancement has led to the growth and rapid increase of tuberculosis (TB) medical data generated from different health care areas, including diagnosis. Prioritizing better adoption and acceptance of innovative diagnostic technology to reduce the spread of TB significantly benefits developing countries. Trained TB-detection rats are used in Tanzania and Ethiopia for operational research to complement other TB diagnostic tools. This technology has increased new TB case detection owing to its speed, cost-effectiveness, and sensitivity.
Objective: During the TB detection process, rats produce vast amounts of data, providing an opportunity to identify interesting patterns that influence TB detection performance. This study aimed to develop models that predict if the rat will hit (indicate the presence of TB within) the sample or not using machine learning (ML) techniques. The goal was to improve the diagnostic accuracy and performance of TB detection involving rats.
Methods: APOPO (Anti-Persoonsmijnen Ontmijnende Product Ontwikkeling) Center in Morogoro provided data for this study from 2012 to 2019, and 366,441 observations were used to build predictive models using ML techniques, including decision tree, random forest, naïve Bayes, support vector machine, and k-nearest neighbor, by incorporating a variety of variables, such as the diagnostic results from partner health clinics using methods endorsed by the World Health Organization (WHO).
Results: The support vector machine technique yielded the highest accuracy of 83.39% for prediction compared to other ML techniques used. Furthermore, this study found that the inclusion of variables related to whether the sample contained TB or not increased the performance accuracy of the predictive model.
Conclusions: The inclusion of variables related to the diagnostic results of TB samples may improve the detection performance of the trained rats. The study results may be of importance to TB-detection rat trainers and TB decision-makers as the results may prompt them to take action to maintain the usefulness of the technology and increase the TB detection performance of trained rats.
[This corrects the article DOI: 10.2196/52782.].
Background: Health literacy (HL) is the ability to make informed decisions using health information. As health data and information availability increase due to online clinic notes and patient portals, it is important to understand how HL relates to social determinants of health (SDoH) and the place of informatics in mitigating disparities.
Objective: This systematic literature review aims to examine the role of HL in interactions with SDoH and to identify feasible HL-based interventions that address low patient understanding of health information to improve clinic note-sharing efficacy.
Methods: The review examined 2 databases, Scopus and PubMed, for English-language articles relating to HL and SDoH. We conducted a quantitative analysis of study characteristics and qualitative synthesis to determine the roles of HL and interventions.
Results: The results (n=43) were analyzed quantitatively and qualitatively for study characteristics, the role of HL, and interventions. Most articles (n=23) noted that HL was a result of SDoH, but other articles noted that it could also be a mediator for SdoH (n=6) or a modifiable SdoH (n=14) itself.
Conclusions: The multivariable nature of HL indicates that it could form the basis for many interventions to combat low patient understandability, including 4 interventions using informatics-based solutions. HL is a crucial, multidimensional skill in supporting patient understanding of health materials. Designing interventions aimed at improving HL or addressing poor HL in patients can help increase comprehension of health information, including the information contained in clinic notes shared with patients.
Background: Hypertension is the most prevalent risk factor for mortality globally. Uncontrolled hypertension is associated with excess morbidity and mortality, and nearly one-half of individuals with hypertension do not have the condition under control. Data from electronic health record (EHR) systems may be useful for community hypertension surveillance, filling a gap in local public health departments' community health assessments and supporting the public health data modernization initiatives currently underway. To identify patients with hypertension, computable phenotypes are required. These phenotypes leverage available data elements-such as vitals measurements and medications-to identify patients diagnosed with hypertension. However, there are multiple methodologies for creating a phenotype, and the identification of which method most accurately reflects real-world prevalence rates is needed to support data modernization initiatives.
Objective: This study sought to assess the comparability of 6 different EHR-based hypertension prevalence estimates with estimates from a national survey. Each of the prevalence estimates was created using a different computable phenotype. The overarching goal is to identify which phenotypes most closely align with nationally accepted estimations.
Methods: Using the 6 different EHR-based computable phenotypes, we calculated hypertension prevalence estimates for Marion County, Indiana, for the period from 2014 to 2015. We extracted hypertension rates from the Behavioral Risk Factor Surveillance System (BRFSS) for the same period. We used the two 1-sided t test (TOST) to test equivalence between BRFSS- and EHR-based prevalence estimates. The TOST was performed at the overall level as well as stratified by age, gender, and race.
Results: Using both 80% and 90% CIs, the TOST analysis resulted in 2 computable phenotypes demonstrating rough equivalence to BRFSS estimates. Variation in performance was noted across phenotypes as well as demographics. TOST with 80% CIs demonstrated that the phenotypes had less variance compared to BRFSS estimates within subpopulations, particularly those related to racial categories. Overall, less variance occurred on phenotypes that included vitals measurements.
Conclusions: This study demonstrates that certain EHR-derived prevalence estimates may serve as rough substitutes for population-based survey estimates. These outcomes demonstrate the importance of critically assessing which data elements to include in EHR-based computer phenotypes. Using comprehensive data sources, containing complete clinical data as well as data representative of the population, are crucial to producing robust estimates of chronic disease. As public health departments look toward data modernization activities, the EHR may serve to assist in more timely, locally representative estimates for chronic
Background: Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors.
Objective: Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links.
Methods: An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants.
Results: The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms.
Conclusions: This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.

