Background: Despite the importance of characterizing colonoscopy indication for quality monitoring and cancer screening program evaluation, there is no standard approach to documenting colonoscopy indication in medical records.
Methods: We applied two algorithms in three health care systems to assign colonoscopy indication to persons 50-89 years old who received a colonoscopy during 2010-2013. Both algorithms used standard procedure, diagnostic, and laboratory codes. One algorithm, the KPNC algorithm, used a hierarchical approach to classify exam indication into: diagnostic, surveillance, or screening; whereas the other, the SEARCH algorithm, used a logistic regression-based algorithm to provide the probability that colonoscopy was performed for screening. Gold standard assessment of indication was from medical records abstraction.
Results: There were 1,796 colonoscopy exams included in analyses; age and racial/ethnic distributions of participants differed across health care systems. The KPNC algorithm's sensitivities and specificities for screening indication ranged from 0.78-0.82 and 0.78-0.91, respectively; sensitivities and specificities for diagnostic indication ranged from 0.78-0.89 and 0.74-0.82, respectively. The KPNC algorithm had poor sensitivities (ranging from 0.11-0.67) and high specificities for surveillance exams. The Area Under the Curve (AUC) of the SEARCH algorithm for screening indication ranged from 0.76-0.84 across health care systems. For screening indication, the KPNC algorithm obtained higher specificities than the SEARCH algorithm at the same sensitivity.
Conclusion: Despite standardized implementation of these indication algorithms across three health care systems, the capture of colonoscopy indication data was imperfect. Thus, we recommend that standard, systematic documentation of colonoscopy indication should be added to medical records to ensure efficient and accurate data capture.
The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ assessments to evaluate repurposed data for a specific secondary use. DataGauge is composed of five steps: (1) Define information needs, (2) Develop a formal Data Needs Model (DNM), (3) Use the DNM and DQ theory to develop goal-specific DQ assessment requirements, (4) Extract DNM-specified data, and (5) Evaluate according to DQ requirements. DataGauge's main contribution is integrating general DQ theory and DQ assessment methods into a systematic process. This process supports the integration and practical implementation of existing Electronic Health Record-specific DQ assessment guidelines. DataGauge also provides an initial theory-based guidance framework that ties the DNM to DQ testing methods for each DQ dimension to aid the design of DQ assessments. This framework can be augmented with existing DQ guidelines to enable systematic assessment. DataGauge sets the stage for future systematic DQ assessment research by defining an assessment process, capable of adapting to a broad range of clinical datasets and secondary uses. Defining DataGauge sets the stage for new research directions such as DQ theory integration, DQ requirements portability research, DQ assessment tool development and DQ assessment tool usability.
Introduction: Health information generated by health care encounters, research enterprises, and public health is increasingly interoperable and shareable across uses and users. This paper examines the US public's willingness to be a part of multi-user health information networks and identifies factors associated with that willingness.
Methods: Using a probability-based sample (n = 890), we examined the univariable and multivariable relationships between willingness to participate in health information networks and demographic factors, trust, altruism, beliefs about the public's ethical obligation to participate in research, privacy, medical deception, and policy and governance using linear regression modeling.
Results: Willingness to be a part of a multi-user network that includes health care providers, mental health, social services, research, or quality improvement is low (26 percent-7.4 percent, depending on the user). Using stepwise regression, we identified a model that explained 42.6 percent of the variability in willingness to participate and included nine statistically significant factors associated with the outcome: Trust in the health system, confidence in policy, the belief that people have an obligation to participate in research, the belief that health researchers are accountable for conducting ethical research, the desire to give permission, education, concerns about insurance, privacy, and preference for notification.
Discussion: Our results suggest willingness to be a part of multi-user data networks is low, but that attention to governance may increase willingness. Building trust to enable acceptance of multi-use data networks will require a commitment to aligning data access practices with the expectations of the people whose data is being used.