Francesco Dernie, George Corby, Abigail Robinson, James Bezer, Nuria Mercade-Besora, Romain Griffier, Guillaume Verdy, Angela Leis, Juan Manuel Ramirez-Anguita, Miguel A Mayer, James T Brash, Sarah Seager, Rowan Parry, Annika Jodicke, Talita Duarte-Salles, Peter R Rijnbeek, Katia Verhamme, Alexandra Pacurariu, Daniel Morales, Luis Pinheiro, Daniel Prieto-Alhambra, Albert Prats-Uribe
{"title":"Standardised and Reproducible Phenotyping Using Distributed Analytics and Tools in the Data Analysis and Real World Interrogation Network (DARWIN EU).","authors":"Francesco Dernie, George Corby, Abigail Robinson, James Bezer, Nuria Mercade-Besora, Romain Griffier, Guillaume Verdy, Angela Leis, Juan Manuel Ramirez-Anguita, Miguel A Mayer, James T Brash, Sarah Seager, Rowan Parry, Annika Jodicke, Talita Duarte-Salles, Peter R Rijnbeek, Katia Verhamme, Alexandra Pacurariu, Daniel Morales, Luis Pinheiro, Daniel Prieto-Alhambra, Albert Prats-Uribe","doi":"10.1002/pds.70042","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE).</p><p><strong>Methods: </strong>The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model.</p><p><strong>Results: </strong>Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge.</p><p><strong>Conclusions: </strong>Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.</p>","PeriodicalId":19782,"journal":{"name":"Pharmacoepidemiology and Drug Safety","volume":"33 11","pages":"e70042"},"PeriodicalIF":2.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacoepidemiology and Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pds.70042","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE).
Methods: The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model.
Results: Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge.
Conclusions: Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.
目的:生成具有代表性的疾病表型对于确保观察性研究结果的可靠性非常重要。本手稿旨在概述一个基于真实世界数据生成可靠、可追溯表型的可重现框架,供数据分析和真实世界询问网络(DARWIN EU)使用。我们通过生成胰腺癌和系统性红斑狼疮(SLE)这两种疾病的表型来说明这一框架的使用方法:表型创建过程包括 14 个步骤,这些步骤基于 DARWIN EU 协调中心与欧洲药品管理局合作制定的标准操作程序。根据映射到 OMOP 通用数据模型的真实世界数据,利用一些定制的 R 软件包生成并审查两种表型的编码清单:为胰腺癌和系统性红斑狼疮生成了代码表,并在六个 OMOP 映射数据库中生成了队列。进行了诊断检查,结果显示这些队列的发病率和流行率数字与之前发表的文献大体相似,尽管数据库之间存在很大差异。同时出现的症状、病症和药物使用与基于以往知识的预先指定的临床描述一致:我们的详细表型分析过程使用了定制的工具,可以进行全面的代码表生成和审查,并对由此产生的队列特征进行大规模探索。更广泛地使用结构化和可重复的表型方法对于确保观察性研究的可靠性以达到监管目的非常重要。
期刊介绍:
The aim of Pharmacoepidemiology and Drug Safety is to provide an international forum for the communication and evaluation of data, methods and opinion in the discipline of pharmacoepidemiology. The Journal publishes peer-reviewed reports of original research, invited reviews and a variety of guest editorials and commentaries embracing scientific, medical, statistical, legal and economic aspects of pharmacoepidemiology and post-marketing surveillance of drug safety. Appropriate material in these categories may also be considered for publication as a Brief Report.
Particular areas of interest include:
design, analysis, results, and interpretation of studies looking at the benefit or safety of specific pharmaceuticals, biologics, or medical devices, including studies in pharmacovigilance, postmarketing surveillance, pharmacoeconomics, patient safety, molecular pharmacoepidemiology, or any other study within the broad field of pharmacoepidemiology;
comparative effectiveness research relating to pharmaceuticals, biologics, and medical devices. Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, as these methods are truly used in the real world;
methodologic contributions of relevance to pharmacoepidemiology, whether original contributions, reviews of existing methods, or tutorials for how to apply the methods of pharmacoepidemiology;
assessments of harm versus benefit in drug therapy;
patterns of drug utilization;
relationships between pharmacoepidemiology and the formulation and interpretation of regulatory guidelines;
evaluations of risk management plans and programmes relating to pharmaceuticals, biologics and medical devices.