Alexander W Jung PhD , Peter C Holm MSc , Kumar Gaurav PhD , Jessica Xin Hjaltelin PhD , Davide Placido PhD , Prof Laust Hvas Mortensen PhD , Prof Ewan Birney PhD , Prof S⊘ren Brunak PhD , Prof Moritz Gerstung PhD
{"title":"基于国民健康数据的多癌症风险分层:一项回顾性建模和验证研究","authors":"Alexander W Jung PhD , Peter C Holm MSc , Kumar Gaurav PhD , Jessica Xin Hjaltelin PhD , Davide Placido PhD , Prof Laust Hvas Mortensen PhD , Prof Ewan Birney PhD , Prof S⊘ren Brunak PhD , Prof Moritz Gerstung PhD","doi":"10.1016/S2589-7500(24)00062-1","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank.</p></div><div><h3>Methods</h3><p>In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance.</p></div><div><h3>Findings</h3><p>From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers.</p></div><div><h3>Interpretation</h3><p>Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts.</p></div><div><h3>Funding</h3><p>Novo Nordisk Foundation and the Danish Innovation Foundation.</p></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"6 6","pages":"Pages e396-e406"},"PeriodicalIF":23.8000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2589750024000621/pdfft?md5=cd2cb89cd08e988338ffea88e7f08924&pid=1-s2.0-S2589750024000621-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study\",\"authors\":\"Alexander W Jung PhD , Peter C Holm MSc , Kumar Gaurav PhD , Jessica Xin Hjaltelin PhD , Davide Placido PhD , Prof Laust Hvas Mortensen PhD , Prof Ewan Birney PhD , Prof S⊘ren Brunak PhD , Prof Moritz Gerstung PhD\",\"doi\":\"10.1016/S2589-7500(24)00062-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank.</p></div><div><h3>Methods</h3><p>In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance.</p></div><div><h3>Findings</h3><p>From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers.</p></div><div><h3>Interpretation</h3><p>Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts.</p></div><div><h3>Funding</h3><p>Novo Nordisk Foundation and the Danish Innovation Foundation.</p></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":\"6 6\",\"pages\":\"Pages e396-e406\"},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2589750024000621/pdfft?md5=cd2cb89cd08e988338ffea88e7f08924&pid=1-s2.0-S2589750024000621-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750024000621\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024000621","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study
Background
Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank.
Methods
In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance.
Findings
From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers.
Interpretation
Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts.
Funding
Novo Nordisk Foundation and the Danish Innovation Foundation.
期刊介绍:
The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health.
The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health.
We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.