Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.100886
Amir Hadid PhD , Emily G McDonald MD , Qianggang Ding MEng , Christopher Phillipp BSc , Audrey Trottier BSc , Philippe C Dixon PhD , Oussama Jlassi MSc , Matthew P Cheng MD , Jesse Papenburg MD , Prof Michael Libman MD , Dennis Jensen PhD
<div><h3>Background</h3><div>Presymptomatic or asymptomatic immune system signals and subclinical physiological changes might provide a more objective measure of early viral upper respiratory tract infections (VRTIs) compared with symptom-based detection. We aimed to use multimodal wearable sensors, host-response biomarkers, and machine learning to predict systemic inflammation following controlled exposure to a live attenuated influenza vaccine, without relying on symptoms.</div></div><div><h3>Methods</h3><div>WE SENSE study is a single-centre (McGill University Health Center, Montreal, QC, Canada), prospective controlled trial that recruited healthy adults aged 18–59 years who had not received or were not planning to receive the seasonal influenza vaccine or any other vaccine during the study period. We excluded participants with any infectious symptoms within 7 days before screening. We collected physiological and activity data (eg, heart rate, breathing rate, and acceleration) through continuous monitoring with a smart ring (Oura ring Gen 2, Oura Oy, Finland), smart watch (Biobeat watch, Biobeat Technologies, Israel), and smart shirt (Astroskin–Hexoskin shirt, Hexoskin, Canada) along with high temporal resolution systemic inflammatory biomarker mapping over 12 days (7 days before inoculation and 5 days after). We frequently tested participants both before and after inoculation via PCR for respiratory pathogens, and monitored them via apps for symptoms and free-text annotations. Machine learning algorithms predicting systemic inflammatory surges were trained (35 participants), validated (ten participants), and tested (ten participants) using gradient-boosting techniques.</div></div><div><h3>Findings</h3><div>Between Dec 10, 2021, and Feb, 28, 2022, we enrolled 56 participants, of whom 55 had available data; all 55 participants continuously wore the Oura ring, 54 participants wore the Astroskin–Hexoskin shirt, and 50 wore the Biobeat watch. 27 (49%) participants were female and 28 (51%) were male; 31 (56%) participants were White, eight (15%) were Asian, four (7%) were Black, two (4%) were Latino or Hispanic, and ten (18%) did not disclose. We used model 2, which included handpicked features from the Oura ring night-time data, as the candidate model because it was built on the lowest number of features (more practical). This model predicted inflammatory surges with receiver operating characteristic area under the curve (ROC-AUC) of 0·73 (95% CI 0·71–0·74) for real-time prediction and 0·89 (0·87–0·90) for a 24-h tolerance prediction window (24h-tol) using night-time data from the Oura ring. Incorporating both night-time and daytime data from the Astroskin–Hexoskin shirt yielded ROC-AUC values of 0·73 (0·71–0·75) for real-time and 0·91 (0·90–0·92) for 24h-tol along with improved precision (ie, specificity [0·83, 0·79–0·87] and F1 score [0·65, 0·58–0·71]). The model based on symptoms alone had lower performance, with ROC-AUC values of 0·66 (0·63–0
背景:与基于症状的检测相比,症状前或无症状的免疫系统信号和亚临床生理变化可能为早期病毒性上呼吸道感染(VRTIs)提供更客观的衡量标准。我们的目标是使用多模态可穿戴传感器、宿主反应生物标志物和机器学习来预测控制暴露于减毒流感活疫苗后的全身炎症,而不依赖于症状。方法:WE SENSE研究是一项单中心前瞻性对照试验(McGill University Health Center, Montreal, QC, Canada),招募了18-59岁的健康成年人,他们在研究期间没有接种或不打算接种季节性流感疫苗或任何其他疫苗。我们在筛查前7天内排除了有任何感染症状的参与者。我们通过连续监测收集生理和活动数据(例如,心率,呼吸频率和加速度),使用智能环(Oura环Gen 2, Oura Oy,芬兰),智能手表(Biobeat手表,Biobeat Technologies,以色列)和智能衬衫(Astroskin-Hexoskin衬衫,Hexoskin,加拿大),以及高时间分辨率的全身炎症生物标志物制图超过12天(接种前7天和接种后5天)。我们经常通过PCR对接种前后的参与者进行呼吸道病原体检测,并通过应用程序监测他们的症状和免费文本注释。使用梯度增强技术对预测全身性炎症激增的机器学习算法进行了训练(35名参与者)、验证(10名参与者)和测试(10名参与者)。研究结果:在2021年12月10日至2022年2月28日期间,我们招募了56名参与者,其中55名有可用数据;所有55名参与者都一直戴着Oura戒指,54名参与者穿着Astroskin-Hexoskin衬衫,50名参与者戴着Biobeat手表。女性27人(49%),男性28人(51%);31名(56%)参与者是白人,8名(15%)是亚洲人,4名(7%)是黑人,2名(4%)是拉丁裔或西班牙裔,10名(18%)没有透露。我们使用模型2,其中包括从Oura环夜间数据中精心挑选的特征,作为候选模型,因为它建立在最少数量的特征上(更实用)。该模型使用来自Oura环的夜间数据预测炎症激增,实时预测受试者工作特征曲线下面积(ROC-AUC)为0.73 (95% CI 0.71 - 0.74), 24小时耐受性预测窗口(24h-tol)为0.89(0.87 - 0.90)。结合astrosskin - hexoskin衬衫的夜间和日间数据,实时的ROC-AUC值为0.73(0.71 - 0.75),24小时的ROC-AUC值为0.91(0.90 - 0.92),并提高了精度(即特异性[0.83,0.79 - 0.87]和F1评分[0.65,0.58 - 0.71])。仅基于症状的模型性能较低,实时ROC-AUC值为0.66 (0.63 - 0.68),24h-tol的ROC-AUC值为0.79(0.77 - 0.82)。解释:全身炎症生物标志物与可穿戴生物传感器的生理数据相结合,为训练机器学习算法提供了丰富而客观的数据,以预测低级别流感挑战的全身炎症。这种方法优于基于症状的检测,并有可能改善流感等虚拟呼吸道感染的检测,并缩短检测时间,即使在无症状人群中也是如此。资助:加拿大卫生研究所。
{"title":"Development of machine learning prediction models for systemic inflammatory response following controlled exposure to a live attenuated influenza vaccine in healthy adults using multimodal wearable biosensors in Canada: a single-centre, prospective controlled trial","authors":"Amir Hadid PhD , Emily G McDonald MD , Qianggang Ding MEng , Christopher Phillipp BSc , Audrey Trottier BSc , Philippe C Dixon PhD , Oussama Jlassi MSc , Matthew P Cheng MD , Jesse Papenburg MD , Prof Michael Libman MD , Dennis Jensen PhD","doi":"10.1016/j.landig.2025.100886","DOIUrl":"10.1016/j.landig.2025.100886","url":null,"abstract":"<div><h3>Background</h3><div>Presymptomatic or asymptomatic immune system signals and subclinical physiological changes might provide a more objective measure of early viral upper respiratory tract infections (VRTIs) compared with symptom-based detection. We aimed to use multimodal wearable sensors, host-response biomarkers, and machine learning to predict systemic inflammation following controlled exposure to a live attenuated influenza vaccine, without relying on symptoms.</div></div><div><h3>Methods</h3><div>WE SENSE study is a single-centre (McGill University Health Center, Montreal, QC, Canada), prospective controlled trial that recruited healthy adults aged 18–59 years who had not received or were not planning to receive the seasonal influenza vaccine or any other vaccine during the study period. We excluded participants with any infectious symptoms within 7 days before screening. We collected physiological and activity data (eg, heart rate, breathing rate, and acceleration) through continuous monitoring with a smart ring (Oura ring Gen 2, Oura Oy, Finland), smart watch (Biobeat watch, Biobeat Technologies, Israel), and smart shirt (Astroskin–Hexoskin shirt, Hexoskin, Canada) along with high temporal resolution systemic inflammatory biomarker mapping over 12 days (7 days before inoculation and 5 days after). We frequently tested participants both before and after inoculation via PCR for respiratory pathogens, and monitored them via apps for symptoms and free-text annotations. Machine learning algorithms predicting systemic inflammatory surges were trained (35 participants), validated (ten participants), and tested (ten participants) using gradient-boosting techniques.</div></div><div><h3>Findings</h3><div>Between Dec 10, 2021, and Feb, 28, 2022, we enrolled 56 participants, of whom 55 had available data; all 55 participants continuously wore the Oura ring, 54 participants wore the Astroskin–Hexoskin shirt, and 50 wore the Biobeat watch. 27 (49%) participants were female and 28 (51%) were male; 31 (56%) participants were White, eight (15%) were Asian, four (7%) were Black, two (4%) were Latino or Hispanic, and ten (18%) did not disclose. We used model 2, which included handpicked features from the Oura ring night-time data, as the candidate model because it was built on the lowest number of features (more practical). This model predicted inflammatory surges with receiver operating characteristic area under the curve (ROC-AUC) of 0·73 (95% CI 0·71–0·74) for real-time prediction and 0·89 (0·87–0·90) for a 24-h tolerance prediction window (24h-tol) using night-time data from the Oura ring. Incorporating both night-time and daytime data from the Astroskin–Hexoskin shirt yielded ROC-AUC values of 0·73 (0·71–0·75) for real-time and 0·91 (0·90–0·92) for 24h-tol along with improved precision (ie, specificity [0·83, 0·79–0·87] and F1 score [0·65, 0·58–0·71]). The model based on symptoms alone had lower performance, with ROC-AUC values of 0·66 (0·63–0","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100886"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144561612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.100897
The Lancet Digital Health
{"title":"Fixing cracks in the artificial intelligence drug development pipeline","authors":"The Lancet Digital Health","doi":"10.1016/j.landig.2025.100897","DOIUrl":"10.1016/j.landig.2025.100897","url":null,"abstract":"","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100897"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.02.004
Christoph Sadée MRes , Stefano Testa MD , Thomas Barba MD PhD , Katherine Hartmann MD PhD , Maximilian Schuessler MS MPP , Alexander Thieme Dr med , Prof George M Church PhD , Prof Ifeoma Okoye MBBS FWACS , Prof Tina Hernandez-Boussard PhD , Prof Leroy Hood MD PhD , Prof Ilya Shmulevich PhD , Prof Ellen Kuhl PhD , Prof Olivier Gevaert PhD
The notion of medical digital twins is gaining popularity both within the scientific community and among the general public; however, much of the recent enthusiasm has occurred in the absence of a consensus on their fundamental make-up. Digital twins originate in the field of engineering, in which a constantly updating virtual copy enables analysis, simulation, and prediction of a real-world object or process. In this Health Policy paper, we evaluate this concept in the context of medicine and outline five key components of the medical digital twin: the patient, data connection, patient-in-silico, interface, and twin synchronisation. We consider how various enabling technologies in multimodal data, artificial intelligence, and mechanistic modelling will pave the way for clinical adoption and provide examples pertaining to oncology and diabetes. We highlight the role of data fusion and the potential of merging artificial intelligence and mechanistic modelling to address the limitations of either the AI or the mechanistic modelling approach used independently. In particular, we highlight how the digital twin concept can support the performance of large language models applied in medicine and its potential to address health-care challenges. We believe that this Health Policy paper will help to guide scientists, clinicians, and policy makers in creating medical digital twins in the future and translating this promising new paradigm from theory into clinical practice.
{"title":"Medical digital twins: enabling precision medicine and medical artificial intelligence","authors":"Christoph Sadée MRes , Stefano Testa MD , Thomas Barba MD PhD , Katherine Hartmann MD PhD , Maximilian Schuessler MS MPP , Alexander Thieme Dr med , Prof George M Church PhD , Prof Ifeoma Okoye MBBS FWACS , Prof Tina Hernandez-Boussard PhD , Prof Leroy Hood MD PhD , Prof Ilya Shmulevich PhD , Prof Ellen Kuhl PhD , Prof Olivier Gevaert PhD","doi":"10.1016/j.landig.2025.02.004","DOIUrl":"10.1016/j.landig.2025.02.004","url":null,"abstract":"<div><div>The notion of medical digital twins is gaining popularity both within the scientific community and among the general public; however, much of the recent enthusiasm has occurred in the absence of a consensus on their fundamental make-up. Digital twins originate in the field of engineering, in which a constantly updating virtual copy enables analysis, simulation, and prediction of a real-world object or process. In this Health Policy paper, we evaluate this concept in the context of medicine and outline five key components of the medical digital twin: the patient, data connection, patient-in-silico, interface, and twin synchronisation. We consider how various enabling technologies in multimodal data, artificial intelligence, and mechanistic modelling will pave the way for clinical adoption and provide examples pertaining to oncology and diabetes. We highlight the role of data fusion and the potential of merging artificial intelligence and mechanistic modelling to address the limitations of either the AI or the mechanistic modelling approach used independently. In particular, we highlight how the digital twin concept can support the performance of large language models applied in medicine and its potential to address health-care challenges. We believe that this Health Policy paper will help to guide scientists, clinicians, and policy makers in creating medical digital twins in the future and translating this promising new paradigm from theory into clinical practice.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100864"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.100874
Thomas McAndrew PhD , Andrew A Lover PhD , Garrik Hoyt , Maimuna S Majumder PhD
Presidential actions on Jan 20, 2025, by President Donald Trump, including executive orders, have delayed access to or led to the removal of crucial public health data sources in the USA. The continuous collection and maintenance of health data support public health, safety, and security associated with diseases such as seasonal influenza. To show how public health data surveillance enhances public health practice, we analysed data from seven US Government-maintained sources associated with seasonal influenza. We fit two models that forecast the number of national incident influenza hospitalisations in the USA: (1) a data-rich model incorporating data from all seven Government data sources; and (2) a data-poor model built using a single Government hospitalisation data source, representing the minimal required information to produce a forecast of influenza hospitalisations. The data-rich model generated reliable forecasts useful for public health decision making, whereas the predictions using the data-poor model were highly uncertain, rendering them impractical. Thus, health data can serve as a transparent and standardised foundation to improve domestic and global health. Therefore, a plan should be developed to safeguard public health data as a public good.
{"title":"When data disappear: public health pays as US policy strays","authors":"Thomas McAndrew PhD , Andrew A Lover PhD , Garrik Hoyt , Maimuna S Majumder PhD","doi":"10.1016/j.landig.2025.100874","DOIUrl":"10.1016/j.landig.2025.100874","url":null,"abstract":"<div><div>Presidential actions on Jan 20, 2025, by President Donald Trump, including executive orders, have delayed access to or led to the removal of crucial public health data sources in the USA. The continuous collection and maintenance of health data support public health, safety, and security associated with diseases such as seasonal influenza. To show how public health data surveillance enhances public health practice, we analysed data from seven US Government-maintained sources associated with seasonal influenza. We fit two models that forecast the number of national incident influenza hospitalisations in the USA: (1) a data-rich model incorporating data from all seven Government data sources; and (2) a data-poor model built using a single Government hospitalisation data source, representing the minimal required information to produce a forecast of influenza hospitalisations. The data-rich model generated reliable forecasts useful for public health decision making, whereas the predictions using the data-poor model were highly uncertain, rendering them impractical. Thus, health data can serve as a transparent and standardised foundation to improve domestic and global health. Therefore, a plan should be developed to safeguard public health data as a public good.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100874"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.100883
Ramez Kouzy , Julian C Hong , Danielle S Bitterman
{"title":"One shot at trust: building credible evidence for medical artificial intelligence","authors":"Ramez Kouzy , Julian C Hong , Danielle S Bitterman","doi":"10.1016/j.landig.2025.100883","DOIUrl":"10.1016/j.landig.2025.100883","url":null,"abstract":"","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100883"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144769145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.1016/j.landig.2025.100894
Ming Liu MSc , Xin-Yao Yi MSc , Yun-Zhe Chen MPH , Mei-Nuo Li MPH , Yuan-Yuan Zhang MPH , Casper J P Zhang PhD , Jian Huang PhD , Prof Wai-Kit Ming MD
Background
Sexually transmitted infections (STIs) are a substantial public health concern. We aimed to evaluate the accuracy and applicability of deep learning algorithms in the early detection of STIs from skin lesions.
Methods
In this systematic review and meta-analysis, we searched PubMed, Institute of Electrical and Electronics Engineers Xplore, Web of Science, Scopus for studies employing deep learning for classifying clinical skin lesion images of STIs published between Jan 1, 2010, and Dec 31, 2023. Studies that did not include clinical images were excluded. The primary outcome was diagnostic performance, assessed by pooled sensitivity and specificity. We conducted a meta-analysis of the studies providing contingency tables using a unified hierarchical model. We additionally assessed the quality of the studies using modified QUADAS-2 and CheckList for Evaluation of image-based AI Reports in Dermatology (CLEAR Derm) criteria. This study was registered with PROSPERO, CRD42024496966.
Findings
Among the 1946 studies identified, we included 101 in our review. The majority of the included studies focused on mpox (91 [88%] of 101 studies), followed by scabies (eight [8%] studies), herpes (four [4%] studies), syphilis (one [1%] study), and molluscum (one [1%] study). A meta-analysis of 55 studies showed that deep learning algorithms had a pooled sensitivity of 0·97 (95% CI 0·95–0·98) and a specificity of 0·99 (0·98–0·99) for mpox, and a sensitivity of 0·95 (0·90–0·98) and specificity of 0·97 (0·86–0·99) for scabies. The majority of studies (86 [85%] of 101 studies) utilised public datasets; traditional convolutional neural networks with backbone architectures such as ResNet and VGGNet were used in all studies. However, notable quality issues related to the data, technical descriptions of labelling methods and diagnostic label references, technical assessment for public evaluation of algorithms, benchmarking and bias assessments, application descriptions of use cases, and target conditions and potential impacts were identified in CLEAR Derm. Potential biases in performance evaluation metrics and applicability concerns in the data, deep learning algorithms, and performance evaluation metrics might impede the generalisability of these models to real-world clinical practice and STI screening across diverse populations.
Interpretation
Although deep learning shows potential for early detection of STIs, there are challenges to ensuring the generalisability of such algorithms due to limited heterogeneous data. Standardised, diverse skin lesion image datasets are crucial to ensure fair comparisons and reliable performance.
Funding
City University of Hong Kong.
背景:性传播感染(STIs)是一个重大的公共卫生问题。我们的目的是评估深度学习算法在皮肤病变性传播感染早期检测中的准确性和适用性。方法:在本系统综述和荟萃分析中,我们检索了PubMed, Institute of Electrical and Electronics Engineers Xplore, Web of Science, Scopus,检索了2010年1月1日至2023年12月31日期间发表的使用深度学习对性传播感染临床皮肤病变图像进行分类的研究。不包括临床影像的研究被排除在外。主要结局是诊断表现,通过综合敏感性和特异性进行评估。我们对这些研究进行了荟萃分析,使用统一的层次模型提供列联表。我们还使用改进的QUADAS-2和基于图像的皮肤病学人工智能报告评估清单(CLEAR Derm)标准评估了研究的质量。本研究注册号为PROSPERO, CRD42024496966。研究结果:在已确定的1946项研究中,我们纳入了101项。纳入的大多数研究集中于mpox(101项研究中的91项[88%]),其次是疥疮(8项[8%]研究)、疱疹(4项[4%]研究)、梅毒(1项[1%]研究)和软疣(1项[1%]研究)。55项研究的荟萃分析显示,深度学习算法对m痘的总灵敏度为0.97 (95% CI为0.95 ~ 0.98),特异性为0.99(0.98 ~ 0.99);对疥疮的总灵敏度为0.95(0.90 ~ 0.98),特异性为0.97(0.86 ~ 0.99)。大多数研究(101项研究中的86项[85%])使用了公共数据集;所有研究均使用具有骨干结构的传统卷积神经网络,如ResNet和VGGNet。然而,在CLEAR Derm中发现了与数据、标签方法和诊断标签参考的技术描述、算法公共评估的技术评估、基准和偏差评估、用例的应用描述以及目标条件和潜在影响相关的显著质量问题。性能评估指标的潜在偏差以及数据、深度学习算法和性能评估指标的适用性问题可能会阻碍这些模型在现实世界的临床实践和不同人群的STI筛查中的推广。解释:尽管深度学习显示出早期发现性传播感染的潜力,但由于异构数据有限,在确保此类算法的通用性方面存在挑战。标准化、多样化的皮肤病变图像数据集对于确保公平比较和可靠的性能至关重要。资助:香港城市大学。
{"title":"Early detection of sexually transmitted infections from skin lesions with deep learning: a systematic review and meta-analysis","authors":"Ming Liu MSc , Xin-Yao Yi MSc , Yun-Zhe Chen MPH , Mei-Nuo Li MPH , Yuan-Yuan Zhang MPH , Casper J P Zhang PhD , Jian Huang PhD , Prof Wai-Kit Ming MD","doi":"10.1016/j.landig.2025.100894","DOIUrl":"10.1016/j.landig.2025.100894","url":null,"abstract":"<div><h3>Background</h3><div>Sexually transmitted infections (STIs) are a substantial public health concern. We aimed to evaluate the accuracy and applicability of deep learning algorithms in the early detection of STIs from skin lesions.</div></div><div><h3>Methods</h3><div>In this systematic review and meta-analysis, we searched PubMed, Institute of Electrical and Electronics Engineers Xplore, Web of Science, Scopus for studies employing deep learning for classifying clinical skin lesion images of STIs published between Jan 1, 2010, and Dec 31, 2023. Studies that did not include clinical images were excluded. The primary outcome was diagnostic performance, assessed by pooled sensitivity and specificity. We conducted a meta-analysis of the studies providing contingency tables using a unified hierarchical model. We additionally assessed the quality of the studies using modified QUADAS-2 and CheckList for Evaluation of image-based AI Reports in Dermatology (CLEAR Derm) criteria. This study was registered with PROSPERO, CRD42024496966.</div></div><div><h3>Findings</h3><div>Among the 1946 studies identified, we included 101 in our review. The majority of the included studies focused on mpox (91 [88%] of 101 studies), followed by scabies (eight [8%] studies), herpes (four [4%] studies), syphilis (one [1%] study), and molluscum (one [1%] study). A meta-analysis of 55 studies showed that deep learning algorithms had a pooled sensitivity of 0·97 (95% CI 0·95–0·98) and a specificity of 0·99 (0·98–0·99) for mpox, and a sensitivity of 0·95 (0·90–0·98) and specificity of 0·97 (0·86–0·99) for scabies. The majority of studies (86 [85%] of 101 studies) utilised public datasets; traditional convolutional neural networks with backbone architectures such as ResNet and VGGNet were used in all studies. However, notable quality issues related to the data, technical descriptions of labelling methods and diagnostic label references, technical assessment for public evaluation of algorithms, benchmarking and bias assessments, application descriptions of use cases, and target conditions and potential impacts were identified in CLEAR Derm. Potential biases in performance evaluation metrics and applicability concerns in the data, deep learning algorithms, and performance evaluation metrics might impede the generalisability of these models to real-world clinical practice and STI screening across diverse populations.</div></div><div><h3>Interpretation</h3><div>Although deep learning shows potential for early detection of STIs, there are challenges to ensuring the generalisability of such algorithms due to limited heterogeneous data. Standardised, diverse skin lesion image datasets are crucial to ensure fair comparisons and reliable performance.</div></div><div><h3>Funding</h3><div>City University of Hong Kong.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 7","pages":"Article 100894"},"PeriodicalIF":24.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144795880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.1016/j.landig.2025.03.002
Dennis Bontempi PhD , Osbert Zalay PhD , Danielle S Bitterman MD , Nicolai Birkbak PhD , Derek Shyr PhD , Fridolin Haugg MSc , Jack M Qian MD , Hannah Roberts MD , Subha Perni MD , Vasco Prudente MSc , Suraj Pai MSc , Andre Dekker PhD , Benjamin Haibe-Kains PhD , Christian Guthier PhD , Tracy Balboni MD , Laura Warren MD , Monica Krishan MD , Benjamin H Kann MD , Prof Charles Swanton MD , Prof Dirk De Ruysscher MD , Prof Hugo J W L Aerts PhD
<div><h3>Background</h3><div>As humans age at different rates, physical appearance can yield insights into biological age and physiological health more reliably than chronological age. In medicine, however, appearance is incorporated into medical judgements in a subjective and non-standardised way. In this study, we aimed to develop and validate FaceAge, a deep learning system to estimate biological age from easily obtainable and low-cost face photographs.</div></div><div><h3>Methods</h3><div>FaceAge was trained on data from 58 851 presumed healthy individuals aged 60 years or older: 56 304 individuals from the IMDb–Wiki dataset (training) and 2547 from the UTKFace dataset (initial validation). Clinical utility was evaluated on data from 6196 patients with cancer diagnoses from two institutions in the Netherlands and the USA: the MAASTRO, Harvard Thoracic, and Harvard Palliative cohorts FaceAge estimates in these cancer cohorts were compared with a non-cancerous reference cohort of 535 individuals. To assess the prognostic relevance of FaceAge, we performed Kaplan–Meier survival analysis and Cox modelling, adjusting for several clinical covariates. We also assessed the performance of FaceAge in patients with metastatic cancer receiving palliative treatment at the end of life by incorporating FaceAge into clinical prediction models. To evaluate whether FaceAge has the potential to be a biomarker for molecular ageing, we performed a gene-based analysis to assess its association with senescence genes.</div></div><div><h3>Findings</h3><div>FaceAge showed significant independent prognostic performance in various cancer types and stages. Looking older was correlated with worse overall survival (after adjusting for covariates per-decade hazard ratio [HR] 1·151, p=0·013 in a pan-cancer cohort of n=4906; 1·148, p=0·011 in a thoracic cohort of n=573; and 1·117, p=0·021 in a palliative cohort of n=717). We found that, on average, patients with cancer looked older than their chronological age (mean increase of 4·79 years with respect to non-cancerous reference cohort, p<0·0001). We found that FaceAge can improve physicians’ survival predictions in patients with incurable cancer receiving palliative treatments (from area under the curve 0·74 [95% CI 0·70–0·78] to 0·8 [0·76–0·83]; p<0·0001), highlighting the clinical use of the algorithm to support end-of-life decision making. FaceAge was also significantly associated with molecular mechanisms of senescence through gene analysis, whereas age was not.</div></div><div><h3>Interpretation</h3><div>Our results suggest that a deep learning model can estimate biological age from face photographs and thereby enhance survival prediction in patients with cancer. Further research, including validation in larger cohorts, is needed to verify these findings in patients with cancer and to establish whether the findings extend to patients with other diseases. Subject to further testing and validation, approaches such as
{"title":"FaceAge, a deep learning system to estimate biological age from face photographs to improve prognostication: a model development and validation study","authors":"Dennis Bontempi PhD , Osbert Zalay PhD , Danielle S Bitterman MD , Nicolai Birkbak PhD , Derek Shyr PhD , Fridolin Haugg MSc , Jack M Qian MD , Hannah Roberts MD , Subha Perni MD , Vasco Prudente MSc , Suraj Pai MSc , Andre Dekker PhD , Benjamin Haibe-Kains PhD , Christian Guthier PhD , Tracy Balboni MD , Laura Warren MD , Monica Krishan MD , Benjamin H Kann MD , Prof Charles Swanton MD , Prof Dirk De Ruysscher MD , Prof Hugo J W L Aerts PhD","doi":"10.1016/j.landig.2025.03.002","DOIUrl":"10.1016/j.landig.2025.03.002","url":null,"abstract":"<div><h3>Background</h3><div>As humans age at different rates, physical appearance can yield insights into biological age and physiological health more reliably than chronological age. In medicine, however, appearance is incorporated into medical judgements in a subjective and non-standardised way. In this study, we aimed to develop and validate FaceAge, a deep learning system to estimate biological age from easily obtainable and low-cost face photographs.</div></div><div><h3>Methods</h3><div>FaceAge was trained on data from 58 851 presumed healthy individuals aged 60 years or older: 56 304 individuals from the IMDb–Wiki dataset (training) and 2547 from the UTKFace dataset (initial validation). Clinical utility was evaluated on data from 6196 patients with cancer diagnoses from two institutions in the Netherlands and the USA: the MAASTRO, Harvard Thoracic, and Harvard Palliative cohorts FaceAge estimates in these cancer cohorts were compared with a non-cancerous reference cohort of 535 individuals. To assess the prognostic relevance of FaceAge, we performed Kaplan–Meier survival analysis and Cox modelling, adjusting for several clinical covariates. We also assessed the performance of FaceAge in patients with metastatic cancer receiving palliative treatment at the end of life by incorporating FaceAge into clinical prediction models. To evaluate whether FaceAge has the potential to be a biomarker for molecular ageing, we performed a gene-based analysis to assess its association with senescence genes.</div></div><div><h3>Findings</h3><div>FaceAge showed significant independent prognostic performance in various cancer types and stages. Looking older was correlated with worse overall survival (after adjusting for covariates per-decade hazard ratio [HR] 1·151, p=0·013 in a pan-cancer cohort of n=4906; 1·148, p=0·011 in a thoracic cohort of n=573; and 1·117, p=0·021 in a palliative cohort of n=717). We found that, on average, patients with cancer looked older than their chronological age (mean increase of 4·79 years with respect to non-cancerous reference cohort, p<0·0001). We found that FaceAge can improve physicians’ survival predictions in patients with incurable cancer receiving palliative treatments (from area under the curve 0·74 [95% CI 0·70–0·78] to 0·8 [0·76–0·83]; p<0·0001), highlighting the clinical use of the algorithm to support end-of-life decision making. FaceAge was also significantly associated with molecular mechanisms of senescence through gene analysis, whereas age was not.</div></div><div><h3>Interpretation</h3><div>Our results suggest that a deep learning model can estimate biological age from face photographs and thereby enhance survival prediction in patients with cancer. Further research, including validation in larger cohorts, is needed to verify these findings in patients with cancer and to establish whether the findings extend to patients with other diseases. Subject to further testing and validation, approaches such as","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100870"},"PeriodicalIF":23.8,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144035203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.1016/j.landig.2025.03.005
Shishir Rao DPhil , Yikuan Li DPhil , Mohammad Mamouei PhD , Gholamreza Salimi-Khorshidi DPhil , Malgorzata Wamil PhD , Milad Nazarzadeh DPhil , Christopher Yau DPhil , Gary S Collins PhD , Rod Jackson PhD , Andrew Vickers DPhil , Goodarz Danaei MD ScD , Kazem Rahimi DM FESC
Background
Although statistical models have been commonly used to identify patients at risk of cardiovascular disease for preventive therapy, these models tend to over-recommend therapy. Moreover, in populations with pre-existing diseases, the current approach is to indiscriminately treat all, as modelling in this context is currently inadequate. This study aimed to develop and validate the Transformer-based Risk assessment survival (TRisk) model, a novel deep learning model, for predicting 10-year risk of cardiovascular disease in both the primary prevention population and individuals with diabetes.
Methods
An open cohort of 3 million adults aged 25–84 years was identified using linked electronic health records from 291 general practices, for model development, and 98 general practices, for validation, across England from 1998 to 2015. Comparison against the QRISK3 score and a deep learning derivation of it was done. Additional analyses compared discriminatory performance in other age groups, by sex, and across categories of socioeconomic status.
Findings
TRisk showed superior discrimination (C index in the primary prevention population 0·910; 95% CI 0·906–0·913). TRisk’s performance was found to be less sensitive to population age range than the benchmark models and outperformed other models also in analyses stratified by age, sex, or socioeconomic status. All models were overall well calibrated. In decision curve analyses, TRisk showed a greater net benefit than benchmark models across the range of relevant thresholds. At the widely recommended 10% risk threshold and the higher 15% threshold, TRisk reduced both the total number of patients classified at high risk (by 20·6% and 34·6%, respectively) and the number of false negatives as compared with recommended strategies. TRisk similarly outperformed other models in patients with diabetes. Compared with the widely recommended treat-all policy approach for patients with diabetes, TRisk at a 10% risk threshold would lead to deselection of 24·3% of individuals, with a small fraction of false negatives (0·2% of the cohort).
Interpretation
TRisk enabled a more targeted selection of individuals at risk of cardiovascular disease in both the primary prevention population and cohorts with diabetes, compared with benchmark approaches. Incorporation of TRisk into routine care could potentially reduce the number of treatment-eligible patients by approximately one-third while preventing at least as many events as with currently adopted approaches.
Funding
None.
背景:虽然统计模型通常用于识别心血管疾病风险患者进行预防治疗,但这些模型倾向于过度推荐治疗。此外,在已有疾病的人群中,目前的做法是不分青红皂白地治疗所有人,因为在这方面的建模目前是不充分的。本研究旨在开发和验证基于transformer的风险评估生存(TRisk)模型,这是一种新的深度学习模型,用于预测初级预防人群和糖尿病患者10年心血管疾病风险。方法:从1998年至2015年,使用英格兰291个全科诊所的相关电子健康记录确定了300万名25-84岁成年人的开放队列,用于模型开发,98个全科诊所进行验证。与QRISK3分数进行了比较,并对其进行了深度学习推导。其他分析比较了其他年龄组、性别和不同社会经济地位类别的歧视性表现。结果:一级预防人群的风险指数(C指数)为0·910;95% ci 0.906 - 0.913)。研究发现,与基准模型相比,风险模型对人口年龄范围的敏感性较低,在按年龄、性别或社会经济地位分层的分析中,风险模型的表现也优于其他模型。所有模型总体上都得到了很好的校准。在决策曲线分析中,在相关阈值范围内,TRisk显示出比基准模型更大的净收益。在广泛推荐的10%风险阈值和更高的15%阈值下,与推荐的策略相比,TRisk降低了高风险患者的总数(分别减少20.6%和34.6%)和假阴性的数量。在糖尿病患者中,TRisk同样优于其他模型。与广泛推荐的针对糖尿病患者的全面治疗政策方法相比,风险阈值为10%的风险将导致24.3%的个体取消选择,并有一小部分假阴性(0.2%的队列)。解释:与基准方法相比,在初级预防人群和糖尿病人群中,风险使得更有针对性地选择有心血管疾病风险的个体。将风险纳入常规护理可能会使符合治疗条件的患者数量减少约三分之一,同时预防的事件至少与目前采用的方法一样多。资金:没有。
{"title":"Refined selection of individuals for preventive cardiovascular disease treatment with a transformer-based risk model","authors":"Shishir Rao DPhil , Yikuan Li DPhil , Mohammad Mamouei PhD , Gholamreza Salimi-Khorshidi DPhil , Malgorzata Wamil PhD , Milad Nazarzadeh DPhil , Christopher Yau DPhil , Gary S Collins PhD , Rod Jackson PhD , Andrew Vickers DPhil , Goodarz Danaei MD ScD , Kazem Rahimi DM FESC","doi":"10.1016/j.landig.2025.03.005","DOIUrl":"10.1016/j.landig.2025.03.005","url":null,"abstract":"<div><h3>Background</h3><div>Although statistical models have been commonly used to identify patients at risk of cardiovascular disease for preventive therapy, these models tend to over-recommend therapy. Moreover, in populations with pre-existing diseases, the current approach is to indiscriminately treat all, as modelling in this context is currently inadequate. This study aimed to develop and validate the Transformer-based Risk assessment survival (TRisk) model, a novel deep learning model, for predicting 10-year risk of cardiovascular disease in both the primary prevention population and individuals with diabetes.</div></div><div><h3>Methods</h3><div>An open cohort of 3 million adults aged 25–84 years was identified using linked electronic health records from 291 general practices, for model development, and 98 general practices, for validation, across England from 1998 to 2015. Comparison against the QRISK3 score and a deep learning derivation of it was done. Additional analyses compared discriminatory performance in other age groups, by sex, and across categories of socioeconomic status.</div></div><div><h3>Findings</h3><div>TRisk showed superior discrimination (C index in the primary prevention population 0·910; 95% CI 0·906–0·913). TRisk’s performance was found to be less sensitive to population age range than the benchmark models and outperformed other models also in analyses stratified by age, sex, or socioeconomic status. All models were overall well calibrated. In decision curve analyses, TRisk showed a greater net benefit than benchmark models across the range of relevant thresholds. At the widely recommended 10% risk threshold and the higher 15% threshold, TRisk reduced both the total number of patients classified at high risk (by 20·6% and 34·6%, respectively) and the number of false negatives as compared with recommended strategies. TRisk similarly outperformed other models in patients with diabetes. Compared with the widely recommended treat-all policy approach for patients with diabetes, TRisk at a 10% risk threshold would lead to deselection of 24·3% of individuals, with a small fraction of false negatives (0·2% of the cohort).</div></div><div><h3>Interpretation</h3><div>TRisk enabled a more targeted selection of individuals at risk of cardiovascular disease in both the primary prevention population and cohorts with diabetes, compared with benchmark approaches. Incorporation of TRisk into routine care could potentially reduce the number of treatment-eligible patients by approximately one-third while preventing at least as many events as with currently adopted approaches.</div></div><div><h3>Funding</h3><div>None.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100873"},"PeriodicalIF":23.8,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.1016/j.landig.2025.01.014
Prof Israel Júnior Borges do Nascimento MD ClinPath , Hebatullah Mohamed Abdulazeem MD MSc , Ishanka Weerasekara PhD , Prof Jodie Marquez PhD , Lenny T Vasanthan PhD , Genevieve Deeken MSc , Prof Rosemary Morgan PhD , Heang-Lee Tan MPH , Isabel Yordi Aguirre PhD , Lasse Østeengaard MSc , Indunil Kularathne BSc , Natasha Azzopardi-Muscat PhD , Prof Robin van Kessel PhD , Edson Zangiacomi Martinez PhD , Govin Permanand PhD , David Novillo-Ortiz PhD MLIS
We evaluated the effects of digital health technologies (DHTs) on women's health, empowerment, and gender equality, using the scoping review method. Following a search across five databases and grey literature, we analysed 80 studies published up to Aug 18, 2023. The thematic appraisal and quantitative analysis found that DHTs positively affect women's access to health-care services, self-care, and tailored self-monitoring enabling the acquisition of health-related interventions. Use of these technologies is beneficial across various medical fields, including gynaecology, endocrinology, and psychiatry. DHTs also improve women's empowerment and gender equality by facilitating skills acquisition, health education, and social interaction, while allowing cost-effective health services. Overall, DHTs contribute to better health outcomes for women and support the UN Sustainable Development Goals by improving access to health care and financial literacy.
{"title":"Transforming women's health, empowerment, and gender equality with digital health: evidence-based policy and practice","authors":"Prof Israel Júnior Borges do Nascimento MD ClinPath , Hebatullah Mohamed Abdulazeem MD MSc , Ishanka Weerasekara PhD , Prof Jodie Marquez PhD , Lenny T Vasanthan PhD , Genevieve Deeken MSc , Prof Rosemary Morgan PhD , Heang-Lee Tan MPH , Isabel Yordi Aguirre PhD , Lasse Østeengaard MSc , Indunil Kularathne BSc , Natasha Azzopardi-Muscat PhD , Prof Robin van Kessel PhD , Edson Zangiacomi Martinez PhD , Govin Permanand PhD , David Novillo-Ortiz PhD MLIS","doi":"10.1016/j.landig.2025.01.014","DOIUrl":"10.1016/j.landig.2025.01.014","url":null,"abstract":"<div><div>We evaluated the effects of digital health technologies (DHTs) on women's health, empowerment, and gender equality, using the scoping review method. Following a search across five databases and grey literature, we analysed 80 studies published up to Aug 18, 2023. The thematic appraisal and quantitative analysis found that DHTs positively affect women's access to health-care services, self-care, and tailored self-monitoring enabling the acquisition of health-related interventions. Use of these technologies is beneficial across various medical fields, including gynaecology, endocrinology, and psychiatry. DHTs also improve women's empowerment and gender equality by facilitating skills acquisition, health education, and social interaction, while allowing cost-effective health services. Overall, DHTs contribute to better health outcomes for women and support the UN Sustainable Development Goals by improving access to health care and financial literacy.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100858"},"PeriodicalIF":23.8,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144081422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.1016/j.landig.2025.03.001
Yuxuan Shi PhD , Zhen Li PhD , Li Wang PhD , Hong Wang PhD , Prof Xiaofeng Liu PhD , Dantong Gu MS , Xiao Chen MS , Xueli Liu PhD , Wentao Gong MS , Xiaowen Jiang MD , Wenquan Li MD , Yongdong Lin BS , Ke Liu MD , Deyan Luo MD , Tao Peng PhD , Xuemei Peng BS , Meimei Tong BS , Huizhen Zheng MD , Xuanchen Zhou MD , Jianrong Wu PhD , Prof Hongmeng Yu PhD
<div><h3>Background</h3><div>Nasopharyngeal carcinoma is highly curable when diagnosed early. However, the nasopharynx’s obscure anatomical position and the similarity of local imaging manifestations with those of other nasopharyngeal diseases often lead to diagnostic challenges, resulting in delayed or missed diagnoses. Our aim was to develop a deep learning algorithm to enhance an otolaryngologist’s diagnostic capabilities by differentiating between nasopharyngeal carcinoma, benign hyperplasia, and normal nasopharynx during endoscopic examination.</div></div><div><h3>Methods</h3><div>In this national, multicentre, model development and validation study, we developed a Swin Transformer-based Nasopharyngeal Diagnostic (STND) system to identify nasopharyngeal carcinoma, benign hyperplasia, and normal nasopharynx. STND was developed with 27 362 nasopharyngeal endoscopic images (10 693 biopsy-proven nasopharyngeal carcinoma, 7073 biopsy-proven benign hyperplasia, and 9596 normal nasopharynx) sourced from eight prominent nasopharyngeal carcinoma centres (stage 1), and externally validated with 1885 prospectively acquired images from ten comprehensive hospitals with a high incidence of nasopharyngeal carcinoma (stage 2). Furthermore, we did a fully crossed, multireader, multicase study involving four expert otolaryngologists from four regional leading nasopharyngeal carcinoma centres, and 24 general otolaryngologists from 24 geographically diverse primary hospitals. This study included 400 images to evaluate the diagnostic capabilities of the experts and general otolaryngologists both with and without the aid of the STND system in a real-world environment.</div></div><div><h3>Findings</h3><div>Endoscopic images used in the internal study (Jan 1, 2017, to Jan 31, 2023) were from 15 521 individuals (9033 [58·2%] men and 6488 [41·8%] women; mean age 47·6 years [IQR 38·4–56·8]). Images from 945 participants (538 [56·9%] men and 407 [43·1%] women; mean age 45·2 years [IQR 35·2– 55·2]) were used in the external validation. STND in the internal dataset discriminated normal nasopharynx images from abnormalities (benign hyperplasia and nasopharyngeal carcinoma) with an area under the curve (AUC) of 0·99 (95% CI 0·99–0·99) and malignant images (ie, nasopharyngeal carcinoma) from non-malignant images (ie, benign hyperplasia and normal nasopharynx) with an AUC of 0·99 (95% CI 0·98–0·99). In the external validation, the system had an AUC for the detection of nasopharyngeal carcinoma of 0·95 (95% CI 0·94–0·96), a sensitivity of 91·6% (95% CI 89·3–93·5), and a specificity of 86·1% (95% CI 84·1–87·9). In the multireader, multicase study, the artificial intelligence (AI)-assisted strategy enhanced otolaryngologists’ diagnostic accuracy by 7·9%, increasing from 83·4% (95% CI 80·1–86·7, without AI assistance) to 91·2% (95% CI 88·6–93·9, with AI assistance; p<0·0001) for primary care otolaryngologists. Reading time per image decreased with the aid of the AI model (mea
{"title":"Artificial intelligence-assisted detection of nasopharyngeal carcinoma on endoscopic images: a national, multicentre, model development and validation study","authors":"Yuxuan Shi PhD , Zhen Li PhD , Li Wang PhD , Hong Wang PhD , Prof Xiaofeng Liu PhD , Dantong Gu MS , Xiao Chen MS , Xueli Liu PhD , Wentao Gong MS , Xiaowen Jiang MD , Wenquan Li MD , Yongdong Lin BS , Ke Liu MD , Deyan Luo MD , Tao Peng PhD , Xuemei Peng BS , Meimei Tong BS , Huizhen Zheng MD , Xuanchen Zhou MD , Jianrong Wu PhD , Prof Hongmeng Yu PhD","doi":"10.1016/j.landig.2025.03.001","DOIUrl":"10.1016/j.landig.2025.03.001","url":null,"abstract":"<div><h3>Background</h3><div>Nasopharyngeal carcinoma is highly curable when diagnosed early. However, the nasopharynx’s obscure anatomical position and the similarity of local imaging manifestations with those of other nasopharyngeal diseases often lead to diagnostic challenges, resulting in delayed or missed diagnoses. Our aim was to develop a deep learning algorithm to enhance an otolaryngologist’s diagnostic capabilities by differentiating between nasopharyngeal carcinoma, benign hyperplasia, and normal nasopharynx during endoscopic examination.</div></div><div><h3>Methods</h3><div>In this national, multicentre, model development and validation study, we developed a Swin Transformer-based Nasopharyngeal Diagnostic (STND) system to identify nasopharyngeal carcinoma, benign hyperplasia, and normal nasopharynx. STND was developed with 27 362 nasopharyngeal endoscopic images (10 693 biopsy-proven nasopharyngeal carcinoma, 7073 biopsy-proven benign hyperplasia, and 9596 normal nasopharynx) sourced from eight prominent nasopharyngeal carcinoma centres (stage 1), and externally validated with 1885 prospectively acquired images from ten comprehensive hospitals with a high incidence of nasopharyngeal carcinoma (stage 2). Furthermore, we did a fully crossed, multireader, multicase study involving four expert otolaryngologists from four regional leading nasopharyngeal carcinoma centres, and 24 general otolaryngologists from 24 geographically diverse primary hospitals. This study included 400 images to evaluate the diagnostic capabilities of the experts and general otolaryngologists both with and without the aid of the STND system in a real-world environment.</div></div><div><h3>Findings</h3><div>Endoscopic images used in the internal study (Jan 1, 2017, to Jan 31, 2023) were from 15 521 individuals (9033 [58·2%] men and 6488 [41·8%] women; mean age 47·6 years [IQR 38·4–56·8]). Images from 945 participants (538 [56·9%] men and 407 [43·1%] women; mean age 45·2 years [IQR 35·2– 55·2]) were used in the external validation. STND in the internal dataset discriminated normal nasopharynx images from abnormalities (benign hyperplasia and nasopharyngeal carcinoma) with an area under the curve (AUC) of 0·99 (95% CI 0·99–0·99) and malignant images (ie, nasopharyngeal carcinoma) from non-malignant images (ie, benign hyperplasia and normal nasopharynx) with an AUC of 0·99 (95% CI 0·98–0·99). In the external validation, the system had an AUC for the detection of nasopharyngeal carcinoma of 0·95 (95% CI 0·94–0·96), a sensitivity of 91·6% (95% CI 89·3–93·5), and a specificity of 86·1% (95% CI 84·1–87·9). In the multireader, multicase study, the artificial intelligence (AI)-assisted strategy enhanced otolaryngologists’ diagnostic accuracy by 7·9%, increasing from 83·4% (95% CI 80·1–86·7, without AI assistance) to 91·2% (95% CI 88·6–93·9, with AI assistance; p<0·0001) for primary care otolaryngologists. Reading time per image decreased with the aid of the AI model (mea","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100869"},"PeriodicalIF":23.8,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144340497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}