AI-powered topic modeling: comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women.

IF 2.8 4区 医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL Experimental Biology and Medicine Pub Date : 2025-02-28 eCollection Date: 2025-01-01 DOI:10.3389/ebm.2025.10389
Li Ma, Ru Chen, Weigong Ge, Paul Rogers, Beverly Lyn-Cook, Huixiao Hong, Weida Tong, Ningning Wu, Wen Zou
{"title":"AI-powered topic modeling: comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women.","authors":"Li Ma, Ru Chen, Weigong Ge, Paul Rogers, Beverly Lyn-Cook, Huixiao Hong, Weida Tong, Ningning Wu, Wen Zou","doi":"10.3389/ebm.2025.10389","DOIUrl":null,"url":null,"abstract":"<p><p>Topic modeling is a crucial technique in natural language processing (NLP), enabling the extraction of latent themes from large text corpora. Traditional topic modeling, such as Latent Dirichlet Allocation (LDA), faces limitations in capturing the semantic relationships in the text document although it has been widely applied in text mining. BERTopic, created in 2022, leveraged advances in deep learning and can capture the contextual relationships between words. In this work, we integrated Artificial Intelligence (AI) modules to LDA and BERTopic and provided a comprehensive comparison on the analysis of prescription opioid-related cardiovascular risks in women. Opioid use can increase the risk of cardiovascular problems in women such as arrhythmia, hypotension etc. 1,837 abstracts were retrieved and downloaded from PubMed as of April 2024 using three Medical Subject Headings (MeSH) words: \"opioid,\" \"cardiovascular,\" and \"women.\" Machine Learning of Language Toolkit (MALLET) was employed for the implementation of LDA. BioBERT was used for document embedding in BERTopic. Eighteen was selected as the optimal topic number for MALLET and 23 for BERTopic. ChatGPT-4-Turbo was integrated to interpret and compare the results. The short descriptions created by ChatGPT for each topic from LDA and BERTopic were highly correlated, and the performance accuracies of LDA and BERTopic were similar as determined by expert manual reviews of the abstracts grouped by their predominant topics. The results of the t-SNE (t-distributed Stochastic Neighbor Embedding) plots showed that the clusters created from BERTopic were more compact and well-separated, representing improved coherence and distinctiveness between the topics. Our findings indicated that AI algorithms could augment both traditional and contemporary topic modeling techniques. In addition, BERTopic has the connection port for ChatGPT-4-Turbo or other large language models in its algorithm for automatic interpretation, while with LDA interpretation must be manually, and needs special procedures for data pre-processing and stop words exclusion. Therefore, while LDA remains valuable for large-scale text analysis with resource constraints, AI-assisted BERTopic offers significant advantages in providing the enhanced interpretability and the improved semantic coherence for extracting valuable insights from textual data.</p>","PeriodicalId":12163,"journal":{"name":"Experimental Biology and Medicine","volume":"250 ","pages":"10389"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11906279/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental Biology and Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/ebm.2025.10389","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Topic modeling is a crucial technique in natural language processing (NLP), enabling the extraction of latent themes from large text corpora. Traditional topic modeling, such as Latent Dirichlet Allocation (LDA), faces limitations in capturing the semantic relationships in the text document although it has been widely applied in text mining. BERTopic, created in 2022, leveraged advances in deep learning and can capture the contextual relationships between words. In this work, we integrated Artificial Intelligence (AI) modules to LDA and BERTopic and provided a comprehensive comparison on the analysis of prescription opioid-related cardiovascular risks in women. Opioid use can increase the risk of cardiovascular problems in women such as arrhythmia, hypotension etc. 1,837 abstracts were retrieved and downloaded from PubMed as of April 2024 using three Medical Subject Headings (MeSH) words: "opioid," "cardiovascular," and "women." Machine Learning of Language Toolkit (MALLET) was employed for the implementation of LDA. BioBERT was used for document embedding in BERTopic. Eighteen was selected as the optimal topic number for MALLET and 23 for BERTopic. ChatGPT-4-Turbo was integrated to interpret and compare the results. The short descriptions created by ChatGPT for each topic from LDA and BERTopic were highly correlated, and the performance accuracies of LDA and BERTopic were similar as determined by expert manual reviews of the abstracts grouped by their predominant topics. The results of the t-SNE (t-distributed Stochastic Neighbor Embedding) plots showed that the clusters created from BERTopic were more compact and well-separated, representing improved coherence and distinctiveness between the topics. Our findings indicated that AI algorithms could augment both traditional and contemporary topic modeling techniques. In addition, BERTopic has the connection port for ChatGPT-4-Turbo or other large language models in its algorithm for automatic interpretation, while with LDA interpretation must be manually, and needs special procedures for data pre-processing and stop words exclusion. Therefore, while LDA remains valuable for large-scale text analysis with resource constraints, AI-assisted BERTopic offers significant advantages in providing the enhanced interpretability and the improved semantic coherence for extracting valuable insights from textual data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Experimental Biology and Medicine
Experimental Biology and Medicine 医学-医学:研究与实验
CiteScore
6.00
自引率
0.00%
发文量
157
审稿时长
1 months
期刊介绍: Experimental Biology and Medicine (EBM) is a global, peer-reviewed journal dedicated to the publication of multidisciplinary and interdisciplinary research in the biomedical sciences. EBM provides both research and review articles as well as meeting symposia and brief communications. Articles in EBM represent cutting edge research at the overlapping junctions of the biological, physical and engineering sciences that impact upon the health and welfare of the world''s population. Topics covered in EBM include: Anatomy/Pathology; Biochemistry and Molecular Biology; Bioimaging; Biomedical Engineering; Bionanoscience; Cell and Developmental Biology; Endocrinology and Nutrition; Environmental Health/Biomarkers/Precision Medicine; Genomics, Proteomics, and Bioinformatics; Immunology/Microbiology/Virology; Mechanisms of Aging; Neuroscience; Pharmacology and Toxicology; Physiology; Stem Cell Biology; Structural Biology; Systems Biology and Microphysiological Systems; and Translational Research.
期刊最新文献
AI-powered topic modeling: comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women. Coenzyme Q10 alleviates neurological deficits in a mouse model of intracerebral hemorrhage by reducing inflammation and apoptosis. The effects of major abdominal surgery on skeletal muscle mitochondrial respiration in relation to systemic redox status and cardiopulmonary fitness. Artificial intelligence in the diagnosis of uveal melanoma: advances and applications. Clinical characteristics and prognosis of ALL in children with CDKN2A/B gene deletion.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1