首页 > 最新文献

BMC Medical Informatics and Decision Making最新文献

英文 中文
Deep learning-based multimodal fusion of the surface ECG and clinical features in prediction of atrial fibrillation recurrence following catheter ablation. 基于深度学习的表面心电图和临床特征多模态融合在导管消融术后心房颤动复发预测中的应用。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-08 DOI: 10.1186/s12911-024-02616-x
Yue Qiu, Hongcheng Guo, Shixin Wang, Shu Yang, Xiafeng Peng, Dongqin Xiayao, Renjie Chen, Jian Yang, Jiaheng Liu, Mingfang Li, Zhoujun Li, Hongwu Chen, Minglong Chen

Background: Despite improvement in treatment strategies for atrial fibrillation (AF), a significant proportion of patients still experience recurrence after ablation. This study aims to propose a novel algorithm based on Transformer using surface electrocardiogram (ECG) signals and clinical features can predict AF recurrence.

Methods: Between October 2018 to December 2021, patients who underwent index radiofrequency ablation for AF with at least one standard 10-second surface ECG during sinus rhythm were enrolled. An end-to-end deep learning framework based on Transformer and a fusion module was used to predict AF recurrence using ECG and clinical features. Model performance was evaluated using areas under the receiver operating characteristic curve (AUROC), sensitivity, specificity, accuracy and F1-score.

Results: A total of 920 patients (median age 61 [IQR 14] years, 66.3% male) were included. After a median follow-up of 24 months, 253 patients (27.5%) experienced AF recurrence. A single deep learning enabled ECG signals identified AF recurrence with an AUROC of 0.769, sensitivity of 75.5%, specificity of 61.1%, F1 score of 55.6% and overall accuracy of 65.2%. Combining ECG signals and clinical features increased the AUROC to 0.899, sensitivity to 81.1%, specificity to 81.7%, F1 score to 71.7%, and overall accuracy to 81.5%.

Conclusions: The Transformer algorithm demonstrated excellent performance in predicting AF recurrence. Integrating ECG and clinical features enhanced the models' performance and may help identify patients at low risk for AF recurrence after index ablation.

背景:尽管心房颤动(房颤)的治疗策略有所改进,但仍有相当一部分患者在消融术后复发。本研究旨在提出一种基于 Transformer 的新型算法,利用表面心电图(ECG)信号和临床特征预测房颤复发:2018年10月至2021年12月期间,入组了因房颤接受指数射频消融术的患者,这些患者在窦性心律期间至少有一次标准的10秒表面心电图。使用基于 Transformer 和融合模块的端到端深度学习框架,利用心电图和临床特征预测房颤复发。使用接收者操作特征曲线下面积(AUROC)、灵敏度、特异性、准确性和 F1 分数对模型性能进行评估:共纳入 920 名患者(中位年龄 61 [IQR 14] 岁,66.3% 为男性)。中位随访 24 个月后,253 名患者(27.5%)出现房颤复发。单个深度学习心电图信号识别房颤复发的 AUROC 为 0.769,灵敏度为 75.5%,特异性为 61.1%,F1 得分为 55.6%,总体准确率为 65.2%。结合心电信号和临床特征后,AUROC 增加到 0.899,灵敏度增加到 81.1%,特异性增加到 81.7%,F1 评分增加到 71.7%,总体准确率增加到 81.5%:Transformer算法在预测房颤复发方面表现出色。整合心电图和临床特征可提高模型的性能,有助于识别指数消融术后房颤复发风险较低的患者。
{"title":"Deep learning-based multimodal fusion of the surface ECG and clinical features in prediction of atrial fibrillation recurrence following catheter ablation.","authors":"Yue Qiu, Hongcheng Guo, Shixin Wang, Shu Yang, Xiafeng Peng, Dongqin Xiayao, Renjie Chen, Jian Yang, Jiaheng Liu, Mingfang Li, Zhoujun Li, Hongwu Chen, Minglong Chen","doi":"10.1186/s12911-024-02616-x","DOIUrl":"10.1186/s12911-024-02616-x","url":null,"abstract":"<p><strong>Background: </strong>Despite improvement in treatment strategies for atrial fibrillation (AF), a significant proportion of patients still experience recurrence after ablation. This study aims to propose a novel algorithm based on Transformer using surface electrocardiogram (ECG) signals and clinical features can predict AF recurrence.</p><p><strong>Methods: </strong>Between October 2018 to December 2021, patients who underwent index radiofrequency ablation for AF with at least one standard 10-second surface ECG during sinus rhythm were enrolled. An end-to-end deep learning framework based on Transformer and a fusion module was used to predict AF recurrence using ECG and clinical features. Model performance was evaluated using areas under the receiver operating characteristic curve (AUROC), sensitivity, specificity, accuracy and F1-score.</p><p><strong>Results: </strong>A total of 920 patients (median age 61 [IQR 14] years, 66.3% male) were included. After a median follow-up of 24 months, 253 patients (27.5%) experienced AF recurrence. A single deep learning enabled ECG signals identified AF recurrence with an AUROC of 0.769, sensitivity of 75.5%, specificity of 61.1%, F1 score of 55.6% and overall accuracy of 65.2%. Combining ECG signals and clinical features increased the AUROC to 0.899, sensitivity to 81.1%, specificity to 81.7%, F1 score to 71.7%, and overall accuracy to 81.5%.</p><p><strong>Conclusions: </strong>The Transformer algorithm demonstrated excellent performance in predicting AF recurrence. Integrating ECG and clinical features enhanced the models' performance and may help identify patients at low risk for AF recurrence after index ablation.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141905924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A risk prediction model based on machine learning algorithm for parastomal hernia after permanent colostomy. 基于机器学习算法的永久性结肠造口术后吻合口旁疝风险预测模型。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-08 DOI: 10.1186/s12911-024-02627-8
Tian Dai, Manzhen Bao, Miao Zhang, Zonggui Wang, JingJing Tang, Zeyan Liu

Objective: To develop a machine learning-based risk prediction model for postoperative parastomal hernia (PSH) in colorectal cancer patients undergoing permanent colostomy, assisting nurses in identifying high-risk groups and devising preventive care strategies.

Methods: A case-control study was conducted on 495 colorectal cancer patients who underwent permanent colostomy at the Second Affiliated Hospital of Anhui Medical University from June 2017 to June 2023, with a 1-year follow-up period. Patients were categorized into PSH and non-PSH groups based on PSH occurrence within 1-year post-operation. Data were split into training (70%) and testing (30%) sets. Variable selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression, and binary classification prediction models were established using Logistic Regression (LR), Support Vector Classification (SVC), K Nearest Neighbor (KNN), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Extreme Gradient Boosting (XgBoost). The binary classification label denoted 1 for PSH occurrence and 0 for no PSH occurrence. Parameters were optimized via 5-fold cross-validation. Model performance was evaluated using Area Under Curve (AUC), specificity, sensitivity, accuracy, positive predictive value, negative predictive value, and F1-score. Clinical utility was evaluated using decision curve analysis (DCA), model explanation was enhanced using shapley additive explanation (SHAP), and model visualization was achieved using a nomogram.

Results: The incidence of PSH within 1 year was 29.1% (144 patients). Among the models tested, the RF model demonstrated the highest discrimination capability with an AUC of 0.888 (95% CI: 0.881-0.935), along with superior specificity, accuracy, sensitivity, and F1 score. It also showed the highest clinical net benefit on the DCA curve. SHAP analysis identified the top 10 influential variables associated with PSH risk: body mass index (BMI), operation duration, history and status of chronic obstructive pulmonary disease (COPD), prealbumin, tumor node metastasis (TNM) staging, stoma site, thickness of rectus abdominis muscle (TRAM), C-reactive protein CRP, american society of anesthesiologists physical status classification (ASA), and stoma diameter. These insights from SHAP plots illustrated how these factors influence individual PSH outcomes. The nomogram was used for model visualization.

Conclusion: The Random Forest model demonstrated robust predictive performance and clinical relevance in forecasting colonic PSH. This model aids in early identification of high-risk patients and guides preventive care.

目的建立基于机器学习的结直肠癌永久性结肠造口术患者术后吻合口旁疝(PSH)风险预测模型,协助护士识别高危人群并制定预防性护理策略:对2017年6月至2023年6月在安徽医科大学第二附属医院接受永久性结肠造口术的495例结直肠癌患者进行病例对照研究,随访1年。根据术后1年内PSH的发生率将患者分为PSH组和非PSH组。数据分为训练集(70%)和测试集(30%)。使用最小绝对收缩和选择操作器(LASSO)回归进行变量选择,并使用逻辑回归(LR)、支持向量分类(SVC)、K 最近邻(KNN)、随机森林(RF)、轻梯度提升机(LGBM)和极端梯度提升(XgBoost)建立二元分类预测模型。二元分类标签表示发生 PSH 为 1,未发生 PSH 为 0。参数通过 5 倍交叉验证进行优化。使用曲线下面积(AUC)、特异性、灵敏度、准确性、阳性预测值、阴性预测值和 F1 分数评估模型性能。使用决策曲线分析(DCA)评估临床实用性,使用沙普利加法解释(SHAP)加强模型解释,使用提名图实现模型可视化:一年内 PSH 的发生率为 29.1%(144 名患者)。在所测试的模型中,RF 模型的分辨能力最高,AUC 为 0.888(95% CI:0.881-0.935),特异性、准确性、灵敏度和 F1 评分均优于其他模型。它还显示出 DCA 曲线上最高的临床净效益。SHAP 分析确定了与 PSH 风险相关的 10 大影响变量:体重指数 (BMI)、手术持续时间、慢性阻塞性肺病 (COPD) 病史和状态、前白蛋白、肿瘤结节转移 (TNM) 分期、造口部位、腹直肌厚度 (TRAM)、C 反应蛋白 CRP、美国麻醉医师协会身体状况分类 (ASA) 和造口直径。这些从 SHAP 图中得出的见解说明了这些因素如何影响个体 PSH 结果。提名图用于模型的可视化:随机森林模型在预测结肠 PSH 方面表现出强大的预测性能和临床相关性。该模型有助于早期识别高危患者并指导预防性护理。
{"title":"A risk prediction model based on machine learning algorithm for parastomal hernia after permanent colostomy.","authors":"Tian Dai, Manzhen Bao, Miao Zhang, Zonggui Wang, JingJing Tang, Zeyan Liu","doi":"10.1186/s12911-024-02627-8","DOIUrl":"10.1186/s12911-024-02627-8","url":null,"abstract":"<p><strong>Objective: </strong>To develop a machine learning-based risk prediction model for postoperative parastomal hernia (PSH) in colorectal cancer patients undergoing permanent colostomy, assisting nurses in identifying high-risk groups and devising preventive care strategies.</p><p><strong>Methods: </strong>A case-control study was conducted on 495 colorectal cancer patients who underwent permanent colostomy at the Second Affiliated Hospital of Anhui Medical University from June 2017 to June 2023, with a 1-year follow-up period. Patients were categorized into PSH and non-PSH groups based on PSH occurrence within 1-year post-operation. Data were split into training (70%) and testing (30%) sets. Variable selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression, and binary classification prediction models were established using Logistic Regression (LR), Support Vector Classification (SVC), K Nearest Neighbor (KNN), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Extreme Gradient Boosting (XgBoost). The binary classification label denoted 1 for PSH occurrence and 0 for no PSH occurrence. Parameters were optimized via 5-fold cross-validation. Model performance was evaluated using Area Under Curve (AUC), specificity, sensitivity, accuracy, positive predictive value, negative predictive value, and F1-score. Clinical utility was evaluated using decision curve analysis (DCA), model explanation was enhanced using shapley additive explanation (SHAP), and model visualization was achieved using a nomogram.</p><p><strong>Results: </strong>The incidence of PSH within 1 year was 29.1% (144 patients). Among the models tested, the RF model demonstrated the highest discrimination capability with an AUC of 0.888 (95% CI: 0.881-0.935), along with superior specificity, accuracy, sensitivity, and F1 score. It also showed the highest clinical net benefit on the DCA curve. SHAP analysis identified the top 10 influential variables associated with PSH risk: body mass index (BMI), operation duration, history and status of chronic obstructive pulmonary disease (COPD), prealbumin, tumor node metastasis (TNM) staging, stoma site, thickness of rectus abdominis muscle (TRAM), C-reactive protein CRP, american society of anesthesiologists physical status classification (ASA), and stoma diameter. These insights from SHAP plots illustrated how these factors influence individual PSH outcomes. The nomogram was used for model visualization.</p><p><strong>Conclusion: </strong>The Random Forest model demonstrated robust predictive performance and clinical relevance in forecasting colonic PSH. This model aids in early identification of high-risk patients and guides preventive care.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141906006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning ensemble approach with explainable AI for lung and colon cancer classification using advanced hyperparameter tuning. 利用先进的超参数调整,为肺癌和结肠癌分类提供具有可解释人工智能的深度学习集合方法。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-07 DOI: 10.1186/s12911-024-02628-7
K Vanitha, Mahesh T R, S Sathea Sree, Suresh Guluwadi

Lung and colon cancers are leading contributors to cancer-related fatalities globally, distinguished by unique histopathological traits discernible through medical imaging. Effective classification of these cancers is critical for accurate diagnosis and treatment. This study addresses critical challenges in the diagnostic imaging of lung and colon cancers, which are among the leading causes of cancer-related deaths worldwide. Recognizing the limitations of existing diagnostic methods, which often suffer from overfitting and poor generalizability, our research introduces a novel deep learning framework that synergistically combines the Xception and MobileNet architectures. This innovative ensemble model aims to enhance feature extraction, improve model robustness, and reduce overfitting.Our methodology involves training the hybrid model on a comprehensive dataset of histopathological images, followed by validation against a balanced test set. The results demonstrate an impressive classification accuracy of 99.44%, with perfect precision and recall in identifying certain cancerous and non-cancerous tissues, marking a significant improvement over traditional approach.The practical implications of these findings are profound. By integrating Gradient-weighted Class Activation Mapping (Grad-CAM), the model offers enhanced interpretability, allowing clinicians to visualize the diagnostic reasoning process. This transparency is vital for clinical acceptance and enables more personalized, accurate treatment planning. Our study not only pushes the boundaries of medical imaging technology but also sets the stage for future research aimed at expanding these techniques to other types of cancer diagnostics.

肺癌和结肠癌是造成全球癌症相关死亡的主要原因,它们具有独特的组织病理学特征,可通过医学成像进行鉴别。对这些癌症进行有效分类对于准确诊断和治疗至关重要。肺癌和结肠癌是导致全球癌症相关死亡的主要原因之一,本研究探讨了肺癌和结肠癌影像诊断中的关键挑战。现有的诊断方法往往存在过度拟合和普适性差的问题,我们的研究认识到了这些方法的局限性,因此引入了一种新型深度学习框架,将 Xception 和 MobileNet 架构协同结合在一起。我们的方法包括在组织病理学图像的综合数据集上训练混合模型,然后根据平衡测试集进行验证。结果表明,分类准确率高达 99.44%,在识别某些癌变和非癌变组织方面具有完美的精确度和召回率,与传统方法相比有了显著提高。通过整合梯度加权类激活图谱(Grad-CAM),该模型提供了更强的可解释性,使临床医生能够直观地看到诊断推理过程。这种透明度对临床接受度至关重要,并能实现更个性化、更准确的治疗规划。我们的研究不仅推动了医学成像技术的发展,还为未来将这些技术扩展到其他类型癌症诊断的研究奠定了基础。
{"title":"Deep learning ensemble approach with explainable AI for lung and colon cancer classification using advanced hyperparameter tuning.","authors":"K Vanitha, Mahesh T R, S Sathea Sree, Suresh Guluwadi","doi":"10.1186/s12911-024-02628-7","DOIUrl":"10.1186/s12911-024-02628-7","url":null,"abstract":"<p><p>Lung and colon cancers are leading contributors to cancer-related fatalities globally, distinguished by unique histopathological traits discernible through medical imaging. Effective classification of these cancers is critical for accurate diagnosis and treatment. This study addresses critical challenges in the diagnostic imaging of lung and colon cancers, which are among the leading causes of cancer-related deaths worldwide. Recognizing the limitations of existing diagnostic methods, which often suffer from overfitting and poor generalizability, our research introduces a novel deep learning framework that synergistically combines the Xception and MobileNet architectures. This innovative ensemble model aims to enhance feature extraction, improve model robustness, and reduce overfitting.Our methodology involves training the hybrid model on a comprehensive dataset of histopathological images, followed by validation against a balanced test set. The results demonstrate an impressive classification accuracy of 99.44%, with perfect precision and recall in identifying certain cancerous and non-cancerous tissues, marking a significant improvement over traditional approach.The practical implications of these findings are profound. By integrating Gradient-weighted Class Activation Mapping (Grad-CAM), the model offers enhanced interpretability, allowing clinicians to visualize the diagnostic reasoning process. This transparency is vital for clinical acceptance and enables more personalized, accurate treatment planning. Our study not only pushes the boundaries of medical imaging technology but also sets the stage for future research aimed at expanding these techniques to other types of cancer diagnostics.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the quality of Persian clinical text with a novel spelling correction system. 利用新型拼写校正系统提高波斯文临床文本的质量。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-05 DOI: 10.1186/s12911-024-02613-0
Seyed Mohammad Sadegh Dashti, Seyedeh Fatemeh Dashti

Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text.

Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates.

Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed.

Conclusions: Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.

背景:电子健康记录(EHR)中拼写的准确性是高效临床护理、研究和确保患者安全的关键因素。波斯语词汇丰富、特点复杂,给实词纠错带来了独特的挑战。本研究旨在开发一种创新方法,用于检测和纠正波斯语临床文本中的拼写错误:我们的策略采用了最先进的预训练模型,该模型专门针对波斯语临床领域的拼写纠正任务进行了细致的微调。该模型还辅以创新的正字法相似性匹配算法 PERTO,该算法利用字符的视觉相似性对候选更正进行排序:结果:对我们的方法进行的评估表明,该方法在检测和纠正波斯语临床文本中的单词错误方面具有稳健性和精确性。在非单词纠错方面,当使用 PERTO 算法时,我们的模型达到了 90.0% 的 F1 分数。在实词错误检测方面,我们的模型表现出了最高的性能,F1 分数达到了 90.6%。此外,在使用 PERTO 算法进行实词纠错时,该模型的 F1 分数也达到了最高的 91.5%:尽管存在一定的局限性,但我们的方法代表了波斯语临床文本拼写错误检测和纠正领域的一大进步。通过有效解决波斯语所带来的独特挑战,我们的方法为更准确、更高效的临床记录铺平了道路,有助于改善患者护理和安全性。未来的研究可以探索其在波斯语医疗领域其他方面的应用,从而增强其影响力和实用性。
{"title":"Improving the quality of Persian clinical text with a novel spelling correction system.","authors":"Seyed Mohammad Sadegh Dashti, Seyedeh Fatemeh Dashti","doi":"10.1186/s12911-024-02613-0","DOIUrl":"10.1186/s12911-024-02613-0","url":null,"abstract":"<p><strong>Background: </strong>The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text.</p><p><strong>Methods: </strong>Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates.</p><p><strong>Results: </strong>The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed.</p><p><strong>Conclusions: </strong>Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141892914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved data augmentation approach and its application in medical named entity recognition. 改进的数据增强方法及其在医学命名实体识别中的应用。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-05 DOI: 10.1186/s12911-024-02624-x
Hongyu Chen, Li Dan, Yonghe Lu, Minghong Chen, Jinxia Zhang

Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.

在医学命名实体识别(NER)中进行数据扩增至关重要,因为这一领域面临着独特的挑战。医学数据的特点是获取成本高、术语专业、分布不平衡以及训练资源有限。这些因素使得医疗 NER 实现高性能变得尤为困难。数据增强方法通过生成额外的训练样本来缓解这些问题,从而平衡数据分布、丰富训练数据集和提高模型泛化能力。本文提出了两种数据扩增方法--基于 Word2Vec 的上下文随机替换扩增法(CRR)和目标实体随机替换扩增法(TER),旨在解决医疗领域数据稀缺和不平衡的问题。这些方法与基于深度学习的中文 NER 模型相结合,可以在有限的资源条件下显著提高性能和识别准确率。实验结果表明,这两种增强方法都能有效提高医学命名实体的识别能力。具体来说,BERT-BiLSTM-CRF 模型的 F1 分数最高,达到 83.587%,比基线模型提高了 1.49%。这验证了数据增强在医学 NER 中的重要性和有效性。
{"title":"An improved data augmentation approach and its application in medical named entity recognition.","authors":"Hongyu Chen, Li Dan, Yonghe Lu, Minghong Chen, Jinxia Zhang","doi":"10.1186/s12911-024-02624-x","DOIUrl":"10.1186/s12911-024-02624-x","url":null,"abstract":"<p><p>Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11302003/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141892913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking treatment success: predicting atypical antipsychotic continuation in youth with mania. 开启治疗成功之门:预测非典型抗精神病药物在躁狂症青少年中的持续应用。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-02 DOI: 10.1186/s12911-024-02622-z
Xiangying Yang, Wenbo Huang, Li Liu, Lei Li, Song Qing, Na Huang, Jun Zeng, Kai Yang

Purpose: This study aimed to create and validate robust machine-learning-based prediction models for antipsychotic drug (risperidone) continuation in children and teenagers suffering from mania over one year and to discover potential variables for clinical treatment.

Method: The study population was collected from the national claims database in China. A total of 4,532 patients aged 4-18 who began risperidone therapy for mania between September 2013 and October 2019 were identified. The data were randomly divided into two datasets: training (80%) and testing (20%). Five regularly used machine learning methods were employed, in addition to the SuperLearner (SL) algorithm, to develop prediction models for the continuation of atypical antipsychotic therapy. The area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI) was utilized.

Results: In terms of discrimination and robustness in predicting risperidone treatment continuation, the generalized linear model (GLM) performed the best (AUC: 0.823, 95% CI: 0.792-0.854, intercept near 0, slope close to 1.0). The SL model (AUC: 0.823, 95% CI: 0.791-0.853, intercept near 0, slope close to 1.0) also exhibited significant performance. Furthermore, the present findings emphasize the significance of several unique clinical and socioeconomic variables, such as the frequency of emergency room visits for nonmental health disorders.

Conclusions: The GLM and SL models provided accurate predictions regarding risperidone treatment continuation in children and adolescents with episodes of mania and hypomania. Consequently, applying prediction models in atypical antipsychotic medicine may aid in evidence-based decision-making.

目的:本研究旨在创建并验证基于机器学习的抗精神病药物(利培酮)对患有躁狂症的儿童和青少年一年内持续用药的稳健预测模型,并发现临床治疗的潜在变量:研究对象来自中国的国家理赔数据库。方法:研究人群来自中国国家理赔数据库,共确定了4532名4-18岁患者,这些患者在2013年9月至2019年10月期间因躁狂症开始接受利培酮治疗。数据被随机分为两个数据集:训练集(80%)和测试集(20%)。除了超级学习器(SL)算法外,还采用了五种常用的机器学习方法来开发非典型抗精神病药物治疗持续性的预测模型。结果显示,在非典型抗精神病药物治疗的辨别力和稳健性方面,机器学习方法均优于其他方法:就预测利培酮治疗持续性的区分度和稳健性而言,广义线性模型(GLM)表现最佳(AUC:0.823,95% CI:0.792-0.854,截距接近0,斜率接近1.0)。SL 模型(AUC:0.823,95% CI:0.791-0.853,截距接近 0,斜率接近 1.0)也表现出显著的性能。此外,本研究结果还强调了几个独特的临床和社会经济变量的重要性,如非精神疾病的急诊就诊频率:结论:GLM和SL模型能准确预测躁狂和躁狂发作的儿童和青少年利培酮治疗的持续性。因此,在非典型抗精神病药物治疗中应用预测模型有助于循证决策。
{"title":"Unlocking treatment success: predicting atypical antipsychotic continuation in youth with mania.","authors":"Xiangying Yang, Wenbo Huang, Li Liu, Lei Li, Song Qing, Na Huang, Jun Zeng, Kai Yang","doi":"10.1186/s12911-024-02622-z","DOIUrl":"10.1186/s12911-024-02622-z","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to create and validate robust machine-learning-based prediction models for antipsychotic drug (risperidone) continuation in children and teenagers suffering from mania over one year and to discover potential variables for clinical treatment.</p><p><strong>Method: </strong>The study population was collected from the national claims database in China. A total of 4,532 patients aged 4-18 who began risperidone therapy for mania between September 2013 and October 2019 were identified. The data were randomly divided into two datasets: training (80%) and testing (20%). Five regularly used machine learning methods were employed, in addition to the SuperLearner (SL) algorithm, to develop prediction models for the continuation of atypical antipsychotic therapy. The area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI) was utilized.</p><p><strong>Results: </strong>In terms of discrimination and robustness in predicting risperidone treatment continuation, the generalized linear model (GLM) performed the best (AUC: 0.823, 95% CI: 0.792-0.854, intercept near 0, slope close to 1.0). The SL model (AUC: 0.823, 95% CI: 0.791-0.853, intercept near 0, slope close to 1.0) also exhibited significant performance. Furthermore, the present findings emphasize the significance of several unique clinical and socioeconomic variables, such as the frequency of emergency room visits for nonmental health disorders.</p><p><strong>Conclusions: </strong>The GLM and SL models provided accurate predictions regarding risperidone treatment continuation in children and adolescents with episodes of mania and hypomania. Consequently, applying prediction models in atypical antipsychotic medicine may aid in evidence-based decision-making.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11295322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141878427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An ontology-based tool for modeling and documenting events in neurosurgery. 基于本体论的神经外科事件建模和记录工具。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-07-31 DOI: 10.1186/s12911-024-02615-y
Patricia Romao, Stefanie Neuenschwander, Chantal Zbinden, Kathleen Seidel, Murat Sariyar

Background: Intraoperative neurophysiological monitoring (IOM) plays a pivotal role in enhancing patient safety during neurosurgical procedures. This vital technique involves the continuous measurement of evoked potentials to provide early warnings and ensure the preservation of critical neural structures. One of the primary challenges has been the effective documentation of IOM events with semantically enriched characterizations. This study aimed to address this challenge by developing an ontology-based tool.

Methods: We structured the development of the IOM Documentation Ontology (IOMDO) and the associated tool into three distinct phases. The initial phase focused on the ontology's creation, drawing from the OBO (Open Biological and Biomedical Ontology) principles. The subsequent phase involved agile software development, a flexible approach to encapsulate the diverse requirements and swiftly produce a prototype. The last phase entailed practical evaluation within real-world documentation settings. This crucial stage enabled us to gather firsthand insights, assessing the tool's functionality and efficacy. The observations made during this phase formed the basis for essential adjustments to ensure the tool's productive utilization.

Results: The core entities of the ontology revolve around central aspects of IOM, including measurements characterized by timestamp, type, values, and location. Concepts and terms of several ontologies were integrated into IOMDO, e.g., the Foundation Model of Anatomy (FMA), the Human Phenotype Ontology (HPO) and the ontology for surgical process models (OntoSPM) related to general surgical terms. The software tool developed for extending the ontology and the associated knowledge base was built with JavaFX for the user-friendly frontend and Apache Jena for the robust backend. The tool's evaluation involved test users who unanimously found the interface accessible and usable, even for those without extensive technical expertise.

Conclusions: Through the establishment of a structured and standardized framework for characterizing IOM events, our ontology-based tool holds the potential to enhance the quality of documentation, benefiting patient care by improving the foundation for informed decision-making. Furthermore, researchers can leverage the semantically enriched data to identify trends, patterns, and areas for surgical practice enhancement. To optimize documentation through ontology-based approaches, it's crucial to address potential modeling issues that are associated with the Ontology of Adverse Events.

背景:术中神经电生理监测(IOM)在加强神经外科手术过程中的患者安全方面发挥着关键作用。这项重要技术包括对诱发电位进行连续测量,以提供早期预警并确保关键神经结构得到保护。有效记录具有语义丰富特征的 IOM 事件一直是主要挑战之一。本研究旨在通过开发基于本体的工具来应对这一挑战:我们将IOM文档本体(IOMDO)和相关工具的开发分为三个不同的阶段。第一阶段的重点是本体的创建,借鉴了开放生物和生物医学本体(OBO)的原则。随后的阶段涉及敏捷软件开发,这是一种灵活的方法,可以封装各种需求并迅速制作出原型。最后一个阶段是在真实文献环境中进行实际评估。在这一关键阶段,我们收集了第一手资料,对工具的功能和功效进行了评估。这一阶段的观察结果是进行必要调整的基础,以确保工具的有效利用:本体论的核心实体围绕着 IOM 的核心方面,包括以时间戳、类型、值和位置为特征的测量。多个本体论的概念和术语被整合到了 IOMDO 中,例如解剖学基础模型(FMA)、人类表型本体论(HPO)以及与一般外科术语相关的手术过程模型本体论(OntoSPM)。为扩展本体和相关知识库而开发的软件工具采用 JavaFX 作为用户友好型前台,Apache Jena 作为强大的后台。对该工具的评估包括测试用户,他们一致认为该界面易于访问和使用,即使是那些没有丰富专业技术知识的人也不例外:结论:通过建立一个结构化和标准化的框架来描述 IOM 事件,我们基于本体论的工具有可能提高文档质量,并通过改善知情决策的基础来改善患者护理。此外,研究人员还可以利用语义丰富的数据来确定趋势、模式和手术实践改进领域。要通过基于本体的方法优化文档记录,解决与不良事件本体相关的潜在建模问题至关重要。
{"title":"An ontology-based tool for modeling and documenting events in neurosurgery.","authors":"Patricia Romao, Stefanie Neuenschwander, Chantal Zbinden, Kathleen Seidel, Murat Sariyar","doi":"10.1186/s12911-024-02615-y","DOIUrl":"10.1186/s12911-024-02615-y","url":null,"abstract":"<p><strong>Background: </strong>Intraoperative neurophysiological monitoring (IOM) plays a pivotal role in enhancing patient safety during neurosurgical procedures. This vital technique involves the continuous measurement of evoked potentials to provide early warnings and ensure the preservation of critical neural structures. One of the primary challenges has been the effective documentation of IOM events with semantically enriched characterizations. This study aimed to address this challenge by developing an ontology-based tool.</p><p><strong>Methods: </strong>We structured the development of the IOM Documentation Ontology (IOMDO) and the associated tool into three distinct phases. The initial phase focused on the ontology's creation, drawing from the OBO (Open Biological and Biomedical Ontology) principles. The subsequent phase involved agile software development, a flexible approach to encapsulate the diverse requirements and swiftly produce a prototype. The last phase entailed practical evaluation within real-world documentation settings. This crucial stage enabled us to gather firsthand insights, assessing the tool's functionality and efficacy. The observations made during this phase formed the basis for essential adjustments to ensure the tool's productive utilization.</p><p><strong>Results: </strong>The core entities of the ontology revolve around central aspects of IOM, including measurements characterized by timestamp, type, values, and location. Concepts and terms of several ontologies were integrated into IOMDO, e.g., the Foundation Model of Anatomy (FMA), the Human Phenotype Ontology (HPO) and the ontology for surgical process models (OntoSPM) related to general surgical terms. The software tool developed for extending the ontology and the associated knowledge base was built with JavaFX for the user-friendly frontend and Apache Jena for the robust backend. The tool's evaluation involved test users who unanimously found the interface accessible and usable, even for those without extensive technical expertise.</p><p><strong>Conclusions: </strong>Through the establishment of a structured and standardized framework for characterizing IOM events, our ontology-based tool holds the potential to enhance the quality of documentation, benefiting patient care by improving the foundation for informed decision-making. Furthermore, researchers can leverage the semantically enriched data to identify trends, patterns, and areas for surgical practice enhancement. To optimize documentation through ontology-based approaches, it's crucial to address potential modeling issues that are associated with the Ontology of Adverse Events.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting angiographic coronary artery disease using machine learning and high-frequency QRS. 利用机器学习和高频 QRS 预测血管造影冠状动脉疾病。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-07-31 DOI: 10.1186/s12911-024-02620-1
Jiajia Zhang, Heng Zhang, Ting Wei, Pinfang Kang, Bi Tang, Hongju Wang

Aim: Exercise stress ECG is a common diagnostic test for stable coronary artery disease, but its sensitivity and specificity need to be further improved. In this paper, we construct a machine learning model for the prediction of angiographic coronary artery disease by HFQRS analysis of cycling exercise ECG.

Methods and results: This study prospectively included 140 inpatients and 59 healthy volunteers undergoing cycling exercise ECG. The CHD group (N=104) and non-CHD group (N=95) were determined by coronary angiography gold standard. Automated HF QRS analysis was performed by the blinded method. The coronary group was predominantly male, with a higher prevalence of age, BMI, hypertension, and diabetes than the non-coronary group ( P < 0.001 ), higher lipid levels in the coronary group ( P < 0.005 ), significantly longer QRS duration during exercise testing ( P < 0.005 ), more positive leads ( P < 0.001 ), and a greater proportion of significant changes in HFQRS ( P < 0.001 ). Age, Gender, Hypertension, Diabetes, and HF QRS Conclusions were screened by correlation analysis and multifactorial retrospective analysis to construct the machine learning models of the XGBoost Classifier, Logistic Regression, LightGBM Classifier, RandomForest Classifier, Artificial Neural Network and Support Vector Machine, respectively.

Conclusion: Male, elderly, with hypertension, diabetes mellitus, and positive exercise stress test HFQRS conclusions suggested a high risk of CHD. The best performance of the Logistic Regression model was compared, and a column line graph for assessing the risk of CHD was further developed and validated.

目的:运动负荷心电图是诊断稳定型冠状动脉疾病的常用方法,但其敏感性和特异性有待进一步提高。本文通过对骑车运动心电图进行高频QRS分析,构建了一个预测血管造影冠状动脉疾病的机器学习模型:本研究前瞻性地纳入了 140 名住院患者和 59 名健康志愿者,他们都接受了骑车运动心电图检查。通过冠状动脉造影金标准确定了冠心病组(104 人)和非冠心病组(95 人)。高频 QRS 自动分析采用盲法进行。冠心病组以男性为主,年龄、体重指数、高血压和糖尿病患病率均高于非冠心病组(P 0.001),冠心病组血脂水平更高(P 0.005),运动测试时 QRS 持续时间明显更长(P 0.005),正导联更多 (P 0.001),HFQRS 发生显著变化的比例更高(P 0.001)。通过相关分析和多因素回顾分析筛选出年龄、性别、高血压、糖尿病和HF QRS结论,分别构建了XGBoost分类器、Logistic回归、LightGBM分类器、RandomForest分类器、人工神经网络和支持向量机等机器学习模型:男性、老年人、高血压、糖尿病和运动负荷试验 HFQRS 阳性结论均提示其罹患冠心病的风险较高。比较了逻辑回归模型的最佳性能,并进一步开发和验证了用于评估冠心病风险的柱状线图。
{"title":"Predicting angiographic coronary artery disease using machine learning and high-frequency QRS.","authors":"Jiajia Zhang, Heng Zhang, Ting Wei, Pinfang Kang, Bi Tang, Hongju Wang","doi":"10.1186/s12911-024-02620-1","DOIUrl":"10.1186/s12911-024-02620-1","url":null,"abstract":"<p><strong>Aim: </strong>Exercise stress ECG is a common diagnostic test for stable coronary artery disease, but its sensitivity and specificity need to be further improved. In this paper, we construct a machine learning model for the prediction of angiographic coronary artery disease by HFQRS analysis of cycling exercise ECG.</p><p><strong>Methods and results: </strong>This study prospectively included 140 inpatients and 59 healthy volunteers undergoing cycling exercise ECG. The CHD group (N=104) and non-CHD group (N=95) were determined by coronary angiography gold standard. Automated HF QRS analysis was performed by the blinded method. The coronary group was predominantly male, with a higher prevalence of age, BMI, hypertension, and diabetes than the non-coronary group ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ), higher lipid levels in the coronary group ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.005</mn></mrow> </math> ), significantly longer QRS duration during exercise testing ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.005</mn></mrow> </math> ), more positive leads ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ), and a greater proportion of significant changes in HFQRS ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ). Age, Gender, Hypertension, Diabetes, and HF QRS Conclusions were screened by correlation analysis and multifactorial retrospective analysis to construct the machine learning models of the XGBoost Classifier, Logistic Regression, LightGBM Classifier, RandomForest Classifier, Artificial Neural Network and Support Vector Machine, respectively.</p><p><strong>Conclusion: </strong>Male, elderly, with hypertension, diabetes mellitus, and positive exercise stress test HFQRS conclusions suggested a high risk of CHD. The best performance of the Logistic Regression model was compared, and a column line graph for assessing the risk of CHD was further developed and validated.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11292994/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer. 基于 RoBERTa 和单模块全局指针的中医实体和关系联合提取。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-07-31 DOI: 10.1186/s12911-024-02577-1
Dongmei Li, Yu Yang, Jinman Cui, Xianghao Meng, Jintao Qu, Zhuobin Jiang, Yufeng Zhao

Background: Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime.

Methods: To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent.

Results: In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models.

Conclusion: The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

背景:大多数中文医学实体和关系联合抽取任务涉及大量嵌套实体、重叠关系和其他具有挑战性的抽取问题。针对这些问题,一些传统方法将联合抽取任务分解为多个步骤或多个模块,导致在此过程中出现局部依赖:为了解决这一问题,我们提出了一种基于 RoBERTa 和单模块全局指针的中医实体和关系联合提取模型,即 RSGP,它将联合提取表述为一个全局指针链接问题。考虑到中文语言结构的独特性,我们在编码层引入了 RoBERTa-wwm 预训练语言模型,以获得更好的嵌入表示。然后,我们将输入句子表示为三阶张量,并对张量中的每个位置进行评分,为后续的三元组解码过程做好准备。最后,我们设计了一种新颖的单模块全局指针解码方法,以减少冗余信息的产生。具体来说,我们单独分析了单字符实体的解码过程,在一定程度上提高了 RSGP 的时间和空间性能:为了验证我们的模型在提取中医实体和关系方面的有效性,我们在公共数据集 CMeIE 上进行了实验。实验结果表明,与基线模型相比,RSGP 在联合提取中医实体和关系方面的表现明显更好,达到了最先进的效果:结论:所提出的 RSGP 能有效地从中医文本中提取实体和关系,帮助实现中医文本的结构化,从而为中医知识图谱的构建提供高质量的数据支持。
{"title":"Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer.","authors":"Dongmei Li, Yu Yang, Jinman Cui, Xianghao Meng, Jintao Qu, Zhuobin Jiang, Yufeng Zhao","doi":"10.1186/s12911-024-02577-1","DOIUrl":"10.1186/s12911-024-02577-1","url":null,"abstract":"<p><strong>Background: </strong>Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime.</p><p><strong>Methods: </strong>To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent.</p><p><strong>Results: </strong>In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models.</p><p><strong>Conclusion: </strong>The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning approach to determine the risk factors for fall in multiple sclerosis. 确定多发性硬化症患者跌倒风险因素的机器学习方法。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-07-30 DOI: 10.1186/s12911-024-02621-0
Su Özgür, Meryem Koçaslan Toran, İsmail Toygar, Gizem Yağmur Yalçın, Mefkure Eraksoy

Background: Falls in multiple sclerosis can result in numerous problems, including injuries and functional loss. Therefore, determining the factors contributing to falls in people with Multiple Sclerosis (PwMS) is crucial. This study aims to investigate the contributing factors to falls in multiple sclerosis using a machine learning approach.

Methods: This cross-sectional study was conducted with 253 PwMS admitted to the outpatient clinic of a university hospital between February and August 2023. A sociodemographic data collection form, Fall Efficacy Scale (FES-I), Berg Balance Scale (BBS), Fatigue Severity Scale (FSS), Expanded Disability Status Scale (EDSS), Multiple Sclerosis Impact Scale (MSIS-29), and Timed 25 Foot Walk Test (T25-FW) were used for data collection. Gradient-boosting algorithms were employed to predict the important variables for falls in PwMS. The XGBoost algorithm emerged as the best performed model in this study.

Results: Most of the participants (70.0%) were female, with a mean age of 40.44 ± 10.88 years. Among the participants, 40.7% reported a fall history in the last year. The area under the curve value of the model was 0.713. Risk factors of falls in PwMS included MSIS-29 (0.424), EDSS (0.406), marital status (0.297), education level (0.240), disease duration (0.185), age (0.130), family type (0.119), smoking (0.031), income level (0.031), and regular exercise habit (0.026).

Conclusions: In this study, smoking and regular exercise were the modifiable factors contributing to falls in PwMS. We recommend that clinicians facilitate the modification of these factors in PwMS. Age and disease duration were non-modifiable factors. These should be considered as risk increasing factors and used to identify PwMS at risk. Interventions aimed at reducing MSIS-29 and EDSS scores will help to prevent falls in PwMS. Education of individuals to increase knowledge and awareness is recommended. Financial support policies for those with low income will help to reduce the risk of falls.

背景:多发性硬化症患者跌倒会导致许多问题,包括受伤和功能丧失。因此,确定导致多发性硬化症患者跌倒的因素至关重要。本研究旨在利用机器学习方法调查多发性硬化症患者跌倒的诱因:这项横断面研究的对象是 2023 年 2 月至 8 月期间在一所大学医院门诊就诊的 253 名多发性硬化症患者。数据收集采用了社会人口学数据收集表、跌倒功效量表(FES-I)、伯格平衡量表(BBS)、疲劳严重程度量表(FSS)、残疾状况扩展量表(EDSS)、多发性硬化影响量表(MSIS-29)和25英尺定时步行测试(T25-FW)。采用梯度提升算法来预测导致 PwMS 跌倒的重要变量。结果显示,XGBoost 算法是本研究中表现最好的模型:大多数参与者(70.0%)为女性,平均年龄为(40.44 ± 10.88)岁。其中,40.7%的参与者在过去一年中有跌倒史。模型的曲线下面积值为 0.713。跌倒的风险因素包括:MSIS-29 (0.424)、EDSS (0.406)、婚姻状况 (0.297)、教育程度 (0.240)、病程 (0.185)、年龄 (0.130)、家庭类型 (0.119)、吸烟 (0.031)、收入水平 (0.031) 和经常锻炼的习惯 (0.026):在这项研究中,吸烟和经常锻炼是导致老年人跌倒的可改变因素。结论:在这项研究中,吸烟和经常锻炼是导致老年人跌倒的可改变因素,我们建议临床医生帮助老年人改变这些因素。年龄和病程是不可改变的因素。这些因素应被视为增加风险的因素,并用于识别有风险的 PwMS。旨在降低 MSIS-29 和 EDSS 分数的干预措施将有助于预防老年人跌倒。建议对个人进行教育,以增加知识和提高意识。针对低收入人群的经济支持政策将有助于降低跌倒风险。
{"title":"A machine learning approach to determine the risk factors for fall in multiple sclerosis.","authors":"Su Özgür, Meryem Koçaslan Toran, İsmail Toygar, Gizem Yağmur Yalçın, Mefkure Eraksoy","doi":"10.1186/s12911-024-02621-0","DOIUrl":"10.1186/s12911-024-02621-0","url":null,"abstract":"<p><strong>Background: </strong>Falls in multiple sclerosis can result in numerous problems, including injuries and functional loss. Therefore, determining the factors contributing to falls in people with Multiple Sclerosis (PwMS) is crucial. This study aims to investigate the contributing factors to falls in multiple sclerosis using a machine learning approach.</p><p><strong>Methods: </strong>This cross-sectional study was conducted with 253 PwMS admitted to the outpatient clinic of a university hospital between February and August 2023. A sociodemographic data collection form, Fall Efficacy Scale (FES-I), Berg Balance Scale (BBS), Fatigue Severity Scale (FSS), Expanded Disability Status Scale (EDSS), Multiple Sclerosis Impact Scale (MSIS-29), and Timed 25 Foot Walk Test (T25-FW) were used for data collection. Gradient-boosting algorithms were employed to predict the important variables for falls in PwMS. The XGBoost algorithm emerged as the best performed model in this study.</p><p><strong>Results: </strong>Most of the participants (70.0%) were female, with a mean age of 40.44 ± 10.88 years. Among the participants, 40.7% reported a fall history in the last year. The area under the curve value of the model was 0.713. Risk factors of falls in PwMS included MSIS-29 (0.424), EDSS (0.406), marital status (0.297), education level (0.240), disease duration (0.185), age (0.130), family type (0.119), smoking (0.031), income level (0.031), and regular exercise habit (0.026).</p><p><strong>Conclusions: </strong>In this study, smoking and regular exercise were the modifiable factors contributing to falls in PwMS. We recommend that clinicians facilitate the modification of these factors in PwMS. Age and disease duration were non-modifiable factors. These should be considered as risk increasing factors and used to identify PwMS at risk. Interventions aimed at reducing MSIS-29 and EDSS scores will help to prevent falls in PwMS. Education of individuals to increase knowledge and awareness is recommended. Financial support policies for those with low income will help to reduce the risk of falls.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289943/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141854896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Medical Informatics and Decision Making
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1