首页 > 最新文献

Methods of Information in Medicine最新文献

英文 中文
Clustering Breast Cancer Patients Based on Their Treatment Courses Using German Cancer Registry Data. 基于德国癌症登记数据的治疗过程聚类乳腺癌患者。
IF 1.8 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-10 DOI: 10.1055/a-2753-9631
Kolja Blohm, David Korfkamp, Florian Oesterling, Klaas Dählmann, Stefanie Schulze, Andreas Hein

Cancer registries collect extensive data on cancer patients, including diagnoses, treatments, and disease progression. These data offer valuable insights into cancer care, but it is challenging to analyze due to its complexity. Machine learning techniques, particularly clustering, enable the exploration of treatment data to uncover previously unknown patterns and relationships.This work aimed to develop a method for clustering breast cancer patients in cancer registries based on their treatment courses, to demonstrate the usefulness of clustering for gaining insights, improving data quality, and identifying clinically relevant patterns.We developed a similarity measure adapted from the Levenshtein distance to compare treatment courses, incorporating cancer diagnosis, surgeries, radiotherapies, and systemic therapies. The method was evaluated on 17,822 breast cancer cases diagnosed in 2019 from the cancer registry of North Rhine-Westphalia. Evaluation involved two stages: first, domain experts reviewed the clustering results to assess clinical relevance and interpretability. Second, an intercluster survival analysis was performed to identify clinically relevant differences between treatment patterns.Expert evaluations confirmed that clustering produced clinically plausible groups while also uncovering unexpected treatment patterns and potential data inconsistencies. The survival analysis showed differences in survival between clusters in both prognostically favorable and unfavorable subgroups. These results demonstrate that treatment-course clustering can identify patient groups with differing survival outcomes. However, registry data incompleteness and unmeasured confounders may influence these findings.Clustering treatment courses in cancer registries can reveal data quality issues, distinguish groups with different prognostic profiles, and support exploratory analyses of treatment patterns. While these findings are not intended to guide clinical decision making or evaluate treatment effectiveness, they can help generate hypotheses, identify unexpected care pathways, and support quality monitoring within cancer registries. Future work should focus on improving treatment data completeness, incorporating additional clinical variables, and refining clustering methods for broader applicability.

癌症登记处收集癌症患者的大量数据,包括诊断、治疗和疾病进展。这些数据为癌症治疗提供了有价值的见解,但由于其复杂性,分析起来很有挑战性。机器学习技术,特别是聚类技术,使探索治疗数据能够揭示以前未知的模式和关系。这项工作旨在开发一种基于治疗过程的乳腺癌患者癌症登记聚类方法,以证明聚类在获得见解,提高数据质量和识别临床相关模式方面的有用性。我们根据Levenshtein距离开发了一种相似性测量方法来比较治疗过程,包括癌症诊断、手术、放射治疗和全身治疗。该方法在北莱茵-威斯特伐利亚州癌症登记处2019年诊断出的17,822例乳腺癌病例中进行了评估。评估包括两个阶段:首先,领域专家审查聚类结果以评估临床相关性和可解释性。其次,进行集群间生存分析,以确定治疗模式之间的临床相关差异。专家评估证实,聚类产生了临床可信的组,同时也揭示了意想不到的治疗模式和潜在的数据不一致。生存分析显示预后有利亚组和不利亚组的生存差异。这些结果表明,疗程聚类可以识别具有不同生存结果的患者组。然而,注册数据不完整和未测量的混杂因素可能会影响这些发现。癌症登记处的聚类疗程可以揭示数据质量问题,区分具有不同预后概况的组,并支持治疗模式的探索性分析。虽然这些发现并不打算指导临床决策或评估治疗效果,但它们可以帮助产生假设,确定意外的护理途径,并支持癌症登记处的质量监测。未来的工作应侧重于提高治疗数据的完整性,纳入额外的临床变量,并改进聚类方法以获得更广泛的适用性。
{"title":"Clustering Breast Cancer Patients Based on Their Treatment Courses Using German Cancer Registry Data.","authors":"Kolja Blohm, David Korfkamp, Florian Oesterling, Klaas Dählmann, Stefanie Schulze, Andreas Hein","doi":"10.1055/a-2753-9631","DOIUrl":"https://doi.org/10.1055/a-2753-9631","url":null,"abstract":"<p><p>Cancer registries collect extensive data on cancer patients, including diagnoses, treatments, and disease progression. These data offer valuable insights into cancer care, but it is challenging to analyze due to its complexity. Machine learning techniques, particularly clustering, enable the exploration of treatment data to uncover previously unknown patterns and relationships.This work aimed to develop a method for clustering breast cancer patients in cancer registries based on their treatment courses, to demonstrate the usefulness of clustering for gaining insights, improving data quality, and identifying clinically relevant patterns.We developed a similarity measure adapted from the Levenshtein distance to compare treatment courses, incorporating cancer diagnosis, surgeries, radiotherapies, and systemic therapies. The method was evaluated on 17,822 breast cancer cases diagnosed in 2019 from the cancer registry of North Rhine-Westphalia. Evaluation involved two stages: first, domain experts reviewed the clustering results to assess clinical relevance and interpretability. Second, an intercluster survival analysis was performed to identify clinically relevant differences between treatment patterns.Expert evaluations confirmed that clustering produced clinically plausible groups while also uncovering unexpected treatment patterns and potential data inconsistencies. The survival analysis showed differences in survival between clusters in both prognostically favorable and unfavorable subgroups. These results demonstrate that treatment-course clustering can identify patient groups with differing survival outcomes. However, registry data incompleteness and unmeasured confounders may influence these findings.Clustering treatment courses in cancer registries can reveal data quality issues, distinguish groups with different prognostic profiles, and support exploratory analyses of treatment patterns. While these findings are not intended to guide clinical decision making or evaluate treatment effectiveness, they can help generate hypotheses, identify unexpected care pathways, and support quality monitoring within cancer registries. Future work should focus on improving treatment data completeness, incorporating additional clinical variables, and refining clustering methods for broader applicability.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Electronic Health Record Data and Up-to-Date Clinical Guidelines for High-Accuracy Clinical Diabetes Drug and Dosage Recommendation. 利用电子病历数据和最新的临床指南,高度准确和实用的临床糖尿病药物和剂量推荐系统。
IF 1.8 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-10 DOI: 10.1055/a-2707-2862
Jhing-Fa Wang, Ming-Jun Wei, Te-Ming Chiang, Tzu-Chun Yeh, Eric Cheng, Yuan-Teh Lee, Hong-I Chen

Existing drug recommendation systems lack integration with up-to-date clinical guidelines (the latest diabetes association standards of care and clinical guidelines that align with local government health care regulations) and lack high-precision drug interaction processing, explainability, and dynamic dosage adjustment. As a result, the recommendations generated by these systems are often inaccurate and do not align with local standards, greatly limiting their practicality.To develop a personalized drug recommendation and dosage optimization system named Diabetes Drug Recommendation System (DDRs), integrating Fast Healthcare Interoperability Resources-standardized electronic health record (EHR) data and up-to-date clinical guidelines for accurate and practical recommendations.We analyzed patients' EHR and International Classification of Diseases-tenth edition codes and integrated them with a drug interaction database to reduce adverse reactions. ADA guidelines and Taiwan's National Health Insurance (NHI) chronic disease guidelines served as data sources. Bio-GPT and Retrieval-Augmented Generation (RAG) were used to build the clinical guideline database and ensure recommendations align with the latest standards, with references provided for interpretability. Finally, optimal dosage was dynamically calculated by integrating patient disease progression trends from the EHR.DDRs achieved superior drug recommendation accuracy (Precision-Recall Area Under the Curve = 0.7951, Jaccard = 0.5632, F1-score = 0.7158), with a low drug-drug interaction rate (4.73%) and dosage error (±6.21%). Faithfulness of recommendations reached 0.850. Field validation with three physicians showed that the system reduced literature review time by 30 to 40% and delivered clinically actionable recommendations.DDRs is the first system to integrate EHR data, LLMs, RAG, ADA guidelines, and Taiwan NHI policies for diabetes treatment. The system demonstrates high accuracy, safety, and interpretability, offering practical decision support in routine clinical settings.

背景:现有的药物推荐系统缺乏与最新临床指南(最新的糖尿病协会护理标准和符合当地政府医疗法规的临床指南)的整合,缺乏高精度的药物相互作用处理、可解释性和动态剂量调整。因此,这些系统产生的建议往往是不准确的,与当地标准不一致,极大地限制了它们的实用性。目的:开发糖尿病药物推荐系统(DDRs),整合fhir标准化EHR数据和最新临床指南,提供准确实用的推荐。方法:分析患者的EHR和ICD-10代码,并将其与药物相互作用数据库进行整合,以减少不良反应。ADA指南和台湾NHI慢性病指南作为数据来源。使用Bio-GPT和RAG建立临床指南数据库,确保建议与最新标准一致,并提供可解释性参考文献。最后,通过整合来自EHR的患者疾病进展趋势,动态计算最佳剂量。结果:DDRs具有较好的推荐准确率(PRAUC = 0.7951, Jaccard = 0.5632, f1评分= 0.7158),DDI率(4.73%)和剂量误差(±6.21%)较低。推荐信度达到0.850。三名医生的现场验证表明,该系统将文献回顾时间缩短了30-40%,并提供了临床可操作的建议。结论:ddr是第一个整合EHR数据、LLMs、RAG、ADA指南和台湾NHI政策的糖尿病治疗系统。该系统具有较高的准确性、安全性和可解释性,可在常规临床环境中提供实用的决策支持。
{"title":"Leveraging Electronic Health Record Data and Up-to-Date Clinical Guidelines for High-Accuracy Clinical Diabetes Drug and Dosage Recommendation.","authors":"Jhing-Fa Wang, Ming-Jun Wei, Te-Ming Chiang, Tzu-Chun Yeh, Eric Cheng, Yuan-Teh Lee, Hong-I Chen","doi":"10.1055/a-2707-2862","DOIUrl":"10.1055/a-2707-2862","url":null,"abstract":"<p><p>Existing drug recommendation systems lack integration with up-to-date clinical guidelines (the latest diabetes association standards of care and clinical guidelines that align with local government health care regulations) and lack high-precision drug interaction processing, explainability, and dynamic dosage adjustment. As a result, the recommendations generated by these systems are often inaccurate and do not align with local standards, greatly limiting their practicality.To develop a personalized drug recommendation and dosage optimization system named Diabetes Drug Recommendation System (DDRs), integrating Fast Healthcare Interoperability Resources-standardized electronic health record (EHR) data and up-to-date clinical guidelines for accurate and practical recommendations.We analyzed patients' EHR and International Classification of Diseases-tenth edition codes and integrated them with a drug interaction database to reduce adverse reactions. ADA guidelines and Taiwan's National Health Insurance (NHI) chronic disease guidelines served as data sources. Bio-GPT and Retrieval-Augmented Generation (RAG) were used to build the clinical guideline database and ensure recommendations align with the latest standards, with references provided for interpretability. Finally, optimal dosage was dynamically calculated by integrating patient disease progression trends from the EHR.DDRs achieved superior drug recommendation accuracy (Precision-Recall Area Under the Curve = 0.7951, Jaccard = 0.5632, F1-score = 0.7158), with a low drug-drug interaction rate (4.73%) and dosage error (±6.21%). Faithfulness of recommendations reached 0.850. Field validation with three physicians showed that the system reduced literature review time by 30 to 40% and delivered clinically actionable recommendations.DDRs is the first system to integrate EHR data, LLMs, RAG, ADA guidelines, and Taiwan NHI policies for diabetes treatment. The system demonstrates high accuracy, safety, and interpretability, offering practical decision support in routine clinical settings.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145138728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Why Synthetic Discoveries are Not Only a Problem of Differentially Private Synthetic Data? 为什么合成发现不仅仅是不同私有合成数据的问题。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-04-15 DOI: 10.1055/a-2540-8284
Heidelinde Dehaene, Alexander Decruyenaere, Christiaan Polet, Johan Decruyenaere, Paloma Rabaey, Thomas Demeester, Stijn Vansteelandt
{"title":"Why Synthetic Discoveries are Not Only a Problem of Differentially Private Synthetic Data?","authors":"Heidelinde Dehaene, Alexander Decruyenaere, Christiaan Polet, Johan Decruyenaere, Paloma Rabaey, Thomas Demeester, Stijn Vansteelandt","doi":"10.1055/a-2540-8284","DOIUrl":"10.1055/a-2540-8284","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"203-204"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop. 英语和日语有限注释病例/放射学报告的跨语言自然语言处理:Real-MedNLP 研讨会的启示。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2024-08-29 DOI: 10.1055/a-2405-2489
Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki

Background:  Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability.

Objectives:  We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks.

Methods:  We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs.

Results:  Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy.

Conclusion:  Most systems adopt medical-domain-specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.

背景:文本数据集(语料库)对于自然语言处理(NLP)模型的应用至关重要。然而,在医疗领域创建语料库是一项挑战,主要是因为原始临床数据(如健康记录)存在隐私问题。因此,现有的临床语料库通常规模较小,数量稀少。医学 NLP(MedNLP)方法在数据可用性有限的情况下表现良好:我们介绍了 "真实-MedNLP "研讨会的成果,该研讨会使用了有限的并行医疗语料库。Real-MedNLP 有三个显著特点:(1)有限的注释文档:训练数据只包括一小部分(约 100 份)已注释的病例报告 (CR) 和放射报告 (RR)。(2) 双语平行:所构建的语料库在日语和英语中是平行的。(3) 实用任务:研讨会讨论了命名实体识别等基本任务和应用实践任务:我们提出了三项任务:对约 100 篇可用文档进行命名实体识别(NER)(任务 1);仅基于人类注释指南进行 NER(任务 2);以及临床应用(任务 3),包括针对 CR 的药物不良反应(ADE)检测和针对 RR 的相同病例识别(CI):九个团队参加了这项研究。在任务 1 中,最佳系统在 CR 和 RR 方面的 F1 分数分别为 0.65 和 0.89,而在任务 2 中的最高分则下降了 50-70%。在任务 3 中,ADE 报告的检测 F1 分数高达 0.64,CI 的二进制准确率高达 0.96:大多数系统都采用了针对特定医疗领域的预训练语言模型,并使用了数据增强方法。尽管在任务 1 和 2 中面临着语料库规模有限的挑战,但最近的方法还是很有前景的,因为部分匹配得分达到了约 0.8-0.9 F1 分数。任务 3 的应用表明,外部语言资源的可用性不同会影响每种语言的性能。
{"title":"Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.","authors":"Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki","doi":"10.1055/a-2405-2489","DOIUrl":"10.1055/a-2405-2489","url":null,"abstract":"<p><strong>Background: </strong> Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability.</p><p><strong>Objectives: </strong> We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks.</p><p><strong>Methods: </strong> We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs.</p><p><strong>Results: </strong> Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy.</p><p><strong>Conclusion: </strong> Most systems adopt medical-domain-specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"145-163"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12196824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning. 使用机器学习破译马来西亚临床记录中的缩写。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-01-22 DOI: 10.1055/a-2521-4372
Ismat Mohd Sulaiman, Awang Bulgiba, Sameem Abdul Kareem, Abdul Aziz Latip

Objective:  This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.

Methods:  A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.

Results:  The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).

Conclusion:  Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.

目的:这是马来西亚第一个检测和消除临床记录中缩写词歧义的机器学习模型。该模型被设计成与MyHarmony相结合,MyHarmony是一个自然语言处理系统,可以为医疗管理提取临床信息。该模型利用词嵌入来确保使用的可行性,不是实时的,而是在低资源设置的约束下进行二次分析。方法:利用29,895份电子病历摘要,建立基于Word2Vec模型的马来西亚临床嵌入。在缩写检测和缩写消歧两个任务上,将该嵌入与传统的基于规则的嵌入和FastText嵌入进行了比较。使用机器学习分类器来评估性能。结果:马来语临床词嵌入包含700万个词标记,24352个唯一词汇,100个维度。对于缩略语的检测,Decision Tree分类器增强了马来西亚临床嵌入,表现出最好的性能(f得分为0.9519)。在缩略词消歧方面,采用马来西亚临床嵌入的分类器对大多数缩略词的消歧效果最好(f值为0.9903)。结论:尽管我们的局部临床词嵌入具有较小的词汇量和维度,但其表现优于较大的非临床快速文本嵌入。用简单的机器学习算法嵌入词可以很好地解译缩略语。它还需要更少的计算资源,适合在马来西亚等资源匮乏的环境中实现。将该模型集成到MyHarmony将提高对临床术语的认识,从而改善监测马来西亚医疗保健服务和决策所产生的信息。
{"title":"Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.","authors":"Ismat Mohd Sulaiman, Awang Bulgiba, Sameem Abdul Kareem, Abdul Aziz Latip","doi":"10.1055/a-2521-4372","DOIUrl":"10.1055/a-2521-4372","url":null,"abstract":"<p><strong>Objective: </strong> This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.</p><p><strong>Methods: </strong> A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.</p><p><strong>Results: </strong> The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).</p><p><strong>Conclusion: </strong> Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"195-202"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12196825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCMSF: A Construction Framework of Traditional Chinese Medicine Syndrome Ancient Book Knowledge Graph. 中医证候古籍知识图谱构建框架。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-04-17 DOI: 10.1055/a-2590-6348
Ziling Zeng, Lin Tong, Bing Li, Wenjing Zong, Qikai Niu, Sihong Liu, Lei Zhang, Jialun Wang, Siqi Zhang, Siwei Tian, Jing'ai Wang, Wei Zhang, Huamin Zhang

Syndrome is a unique and crucial concept in traditional Chinese medicine (TCM). However, much of the syndrome knowledge lacks systematic organization and correlation, and current information technologies are unsuitable for TCM ancient texts.We aimed to develop a knowledge graph that presents this knowledge in a more orderly, structured, and semantically oriented manner, providing a foundation for computer-aided diagnosis and treatment.We developed a construction framework of TCM syndrome knowledge from ancient books, using a pretrained model and rules (TCMSF). We conducted fine-tuning training on Enhanced Representation through Knowledge Integration (ERNIE), Bidirectional Encoder Representation from Transformers pretrained language models, and chatGLM3-6b large language models for named entity recognition (NER) tasks. Furthermore, we employed the progressive entity relationship extraction method based on the dual pattern feature combination to extract and standardize entities and relationships between entities in these books.We selected Yin deficiency syndrome as a case study and constructed a model layer suitable for the expression of knowledge in these books. Compared with multiple NER methods, the combination of ERNIE and Conditional Random Fields performs the best. By utilizing this combination, we completed the entity extraction of Yin deficiency syndrome, achieving an average F1 value of 0.77. The relationship extraction method we proposed reduces the number of incorrectly connected relationships compared with fully connected pattern layers. We successfully constructed a knowledge graph of ancient books on Yin deficiency syndrome, including over 120,000 entities and over 1.18 million relationships.We developed TCMSF in line with the knowledge characteristics of ancient TCM books and improved the accuracy of knowledge graph construction.

背景:证候是中医中一个独特而重要的概念。然而,许多证候知识缺乏系统的组织和相关性,现有的信息技术也不适合中医古籍。目的:我们旨在开发一个知识图谱,以更有序、结构化和面向语义的方式呈现这些知识,为计算机辅助诊断和治疗提供基础。方法:采用预训练模型和规则(TCMSF)构建古籍中医证候知识构建框架。我们通过知识集成(ERINE)、变形器双向编码器表示(BERT)预训练语言模型和chatGLM3-6b大型语言模型(llm)对命名实体识别(NER)任务进行了微调训练。在此基础上,我们采用基于双模式特征组合的渐进式实体关系提取方法,对图书中的实体及实体间关系进行提取和规范。结果:我们以阴虚证为个案,构建了适合这些书中知识表达的模型层。与多种NER方法相比,ERNIE与条件随机场(Conditional Random field, CRF)相结合的效果最好。利用这一组合,我们完成了阴虚证的实体提取,平均F1值为0.77。与完全连接的模式层相比,我们提出的关系提取方法减少了错误连接关系的数量。我们成功构建了一个包含12万多个实体、118万多个关系的阴虚证古籍知识图谱。结论:开发的中医古籍知识图谱符合中医古籍知识特征,提高了知识图谱构建的准确性。
{"title":"TCMSF: A Construction Framework of Traditional Chinese Medicine Syndrome Ancient Book Knowledge Graph.","authors":"Ziling Zeng, Lin Tong, Bing Li, Wenjing Zong, Qikai Niu, Sihong Liu, Lei Zhang, Jialun Wang, Siqi Zhang, Siwei Tian, Jing'ai Wang, Wei Zhang, Huamin Zhang","doi":"10.1055/a-2590-6348","DOIUrl":"10.1055/a-2590-6348","url":null,"abstract":"<p><p>Syndrome is a unique and crucial concept in traditional Chinese medicine (TCM). However, much of the syndrome knowledge lacks systematic organization and correlation, and current information technologies are unsuitable for TCM ancient texts.We aimed to develop a knowledge graph that presents this knowledge in a more orderly, structured, and semantically oriented manner, providing a foundation for computer-aided diagnosis and treatment.We developed a construction framework of TCM syndrome knowledge from ancient books, using a pretrained model and rules (TCMSF). We conducted fine-tuning training on Enhanced Representation through Knowledge Integration (ERNIE), Bidirectional Encoder Representation from Transformers pretrained language models, and chatGLM3-6b large language models for named entity recognition (NER) tasks. Furthermore, we employed the progressive entity relationship extraction method based on the dual pattern feature combination to extract and standardize entities and relationships between entities in these books.We selected Yin deficiency syndrome as a case study and constructed a model layer suitable for the expression of knowledge in these books. Compared with multiple NER methods, the combination of ERNIE and Conditional Random Fields performs the best. By utilizing this combination, we completed the entity extraction of Yin deficiency syndrome, achieving an average F1 value of 0.77. The relationship extraction method we proposed reduces the number of incorrectly connected relationships compared with fully connected pattern layers. We successfully constructed a knowledge graph of ancient books on Yin deficiency syndrome, including over 120,000 entities and over 1.18 million relationships.We developed TCMSF in line with the knowledge characteristics of ancient TCM books and improved the accuracy of knowledge graph construction.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"183-194"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12196822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data. 面向中医数据语义整合的综合症状表型本体。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-05-06 DOI: 10.1055/a-2576-1847
Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Meng Ren, Hong Gao, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Chu Liao, Junqiu Ye, Qi Hao, Xinyan Wang, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

Symptom phenotypes are crucial for diagnosing and treating various disease conditions. However, the diversity of symptom terminologies poses a significant challenge to analyzing and sharing of symptom-related medical data, particularly in the field of traditional Chinese medicine (TCM). This study aims to construct an Integrated Symptom Phenotype Ontology (ISPO) to support data mining of Chinese electronic medical records (EMRs) and real-world studies in the TCM field.We manually annotated and extracted symptom terms from 21 classical TCM textbooks and 78,696 inpatient EMRs, and integrated them with five publicly available symptom-related biomedical vocabularies. Through a human-machine collaborative approach for terminology editing and ontology development, including term screening, semantic mapping, and concept classification, we constructed a high-quality symptom ontology that integrates both TCM and Western medical terminology.ISPO provides 3,147 concepts, 23,475 terms, and 23,363 hierarchical relationships. Compared with international symptom-related ontologies such as the Symptom Ontology, ISPO offers significant improvements in the number of terms and synonymous relationships. Furthermore, evaluation across three independent curated clinical datasets demonstrated that ISPO achieved over 90% coverage of symptom terms, highlighting its strong clinical usability and completeness.ISPO represents the first clinical ontology globally dedicated to the systematic representation of symptoms. It integrates symptom terminologies from historical and contemporary sources, encompassing both TCM and Western medicine, thereby enhancing semantic interoperability across heterogeneous medical data sources and clinical decision support systems in TCM.

症状表型对于诊断和治疗各种疾病状况至关重要。然而,症状术语的多样性给分析和共享与症状相关的医疗数据带来了重大挑战,特别是在中医领域。本研究旨在构建一个综合症状表型本体(ISPO),以支持中国电子病历(EMRs)的数据挖掘和中医领域的现实研究。我们对21本经典中医教科书和78,696份住院病历中的症状术语进行人工标注和提取,并将其与5个公开的症状相关生物医学词汇进行整合。通过人机协作的方式进行术语编辑和本体开发,包括术语筛选、语义映射和概念分类,我们构建了一个高质量的中医和西医术语的症状本体。ISPO提供了3,147个概念、23,475个术语和23,363个层次关系。与国际症状相关的本体(如症状本体)相比,ISPO在术语数量和同义关系方面有了显著的改进。此外,对三个独立的临床数据集的评估表明,ISPO实现了超过90%的症状术语覆盖率,突出了其强大的临床可用性和完整性。ISPO代表了全球第一个致力于系统表征症状的临床本体论。它整合了来自历史和现代的症状术语,包括中医和西医,从而增强了中医异构医疗数据源和临床决策支持系统的语义互操作性。
{"title":"ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data.","authors":"Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Meng Ren, Hong Gao, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Chu Liao, Junqiu Ye, Qi Hao, Xinyan Wang, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou","doi":"10.1055/a-2576-1847","DOIUrl":"10.1055/a-2576-1847","url":null,"abstract":"<p><p>Symptom phenotypes are crucial for diagnosing and treating various disease conditions. However, the diversity of symptom terminologies poses a significant challenge to analyzing and sharing of symptom-related medical data, particularly in the field of traditional Chinese medicine (TCM). This study aims to construct an Integrated Symptom Phenotype Ontology (ISPO) to support data mining of Chinese electronic medical records (EMRs) and real-world studies in the TCM field.We manually annotated and extracted symptom terms from 21 classical TCM textbooks and 78,696 inpatient EMRs, and integrated them with five publicly available symptom-related biomedical vocabularies. Through a human-machine collaborative approach for terminology editing and ontology development, including term screening, semantic mapping, and concept classification, we constructed a high-quality symptom ontology that integrates both TCM and Western medical terminology.ISPO provides 3,147 concepts, 23,475 terms, and 23,363 hierarchical relationships. Compared with international symptom-related ontologies such as the Symptom Ontology, ISPO offers significant improvements in the number of terms and synonymous relationships. Furthermore, evaluation across three independent curated clinical datasets demonstrated that ISPO achieved over 90% coverage of symptom terms, highlighting its strong clinical usability and completeness.ISPO represents the first clinical ontology globally dedicated to the systematic representation of symptoms. It integrates symptom terminologies from historical and contemporary sources, encompassing both TCM and Western medicine, thereby enhancing semantic interoperability across heterogeneous medical data sources and clinical decision support systems in TCM.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"164-175"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144022609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response to Letter by Dehaene et al. on Synthetic Discovery is not only a Problem of Differentially Private Synthetic Data. 对Dehaene等人关于合成发现的评论的回应不仅仅是一个差异私有合成数据的问题。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-04-15 DOI: 10.1055/a-2540-8346
Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala
{"title":"Response to Letter by Dehaene et al. on Synthetic Discovery is not only a Problem of Differentially Private Synthetic Data.","authors":"Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala","doi":"10.1055/a-2540-8346","DOIUrl":"10.1055/a-2540-8346","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"205-206"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms. 从非结构化血液病报告中自动提取信息以支持骨髓增生性肿瘤的反应评估。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 Epub Date: 2025-04-17 DOI: 10.1055/a-2590-6456
Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion

Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.Compared with a reference standard (n = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.

评估骨髓增生性肿瘤患者的治疗反应是困难的,因为数据成分存在于非结构化的骨髓病理学(血液病理学)报告中,这些报告需要专门的手工注释和解释。虽然自然语言处理(NLP)已经成功地用于实体肿瘤报告的特征提取,但其在血液病理学中的应用尚不清楚。实现了一个名为Leo的开源NLP框架,用于解析文档片段并提取用于评估骨髓增殖性肿瘤反应的概念短语。参考标准是通过手工检查血液病记录生成的。与参考标准(n = 300份报告)相比,我们的NLP方法提取吸出性成髓细胞(F1 = 98%)和活检网状蛋白纤维化(F1 = 93%)等特征的准确性较高。然而,其他值,如来自活检(F1 = 6%)和流式细胞术(F1 = 8%)的成髓细胞,受到报告惯例的稀疏性代表的影响。提取临床重要性最高的4个特征,F1评分超过90%。手动注释300个报告需要30小时的工作时间,而自动NLP需要3.5小时的运行时间来处理34,301个报告。据我们所知,这是第一个将自然语言处理应用于血液病理学临床特征提取的研究。该方法可以为其他机构的工作提供信息,代码可在https://github.com/wcmc-research-informatics/BmrExtractor上获得。
{"title":"Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms.","authors":"Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion","doi":"10.1055/a-2590-6456","DOIUrl":"10.1055/a-2590-6456","url":null,"abstract":"<p><p>Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.Compared with a reference standard (<i>n</i> = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"176-182"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing Advanced Machine Learning Techniques for Microscopic Vessel Segmentation in Pulmonary Fibrosis Using Novel Hierarchical Phase-Contrast Tomography Images. 利用先进的机器学习技术在肺纤维化中使用新型分层相衬断层扫描(HiP-CT)图像进行显微血管分割。
IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-01 Epub Date: 2025-02-18 DOI: 10.1055/a-2540-8166
Pardeep Vasudev, Mehran Azimbagirad, Shahab Aslani, Moucheng Xu, Yufei Wang, Robert Chapman, Hannah Coleman, Christopher Werlein, Claire Walsh, Peter Lee, Paul Tafforeau, Joseph Jacob

Background:  Fibrotic lung disease is a progressive illness that causes scarring and ultimately respiratory failure, with irreversible damage by the time it is diagnosed on computed tomography imaging. Recent research postulates the role of the lung vasculature on the pathogenesis of the disease. With the recent development of high-resolution hierarchical phase-contrast tomography (HiP-CT), we have the potential to understand and detect changes in the lungs long before conventional imaging. However, to gain quantitative insight into vascular changes you first need to be able to segment the vessels before further downstream analysis can be conducted. Aside from this, HiP-CT generates large-volume, high-resolution data which is time-consuming and expensive to label.

Objectives:  This project aims to qualitatively assess the latest machine learning methods for vessel segmentation in HiP-CT data to enable label propagation as the first step for imaging biomarker discovery, with the goal to identify early-stage interstitial lung disease amenable to treatment, before fibrosis begins.

Methods:  Semisupervised learning (SSL) has become a growing method to tackle sparsely labeled datasets due to its leveraging of unlabeled data. In this study, we will compare two SSL methods; Seg PL, based on pseudo-labeling, and MisMatch, using consistency regularization against state-of-the-art supervised learning method, nnU-Net, on vessel segmentation in sparsely labeled lung HiP-CT data.

Results:  On initial experimentation, both MisMatch and SegPL showed promising performance on qualitative review. In comparison with supervised learning, both MisMatch and SegPL showed better out-of-distribution performance within the same sample (different vessel morphology and texture vessels), though supervised learning provided more consistent segmentations for well-represented labels in the limited annotations.

Conclusion:  Further quantitative research is required to better assess the generalizability of these findings, though they show promising first steps toward leveraging this novel data to tackle fibrotic lung disease.

背景:纤维化性肺病是一种进行性疾病,可导致瘢痕形成并最终导致呼吸衰竭,在计算机断层成像诊断时具有不可逆转的损害。最近的研究假设肺血管系统在疾病发病机制中的作用,并且随着高分辨率分层相衬断层扫描(HiP-CT)的最新发展,我们有可能在传统成像之前很久就了解和检测肺部的变化。然而,为了获得对血管变化的定量了解,首先需要能够在进行进一步的下游分析之前对血管进行分段。除此之外,HiP-CT产生的数据量大,分辨率高,耗时且标签成本高。目的:本项目旨在定性评估HiP-CT数据中用于血管分割的最新机器学习方法,使标签传播成为成像生物标志物发现的第一步,目标是在纤维化开始之前识别适合治疗的早期间质性肺疾病。方法:半监督学习已成为一种日益增长的方法来处理稀疏标记的数据集,由于其利用未标记的数据。在本研究中,我们将比较两种半监督学习方法;Seg PL,基于伪标记和错配,在nnU-Net中使用一致性正则化对抗最先进的监督学习方法,在稀疏标记的肺HiP-CT数据中进行血管分割。结果:在初始实验中,MisMatch和SegPL均表现出良好的定性评价性能。与监督学习相比,在同一样本(不同的血管形态和纹理血管)中,MisMatch和SegPL都表现出更好的非分布性能,尽管监督学习为有限注释中表现良好的标签提供了更一致的分割。结论:需要进一步的定量研究来更好地评估这些发现的普遍性,尽管它们在利用这些新数据治疗纤维化肺疾病方面迈出了有希望的第一步。
{"title":"Harnessing Advanced Machine Learning Techniques for Microscopic Vessel Segmentation in Pulmonary Fibrosis Using Novel Hierarchical Phase-Contrast Tomography Images.","authors":"Pardeep Vasudev, Mehran Azimbagirad, Shahab Aslani, Moucheng Xu, Yufei Wang, Robert Chapman, Hannah Coleman, Christopher Werlein, Claire Walsh, Peter Lee, Paul Tafforeau, Joseph Jacob","doi":"10.1055/a-2540-8166","DOIUrl":"10.1055/a-2540-8166","url":null,"abstract":"<p><strong>Background: </strong> Fibrotic lung disease is a progressive illness that causes scarring and ultimately respiratory failure, with irreversible damage by the time it is diagnosed on computed tomography imaging. Recent research postulates the role of the lung vasculature on the pathogenesis of the disease. With the recent development of high-resolution hierarchical phase-contrast tomography (HiP-CT), we have the potential to understand and detect changes in the lungs long before conventional imaging. However, to gain quantitative insight into vascular changes you first need to be able to segment the vessels before further downstream analysis can be conducted. Aside from this, HiP-CT generates large-volume, high-resolution data which is time-consuming and expensive to label.</p><p><strong>Objectives: </strong> This project aims to qualitatively assess the latest machine learning methods for vessel segmentation in HiP-CT data to enable label propagation as the first step for imaging biomarker discovery, with the goal to identify early-stage interstitial lung disease amenable to treatment, before fibrosis begins.</p><p><strong>Methods: </strong> Semisupervised learning (SSL) has become a growing method to tackle sparsely labeled datasets due to its leveraging of unlabeled data. In this study, we will compare two SSL methods; Seg PL, based on pseudo-labeling, and MisMatch, using consistency regularization against state-of-the-art supervised learning method, nnU-Net, on vessel segmentation in sparsely labeled lung HiP-CT data.</p><p><strong>Results: </strong> On initial experimentation, both MisMatch and SegPL showed promising performance on qualitative review. In comparison with supervised learning, both MisMatch and SegPL showed better out-of-distribution performance within the same sample (different vessel morphology and texture vessels), though supervised learning provided more consistent segmentations for well-represented labels in the limited annotations.</p><p><strong>Conclusion: </strong> Further quantitative research is required to better assess the generalizability of these findings, though they show promising first steps toward leveraging this novel data to tackle fibrotic lung disease.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"97-108"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Methods of Information in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1