机器学习估计低密度脂蛋白-胆固醇的数据集依赖性。

IF 2.1 4区 医学 Q3 MEDICAL LABORATORY TECHNOLOGY Annals of Clinical Biochemistry Pub Date : 2023-11-01 Epub Date: 2023-06-05 DOI:10.1177/00045632231180408
Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito
{"title":"机器学习估计低密度脂蛋白-胆固醇的数据集依赖性。","authors":"Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito","doi":"10.1177/00045632231180408","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.</p><p><strong>Methods: </strong>Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (<i>N</i> = 2664), clinical patients at Gifu University Hospital (<i>N</i> = 7409), and clinical patients at Fujita Health University Hospital (<i>N</i> = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.</p><p><strong>Results: </strong>The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.</p><p><strong>Conclusion: </strong>Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.</p>","PeriodicalId":8005,"journal":{"name":"Annals of Clinical Biochemistry","volume":" ","pages":"396-405"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning.\",\"authors\":\"Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito\",\"doi\":\"10.1177/00045632231180408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.</p><p><strong>Methods: </strong>Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (<i>N</i> = 2664), clinical patients at Gifu University Hospital (<i>N</i> = 7409), and clinical patients at Fujita Health University Hospital (<i>N</i> = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.</p><p><strong>Results: </strong>The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.</p><p><strong>Conclusion: </strong>Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.</p>\",\"PeriodicalId\":8005,\"journal\":{\"name\":\"Annals of Clinical Biochemistry\",\"volume\":\" \",\"pages\":\"396-405\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Clinical Biochemistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/00045632231180408\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICAL LABORATORY TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Clinical Biochemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00045632231180408","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:我们评估了一种基于机器学习的低密度脂蛋白-胆固醇(LDL-C)估计方法的适用性以及训练数据集特征的影响。方法:从训练数据集中选择3个训练数据集:健康科学资源中心健康体检参与者(N = 2664)、岐阜大学医院临床患者(N = 7409)和藤田健康大学医院临床患者(N = 14842)。通过超参数调优和10倍交叉验证,构建了9个不同的机器学习模型。选择另外3711名藤田卫生大学医院临床患者的测试数据集作为测试集,用于与Friedewald公式和Martin方法比较和验证模型。结果:在健康检查数据集上训练的模型的决定系数产生的决定系数等于或低于Martin方法的决定系数。相反,在临床患者身上训练的几个模型的决定系数超过了Martin方法的决定系数。在临床患者数据集上训练的模型的差异均值和对直接方法的收敛率高于在健康检查参与者数据集上训练的模型。在后者数据集上训练的模型倾向于高估2019年ESC/EAS低密度脂蛋白胆固醇分类指南。结论:尽管机器学习模型为LDL-C估计提供了有价值的方法,但它们应该在具有匹配特征的数据集上进行训练。机器学习方法的多功能性是另一个重要的考虑因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning.

Objectives: We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.

Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.

Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.

Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of Clinical Biochemistry
Annals of Clinical Biochemistry Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
5.20
自引率
4.50%
发文量
61
期刊介绍: Annals of Clinical Biochemistry is the fully peer reviewed international journal of the Association for Clinical Biochemistry and Laboratory Medicine. Annals of Clinical Biochemistry accepts papers that contribute to knowledge in all fields of laboratory medicine, especially those pertaining to the understanding, diagnosis and treatment of human disease. It publishes papers on clinical biochemistry, clinical audit, metabolic medicine, immunology, genetics, biotechnology, haematology, microbiology, computing and management where they have both biochemical and clinical relevance. Papers describing evaluation or implementation of commercial reagent kits or the performance of new analysers require substantial original information. Unless of exceptional interest and novelty, studies dealing with the redox status in various diseases are not generally considered within the journal''s scope. Studies documenting the association of single nucleotide polymorphisms (SNPs) with particular phenotypes will not normally be considered, given the greater strength of genome wide association studies (GWAS). Research undertaken in non-human animals will not be considered for publication in the Annals. Annals of Clinical Biochemistry is also the official journal of NVKC (de Nederlandse Vereniging voor Klinische Chemie) and JSCC (Japan Society of Clinical Chemistry).
期刊最新文献
Exploratory Study on Reference Intervals of Calprotectin and Pentraxin 3. Coefficients of variation analyses of internal quality control status for blood lead in China from 2015 to 2023. The effects of controlled acute psychological stress on serum cortisol and plasma metanephrine concentrations in healthy subjects. Suggested guide to using lactate gap as a surrogate marker in the diagnosis of ethylene glycol overdose. Simultaneous quantification of serum symmetric dimethylarginine, asymmetric dimethylarginine and creatinine for use in a routine clinical laboratory.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1