机器学习估计低密度脂蛋白-胆固醇的数据集依赖性。

IF 2.1 4区医学 Q3 MEDICAL LABORATORY TECHNOLOGY Annals of Clinical Biochemistry Pub Date : 2023-11-01 Epub Date: 2023-06-05 DOI:10.1177/00045632231180408

Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito

{"title":"机器学习估计低密度脂蛋白-胆固醇的数据集依赖性。","authors":"Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito","doi":"10.1177/00045632231180408","DOIUrl":null,"url":null,"abstract":"Objectives: We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.","PeriodicalId":8005,"journal":{"name":"Annals of Clinical Biochemistry","volume":" ","pages":"396-405"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning.\",\"authors\":\"Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito\",\"doi\":\"10.1177/00045632231180408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.\",\"PeriodicalId\":8005,\"journal\":{\"name\":\"Annals of Clinical Biochemistry\",\"volume\":\" \",\"pages\":\"396-405\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Clinical Biochemistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/00045632231180408\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICAL LABORATORY TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Clinical Biochemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00045632231180408","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的:我们评估了一种基于机器学习的低密度脂蛋白-胆固醇(LDL-C)估计方法的适用性以及训练数据集特征的影响。方法:从训练数据集中选择3个训练数据集:健康科学资源中心健康体检参与者(N = 2664)、岐阜大学医院临床患者(N = 7409)和藤田健康大学医院临床患者(N = 14842)。通过超参数调优和10倍交叉验证，构建了9个不同的机器学习模型。选择另外3711名藤田卫生大学医院临床患者的测试数据集作为测试集，用于与Friedewald公式和Martin方法比较和验证模型。结果:在健康检查数据集上训练的模型的决定系数产生的决定系数等于或低于Martin方法的决定系数。相反，在临床患者身上训练的几个模型的决定系数超过了Martin方法的决定系数。在临床患者数据集上训练的模型的差异均值和对直接方法的收敛率高于在健康检查参与者数据集上训练的模型。在后者数据集上训练的模型倾向于高估2019年ESC/EAS低密度脂蛋白胆固醇分类指南。结论:尽管机器学习模型为LDL-C估计提供了有价值的方法，但它们应该在具有匹配特征的数据集上进行训练。机器学习方法的多功能性是另一个重要的考虑因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning.

Objectives: We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.

Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.

Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.

Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Clinical Biochemistry Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry

CiteScore

5.20

自引率

4.50%

发文量

期刊介绍： Annals of Clinical Biochemistry is the fully peer reviewed international journal of the Association for Clinical Biochemistry and Laboratory Medicine. Annals of Clinical Biochemistry accepts papers that contribute to knowledge in all fields of laboratory medicine, especially those pertaining to the understanding, diagnosis and treatment of human disease. It publishes papers on clinical biochemistry, clinical audit, metabolic medicine, immunology, genetics, biotechnology, haematology, microbiology, computing and management where they have both biochemical and clinical relevance. Papers describing evaluation or implementation of commercial reagent kits or the performance of new analysers require substantial original information. Unless of exceptional interest and novelty, studies dealing with the redox status in various diseases are not generally considered within the journal''s scope. Studies documenting the association of single nucleotide polymorphisms (SNPs) with particular phenotypes will not normally be considered, given the greater strength of genome wide association studies (GWAS). Research undertaken in non-human animals will not be considered for publication in the Annals. Annals of Clinical Biochemistry is also the official journal of NVKC (de Nederlandse Vereniging voor Klinische Chemie) and JSCC (Japan Society of Clinical Chemistry).