Interpretable machine learning analysis to identify risk factors for diabetes using the anonymous living census data of Japan.

IF 3.1 Q2 MEDICAL INFORMATICS Health and Technology Pub Date : 2023-01-01 Epub Date: 2023-01-26 DOI:10.1007/s12553-023-00730-w
Pei Jiang, Hiroyuki Suzuki, Takashi Obi
{"title":"Interpretable machine learning analysis to identify risk factors for diabetes using the anonymous living census data of Japan.","authors":"Pei Jiang, Hiroyuki Suzuki, Takashi Obi","doi":"10.1007/s12553-023-00730-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Diabetes mellitus causes various problems in our life. With the big data boom in our society, some risk factors for Diabetes must still exist. To identify new risk factors for diabetes in the big data society and explore further efficient use of big data, the non-objective-oriented census data about the Japanese Citizen's Survey of Living Conditions were analyzed using interpretable machine learning methods.</p><p><strong>Methods: </strong>Seven interpretable machine learning methods were used to analysis Japan citizens' census data. Firstly, logistic analysis was used to analyze the risk factors of diabetes from 19 selected initial elements. Then, the linear analysis, linear discriminate analysis, Hayashi's quantification analysis method 2, random forest, XGBoost, and SHAP methods were used to re-check and find the different factor contributions. Finally, the relationship among the factors was analyzed to understand the relationship among factors.</p><p><strong>Results: </strong>Four new risk factors: the number of family members, insurance type, public pension type, and health awareness level, were found as risk factors for diabetes mellitus for the first time, while another 11 risk factors were reconfirmed in this analysis. Especially the insurance type factor and health awareness level factor make more contributions to diabetes than factors: hypertension, hyperlipidemia, and stress in some interpretable models. We also found that work years were identified as a risk factor for diabetes because it has a high coefficient with the risk factor of age.</p><p><strong>Conclusions: </strong>New risk factors for diabetes mellitus were identified based on Japan's non-objective-oriented anonymous census data using interpretable machine learning models. The newly identified risk factors inspire new possible policies for preventing diabetes. Moreover, our analysis certifies that big data can help us find helpful knowledge in today's prosperous society. Our study also paves the way for identifying more risk factors and promoting the efficiency of using big data.</p>","PeriodicalId":12941,"journal":{"name":"Health and Technology","volume":"13 1","pages":"119-131"},"PeriodicalIF":3.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876749/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12553-023-00730-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Diabetes mellitus causes various problems in our life. With the big data boom in our society, some risk factors for Diabetes must still exist. To identify new risk factors for diabetes in the big data society and explore further efficient use of big data, the non-objective-oriented census data about the Japanese Citizen's Survey of Living Conditions were analyzed using interpretable machine learning methods.

Methods: Seven interpretable machine learning methods were used to analysis Japan citizens' census data. Firstly, logistic analysis was used to analyze the risk factors of diabetes from 19 selected initial elements. Then, the linear analysis, linear discriminate analysis, Hayashi's quantification analysis method 2, random forest, XGBoost, and SHAP methods were used to re-check and find the different factor contributions. Finally, the relationship among the factors was analyzed to understand the relationship among factors.

Results: Four new risk factors: the number of family members, insurance type, public pension type, and health awareness level, were found as risk factors for diabetes mellitus for the first time, while another 11 risk factors were reconfirmed in this analysis. Especially the insurance type factor and health awareness level factor make more contributions to diabetes than factors: hypertension, hyperlipidemia, and stress in some interpretable models. We also found that work years were identified as a risk factor for diabetes because it has a high coefficient with the risk factor of age.

Conclusions: New risk factors for diabetes mellitus were identified based on Japan's non-objective-oriented anonymous census data using interpretable machine learning models. The newly identified risk factors inspire new possible policies for preventing diabetes. Moreover, our analysis certifies that big data can help us find helpful knowledge in today's prosperous society. Our study also paves the way for identifying more risk factors and promoting the efficiency of using big data.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用日本匿名生活普查数据进行可解释的机器学习分析,以识别糖尿病风险因素。
目的:糖尿病会给我们的生活带来各种问题。随着大数据社会的蓬勃发展,一些糖尿病的风险因素必然依然存在。为了在大数据社会中发现新的糖尿病风险因素,并探索进一步有效利用大数据,我们使用可解释的机器学习方法分析了有关日本市民生活状况调查的非客观普查数据:方法:使用了七种可解释的机器学习方法来分析日本公民普查数据。首先,使用逻辑分析法从 19 个选定的初始要素中分析糖尿病的风险因素。然后,使用线性分析、线性判别分析、林量化分析方法 2、随机森林、XGBoost 和 SHAP 方法重新检查并找出不同因素的贡献。最后,分析了各因素之间的关系,以了解各因素之间的关系:结果:4 个新的风险因素:家庭成员数量、保险类型、公共养老金类型和健康意识水平首次被发现为糖尿病的风险因素,另外 11 个风险因素在本次分析中被再次确认。特别是在一些可解释的模型中,保险类型因素和健康意识水平因素对糖尿病的影响大于高血压、高脂血症和压力因素。我们还发现,工作年限被认为是糖尿病的一个风险因素,因为它与年龄这一风险因素的系数很高:结论:利用可解释的机器学习模型,基于日本非客观导向的匿名人口普查数据,发现了糖尿病的新风险因素。新发现的风险因素激发了预防糖尿病的新政策。此外,我们的分析证明,在当今繁荣的社会中,大数据可以帮助我们找到有用的知识。我们的研究还为识别更多风险因素和提高大数据使用效率铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Health and Technology
Health and Technology MEDICAL INFORMATICS-
CiteScore
7.10
自引率
0.00%
发文量
83
期刊介绍: Health and Technology is the first truly cross-disciplinary journal on issues related to health technologies addressing all professions relating to health, care and health technology.The journal constitutes an information platform connecting medical technology and informatics with the needs of care, health care professionals and patients. Thus, medical physicists and biomedical/clinical engineers are encouraged to write articles not only for their colleagues, but directed to all other groups of readers as well, and vice versa.By its nature, the journal presents and discusses hot subjects including but not limited to patient safety, patient empowerment, disease surveillance and management, e-health and issues concerning data security, privacy, reliability and management, data mining and knowledge exchange as well as health prevention. The journal also addresses the medical, financial, social, educational and safety aspects of health technologies as well as health technology assessment and management, including issues such security, efficacy, cost in comparison to the benefit, as well as social, legal and ethical implications.This journal is a communicative source for the health work force (physicians, nurses, medical physicists, clinical engineers, biomedical engineers, hospital engineers, etc.), the ministries of health, hospital management, self-employed doctors, health care providers and regulatory agencies, the medical technology industry, patients'' associations, universities (biomedical and clinical engineering, medical physics, medical informatics, biology, medicine and public health as well as health economics programs), research institutes and professional, scientific and technical organizations.Health and Technology is jointly published by Springer and the IUPESM (International Union for Physical and Engineering Sciences in Medicine) in cooperation with the World Health Organization.
期刊最新文献
Health disparity in digital health technology design Training protocol for driving power wheelchairs using virtual environment: preliminary results from a pilot study COVID-19 vaccine prediction based on an interpretable CNN-LSTM model with three-stage feature engineering Detection of Cardio Vascular abnormalities using gradient descent optimization and CNN Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1