{"title":"产前健康营养和体重管理可带来积极的孕期体验:比较人工智能模型对孕期营养的反应。","authors":"Emine Karacan","doi":"10.1016/j.ijmedinf.2024.105663","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>As artificial intelligence AI-supported applications become integral to web-based information-seeking, assessing their impact on healthy nutrition and weight management during the antenatal period is crucial.</div></div><div><h3>Objective</h3><div>This study was conducted to evaluate both the quality and semantic similarity of responses created by AI models to the most frequently asked questions about healthy nutrition and weight management during the antenatal period, based on existing clinical knowledge.</div></div><div><h3>Methods</h3><div>In this study, a cross-sectional assessment design was used to explore data from 3 AI models (GPT-4, MedicalGPT, Med-PaLM). We directed the most frequently asked questions about nutrition during pregnancy, obtained from the American College of Obstetricians and Gynecologists (ACOG) to each model in a new and single session on October 21, 2023, without any prior conversation. Immediately after, instructions were given to the AI models to generate responses to these questions. The responses created by AI models were evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale. Additionally, to assess the semantic similarity between answers to 31 pregnancy nutrition-related frequently asked questions sourced from the ACOG and responses from AI models we evaluated cosine similarity using both WORD2VEC and BioLORD-2023.</div></div><div><h3>Results</h3><div>Med-PaLM outperformed GPT-4 and MedicalGPT in response quality (mean = 3.93), demonstrating superior clinical accuracy over both GPT-4 (p = 0.016) and MedicalGPT (p = 0.001). GPT-4 had higher quality than MedicalGPT (p = 0.027).</div><div>The semantic similarity between ACOG and Med-PaLM is higher with WORD2VEC (0.92) compared to BioLORD-2023 (0.81), showing a difference of +0.11. The similarity scores for ACOG–MedicalGPT and ACOG–GPT-4 are similar across both models, with minimal differences of −0.01. Overall, WORD2VEC has a slightly higher average similarity (0.82) than BioLORD-2023 (0.79), with a difference of +0.03.</div></div><div><h3>Conclusions</h3><div>Despite the superior performance of Med-PaLM, there is a need for further evidence-based research and improvement in the integration of AI in healthcare due to varying AI model performances.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"193 ","pages":"Article 105663"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Healthy nutrition and weight management for a positive pregnancy experience in the antenatal period: Comparison of responses from artificial intelligence models on nutrition during pregnancy\",\"authors\":\"Emine Karacan\",\"doi\":\"10.1016/j.ijmedinf.2024.105663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>As artificial intelligence AI-supported applications become integral to web-based information-seeking, assessing their impact on healthy nutrition and weight management during the antenatal period is crucial.</div></div><div><h3>Objective</h3><div>This study was conducted to evaluate both the quality and semantic similarity of responses created by AI models to the most frequently asked questions about healthy nutrition and weight management during the antenatal period, based on existing clinical knowledge.</div></div><div><h3>Methods</h3><div>In this study, a cross-sectional assessment design was used to explore data from 3 AI models (GPT-4, MedicalGPT, Med-PaLM). We directed the most frequently asked questions about nutrition during pregnancy, obtained from the American College of Obstetricians and Gynecologists (ACOG) to each model in a new and single session on October 21, 2023, without any prior conversation. Immediately after, instructions were given to the AI models to generate responses to these questions. The responses created by AI models were evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale. Additionally, to assess the semantic similarity between answers to 31 pregnancy nutrition-related frequently asked questions sourced from the ACOG and responses from AI models we evaluated cosine similarity using both WORD2VEC and BioLORD-2023.</div></div><div><h3>Results</h3><div>Med-PaLM outperformed GPT-4 and MedicalGPT in response quality (mean = 3.93), demonstrating superior clinical accuracy over both GPT-4 (p = 0.016) and MedicalGPT (p = 0.001). GPT-4 had higher quality than MedicalGPT (p = 0.027).</div><div>The semantic similarity between ACOG and Med-PaLM is higher with WORD2VEC (0.92) compared to BioLORD-2023 (0.81), showing a difference of +0.11. The similarity scores for ACOG–MedicalGPT and ACOG–GPT-4 are similar across both models, with minimal differences of −0.01. Overall, WORD2VEC has a slightly higher average similarity (0.82) than BioLORD-2023 (0.79), with a difference of +0.03.</div></div><div><h3>Conclusions</h3><div>Despite the superior performance of Med-PaLM, there is a need for further evidence-based research and improvement in the integration of AI in healthcare due to varying AI model performances.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"193 \",\"pages\":\"Article 105663\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003265\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003265","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Healthy nutrition and weight management for a positive pregnancy experience in the antenatal period: Comparison of responses from artificial intelligence models on nutrition during pregnancy
Background
As artificial intelligence AI-supported applications become integral to web-based information-seeking, assessing their impact on healthy nutrition and weight management during the antenatal period is crucial.
Objective
This study was conducted to evaluate both the quality and semantic similarity of responses created by AI models to the most frequently asked questions about healthy nutrition and weight management during the antenatal period, based on existing clinical knowledge.
Methods
In this study, a cross-sectional assessment design was used to explore data from 3 AI models (GPT-4, MedicalGPT, Med-PaLM). We directed the most frequently asked questions about nutrition during pregnancy, obtained from the American College of Obstetricians and Gynecologists (ACOG) to each model in a new and single session on October 21, 2023, without any prior conversation. Immediately after, instructions were given to the AI models to generate responses to these questions. The responses created by AI models were evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale. Additionally, to assess the semantic similarity between answers to 31 pregnancy nutrition-related frequently asked questions sourced from the ACOG and responses from AI models we evaluated cosine similarity using both WORD2VEC and BioLORD-2023.
Results
Med-PaLM outperformed GPT-4 and MedicalGPT in response quality (mean = 3.93), demonstrating superior clinical accuracy over both GPT-4 (p = 0.016) and MedicalGPT (p = 0.001). GPT-4 had higher quality than MedicalGPT (p = 0.027).
The semantic similarity between ACOG and Med-PaLM is higher with WORD2VEC (0.92) compared to BioLORD-2023 (0.81), showing a difference of +0.11. The similarity scores for ACOG–MedicalGPT and ACOG–GPT-4 are similar across both models, with minimal differences of −0.01. Overall, WORD2VEC has a slightly higher average similarity (0.82) than BioLORD-2023 (0.79), with a difference of +0.03.
Conclusions
Despite the superior performance of Med-PaLM, there is a need for further evidence-based research and improvement in the integration of AI in healthcare due to varying AI model performances.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.