由教练专家评估GPT-4和谷歌Gemini生成的肥大相关训练计划的可重复性和质量。

IF 6.4 2区 医学 Q1 SPORT SCIENCES Biology of Sport Pub Date : 2025-04-01 Epub Date: 2024-12-18 DOI:10.5114/biolsport.2025.145911
Tim Havers, Lukas Masur, Eduard Isenmann, Stephan Geisler, Christoph Zinner, Billy Sperlich, Peter Düking
{"title":"由教练专家评估GPT-4和谷歌Gemini生成的肥大相关训练计划的可重复性和质量。","authors":"Tim Havers, Lukas Masur, Eduard Isenmann, Stephan Geisler, Christoph Zinner, Billy Sperlich, Peter Düking","doi":"10.5114/biolsport.2025.145911","DOIUrl":null,"url":null,"abstract":"<p><p>Large Language Models (LLMs) are increasingly utilized in various domains, including the generation of training plans. However, the reproducibility and quality of training plans produced by different LLMs have not been studied extensively. This study aims to: i) investigate and compare the quality of muscle hypertrophy-related resistance training (RT) plans generated by Google Gemini (GG) and GPT-4, and ii) the reproducibility of the RT plans when the same prompts are provided multiple times concomitantly. Two distinct prompts were used, one providing little information about the training plan requirements and the other providing detailed information. These prompts were input into GG and GPT-4 by two different individuals, resulting in the generation of eight RT plans. These plans were evaluated by 12 coaching experts using a 5-point Likert scale, based on quality criteria derived from the literature. The results indicated a high degree of reproducibility, as indicated by coaching expert evaluation, when the same distinct prompts were provided multiple times to the LLMs of interest, with 27 out of 28 items showing no differences (p > 0.05). Overall, GPT-4 was rated higher on several aspects of RT quality criteria (p = 0.000-0.043). Additionally, compared to little information, higher information density within the prompts resulted in higher rated RT quality (p = 0.000-0.037). Our findings show that RT plans can be generated reproducibly with the same quality when using the same prompts. Furthermore, quality improves with more detailed input, and GPT-4 outperformed GG in generating higherquality plans. These results suggest that detailed information input is crucial for LLM performance.</p>","PeriodicalId":55365,"journal":{"name":"Biology of Sport","volume":"42 2","pages":"289-329"},"PeriodicalIF":6.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963122/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts.\",\"authors\":\"Tim Havers, Lukas Masur, Eduard Isenmann, Stephan Geisler, Christoph Zinner, Billy Sperlich, Peter Düking\",\"doi\":\"10.5114/biolsport.2025.145911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Large Language Models (LLMs) are increasingly utilized in various domains, including the generation of training plans. However, the reproducibility and quality of training plans produced by different LLMs have not been studied extensively. This study aims to: i) investigate and compare the quality of muscle hypertrophy-related resistance training (RT) plans generated by Google Gemini (GG) and GPT-4, and ii) the reproducibility of the RT plans when the same prompts are provided multiple times concomitantly. Two distinct prompts were used, one providing little information about the training plan requirements and the other providing detailed information. These prompts were input into GG and GPT-4 by two different individuals, resulting in the generation of eight RT plans. These plans were evaluated by 12 coaching experts using a 5-point Likert scale, based on quality criteria derived from the literature. The results indicated a high degree of reproducibility, as indicated by coaching expert evaluation, when the same distinct prompts were provided multiple times to the LLMs of interest, with 27 out of 28 items showing no differences (p > 0.05). Overall, GPT-4 was rated higher on several aspects of RT quality criteria (p = 0.000-0.043). Additionally, compared to little information, higher information density within the prompts resulted in higher rated RT quality (p = 0.000-0.037). Our findings show that RT plans can be generated reproducibly with the same quality when using the same prompts. Furthermore, quality improves with more detailed input, and GPT-4 outperformed GG in generating higherquality plans. These results suggest that detailed information input is crucial for LLM performance.</p>\",\"PeriodicalId\":55365,\"journal\":{\"name\":\"Biology of Sport\",\"volume\":\"42 2\",\"pages\":\"289-329\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963122/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology of Sport\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5114/biolsport.2025.145911\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"SPORT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology of Sport","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5114/biolsport.2025.145911","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(llm)越来越多地应用于各个领域,包括培训计划的生成。然而,不同法学硕士制定的培训计划的可重复性和质量尚未得到广泛研究。本研究旨在:i)调查和比较谷歌Gemini (GG)和GPT-4生成的肌肉肥大相关阻力训练(RT)计划的质量,ii)同时提供多次相同提示时RT计划的可重复性。使用了两个不同的提示,一个提供关于培训计划要求的少量信息,另一个提供详细信息。这些提示由两个不同的个体输入到GG和GPT-4中,从而产生8个RT计划。这些计划由12位教练专家根据文献得出的质量标准,使用5分李克特量表进行评估。结果表明,正如教练专家评估所表明的那样,当向感兴趣的法学硕士多次提供相同的不同提示时,结果具有高度的可重复性,28个项目中有27个没有差异(p > 0.05)。总体而言,GPT-4在RT质量标准的几个方面被评为更高(p = 0.000-0.043)。此外,与信息较少相比,提示信息密度越大,RT质量评分越高(p = 0.000-0.037)。我们的研究结果表明,当使用相同的提示时,可以以相同的质量可重复地生成RT计划。此外,更详细的输入提高了质量,GPT-4在生成更高质量的计划方面优于GG。这些结果表明,详细的信息输入对LLM的性能至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts.

Large Language Models (LLMs) are increasingly utilized in various domains, including the generation of training plans. However, the reproducibility and quality of training plans produced by different LLMs have not been studied extensively. This study aims to: i) investigate and compare the quality of muscle hypertrophy-related resistance training (RT) plans generated by Google Gemini (GG) and GPT-4, and ii) the reproducibility of the RT plans when the same prompts are provided multiple times concomitantly. Two distinct prompts were used, one providing little information about the training plan requirements and the other providing detailed information. These prompts were input into GG and GPT-4 by two different individuals, resulting in the generation of eight RT plans. These plans were evaluated by 12 coaching experts using a 5-point Likert scale, based on quality criteria derived from the literature. The results indicated a high degree of reproducibility, as indicated by coaching expert evaluation, when the same distinct prompts were provided multiple times to the LLMs of interest, with 27 out of 28 items showing no differences (p > 0.05). Overall, GPT-4 was rated higher on several aspects of RT quality criteria (p = 0.000-0.043). Additionally, compared to little information, higher information density within the prompts resulted in higher rated RT quality (p = 0.000-0.037). Our findings show that RT plans can be generated reproducibly with the same quality when using the same prompts. Furthermore, quality improves with more detailed input, and GPT-4 outperformed GG in generating higherquality plans. These results suggest that detailed information input is crucial for LLM performance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biology of Sport
Biology of Sport 生物-运动科学
CiteScore
8.20
自引率
12.50%
发文量
113
审稿时长
>12 weeks
期刊介绍: Biology of Sport is the official journal of the Institute of Sport in Warsaw, Poland, published since 1984. Biology of Sport is an international scientific peer-reviewed journal, published quarterly in both paper and electronic format. The journal publishes articles concerning basic and applied sciences in sport: sports and exercise physiology, sports immunology and medicine, sports genetics, training and testing, pharmacology, as well as in other biological aspects related to sport. Priority is given to inter-disciplinary papers.
期刊最新文献
The influence of functional kinematic asymmetry on maximum speed performance in repeated sprints. The overload loop: a distinct reoxygenation pattern above the second ventilatory threshold revealed by a new analytical method. Training characteristics of male and female WorldTour professional road cyclists before the competitive phase. Return of match running performance following muscle strain injuries of varying severity in professional football. Training with an elastic bench press device provides comparable adaptations to conventional resistance training in trained men.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1