Background
Large language models (LLMs) are increasingly used in health care but remain vulnerable to medical misinformation. We aimed to evaluate how often these models accept or reject fabricated medical content, and how framing that content as a logical fallacy changes results.
Methods
In this cross-sectional benchmarking analysis, we probed 20 LLMs with more than 3·4 million prompts that all contained health misinformation drawn from three sources: public-forum and social-media dialogues, real hospital discharge notes in which we inserted a single false recommendation, and 300 physician-validated simulated vignettes. Logical fallacies—common patterns of flawed reasoning such as appeals to authority, popularity, or emotion—were used to test how rhetorical framing influences model behaviour. Each prompt was posed once in a neutral base form and ten times with a named logical fallacy. For every run we logged susceptibility (model accepts the false claim) and fallacy detection (model flags the rhetoric).
Findings
Across all models and corpora, LLMs were susceptible to fabricated data in 50 108 (31·7%) of 158 000 base prompts. Eight of ten fallacy framings significantly reduced or did not change that rate, led by appeal to popularity (susceptibility 11·9%; difference of –19·8 percentage points; p<0·0001); only the slippery-slope prompt (33·9%; difference of 2·2 percentage points; p<0·0001) and the appeal-to-authority prompt (34·6%; difference of 2·9 percentage points; p<0·0001) increased it. Real hospital notes (with fabricated inserted elements) produced the highest susceptibility to the base prompt (46 108 [46·1%] of 100 000), whereas social-media misinformation showed lower base prompt susceptibility (2479 [8·9%] of 28 000). Performance varied by model: GPT models were the least susceptible and most accurate at fallacy detection, whereas others, such as Gemma-3–4B-it, showed 63·6% (5023 of 7900) susceptibility.
Interpretation
These results show that LLMs still absorb harmful medical fabrications, especially when phrased in authoritative clinical prose, yet, counter-intuitively, become less vulnerable when the same claims are wrapped in most logical fallacy styles. Therefore, improving safety appears to depend less on model scale and more on fact-grounding and context-aware guardrails.
Funding
Scientific Computing and Data at Icahn School of Medicine and National Institutes of Health Office of Research Infrastructure.
扫码关注我们
求助内容:
应助结果提醒方式:
