To evaluate the strengths and limitations of retrieval-augmented generative (RAG) artificial intelligence (AI) for natural language querying of biologics immunogenicity data. The package inserts for drugs approved with biologics license applications (BLA) were retrieved from DailyMed ( https://dailymed.nlm.nih.gov/dailymed/ ). The RAG system integrated natural language processing, retrieval, and large language model (LLM) components. ChatGPT, Gemini, DeepSeek, and Llama were queried with five clinical pharmacology-focused questions on factors influencing anti-drug antibody (ADA) incidence and tolerability, including effects of target protein, administration route, and citrate excipients. Outputs were assessed for relevance, faithfulness, and domain-specific accuracy. The dataset included 663 biologics, of which 206 (31.1%) were monoclonal antibodies. The RAG system retrieved relevant contexts for all queries, but several contexts contained inaccuracies related to the presence of non-antibody protein drugs. All four LLMs generated coherent summaries and identified determinants of ADA incidence, such as drug type, assay methods, and concomitant therapy. All models found that injection-site pain occurred with some protein therapeutics containing citrate excipients, and that evidence for a direct causal role of citrate was mixed. Comparative evaluation showed that LLM outputs were generally relevant and faithful to the source text, with variation in the level of detail and comprehensiveness across models. Domain-specific evaluations indicated that responses accurately identified trends in immunogenicity and highlighted the knowledge gaps. While RAG-based systems can retrieve and synthesize immunogenicity assessments from multiple source documents, significant limitations were noted in this use case. The effectiveness of the retriever can limit RAG performance and warrant refinement.
扫码关注我们
求助内容:
应助结果提醒方式:
