Pub Date : 2025-02-13DOI: 10.1101/2025.02.07.25321883
Rachel E Solnick, Tatiana Gonzalez-Argoti, Laurie J Bauman, Christine Tagliaferri Rael, Joanne E Mantell, Yvonne Calderon, Ethan Cowan, Susie Hoffman
HIV pre-exposure prophylaxis (PrEP) is underutilized in the United States. Emergency Departments (EDs) can be strategic locations for initiating PrEP; however, knowledge concerning patients' receptivity to ED PrEP programs is limited. This study explores ED patients' perspectives on PrEP service delivery and their preferences for implementation. Semi-structured qualitative interviews were conducted with 15 potentially PrEP-eligible ED patients to examine their receptiveness to PrEP services, preferences for delivery methods, and logistical considerations. Most participants were open to learning about PrEP in the ED, provided it did not delay care, occur during distress, or compromise privacy. Universal PrEP education was viewed as reducing stigma and increasing awareness, while targeted screening was seen as efficient. Participants strongly preferred receiving information in person rather than via videos or pamphlets. Concerns included ensuring ED staff expertise and maintaining privacy during PrEP-related discussions. Regarding same-day PrEP versus prescriptions or referrals, opinions varied, with participants valuing flexibility and linkage to care. This first qualitative study of ED patients' perspectives on PrEP services highlights general receptiveness, with key concerns about privacy, expertise, and wait times. Patient-centered approaches, including integrating PrEP services into ED workflows, offering flexible initiation options, and providing privacy, can support the feasibility of ED-based PrEP programs.
{"title":"Emergency Department Patients' Perspectives on Being Offered HIV Pre-Exposure Prophylaxis (PrEP) Services in an Urban ED.","authors":"Rachel E Solnick, Tatiana Gonzalez-Argoti, Laurie J Bauman, Christine Tagliaferri Rael, Joanne E Mantell, Yvonne Calderon, Ethan Cowan, Susie Hoffman","doi":"10.1101/2025.02.07.25321883","DOIUrl":"10.1101/2025.02.07.25321883","url":null,"abstract":"<p><p>HIV pre-exposure prophylaxis (PrEP) is underutilized in the United States. Emergency Departments (EDs) can be strategic locations for initiating PrEP; however, knowledge concerning patients' receptivity to ED PrEP programs is limited. This study explores ED patients' perspectives on PrEP service delivery and their preferences for implementation. Semi-structured qualitative interviews were conducted with 15 potentially PrEP-eligible ED patients to examine their receptiveness to PrEP services, preferences for delivery methods, and logistical considerations. Most participants were open to learning about PrEP in the ED, provided it did not delay care, occur during distress, or compromise privacy. Universal PrEP education was viewed as reducing stigma and increasing awareness, while targeted screening was seen as efficient. Participants strongly preferred receiving information in person rather than via videos or pamphlets. Concerns included ensuring ED staff expertise and maintaining privacy during PrEP-related discussions. Regarding same-day PrEP versus prescriptions or referrals, opinions varied, with participants valuing flexibility and linkage to care. This first qualitative study of ED patients' perspectives on PrEP services highlights general receptiveness, with key concerns about privacy, expertise, and wait times. Patient-centered approaches, including integrating PrEP services into ED workflows, offering flexible initiation options, and providing privacy, can support the feasibility of ED-based PrEP programs.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1101/2025.01.21.25320888
Diogo A P Nunes, Dan Furrer, Sara Berger, Guillermo Cecchi, Joana Ferreira-Gomes, Fani Neto, David Martins de Matos, A Vania Apkarian, Paulo Branco
Placebo analgesia in chronic pain is a widely studied clinical phenomenon, where expectations about the effectiveness of a treatment can result in substantial pain relief when using an inert treatment agent. While placebos offer an opportunity for non-pharmacological treatment in chronic pain, not everyone demonstrates an analgesic response. Prior research has identified biopsychosocial factors that determine the likelihood of an individual to respond to a placebo, yet generalizability and ecological validity in those studies have been limited due to the inability to account for dynamic personal and treatment effects-which are well-known to play a role. Here, we assessed the potential of using fine-tuned large language models (LLMs) to predict placebo responders in chronic low-back pain using contextual features extracted from patient interviews, as they speak about their lifestyle, pain, and treatment history. We re-analyzed data from two clinical trials where individuals performed open-ended interviews and used these to develop a predictive model of placebo response. Our findings demonstrate that semantic features extracted with LLMs accurately predicted placebo responders, achieving a classification accuracy of 74% in unseen data, and validating with 70% accuracy in an independent cohort. Further, LLMs eliminated the need for pre-selecting search terms or to use dictionary approaches, enabling a fully data-driven approach. This LLM method further provided interpretable insights into psychosocial factors underlying placebo responses, highlighting nuanced linguistic patterns linked to responder status, which tap into semantic dimensions such as "anxiety," "resignation," and "hope." These findings expand on prior research by integrating state-of-art NLP techniques to address limitations in interpretability and context sensitivity of standard methods like bag-of-words and dictionary-based approaches. This method highlights the role of language models to link language and psychological states, paving the way for a deeper yet quantitative exploration of biopsychosocial phenomena, and to understand how they relate to treatment outcomes, including placebo.
{"title":"Advancing the prediction and understanding of placebo responses in chronic back pain using large language models.","authors":"Diogo A P Nunes, Dan Furrer, Sara Berger, Guillermo Cecchi, Joana Ferreira-Gomes, Fani Neto, David Martins de Matos, A Vania Apkarian, Paulo Branco","doi":"10.1101/2025.01.21.25320888","DOIUrl":"10.1101/2025.01.21.25320888","url":null,"abstract":"<p><p>Placebo analgesia in chronic pain is a widely studied clinical phenomenon, where expectations about the effectiveness of a treatment can result in substantial pain relief when using an inert treatment agent. While placebos offer an opportunity for non-pharmacological treatment in chronic pain, not everyone demonstrates an analgesic response. Prior research has identified biopsychosocial factors that determine the likelihood of an individual to respond to a placebo, yet generalizability and ecological validity in those studies have been limited due to the inability to account for dynamic personal and treatment effects-which are well-known to play a role. Here, we assessed the potential of using fine-tuned large language models (LLMs) to predict placebo responders in chronic low-back pain using contextual features extracted from patient interviews, as they speak about their lifestyle, pain, and treatment history. We re-analyzed data from two clinical trials where individuals performed open-ended interviews and used these to develop a predictive model of placebo response. Our findings demonstrate that semantic features extracted with LLMs accurately predicted placebo responders, achieving a classification accuracy of 74% in unseen data, and validating with 70% accuracy in an independent cohort. Further, LLMs eliminated the need for pre-selecting search terms or to use dictionary approaches, enabling a fully data-driven approach. This LLM method further provided interpretable insights into psychosocial factors underlying placebo responses, highlighting nuanced linguistic patterns linked to responder status, which tap into semantic dimensions such as \"anxiety,\" \"resignation,\" and \"hope.\" These findings expand on prior research by integrating state-of-art NLP techniques to address limitations in interpretability and context sensitivity of standard methods like bag-of-words and dictionary-based approaches. This method highlights the role of language models to link language and psychological states, paving the way for a deeper yet quantitative exploration of biopsychosocial phenomena, and to understand how they relate to treatment outcomes, including placebo.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11838926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143461291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1101/2025.02.12.25322145
Ranjana M S Gigi, Mandisa M Mdingi, Lukas Bütikofer, Chibuzor M Babalola, Jeffrey D Klausner, Andrew Medina-Marino, Christina A Muzny, Christopher M Taylor, Janneke H H M van de Wijgert, Remco P H Peters, Nicola Low
Background: Same-day testing and treatment of curable sexually transmitted infections (STI) is a strategy to reduce infection duration and onward transmission. South African primary healthcare facilities often lack sufficient waiting spaces. This study aimed to assess the proportion of, and factors influencing, pregnant women waiting for on-site STI test results before and after the installation of clinic-based waiting rooms.
Methods: We conducted an observational quality improvement study at 5 public primary healthcare facilities in South Africa from March 2021 to May 2023. The intervention was the installation of a waiting room in two clinics. Three clinics were used as comparators: two already had a waiting room in an existing building and one had access to a shared waiting area. The outcome was the percentage of women who waited for their STI test results. We conducted univariable and multivariable analyses and report marginal risk differences (with 95% confidence intervals, CI) of the proportions of women who waited for results. A subset of women answered structured questions about factors influencing their decision to wait for results.
Results: We analysed data from 624 women across the 5 facilities. Overall, 36% (95% CI 31, 40) waited for their test results (range 7% to 89%). In the two intervention clinics, 17% (95% CI 11, 24) waited for results before the introduction of a waiting room and 10% (95% CI 5, 18) after (crude absolute difference -7% (95% CI -16, +3), adjusted difference, -6% (95% CI -17, +5)). The percentages of pregnant women waiting for sexually transmitted infection test results were higher throughout the study period in 2 clinics which always had a dedicated waiting room than in 2 clinics where a waiting room was installed, or in 1 clinic, which only had access to a shared waiting area. Most women reported before testing that they did not intend to wait and none of the suggested factors would change their decision.
Conclusions: Introduction of a waiting room did not increase the proportion of women who waited for their results in this observational study. Future studies should investigate infrastructure, individual and test-based factors that affect same-day STI testing and treatment.
背景:当天检测和治疗可治愈的性传播感染(STI)是缩短感染持续时间和减少传播的一项策略。南非的初级医疗保健设施往往缺乏足够的等候空间。本研究旨在评估安装诊所候诊室前后孕妇等待现场性传播感染检测结果的比例和影响因素:我们于 2021 年 3 月至 2023 年 5 月在南非的 5 家公立初级医疗保健机构开展了一项观察性质量改进研究。干预措施是在两家诊所安装候诊室。三家诊所作为比较对象:两家诊所已在现有建筑内设有候诊室,一家诊所可使用共享候诊区。结果是等待性传播感染检测结果的妇女比例。我们进行了单变量和多变量分析,并报告了等待结果的妇女比例的边际风险差异(含 95% 置信区间,CI)。一部分妇女回答了有关影响她们决定等待结果的因素的结构化问题:我们分析了来自 5 家机构的 624 名妇女的数据。总体而言,36%(95% CI 31,40)的妇女等待了检查结果(范围从 7% 到 89%)。在两家干预诊所中,设立候检室前有 17% (95% CI 11,24)的人等待结果,设立候检室后有 10% (95% CI 5,18)的人等待结果(粗略绝对差异为 -7%(95% CI -16,+3),调整后差异为 -6%(95% CI -17,+5))。在整个研究期间,2 家始终设有专用候诊室的诊所中等待性传播感染检测结果的孕妇比例高于 2 家设有候诊室的诊所或 1 家只能使用共用候诊区的诊所。大多数妇女在检查前都表示她们不打算等待,而且所建议的因素都不会改变她们的决定:结论:在这项观察性研究中,设置候诊室并没有增加等待结果的妇女比例。未来的研究应调查影响当天性传播感染检测和治疗的基础设施、个人和检测因素。
{"title":"Does a waiting room increase same-day treatment for sexually transmitted infections among pregnant women? A quality improvement study at South African primary healthcare facilities.","authors":"Ranjana M S Gigi, Mandisa M Mdingi, Lukas Bütikofer, Chibuzor M Babalola, Jeffrey D Klausner, Andrew Medina-Marino, Christina A Muzny, Christopher M Taylor, Janneke H H M van de Wijgert, Remco P H Peters, Nicola Low","doi":"10.1101/2025.02.12.25322145","DOIUrl":"10.1101/2025.02.12.25322145","url":null,"abstract":"<p><strong>Background: </strong>Same-day testing and treatment of curable sexually transmitted infections (STI) is a strategy to reduce infection duration and onward transmission. South African primary healthcare facilities often lack sufficient waiting spaces. This study aimed to assess the proportion of, and factors influencing, pregnant women waiting for on-site STI test results before and after the installation of clinic-based waiting rooms.</p><p><strong>Methods: </strong>We conducted an observational quality improvement study at 5 public primary healthcare facilities in South Africa from March 2021 to May 2023. The intervention was the installation of a waiting room in two clinics. Three clinics were used as comparators: two already had a waiting room in an existing building and one had access to a shared waiting area. The outcome was the percentage of women who waited for their STI test results. We conducted univariable and multivariable analyses and report marginal risk differences (with 95% confidence intervals, CI) of the proportions of women who waited for results. A subset of women answered structured questions about factors influencing their decision to wait for results.</p><p><strong>Results: </strong>We analysed data from 624 women across the 5 facilities. Overall, 36% (95% CI 31, 40) waited for their test results (range 7% to 89%). In the two intervention clinics, 17% (95% CI 11, 24) waited for results before the introduction of a waiting room and 10% (95% CI 5, 18) after (crude absolute difference -7% (95% CI -16, +3), adjusted difference, -6% (95% CI -17, +5)). The percentages of pregnant women waiting for sexually transmitted infection test results were higher throughout the study period in 2 clinics which always had a dedicated waiting room than in 2 clinics where a waiting room was installed, or in 1 clinic, which only had access to a shared waiting area. Most women reported before testing that they did not intend to wait and none of the suggested factors would change their decision.</p><p><strong>Conclusions: </strong>Introduction of a waiting room did not increase the proportion of women who waited for their results in this observational study. Future studies should investigate infrastructure, individual and test-based factors that affect same-day STI testing and treatment.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1101/2025.02.11.25322066
Jose Victor Zambrana, Ian A Mellis, Abigail Shotwell, Hannah E Maier, Yara Saborio, Carlos Barillas, Roger Lopez, Gerald Vasquez, Miguel Plazaola, Nery Sanchez, Sergio Ojeda, Isabel Gilbertson, Guillermina Kuan, Qian Wang, Lihong Liu, Angel Balmaseda, David D Ho, Aubree Gordon
Background: Vaccination prior infection elicit neutralizing antibodies targeting SARS-CoV-2, yet the quantitative relationship between serum antibodies infection risk against viral variants remains uncertain, particularly in underrepresented regions.
Methods: We investigated the protective correlation of pre-exposure serum neutralizing antibody levels, employing a panel of SARS-CoV-2 pseudoviruses (Omicron BA.1, Omicron BA.2, ancestral D614G), Spike-binding antibody levels, with symptomatic BA.1 or BA.2 SARS-CoV-2 infections overall infection, in 345 household contacts from a SARS-CoV-2 household cohort study.
Results: A four-fold increase in homotypic-neutralizing (e.g., BA.1-neutralizing vs. BA.1 exposure) titers was correlated with protection from symptomatic infections (BA.1 protection: 28% [95%CI 12-42%]; BA.2 protection: 43% [20-62%]), ancestral-neutralizing titers were also correlated with protection from either variant, but only at higher average levels than homotypic. Mediation analyses revealed that homotypic D614G-neutralizing antibodies mediated protection from infection symptomatic infection both from prior infection vaccination.
Conclusions: These findings underscore the importance of monitoring variant-specific antibody responses highlight that antibodies targeting circulating strains may be more predictive of protection from infection. Nevertheless, ancestral-strain-neutralizing antibodies remain relevant as a correlate of protection. Our study emphasizes the need for continued efforts to assess antibody correlates of protection.
Funding: We acknowledge funding from the U.S. N.I.H., the Open Philanthropy Project, the Bill Melinda Gates Foundation.
{"title":"Variant-specific antibody correlates of protection against SARS-CoV-2 Omicron symptomatic overall infections.","authors":"Jose Victor Zambrana, Ian A Mellis, Abigail Shotwell, Hannah E Maier, Yara Saborio, Carlos Barillas, Roger Lopez, Gerald Vasquez, Miguel Plazaola, Nery Sanchez, Sergio Ojeda, Isabel Gilbertson, Guillermina Kuan, Qian Wang, Lihong Liu, Angel Balmaseda, David D Ho, Aubree Gordon","doi":"10.1101/2025.02.11.25322066","DOIUrl":"10.1101/2025.02.11.25322066","url":null,"abstract":"<p><strong>Background: </strong>Vaccination prior infection elicit neutralizing antibodies targeting SARS-CoV-2, yet the quantitative relationship between serum antibodies infection risk against viral variants remains uncertain, particularly in underrepresented regions.</p><p><strong>Methods: </strong>We investigated the protective correlation of pre-exposure serum neutralizing antibody levels, employing a panel of SARS-CoV-2 pseudoviruses (Omicron BA.1, Omicron BA.2, ancestral D614G), Spike-binding antibody levels, with symptomatic BA.1 or BA.2 SARS-CoV-2 infections overall infection, in 345 household contacts from a SARS-CoV-2 household cohort study.</p><p><strong>Results: </strong>A four-fold increase in homotypic-neutralizing (e.g., BA.1-neutralizing vs. BA.1 exposure) titers was correlated with protection from symptomatic infections (BA.1 protection: 28% [95%CI 12-42%]; BA.2 protection: 43% [20-62%]), ancestral-neutralizing titers were also correlated with protection from either variant, but only at higher average levels than homotypic. Mediation analyses revealed that homotypic D614G-neutralizing antibodies mediated protection from infection symptomatic infection both from prior infection vaccination.</p><p><strong>Conclusions: </strong>These findings underscore the importance of monitoring variant-specific antibody responses highlight that antibodies targeting circulating strains may be more predictive of protection from infection. Nevertheless, ancestral-strain-neutralizing antibodies remain relevant as a correlate of protection. Our study emphasizes the need for continued efforts to assess antibody correlates of protection.</p><p><strong>Funding: </strong>We acknowledge funding from the U.S. N.I.H., the Open Philanthropy Project, the Bill Melinda Gates Foundation.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844606/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.10.25321755
Stacy G Guzman, Sarah M Ruggiero, Shiva Ganesan, Colin A Ellis, Alicia G Harrison, Katie R Sullivan, Zornitza Stark, Natasha J Brown, Sajel L Kana, Anabelle Tuttle, Jair Tenorio, Pablo Lapunzina, Julián Nevado, Marie T McDonald, Courtney Jensen, Patricia G Wheeler, Lila Stange, Jennifer Morrison, Boris Keren, Solveig Heide, Meg W Keating, Kameryn M Butler, Mike A Lyons, Shailly Jain, Mehdi Yeganeh, Michelle L Thompson, Molly Schroeder, Hoanh Nguyen, Jorge Granadillo, Kari M Johnston, Chaya N Murali, Katie Bosanko, T Andrew Burrow, Syreeta Morgan, Deborah J Watson, Hakon Hakonarson, Ingo Helbig
Disease-causing variants in synaptic function genes are a common cause of neurodevelopmental disorders and epilepsy. Here, we describe 14 individuals with de novo disruptive variants in BSN , which encodes the presynaptic protein Bassoon. To expand the phenotypic spectrum, we identified 15 additional individuals with protein-truncating variants (PTVs) from large biobanks. Clinical features were standardized using the Human Phenotype Ontology (HPO) across all 29 individuals, which revealed common clinical characteristics including epilepsy (13/29 45%), febrile seizures (7/29 25%), generalized tonic-clonic seizures (5/29 17%), and focal onset seizures (3/29 10%). Behavioral phenotypes were present in almost half of all individuals (14/29 48%), which comprised ADHD (7/29 25%) and autistic behavior (5/29 17%). Additional common features included developmental delay (11/29 38%), obesity (10/29 34%), and delayed speech (8/29 28%). In adults with BSN PTVs, milder features were common, suggesting phenotypic variability including a range of individuals without obvious neurodevelopmental features (7/29 24%). To detect gene-specific signatures, we performed association analysis in a cohort of 14,895 individuals with neurodevelopmental disorders (NDDs). A total of 66 clinical features were associated with BSN , including febrile seizures (p=1.26e-06) and behavioral disinhibition (p = 3.39e-17). Furthermore, individuals carrying BSN variants were phenotypically more similar than expected by chance (p=0.00014), exceeding phenotypic relatedness in 179/256 NDD-related conditions. In summary, integrating information derived from community-based gene matching and large data repositories through computational phenotyping approaches, we identify BSN variants as the cause of a new class of synaptic disorder with a broad phenotypic range across the age spectrum.
{"title":"Variants in <i>BSN</i> , encoding the presynaptic protein Bassoon, result in a novel neurodevelopmental disorder with a broad phenotypic range.","authors":"Stacy G Guzman, Sarah M Ruggiero, Shiva Ganesan, Colin A Ellis, Alicia G Harrison, Katie R Sullivan, Zornitza Stark, Natasha J Brown, Sajel L Kana, Anabelle Tuttle, Jair Tenorio, Pablo Lapunzina, Julián Nevado, Marie T McDonald, Courtney Jensen, Patricia G Wheeler, Lila Stange, Jennifer Morrison, Boris Keren, Solveig Heide, Meg W Keating, Kameryn M Butler, Mike A Lyons, Shailly Jain, Mehdi Yeganeh, Michelle L Thompson, Molly Schroeder, Hoanh Nguyen, Jorge Granadillo, Kari M Johnston, Chaya N Murali, Katie Bosanko, T Andrew Burrow, Syreeta Morgan, Deborah J Watson, Hakon Hakonarson, Ingo Helbig","doi":"10.1101/2025.02.10.25321755","DOIUrl":"https://doi.org/10.1101/2025.02.10.25321755","url":null,"abstract":"<p><p>Disease-causing variants in synaptic function genes are a common cause of neurodevelopmental disorders and epilepsy. Here, we describe 14 individuals with <i>de novo</i> disruptive variants in <i>BSN</i> , which encodes the presynaptic protein Bassoon. To expand the phenotypic spectrum, we identified 15 additional individuals with protein-truncating variants (PTVs) from large biobanks. Clinical features were standardized using the Human Phenotype Ontology (HPO) across all 29 individuals, which revealed common clinical characteristics including epilepsy (13/29 45%), febrile seizures (7/29 25%), generalized tonic-clonic seizures (5/29 17%), and focal onset seizures (3/29 10%). Behavioral phenotypes were present in almost half of all individuals (14/29 48%), which comprised ADHD (7/29 25%) and autistic behavior (5/29 17%). Additional common features included developmental delay (11/29 38%), obesity (10/29 34%), and delayed speech (8/29 28%). In adults with <i>BSN</i> PTVs, milder features were common, suggesting phenotypic variability including a range of individuals without obvious neurodevelopmental features (7/29 24%). To detect gene-specific signatures, we performed association analysis in a cohort of 14,895 individuals with neurodevelopmental disorders (NDDs). A total of 66 clinical features were associated with <i>BSN</i> , including febrile seizures (p=1.26e-06) and behavioral disinhibition (p = 3.39e-17). Furthermore, individuals carrying <i>BSN</i> variants were phenotypically more similar than expected by chance (p=0.00014), exceeding phenotypic relatedness in 179/256 NDD-related conditions. In summary, integrating information derived from community-based gene matching and large data repositories through computational phenotyping approaches, we identify <i>BSN</i> variants as the cause of a new class of synaptic disorder with a broad phenotypic range across the age spectrum.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.09.25321494
L A Machado-Paula, J Romanowska, R T Lie, L Hovey, B Doolittle, W Awotoye, L Dunlay, X J Xie, E Zeng, A Butali, M L Marazita, J C Murray, L M Moreno-Uribe, A L Petrin
Objectives: Nonsyndromic orofacial clefts (OFCs) etiology involves multiple genetic and environmental factors with over 60 identified risk loci; however, they account for only a minority of the estimated risk. Epigenetic factors such as differential DNA methylation (DNAm) are also associated with OFCs risk and can alter risk for different cleft types and modify OFCs penetrance. DNAm is a covalent addition of a methyl (CH3) group to the nucleotide cytosine that can lead to changes in expression of the targeted gene. DNAm can be affected by environmental influences and genetic variation via methylation quantitative loci (meQTLs). We hypothesize that aberrant DNAm and the resulting alterations in gene expression play a key role in the etiology of OFCs, and that certain common genetic variants that affect OFCs risk do so by influencing DNAm.
Methods: We used genotype from 10 cleft-associated SNPs and genome-wide DNA methylation data (Illumina 450K array) for 409 cases with OFCs and 456 controls and identified 23 cleft-associated meQTLs. We then used an independent cohort of 362 cleft-discordant sib pairs for replication. We used methylation-specific qPCR to measure methylation levels of each CpG site and combined genotypic and methylation data for an interaction analysis of each SNP-CpG pair using the R package MatrixeQTL in a linear model. We also performed a Paired T-test to analyze differences in DNA methylation between each member of the sibling pairs.
Conclusions: Our results confirm previous evidence that some of the common non-coding variants detected through GWAS studies can influence the risk of OFCs via epigenetic mechanisms, such as DNAm, which can ultimately affect and regulate gene expression. Given the large prevalence of non-coding SNPs in most OFCs genome wide association studies, our findings can potentially address major knowledge gaps, like missing heritability, reduced penetrance, and variable expressivity associated with OFCs phenotypes.
{"title":"Genetic-epigenetic interactions (meQTLs) in orofacial clefts etiology.","authors":"L A Machado-Paula, J Romanowska, R T Lie, L Hovey, B Doolittle, W Awotoye, L Dunlay, X J Xie, E Zeng, A Butali, M L Marazita, J C Murray, L M Moreno-Uribe, A L Petrin","doi":"10.1101/2025.02.09.25321494","DOIUrl":"https://doi.org/10.1101/2025.02.09.25321494","url":null,"abstract":"<p><strong>Objectives: </strong>Nonsyndromic orofacial clefts (OFCs) etiology involves multiple genetic and environmental factors with over 60 identified risk loci; however, they account for only a minority of the estimated risk. Epigenetic factors such as differential DNA methylation (DNAm) are also associated with OFCs risk and can alter risk for different cleft types and modify OFCs penetrance. DNAm is a covalent addition of a methyl (CH3) group to the nucleotide cytosine that can lead to changes in expression of the targeted gene. DNAm can be affected by environmental influences and genetic variation via methylation quantitative loci (meQTLs). We hypothesize that aberrant DNAm and the resulting alterations in gene expression play a key role in the etiology of OFCs, and that certain common genetic variants that affect OFCs risk do so by influencing DNAm.</p><p><strong>Methods: </strong>We used genotype from 10 cleft-associated SNPs and genome-wide DNA methylation data (Illumina 450K array) for 409 cases with OFCs and 456 controls and identified 23 cleft-associated meQTLs. We then used an independent cohort of 362 cleft-discordant sib pairs for replication. We used methylation-specific qPCR to measure methylation levels of each CpG site and combined genotypic and methylation data for an interaction analysis of each SNP-CpG pair using the R package MatrixeQTL in a linear model. We also performed a Paired T-test to analyze differences in DNA methylation between each member of the sibling pairs.</p><p><strong>Results: </strong>We replicated 9 meQTLs, showing interactions between rs13041247 ( <i>MAFB</i> ) - cg18347630 ( <i>PLCG1</i> ) (P=0.04); rs227731 ( <i>NOG</i> ) - cg08592707 <i>(PPM1E)</i> (P=0.01); rs227731 ( <i>NOG</i> ) - cg10303698 ( <i>CUEDC1</i> ) (P=0.001); rs3758249 ( <i>FOXE1</i> ) - cg20308679 ( <i>FRZB</i> ) (P=0.04); rs8001641 ( <i>SPRY2</i> ) - cg19191560 ( <i>LGR4</i> ) (P=0.04); rs987525(8q24) - cg16561172( <i>MYC</i> ) (P=0.00000963); rs7590268( <i>THADA</i> ) - cg06873343 ( <i>TTYH3</i> ) (P=0.04); rs7078160 ( <i>VAX1</i> ) - cg09487139 (P=0.05); rs560426 ( <i>ABCA4/ARHGAP29</i> ) - cg25196715 ( <i>ABCA4/ARHGAP29</i> ) (P=0,03). Paired T-test showed significant differences for cg06873343 ( <i>TTYH3</i> ) (P=0.04); cg17103269 ( <i>LPIN3</i> ) (P=0.002), and cg19191560 ( <i>LGR4</i> ) (P=0.05).</p><p><strong>Conclusions: </strong>Our results confirm previous evidence that some of the common non-coding variants detected through GWAS studies can influence the risk of OFCs via epigenetic mechanisms, such as DNAm, which can ultimately affect and regulate gene expression. Given the large prevalence of non-coding SNPs in most OFCs genome wide association studies, our findings can potentially address major knowledge gaps, like missing heritability, reduced penetrance, and variable expressivity associated with OFCs phenotypes.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.11.25322053
Emil Jørsboe, Phil Kubitz, Julius Honecker, Andrea Flaccus, Dagmar Mvondo, Matthias Raggi, Torben Hansen, Hans Hauner, Matthias Blüher, Philip D Charles, Cecilia M Lindgren, Christoffer Nellåker, Melina Claussnitzer
Fat distribution and macro structure of white adipose tissue are important factors in predicting obesity-associated diseases, but cellular microstructure of white adipose tissue has been less explored. To investigate the relationship between adipocyte size and obesity-related traits, and their underlying disease-driving genetic associations, we performed the largest study of automatic adipocyte phenotyping linking histological measurements and genetics to date. We introduce deep learning based methods for scalable and accurate semantic segmentation of subcutaneous and visceral adipose tissue histology samples (N=2,667) across 5 independent cohorts, including data from 9,000 whole slide images, with over 27 million adipocytes. Estimates of mean size of adipocytes were validated against Glastonbury et al. 2020. We show that adipocyte hypertrophy correlates with an adverse metabolic profile with increased levels of leptin, fasting plasma glucose, glycated hemoglobin and triglycerides, and decreased levels of adiponectin and HDL cholesterol. We performed the largest GWAS (NSubcutaneous = 2066, NVisceral = 1878) and subsequent meta-analysis of mean adipocyte area, and find two genome-wide significant loci (rs73184721, rs200047724) associated with increased 95%-quantile adipocyte size in respectively visceral and subcutaneous adipose tissue. Stratifying by sex, in females we find two genome-wide significant loci, with one variant (rs140503338) associated with increased mean adipocyte size in subcutaneous adipose tissue, and the other (rs11656704) is associated with decreased 95%-quantile adipocyte size in visceral adipose tissue.
{"title":"Deep Learning Derived Adipocyte Size Reveals Adipocyte Hypertrophy is under Genetic Control.","authors":"Emil Jørsboe, Phil Kubitz, Julius Honecker, Andrea Flaccus, Dagmar Mvondo, Matthias Raggi, Torben Hansen, Hans Hauner, Matthias Blüher, Philip D Charles, Cecilia M Lindgren, Christoffer Nellåker, Melina Claussnitzer","doi":"10.1101/2025.02.11.25322053","DOIUrl":"10.1101/2025.02.11.25322053","url":null,"abstract":"<p><p>Fat distribution and macro structure of white adipose tissue are important factors in predicting obesity-associated diseases, but cellular microstructure of white adipose tissue has been less explored. To investigate the relationship between adipocyte size and obesity-related traits, and their underlying disease-driving genetic associations, we performed the largest study of automatic adipocyte phenotyping linking histological measurements and genetics to date. We introduce deep learning based methods for scalable and accurate semantic segmentation of subcutaneous and visceral adipose tissue histology samples (N=2,667) across 5 independent cohorts, including data from 9,000 whole slide images, with over 27 million adipocytes. Estimates of mean size of adipocytes were validated against Glastonbury et al. 2020. We show that adipocyte hypertrophy correlates with an adverse metabolic profile with increased levels of leptin, fasting plasma glucose, glycated hemoglobin and triglycerides, and decreased levels of adiponectin and HDL cholesterol. We performed the largest GWAS (N<sub>Subcutaneous</sub> = 2066, N<sub>Visceral</sub> = 1878) and subsequent meta-analysis of mean adipocyte area, and find two genome-wide significant loci (rs73184721, rs200047724) associated with increased 95%-quantile adipocyte size in respectively visceral and subcutaneous adipose tissue. Stratifying by sex, in females we find two genome-wide significant loci, with one variant (rs140503338) associated with increased mean adipocyte size in subcutaneous adipose tissue, and the other (rs11656704) is associated with decreased 95%-quantile adipocyte size in visceral adipose tissue.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844614/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.10.25321991
Colin Xu, Thomas T Kim, Irving Kirsch, Martin Plöderl, Jay D Amsterdam, H Edmund Pigott
Background: The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial was designed to give guidance in selecting the best next-step treatment for depressed patients who did not remit during their first, and/or subsequent, antidepressant trial, with up to four trials per patient. Our prior research documented protocol violations which inflated STAR*D's reported cumulative remission rate by 91.4%. A similar reanalysis of the step-2 drug-switch trial has not been done until now.
Methods: We reanalyzed the patient-level dataset of STAR*D's drug-switch treatment therapies-with fidelity to the original research protocol and related publications-to determine whether there were clinically-relevant differences in results compared to the original publication.
Results: While our reanalysis largely comported with STAR*D's published findings of no significant differences between drug-switch treatments, we found the following discrepancies: Lower than reported step-2 remission rates ranging from 16.2 to 19.3% (versus 17.6 to 24.8%); A significant increase in treatment-emergent suicidal ideation during the step-2 drug-switch therapies ranging from 11.2 to 15.0% compared to step-1 citalopram treatment (9.0%); A four times greater number of severe suicidal behaviors reported by the treating clinicians compared to the published suicide-related Serious Adverse Events (16 versus 4); and A sustained remission rate of only 3.1 to 8.4%.
Conclusion: Compared to the original publication, our reanalysis found lower remission rates and more suicidal risk than reported. This adds to the discrepancies found in our prior reanalysis and also to the finding that switching antidepressants is not well supported by the evidence.
{"title":"Restoring STAR*D: A Reanalysis of Drug-Switch Therapy After Failed SSRI Treatment Using Patient-Level Data with Fidelity to the Original STAR*D Research Protocol.","authors":"Colin Xu, Thomas T Kim, Irving Kirsch, Martin Plöderl, Jay D Amsterdam, H Edmund Pigott","doi":"10.1101/2025.02.10.25321991","DOIUrl":"10.1101/2025.02.10.25321991","url":null,"abstract":"<p><strong>Background: </strong>The <i>Sequenced Treatment Alternatives to Relieve Depression</i> (STAR*D) trial was designed to give guidance in selecting the best next-step treatment for depressed patients who did not remit during their first, and/or subsequent, antidepressant trial, with up to four trials per patient. Our prior research documented protocol violations which inflated STAR*D's reported cumulative remission rate by 91.4%. A similar reanalysis of the step-2 drug-switch trial has not been done until now.</p><p><strong>Methods: </strong>We reanalyzed the patient-level dataset of STAR*D's drug-switch treatment therapies-<i>with fidelity to the original research protocol and related publications</i>-to determine whether there were clinically-relevant differences in results compared to the original publication.</p><p><strong>Results: </strong>While our reanalysis largely comported with STAR*D's published findings of no significant differences between drug-switch treatments, we found the following discrepancies: Lower than reported step-2 remission rates ranging from 16.2 to 19.3% (versus 17.6 to 24.8%); A significant increase in treatment-emergent suicidal ideation during the step-2 drug-switch therapies ranging from 11.2 to 15.0% compared to step-1 citalopram treatment (9.0%); A four times greater number of severe suicidal behaviors reported by the treating clinicians compared to the published suicide-related Serious Adverse Events (16 versus 4); and A sustained remission rate of only 3.1 to 8.4%.</p><p><strong>Conclusion: </strong>Compared to the original publication, our reanalysis found lower remission rates and more suicidal risk than reported. This adds to the discrepancies found in our prior reanalysis and also to the finding that switching antidepressants is not well supported by the evidence.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844611/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.09.25321973
Julia Guillebaud, Janin Nouhin, Vibol Hul, Thavry Hoem, Oudamdaniel Yanneth, Mala Sim, Limmey Khun, Y Phalla, Sreymom Ken, Leakhena Pum, Reaksa Lim, Channa Meng, Kimtuo Chhel, Sithun Nuon, Sreyleak Hoem, Kunthy Nguon, Malen Chan, Sowath Ly, Erik A Karlsson, Jean-Marc Reynes, Anavaj Sakunthabhai, Philippe Dussart, Veasna Duong
Background: Rodent-borne viruses, including hantaviruses, arenaviruses, and rodent hepatitis virus (HEV-C), pose significant health threats to humans, causing severe diseases such as hepatitis, respiratory illness, and hemorrhagic fevers. In Cambodia, data on these viruses remain limited, and their burdens on human health are unknown. This study investigated the presences of these viruses in rodents and assessed potential human exposure across diverse environmental and socio-economic contexts in Cambodia.
Methods: The study was conducted in urban, semi-urban, and rural areas of Cambodia during the rainy (2020) and dry seasons (2022). Rodents were screened for arenavirus, hantavirus, and HEV-C using RT-PCR. Human serum samples from the same site were tested for IgG antibodies using ELISA. Factors associated with virus spillover into humans were analyzed.
Findings: Among 750 rodents, 9.7% carried at least one virus: 5.2% arenavirus, 3.3% hantavirus, and 1.9% HEV-C. Infection rates were highest in urban (14.5%), followed by semi-urban (11.9%) and rural (2.1%) interfaces. Arenavirus was more prevalent during the rainy season, while hantavirus and HEV-C remained consistent across seasons. Seroprevalence in human was 12.7% for arenavirus, 10.0% for hantavirus, and 24.2% for HEV. Higher arenavirus seroprevalence was associated with urban recidency and lower education level. Hantavirus seroprevalence was associated with urban residency, acute hepatitis history, and flood-prone living areas. HEV seroprevalence increased with urban residency, increasing age, and medical condition history.
Interpretation: Our findings highlighted the need for rodent control, improved market infrastructure, enhanced waste management, and public awareness on hygiene practices and zoonotic risks, especially in urban and high-risk areas.
{"title":"Burden of rodent-borne viruses in rodents and zoonotic risk in human in Cambodia: a descriptive and observational study.","authors":"Julia Guillebaud, Janin Nouhin, Vibol Hul, Thavry Hoem, Oudamdaniel Yanneth, Mala Sim, Limmey Khun, Y Phalla, Sreymom Ken, Leakhena Pum, Reaksa Lim, Channa Meng, Kimtuo Chhel, Sithun Nuon, Sreyleak Hoem, Kunthy Nguon, Malen Chan, Sowath Ly, Erik A Karlsson, Jean-Marc Reynes, Anavaj Sakunthabhai, Philippe Dussart, Veasna Duong","doi":"10.1101/2025.02.09.25321973","DOIUrl":"10.1101/2025.02.09.25321973","url":null,"abstract":"<p><strong>Background: </strong>Rodent-borne viruses, including hantaviruses, arenaviruses, and rodent hepatitis virus (HEV-C), pose significant health threats to humans, causing severe diseases such as hepatitis, respiratory illness, and hemorrhagic fevers. In Cambodia, data on these viruses remain limited, and their burdens on human health are unknown. This study investigated the presences of these viruses in rodents and assessed potential human exposure across diverse environmental and socio-economic contexts in Cambodia.</p><p><strong>Methods: </strong>The study was conducted in urban, semi-urban, and rural areas of Cambodia during the rainy (2020) and dry seasons (2022). Rodents were screened for arenavirus, hantavirus, and HEV-C using RT-PCR. Human serum samples from the same site were tested for IgG antibodies using ELISA. Factors associated with virus spillover into humans were analyzed.</p><p><strong>Findings: </strong>Among 750 rodents, 9.7% carried at least one virus: 5.2% arenavirus, 3.3% hantavirus, and 1.9% HEV-C. Infection rates were highest in urban (14.5%), followed by semi-urban (11.9%) and rural (2.1%) interfaces. Arenavirus was more prevalent during the rainy season, while hantavirus and HEV-C remained consistent across seasons. Seroprevalence in human was 12.7% for arenavirus, 10.0% for hantavirus, and 24.2% for HEV. Higher arenavirus seroprevalence was associated with urban recidency and lower education level. Hantavirus seroprevalence was associated with urban residency, acute hepatitis history, and flood-prone living areas. HEV seroprevalence increased with urban residency, increasing age, and medical condition history.</p><p><strong>Interpretation: </strong>Our findings highlighted the need for rodent control, improved market infrastructure, enhanced waste management, and public awareness on hygiene practices and zoonotic risks, especially in urban and high-risk areas.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1101/2025.02.09.25321620
Jacob Berkowitz, Davy Weissenbacher, Apoorva Srinivasan, Nadine A Friedrich, Jose Miguel Acitores Cortina, Sophia Kivelson, Graciela Gonzalez Hernandez, Nicholas P Tatonetti
Large language models (LLMs) integrate knowledge from diverse sources into a single set of internal weights. However, these representations are difficult to interpret, complicating our understanding of the models' learning capabilities. Sparse autoencoders (SAEs) linearize LLM embeddings, creating monosemantic features that both provide insight into the model's comprehension and simplify downstream machine learning tasks. These features are especially important in biomedical applications where explainability is critical. Here, we evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs). We transform hidden-state embeddings of drug names from Gemma2-9b-it into interpretable features and train a linear classifier on these features to classify ADR likelihood, evaluating against an established benchmark. These embeddings provide strong predictive performance, giving AUC-ROC of 0.957 for identifying acute kidney injury, 0.902 for acute liver injury, 0.954 for acute myocardial infarction, and 0.963 for gastrointestinal bleeds. Notably, there are no significant differences (p > 0.05) in performance between the simple linear classifiers built on SAE outputs and neural networks trained on the raw embeddings, suggesting that the information lost in reconstruction is minimal. This finding suggests that SAE-derived representations retain the essential information from the LLM while reducing model complexity, paving the way for more transparent, compute-efficient strategies. We believe that this approach can help synthesize the biomedical knowledge our models learn in training and be used for downstream applications, such as expanding reference sets for pharmacovigilance.
{"title":"Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge.","authors":"Jacob Berkowitz, Davy Weissenbacher, Apoorva Srinivasan, Nadine A Friedrich, Jose Miguel Acitores Cortina, Sophia Kivelson, Graciela Gonzalez Hernandez, Nicholas P Tatonetti","doi":"10.1101/2025.02.09.25321620","DOIUrl":"10.1101/2025.02.09.25321620","url":null,"abstract":"<p><p>Large language models (LLMs) integrate knowledge from diverse sources into a single set of internal weights. However, these representations are difficult to interpret, complicating our understanding of the models' learning capabilities. Sparse autoencoders (SAEs) linearize LLM embeddings, creating monosemantic features that both provide insight into the model's comprehension and simplify downstream machine learning tasks. These features are especially important in biomedical applications where explainability is critical. Here, we evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs). We transform hidden-state embeddings of drug names from Gemma2-9b-it into interpretable features and train a linear classifier on these features to classify ADR likelihood, evaluating against an established benchmark. These embeddings provide strong predictive performance, giving AUC-ROC of 0.957 for identifying acute kidney injury, 0.902 for acute liver injury, 0.954 for acute myocardial infarction, and 0.963 for gastrointestinal bleeds. Notably, there are no significant differences (p > 0.05) in performance between the simple linear classifiers built on SAE outputs and neural networks trained on the raw embeddings, suggesting that the information lost in reconstruction is minimal. This finding suggests that SAE-derived representations retain the essential information from the LLM while reducing model complexity, paving the way for more transparent, compute-efficient strategies. We believe that this approach can help synthesize the biomedical knowledge our models learn in training and be used for downstream applications, such as expanding reference sets for pharmacovigilance.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}