Background: Mobile health apps often require the collection of identifiable information. Subsequently, this places users at significant risk of privacy breaches when the data are misused or not adequately stored and secured. These issues are especially concerning for users of reproductive health apps in the United States as protection of sensitive user information is affected by shifting governmental regulations such as the overruling of Roe v Wade and varying state-level abortion laws. Limited studies have analyzed the data privacy policies of these apps and considered the safety issues associated with a lack of user transparency and protection.
Objective: This study aimed to evaluate popular reproductive health apps, assess their individual privacy policies, analyze federal and state data privacy laws governing these apps in the United States and the European Union (EU), and recommend best practices for users and app developers to ensure user data safety.
Methods: In total, 4 popular reproductive health apps-Clue, Flo, Period Tracker by GP Apps, and Stardust-as identified from multiple web sources were selected through convenience sampling. This selection ensured equal representation of apps based in the United States and the EU, facilitating a comparative analysis of data safety practices under differing privacy laws. A qualitative content analysis of the apps and a review of the literature on data use policies, governmental data privacy regulations, and best practices for mobile app data privacy were conducted between January 2023 and July 2023. The apps were downloaded and systematically evaluated using the Transparency, Health Content, Excellent Technical Content, Security/Privacy, Usability, Subjective (THESIS) evaluation tool to assess their privacy and security practices.
Results: The overall privacy and security scores for the EU-based apps, Clue and Flo, were both 3.5 of 5. In contrast, the US-based apps, Period Tracker by GP Apps and Stardust, received scores of 2 and 4.5, respectively. Major concerns regarding privacy and data security primarily involved the apps' use of IP address tracking and the involvement of third parties for advertising and marketing purposes, as well as the potential misuse of data.
Conclusions: Currently, user expectations for data privacy in reproductive health apps are not being met. Despite stricter privacy policies, particularly with state-specific adaptations, apps must be transparent about data storage and third-party sharing even if just for marketing or analytical purposes. Given the sensitivity of reproductive health data and recent state restrictions on abortion, apps should minimize data collection, exceed encryption and anonymization standards, and reduce IP address tracking to better protect users.
Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology.
Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs' ability to generate evidence-based treatment recommendations using PubMed references.
Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses.
Results: A total of 75% of the referenced articles were properly cited from PubMed, while 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments.
Conclusions: This study demonstrates how retrieval augmented generation-enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline.
Background: While the COVID-19 pandemic has induced massive discussion of available medications on social media, traditional studies focused only on limited aspects, such as public opinions, and endured reporting biases, inefficiency, and long collection times.
Objective: Harnessing drug-related data posted on social media in real-time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study aimed to develop a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19-related drugs.
Methods: This study constructed a full pipeline for COVID-19-related drug tweet analysis, using pretrained language model-based NLP techniques as the backbone. This pipeline is architecturally composed of 4 core modules: named entity recognition and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names for time trend analysis, target sentiment analysis to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to dig potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to the COVID-19 pandemic and drug therapies between February 1, 2020, and April 30, 2022.
Results: From a dataset comprising 169,659,956 COVID-19-related tweets from 103,682,686 users, our named entity recognition model identified 2,124,757 relevant tweets sourced from 1,800,372 unique users, and the top 5 most-discussed drugs: ivermectin, hydroxychloroquine, remdesivir, zinc, and vitamin D. Time trend analysis revealed that the public focused mostly on repurposed drugs (ie, hydroxychloroquine and ivermectin), and least on remdesivir, the only officially approved drug among the 5. Sentiment analysis of the top 5 most-discussed drugs revealed that public perception was predominantly shaped by celebrity endorsements, media hot spots, and governmental directives rather than empirical evidence of drug efficacy. Topic analysis obtained 15 general topics of overall drug-related tweets, with "clinical treatment effects of drugs" and "physical symptoms" emerging as the most frequently discussed topics. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use.
Conclusions: This study shows that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media-based public health analytics.
Background: Chronic obstructive pulmonary disease (COPD) ranks among the leading causes of global mortality, and COVID-19 has intensified its challenges. Beyond the evident physical effects, the long-term psychological effects of COVID-19 are not fully understood.
Objective: This study aims to unveil the long-term psychological trends and patterns in populations with COPD throughout the COVID-19 pandemic and beyond via large-scale Twitter mining.
Methods: A 2-stage deep learning framework was designed in this study. The first stage involved a data retrieval procedure to identify COPD and non-COPD users and to collect their daily tweets. In the second stage, a data mining procedure leveraged various deep learning algorithms to extract demographic characteristics, hashtags, topics, and sentiments from the collected tweets. Based on these data, multiple analytical methods, namely, odds ratio (OR), difference-in-difference, and emotion pattern methods, were used to examine the psychological effects.
Results: A cohort of 15,347 COPD users was identified from the data that we collected in the Twitter database, comprising over 2.5 billion tweets, spanning from January 2020 to June 2023. The attentiveness toward COPD was significantly affected by gender, age, and occupation; it was lower in females (OR 0.91, 95% CI 0.87-0.94; P<.001) than in males, higher in adults aged 40 years and older (OR 7.23, 95% CI 6.95-7.52; P<.001) than in those younger than 40 years, and higher in individuals with lower socioeconomic status (OR 1.66, 95% CI 1.60-1.72; P<.001) than in those with higher socioeconomic status. Across the study duration, COPD users showed decreasing concerns for COVID-19 and increasing health-related concerns. After the middle phase of COVID-19 (July 2021), a distinct decrease in sentiments among COPD users contrasted sharply with the upward trend among non-COPD users. Notably, in the post-COVID era (June 2023), COPD users showed reduced levels of joy and trust and increased levels of fear compared to their levels of joy and trust in the middle phase of COVID-19. Moreover, males, older adults, and individuals with lower socioeconomic status showed heightened fear compared to their counterparts.
Conclusions: Our data analysis results suggest that populations with COPD experienced heightened mental stress in the post-COVID era. This underscores the importance of developing tailored interventions and support systems that account for diverse population characteristics.
Background: With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support.
Objective: The objective of this study was to assess the competency of 3 widely used LLMs to distinguish appropriate versus inappropriate responses when engaging individuals who exhibit suicidal ideation.
Methods: This observational, cross-sectional study evaluated responses to the revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Data collection and analyses were conducted in July 2024. A common training module for mental health professionals, SIRI-2 provides 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by two clinician responses. Clinician responses were scored from -3 (highly inappropriate) to +3 (highly appropriate). All 3 LLMs were provided with a standardized set of instructions to rate clinician responses. We compared LLM responses to those of expert suicidologists, conducting linear regression analyses and converting LLM responses to z scores to identify outliers (z score>1.96 or <-1.96; P<0.05). Furthermore, we compared final SIRI-2 scores to those produced by health professionals in prior studies.
Results: All 3 LLMs rated responses as more appropriate than ratings provided by expert suicidologists. The item-level mean difference was 0.86 for ChatGPT (95% CI 0.61-1.12; P<.001), 0.61 for Claude (95% CI 0.41-0.81; P<.001), and 0.73 for Gemini (95% CI 0.35-1.11; P<.001). In terms of z scores, 19% (9 of 48) of ChatGPT responses were outliers when compared to expert suicidologists. Similarly, 11% (5 of 48) of Claude responses were outliers compared to expert suicidologists. Additionally, 36% (17 of 48) of Gemini responses were outliers compared to expert suicidologists. ChatGPT produced a final SIRI-2 score of 45.7, roughly equivalent to master's level counselors in prior studies. Claude produced an SIRI-2 score of 36.7, exceeding prior performance of mental health professionals after suicide intervention skills training. Gemini produced a final SIRI-2 score of 54.5, equivalent to untrained K-12 school staff.
Conclusions: Current versions of 3 major LLMs demonstrated an upward bias in their evaluations of appropriate responses to suicidal ideation; however, 2 of the 3 models performed equivalent to or exceeded the performance of mental health professionals.