首页 > 最新文献

Journal of Medical Internet Research最新文献

英文 中文
Effectiveness of Telehealth Versus In-Person Informed Consent: Randomized Study of Comprehension and Decision-Making. 远程医疗与当面知情同意的有效性:关于理解和决策的随机研究。
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/63473
Saif Khairat, Paige Ottmar, Prabal Chourasia, Jihad Obeid
<p><strong>Background: </strong>Obtaining informed consent (IC) is vital for ethically and effectively recruiting participants in research projects. However, traditional in-person IC approaches encounter notable obstacles, such as geographic barriers, transportation expenses, and literacy challenges, which can lead to delays in enrollment and increased costs. Telehealth, especially teleconsent, offers a potential way to overcome these obstacles by facilitating the IC process in a digital setting. Nonetheless, there are concerns about whether teleconsent can achieve levels of understanding and involvement that are equivalent to those of in-person IC meetings.</p><p><strong>Objective: </strong>This study aims to evaluate comprehension and decision-making in participants undergoing teleconsent versus traditional in-person IC. We used validated assessments to determine whether teleconsent is a viable alternative that maintains participants' understanding and decision-making abilities.</p><p><strong>Methods: </strong>A randomized comparative study design was used, recruiting potential participants for a parent study assessing patient experiences with patient portals. Participants were randomly assigned to 2 groups: teleconsent and in-person consent. The teleconsent group used Doxy.me software, allowing real-time interaction between researchers and participants while reviewing and electronically signing the IC documents. Recruitment involved using an institutional web-based platform to identify interested individuals, who were then contacted to assess eligibility and gather demographic information. The Decision-Making Control Instrument (DMCI) survey was used to assess the perceived voluntariness, trust, and decision self-efficacy. The Quality of Informed Consent (QuIC) was used to measure the comprehension level of the consent form. The validated Short Assessment of Health Literacy-English tool was used to measure participants' health literacy levels.</p><p><strong>Results: </strong>A total of 64 participants were enrolled in the study, with 32 in the teleconsent group and 32 in the in-person group. Of 64 participants, 32 (50%) were in the teleconsent group, 54 (84.4%) were females, 44 (68.7%) were aged 18-34 years, 50 (78.1%) were White, and 31 (48.4%) had a bachelor degree. The mean SAHL-E scores were different between the teleconsent and in-person groups (16.72, SD 1.88 vs 17.38, SD 0.95; P=.03). No significant differences were found between the average scores at baseline and follow-up for QuIC part A (P=.29), QuIC part B (P=.25), and DMCI (P=.38) within the teleconsent and in-person groups. Additionally, there were no significant differences in QuIC or DMCI between subgroups based on age, sex, and ethnicity.</p><p><strong>Conclusions: </strong>This study assessed the effectiveness of IC processes through telehealth compared to traditional in-person visits. Findings indicate that telehealth offers similar participant understanding and engagement wh
{"title":"Effectiveness of Telehealth Versus In-Person Informed Consent: Randomized Study of Comprehension and Decision-Making.","authors":"Saif Khairat, Paige Ottmar, Prabal Chourasia, Jihad Obeid","doi":"10.2196/63473","DOIUrl":"https://doi.org/10.2196/63473","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Obtaining informed consent (IC) is vital for ethically and effectively recruiting participants in research projects. However, traditional in-person IC approaches encounter notable obstacles, such as geographic barriers, transportation expenses, and literacy challenges, which can lead to delays in enrollment and increased costs. Telehealth, especially teleconsent, offers a potential way to overcome these obstacles by facilitating the IC process in a digital setting. Nonetheless, there are concerns about whether teleconsent can achieve levels of understanding and involvement that are equivalent to those of in-person IC meetings.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to evaluate comprehension and decision-making in participants undergoing teleconsent versus traditional in-person IC. We used validated assessments to determine whether teleconsent is a viable alternative that maintains participants' understanding and decision-making abilities.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A randomized comparative study design was used, recruiting potential participants for a parent study assessing patient experiences with patient portals. Participants were randomly assigned to 2 groups: teleconsent and in-person consent. The teleconsent group used Doxy.me software, allowing real-time interaction between researchers and participants while reviewing and electronically signing the IC documents. Recruitment involved using an institutional web-based platform to identify interested individuals, who were then contacted to assess eligibility and gather demographic information. The Decision-Making Control Instrument (DMCI) survey was used to assess the perceived voluntariness, trust, and decision self-efficacy. The Quality of Informed Consent (QuIC) was used to measure the comprehension level of the consent form. The validated Short Assessment of Health Literacy-English tool was used to measure participants' health literacy levels.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;A total of 64 participants were enrolled in the study, with 32 in the teleconsent group and 32 in the in-person group. Of 64 participants, 32 (50%) were in the teleconsent group, 54 (84.4%) were females, 44 (68.7%) were aged 18-34 years, 50 (78.1%) were White, and 31 (48.4%) had a bachelor degree. The mean SAHL-E scores were different between the teleconsent and in-person groups (16.72, SD 1.88 vs 17.38, SD 0.95; P=.03). No significant differences were found between the average scores at baseline and follow-up for QuIC part A (P=.29), QuIC part B (P=.25), and DMCI (P=.38) within the teleconsent and in-person groups. Additionally, there were no significant differences in QuIC or DMCI between subgroups based on age, sex, and ethnicity.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This study assessed the effectiveness of IC processes through telehealth compared to traditional in-person visits. Findings indicate that telehealth offers similar participant understanding and engagement wh","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e63473"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of Reproductive Health Apps' Data Privacy Policies and the Risks Posed to Users: Qualitative Content Analysis.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/51517
Nina Zadushlivy, Rizwana Biviji, Karmen S Williams

Background: Mobile health apps often require the collection of identifiable information. Subsequently, this places users at significant risk of privacy breaches when the data are misused or not adequately stored and secured. These issues are especially concerning for users of reproductive health apps in the United States as protection of sensitive user information is affected by shifting governmental regulations such as the overruling of Roe v Wade and varying state-level abortion laws. Limited studies have analyzed the data privacy policies of these apps and considered the safety issues associated with a lack of user transparency and protection.

Objective: This study aimed to evaluate popular reproductive health apps, assess their individual privacy policies, analyze federal and state data privacy laws governing these apps in the United States and the European Union (EU), and recommend best practices for users and app developers to ensure user data safety.

Methods: In total, 4 popular reproductive health apps-Clue, Flo, Period Tracker by GP Apps, and Stardust-as identified from multiple web sources were selected through convenience sampling. This selection ensured equal representation of apps based in the United States and the EU, facilitating a comparative analysis of data safety practices under differing privacy laws. A qualitative content analysis of the apps and a review of the literature on data use policies, governmental data privacy regulations, and best practices for mobile app data privacy were conducted between January 2023 and July 2023. The apps were downloaded and systematically evaluated using the Transparency, Health Content, Excellent Technical Content, Security/Privacy, Usability, Subjective (THESIS) evaluation tool to assess their privacy and security practices.

Results: The overall privacy and security scores for the EU-based apps, Clue and Flo, were both 3.5 of 5. In contrast, the US-based apps, Period Tracker by GP Apps and Stardust, received scores of 2 and 4.5, respectively. Major concerns regarding privacy and data security primarily involved the apps' use of IP address tracking and the involvement of third parties for advertising and marketing purposes, as well as the potential misuse of data.

Conclusions: Currently, user expectations for data privacy in reproductive health apps are not being met. Despite stricter privacy policies, particularly with state-specific adaptations, apps must be transparent about data storage and third-party sharing even if just for marketing or analytical purposes. Given the sensitivity of reproductive health data and recent state restrictions on abortion, apps should minimize data collection, exceed encryption and anonymization standards, and reduce IP address tracking to better protect users.

{"title":"Exploration of Reproductive Health Apps' Data Privacy Policies and the Risks Posed to Users: Qualitative Content Analysis.","authors":"Nina Zadushlivy, Rizwana Biviji, Karmen S Williams","doi":"10.2196/51517","DOIUrl":"https://doi.org/10.2196/51517","url":null,"abstract":"<p><strong>Background: </strong>Mobile health apps often require the collection of identifiable information. Subsequently, this places users at significant risk of privacy breaches when the data are misused or not adequately stored and secured. These issues are especially concerning for users of reproductive health apps in the United States as protection of sensitive user information is affected by shifting governmental regulations such as the overruling of Roe v Wade and varying state-level abortion laws. Limited studies have analyzed the data privacy policies of these apps and considered the safety issues associated with a lack of user transparency and protection.</p><p><strong>Objective: </strong>This study aimed to evaluate popular reproductive health apps, assess their individual privacy policies, analyze federal and state data privacy laws governing these apps in the United States and the European Union (EU), and recommend best practices for users and app developers to ensure user data safety.</p><p><strong>Methods: </strong>In total, 4 popular reproductive health apps-Clue, Flo, Period Tracker by GP Apps, and Stardust-as identified from multiple web sources were selected through convenience sampling. This selection ensured equal representation of apps based in the United States and the EU, facilitating a comparative analysis of data safety practices under differing privacy laws. A qualitative content analysis of the apps and a review of the literature on data use policies, governmental data privacy regulations, and best practices for mobile app data privacy were conducted between January 2023 and July 2023. The apps were downloaded and systematically evaluated using the Transparency, Health Content, Excellent Technical Content, Security/Privacy, Usability, Subjective (THESIS) evaluation tool to assess their privacy and security practices.</p><p><strong>Results: </strong>The overall privacy and security scores for the EU-based apps, Clue and Flo, were both 3.5 of 5. In contrast, the US-based apps, Period Tracker by GP Apps and Stardust, received scores of 2 and 4.5, respectively. Major concerns regarding privacy and data security primarily involved the apps' use of IP address tracking and the involvement of third parties for advertising and marketing purposes, as well as the potential misuse of data.</p><p><strong>Conclusions: </strong>Currently, user expectations for data privacy in reproductive health apps are not being met. Despite stricter privacy policies, particularly with state-specific adaptations, apps must be transparent about data storage and third-party sharing even if just for marketing or analytical purposes. Given the sensitivity of reproductive health data and recent state restrictions on abortion, apps should minimize data collection, exceed encryption and anonymization standards, and reduce IP address tracking to better protect users.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e51517"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/64364
Eliza Berman, Holly Sundberg Malek, Michael Bitzer, Nisar Malek, Carsten Eickhoff

Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology.

Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs' ability to generate evidence-based treatment recommendations using PubMed references.

Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses.

Results: A total of 75% of the referenced articles were properly cited from PubMed, while 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments.

Conclusions: This study demonstrates how retrieval augmented generation-enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline.

{"title":"Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study.","authors":"Eliza Berman, Holly Sundberg Malek, Michael Bitzer, Nisar Malek, Carsten Eickhoff","doi":"10.2196/64364","DOIUrl":"https://doi.org/10.2196/64364","url":null,"abstract":"<p><strong>Background: </strong>Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology.</p><p><strong>Objective: </strong>In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs' ability to generate evidence-based treatment recommendations using PubMed references.</p><p><strong>Methods: </strong>We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses.</p><p><strong>Results: </strong>A total of 75% of the referenced articles were properly cited from PubMed, while 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments.</p><p><strong>Conclusions: </strong>This study demonstrates how retrieval augmented generation-enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e64364"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Publishing Identifiable Patient Photographs in the Digital Age: Focus Group Study of Patients, Doctors, and Medical Students. 在数字时代发布可识别的患者照片:病人、医生和医科学生焦点小组研究。
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/59970
Marija Roguljić, Dina Šimunović, Ivan Buljan, Marija Franka Žuljević, Antonela Turić, Ana Marušić
<p><strong>Background: </strong>The publication of patient photographs in scientific journals continues to pose challenges regarding privacy and confidentiality, despite existing ethical guidelines. Recent studies indicate that key stakeholders-including health care professionals and patients-lack sufficient awareness of the ethical considerations surrounding patient photographs, particularly in the context of digital scientific publishing.</p><p><strong>Objective: </strong>This qualitative study aims to explore how different stakeholders-patients, medical students, and doctors-understand the challenges of patient privacy and confidentiality in scientific publications. Additionally, it sought to identify key areas for future research, particularly in the context of online, open-access articles.</p><p><strong>Methods: </strong>We conducted 4 online focus groups due to COVID-19 restrictions: 1 with patients, 2 with final-year medical students, and 1 with head and neck physicians and dentists who regularly handle patient photographs. Participants were invited via email, and those who accepted took part in discussions lasting approximately 1 hour. All interviews were recorded and transcribed for analysis. All 4 focus groups were asked the same set of questions, covering the following topics: (1) consent for publishing patient photographs, (2) information on guidelines and standards for consent to publish patient photographs, (3) the importance of informed consent for various purposes, (4) methods for deidentifying patient photographs, and (5) the use of patient photographs in online, open-access publishing.</p><p><strong>Results: </strong>Three key themes emerged from the focus group discussions: (1) no definitive resources or practical recommendations available, (2) online publishing of patient images makes them more open to misuse, and (3) anonymization techniques have limitations. All stakeholder groups expressed a lack of knowledge about online publishing in general and concerns about the fate of patient photographs in the digital environment after publication. They emphasized the need for increased awareness among all relevant stakeholders and more stringent procedures for obtaining informed patient consent before publishing photographs. While they recognized the usefulness of image anonymization techniques in protecting patient identity, they were also aware that current methods remain insufficient to ensure complete anonymity.</p><p><strong>Conclusions: </strong>This qualitative study highlights that publishing patient photographs in open-access scientific journals is an important, serious, and largely unexplored issue, with all stakeholders still uncertain about the best ways to protect patient privacy. Clinicians, publishers, and journal editors should not only implement best practices to ensure fully informed patient consent for publishing identifiable photographs but also develop technical and governance safeguards. Future quantitative stu
{"title":"Publishing Identifiable Patient Photographs in the Digital Age: Focus Group Study of Patients, Doctors, and Medical Students.","authors":"Marija Roguljić, Dina Šimunović, Ivan Buljan, Marija Franka Žuljević, Antonela Turić, Ana Marušić","doi":"10.2196/59970","DOIUrl":"https://doi.org/10.2196/59970","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The publication of patient photographs in scientific journals continues to pose challenges regarding privacy and confidentiality, despite existing ethical guidelines. Recent studies indicate that key stakeholders-including health care professionals and patients-lack sufficient awareness of the ethical considerations surrounding patient photographs, particularly in the context of digital scientific publishing.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This qualitative study aims to explore how different stakeholders-patients, medical students, and doctors-understand the challenges of patient privacy and confidentiality in scientific publications. Additionally, it sought to identify key areas for future research, particularly in the context of online, open-access articles.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We conducted 4 online focus groups due to COVID-19 restrictions: 1 with patients, 2 with final-year medical students, and 1 with head and neck physicians and dentists who regularly handle patient photographs. Participants were invited via email, and those who accepted took part in discussions lasting approximately 1 hour. All interviews were recorded and transcribed for analysis. All 4 focus groups were asked the same set of questions, covering the following topics: (1) consent for publishing patient photographs, (2) information on guidelines and standards for consent to publish patient photographs, (3) the importance of informed consent for various purposes, (4) methods for deidentifying patient photographs, and (5) the use of patient photographs in online, open-access publishing.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Three key themes emerged from the focus group discussions: (1) no definitive resources or practical recommendations available, (2) online publishing of patient images makes them more open to misuse, and (3) anonymization techniques have limitations. All stakeholder groups expressed a lack of knowledge about online publishing in general and concerns about the fate of patient photographs in the digital environment after publication. They emphasized the need for increased awareness among all relevant stakeholders and more stringent procedures for obtaining informed patient consent before publishing photographs. While they recognized the usefulness of image anonymization techniques in protecting patient identity, they were also aware that current methods remain insufficient to ensure complete anonymity.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This qualitative study highlights that publishing patient photographs in open-access scientific journals is an important, serious, and largely unexplored issue, with all stakeholders still uncertain about the best ways to protect patient privacy. Clinicians, publishers, and journal editors should not only implement best practices to ensure fully informed patient consent for publishing identifiable photographs but also develop technical and governance safeguards. Future quantitative stu","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e59970"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association of Screen Content With Early Development Among Preschoolers in Shanghai: 7-Day Monitoring Study With Auto Intelligent Technology.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/65343
Hao Chen, Yi Sun, Sha Luo, Yingyan Ma, Chenshu Li, Yingcheng Xiao, Yimeng Zhang, Senlin Lin, Yingnan Jia
<p><strong>Background: </strong>It is unclear how exposure to different types of screen content is associated with early development among preschool children.</p><p><strong>Objective: </strong>This study aims to precisely evaluate the screen exposure time across different content types and to explore the associations with the Ages and Stages Questionnaire, Third Edition (ASQ-3) score and 5 capacity domains in children aged 34.5-66 months.</p><p><strong>Methods: </strong>This monitoring study used intelligent technology to collect data on the 7-day screen time and the time spent viewing each content type. The participants were 2 groups of Shanghai kindergarten kids. The data were collected between March 2023 and July 2023. Screen exposure data (total daily time and time for each type of content) were collected from children aged between 34.5 and 66 months. A self-designed questionnaire and the Healthy Screen Viewing for Children intelligent technology app were used to assess screen exposure to all media and tablets. The ASQ-3 was used to assess early development in children aged 34.5-66 months.</p><p><strong>Results: </strong>In the 535-child sample, the results of linear regression analysis indicated that both screen time of more than 60 minutes and exposure to smartphones and tablets were negatively associated with ASQ-3 score. Among 365 participants with data collected by the Healthy Screen Viewing for Children app, median regression showed that the median total ASQ-3 score was negatively associated with screen time for noneducational content (β=-.055; 95% CI -0.148 to -0.006; P=.03), screen time for both educational and noneducational content (β=-.042; 95% CI -0.081 to -0.007; P=.001), and fast-paced content (β=-.034; 95% CI -0.062 to -0.011; P=.049). The median gross motor score was negatively associated with screen time for parental guidance-13-rated content (β=-.015; 95% CI -0.022 to 0.009; P=.03), educational and noneducational content (β=-.018, 95% CI -0.038 to -0.001; P=.02), static content (β=-.022; 95% CI -0.050 to 0.007; P=.02). This study also revealed that the median fine motor score was negatively associated with screen time for guidance-rated content (β=-.032, 95% CI -0.057 to -0.003; P=.006), parental guidance (PG) rated content (β=-.020; 95% CI -0.036 to -0.007; P=.004), noneducational content (β=-.026; 95% CI -0.067 to -0.003; P=.01), both educational and noneducational content (β=-.020; 95% CI -0.034 to -0.001; P<.001), fast-paced content (β=-.022; 95% CI -0.033 to -0.014; P<.001), static content (β=-.034; 95% CI -0.050 to 0.018; P<.001), animated content (β=-.038; 95% CI -0.069 to -0.001; P=.004), and screen use during the daytime (β=-.026; 95% CI -0.043 to 0.005; P=.005).</p><p><strong>Conclusions: </strong>The results indicated that the time spent viewing noneducational, static, fast-paced, and animated content was negatively associated with early development among preschool children. Limiting screen time in relevant aspect
{"title":"Association of Screen Content With Early Development Among Preschoolers in Shanghai: 7-Day Monitoring Study With Auto Intelligent Technology.","authors":"Hao Chen, Yi Sun, Sha Luo, Yingyan Ma, Chenshu Li, Yingcheng Xiao, Yimeng Zhang, Senlin Lin, Yingnan Jia","doi":"10.2196/65343","DOIUrl":"https://doi.org/10.2196/65343","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;It is unclear how exposure to different types of screen content is associated with early development among preschool children.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to precisely evaluate the screen exposure time across different content types and to explore the associations with the Ages and Stages Questionnaire, Third Edition (ASQ-3) score and 5 capacity domains in children aged 34.5-66 months.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;This monitoring study used intelligent technology to collect data on the 7-day screen time and the time spent viewing each content type. The participants were 2 groups of Shanghai kindergarten kids. The data were collected between March 2023 and July 2023. Screen exposure data (total daily time and time for each type of content) were collected from children aged between 34.5 and 66 months. A self-designed questionnaire and the Healthy Screen Viewing for Children intelligent technology app were used to assess screen exposure to all media and tablets. The ASQ-3 was used to assess early development in children aged 34.5-66 months.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;In the 535-child sample, the results of linear regression analysis indicated that both screen time of more than 60 minutes and exposure to smartphones and tablets were negatively associated with ASQ-3 score. Among 365 participants with data collected by the Healthy Screen Viewing for Children app, median regression showed that the median total ASQ-3 score was negatively associated with screen time for noneducational content (β=-.055; 95% CI -0.148 to -0.006; P=.03), screen time for both educational and noneducational content (β=-.042; 95% CI -0.081 to -0.007; P=.001), and fast-paced content (β=-.034; 95% CI -0.062 to -0.011; P=.049). The median gross motor score was negatively associated with screen time for parental guidance-13-rated content (β=-.015; 95% CI -0.022 to 0.009; P=.03), educational and noneducational content (β=-.018, 95% CI -0.038 to -0.001; P=.02), static content (β=-.022; 95% CI -0.050 to 0.007; P=.02). This study also revealed that the median fine motor score was negatively associated with screen time for guidance-rated content (β=-.032, 95% CI -0.057 to -0.003; P=.006), parental guidance (PG) rated content (β=-.020; 95% CI -0.036 to -0.007; P=.004), noneducational content (β=-.026; 95% CI -0.067 to -0.003; P=.01), both educational and noneducational content (β=-.020; 95% CI -0.034 to -0.001; P&lt;.001), fast-paced content (β=-.022; 95% CI -0.033 to -0.014; P&lt;.001), static content (β=-.034; 95% CI -0.050 to 0.018; P&lt;.001), animated content (β=-.038; 95% CI -0.069 to -0.001; P=.004), and screen use during the daytime (β=-.026; 95% CI -0.043 to 0.005; P=.005).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;The results indicated that the time spent viewing noneducational, static, fast-paced, and animated content was negatively associated with early development among preschool children. Limiting screen time in relevant aspect","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e65343"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/66821
Samer El Kababji, Nicholas Mitsakakis, Elizabeth Jonker, Ana-Alicia Beltran-Bless, Gregory Pond, Lisa Vandermeer, Dhenuka Radhakrishnan, Lucy Mosquera, Alexander Paterson, Lois Shepherd, Bingshu Chen, William Barlow, Julie Gralow, Marie-France Savard, Christian Fesl, Dominik Hlauschek, Marija Balic, Gabriel Rinnerthaler, Richard Greil, Michael Gnant, Mark Clemons, Khaled El Emam
<p><strong>Background: </strong>Insufficient patient accrual is a major challenge in clinical trials and can result in underpowered studies, as well as exposing study participants to toxicity and additional costs, with limited scientific benefit. Real-world data can provide external controls, but insufficient accrual affects all arms of a study, not just controls. Studies that used generative models to simulate more patients were limited in the accrual scenarios considered, replicability criteria, number of generative models, and number of clinical trials evaluated.</p><p><strong>Objective: </strong>This study aimed to perform a comprehensive evaluation on the extent generative models can be used to simulate additional patients to compensate for insufficient accrual in clinical trials.</p><p><strong>Methods: </strong>We performed a retrospective analysis using 10 datasets from 9 fully accrued, completed, and published cancer trials. For each trial, we removed the latest recruited patients (from 10% to 50%), trained a generative model on the remaining patients, and simulated additional patients to replace the removed ones using the generative model to augment the available data. We then replicated the published analysis on this augmented dataset to determine if the findings remained the same. Four different generative models were evaluated: sequential synthesis with decision trees, Bayesian network, generative adversarial network, and a variational autoencoder. These generative models were compared to sampling with replacement (ie, bootstrap) as a simple alternative. Replication of the published analyses used 4 metrics: decision agreement, estimate agreement, standardized difference, and CI overlap.</p><p><strong>Results: </strong>Sequential synthesis performed well on the 4 replication metrics for the removal of up to 40% of the last recruited patients (decision agreement: 88% to 100% across datasets, estimate agreement: 100%, cannot reject standardized difference null hypothesis: 100%, and CI overlap: 0.8-0.92). Sampling with replacement was the next most effective approach, with decision agreement varying from 78% to 89% across all datasets. There was no evidence of a monotonic relationship in the estimated effect size with recruitment order across these studies. This suggests that patients recruited earlier in a trial were not systematically different than those recruited later, at least partially explaining why generative models trained on early data can effectively simulate patients recruited later in a trial. The fidelity of the generated data relative to the training data on the Hellinger distance was high in all cases.</p><p><strong>Conclusions: </strong>For an oncology study with insufficient accrual with as few as 60% of target recruitment, sequential synthesis can enable the simulation of the full dataset had the study continued accruing patients and can be an alternative to drawing conclusions from an underpowered study. These results
{"title":"Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study.","authors":"Samer El Kababji, Nicholas Mitsakakis, Elizabeth Jonker, Ana-Alicia Beltran-Bless, Gregory Pond, Lisa Vandermeer, Dhenuka Radhakrishnan, Lucy Mosquera, Alexander Paterson, Lois Shepherd, Bingshu Chen, William Barlow, Julie Gralow, Marie-France Savard, Christian Fesl, Dominik Hlauschek, Marija Balic, Gabriel Rinnerthaler, Richard Greil, Michael Gnant, Mark Clemons, Khaled El Emam","doi":"10.2196/66821","DOIUrl":"https://doi.org/10.2196/66821","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Insufficient patient accrual is a major challenge in clinical trials and can result in underpowered studies, as well as exposing study participants to toxicity and additional costs, with limited scientific benefit. Real-world data can provide external controls, but insufficient accrual affects all arms of a study, not just controls. Studies that used generative models to simulate more patients were limited in the accrual scenarios considered, replicability criteria, number of generative models, and number of clinical trials evaluated.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to perform a comprehensive evaluation on the extent generative models can be used to simulate additional patients to compensate for insufficient accrual in clinical trials.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We performed a retrospective analysis using 10 datasets from 9 fully accrued, completed, and published cancer trials. For each trial, we removed the latest recruited patients (from 10% to 50%), trained a generative model on the remaining patients, and simulated additional patients to replace the removed ones using the generative model to augment the available data. We then replicated the published analysis on this augmented dataset to determine if the findings remained the same. Four different generative models were evaluated: sequential synthesis with decision trees, Bayesian network, generative adversarial network, and a variational autoencoder. These generative models were compared to sampling with replacement (ie, bootstrap) as a simple alternative. Replication of the published analyses used 4 metrics: decision agreement, estimate agreement, standardized difference, and CI overlap.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Sequential synthesis performed well on the 4 replication metrics for the removal of up to 40% of the last recruited patients (decision agreement: 88% to 100% across datasets, estimate agreement: 100%, cannot reject standardized difference null hypothesis: 100%, and CI overlap: 0.8-0.92). Sampling with replacement was the next most effective approach, with decision agreement varying from 78% to 89% across all datasets. There was no evidence of a monotonic relationship in the estimated effect size with recruitment order across these studies. This suggests that patients recruited earlier in a trial were not systematically different than those recruited later, at least partially explaining why generative models trained on early data can effectively simulate patients recruited later in a trial. The fidelity of the generated data relative to the training data on the Hellinger distance was high in all cases.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;For an oncology study with insufficient accrual with as few as 60% of target recruitment, sequential synthesis can enable the simulation of the full dataset had the study continued accruing patients and can be an alternative to drawing conclusions from an underpowered study. These results","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e66821"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing Public Sentiments and Drug Interactions in the COVID-19 Pandemic Using Social Media: Natural Language Processing and Network Analysis.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/63755
Wanxin Li, Yining Hua, Peilin Zhou, Li Zhou, Xin Xu, Jie Yang

Background: While the COVID-19 pandemic has induced massive discussion of available medications on social media, traditional studies focused only on limited aspects, such as public opinions, and endured reporting biases, inefficiency, and long collection times.

Objective: Harnessing drug-related data posted on social media in real-time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study aimed to develop a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19-related drugs.

Methods: This study constructed a full pipeline for COVID-19-related drug tweet analysis, using pretrained language model-based NLP techniques as the backbone. This pipeline is architecturally composed of 4 core modules: named entity recognition and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names for time trend analysis, target sentiment analysis to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to dig potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to the COVID-19 pandemic and drug therapies between February 1, 2020, and April 30, 2022.

Results: From a dataset comprising 169,659,956 COVID-19-related tweets from 103,682,686 users, our named entity recognition model identified 2,124,757 relevant tweets sourced from 1,800,372 unique users, and the top 5 most-discussed drugs: ivermectin, hydroxychloroquine, remdesivir, zinc, and vitamin D. Time trend analysis revealed that the public focused mostly on repurposed drugs (ie, hydroxychloroquine and ivermectin), and least on remdesivir, the only officially approved drug among the 5. Sentiment analysis of the top 5 most-discussed drugs revealed that public perception was predominantly shaped by celebrity endorsements, media hot spots, and governmental directives rather than empirical evidence of drug efficacy. Topic analysis obtained 15 general topics of overall drug-related tweets, with "clinical treatment effects of drugs" and "physical symptoms" emerging as the most frequently discussed topics. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use.

Conclusions: This study shows that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media-based public health analytics.

{"title":"Characterizing Public Sentiments and Drug Interactions in the COVID-19 Pandemic Using Social Media: Natural Language Processing and Network Analysis.","authors":"Wanxin Li, Yining Hua, Peilin Zhou, Li Zhou, Xin Xu, Jie Yang","doi":"10.2196/63755","DOIUrl":"https://doi.org/10.2196/63755","url":null,"abstract":"<p><strong>Background: </strong>While the COVID-19 pandemic has induced massive discussion of available medications on social media, traditional studies focused only on limited aspects, such as public opinions, and endured reporting biases, inefficiency, and long collection times.</p><p><strong>Objective: </strong>Harnessing drug-related data posted on social media in real-time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study aimed to develop a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19-related drugs.</p><p><strong>Methods: </strong>This study constructed a full pipeline for COVID-19-related drug tweet analysis, using pretrained language model-based NLP techniques as the backbone. This pipeline is architecturally composed of 4 core modules: named entity recognition and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names for time trend analysis, target sentiment analysis to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to dig potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to the COVID-19 pandemic and drug therapies between February 1, 2020, and April 30, 2022.</p><p><strong>Results: </strong>From a dataset comprising 169,659,956 COVID-19-related tweets from 103,682,686 users, our named entity recognition model identified 2,124,757 relevant tweets sourced from 1,800,372 unique users, and the top 5 most-discussed drugs: ivermectin, hydroxychloroquine, remdesivir, zinc, and vitamin D. Time trend analysis revealed that the public focused mostly on repurposed drugs (ie, hydroxychloroquine and ivermectin), and least on remdesivir, the only officially approved drug among the 5. Sentiment analysis of the top 5 most-discussed drugs revealed that public perception was predominantly shaped by celebrity endorsements, media hot spots, and governmental directives rather than empirical evidence of drug efficacy. Topic analysis obtained 15 general topics of overall drug-related tweets, with \"clinical treatment effects of drugs\" and \"physical symptoms\" emerging as the most frequently discussed topics. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use.</p><p><strong>Conclusions: </strong>This study shows that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media-based public health analytics.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e63755"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Psychological Trends in Populations With Chronic Obstructive Pulmonary Disease During COVID-19 and Beyond: Large-Scale Longitudinal Twitter Mining Study.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/54543
Chunyan Zhang, Ting Wang, Caixia Dong, Duwei Dai, Linyun Zhou, Zongfang Li, Songhua Xu

Background: Chronic obstructive pulmonary disease (COPD) ranks among the leading causes of global mortality, and COVID-19 has intensified its challenges. Beyond the evident physical effects, the long-term psychological effects of COVID-19 are not fully understood.

Objective: This study aims to unveil the long-term psychological trends and patterns in populations with COPD throughout the COVID-19 pandemic and beyond via large-scale Twitter mining.

Methods: A 2-stage deep learning framework was designed in this study. The first stage involved a data retrieval procedure to identify COPD and non-COPD users and to collect their daily tweets. In the second stage, a data mining procedure leveraged various deep learning algorithms to extract demographic characteristics, hashtags, topics, and sentiments from the collected tweets. Based on these data, multiple analytical methods, namely, odds ratio (OR), difference-in-difference, and emotion pattern methods, were used to examine the psychological effects.

Results: A cohort of 15,347 COPD users was identified from the data that we collected in the Twitter database, comprising over 2.5 billion tweets, spanning from January 2020 to June 2023. The attentiveness toward COPD was significantly affected by gender, age, and occupation; it was lower in females (OR 0.91, 95% CI 0.87-0.94; P<.001) than in males, higher in adults aged 40 years and older (OR 7.23, 95% CI 6.95-7.52; P<.001) than in those younger than 40 years, and higher in individuals with lower socioeconomic status (OR 1.66, 95% CI 1.60-1.72; P<.001) than in those with higher socioeconomic status. Across the study duration, COPD users showed decreasing concerns for COVID-19 and increasing health-related concerns. After the middle phase of COVID-19 (July 2021), a distinct decrease in sentiments among COPD users contrasted sharply with the upward trend among non-COPD users. Notably, in the post-COVID era (June 2023), COPD users showed reduced levels of joy and trust and increased levels of fear compared to their levels of joy and trust in the middle phase of COVID-19. Moreover, males, older adults, and individuals with lower socioeconomic status showed heightened fear compared to their counterparts.

Conclusions: Our data analysis results suggest that populations with COPD experienced heightened mental stress in the post-COVID era. This underscores the importance of developing tailored interventions and support systems that account for diverse population characteristics.

{"title":"Exploring Psychological Trends in Populations With Chronic Obstructive Pulmonary Disease During COVID-19 and Beyond: Large-Scale Longitudinal Twitter Mining Study.","authors":"Chunyan Zhang, Ting Wang, Caixia Dong, Duwei Dai, Linyun Zhou, Zongfang Li, Songhua Xu","doi":"10.2196/54543","DOIUrl":"https://doi.org/10.2196/54543","url":null,"abstract":"<p><strong>Background: </strong>Chronic obstructive pulmonary disease (COPD) ranks among the leading causes of global mortality, and COVID-19 has intensified its challenges. Beyond the evident physical effects, the long-term psychological effects of COVID-19 are not fully understood.</p><p><strong>Objective: </strong>This study aims to unveil the long-term psychological trends and patterns in populations with COPD throughout the COVID-19 pandemic and beyond via large-scale Twitter mining.</p><p><strong>Methods: </strong>A 2-stage deep learning framework was designed in this study. The first stage involved a data retrieval procedure to identify COPD and non-COPD users and to collect their daily tweets. In the second stage, a data mining procedure leveraged various deep learning algorithms to extract demographic characteristics, hashtags, topics, and sentiments from the collected tweets. Based on these data, multiple analytical methods, namely, odds ratio (OR), difference-in-difference, and emotion pattern methods, were used to examine the psychological effects.</p><p><strong>Results: </strong>A cohort of 15,347 COPD users was identified from the data that we collected in the Twitter database, comprising over 2.5 billion tweets, spanning from January 2020 to June 2023. The attentiveness toward COPD was significantly affected by gender, age, and occupation; it was lower in females (OR 0.91, 95% CI 0.87-0.94; P<.001) than in males, higher in adults aged 40 years and older (OR 7.23, 95% CI 6.95-7.52; P<.001) than in those younger than 40 years, and higher in individuals with lower socioeconomic status (OR 1.66, 95% CI 1.60-1.72; P<.001) than in those with higher socioeconomic status. Across the study duration, COPD users showed decreasing concerns for COVID-19 and increasing health-related concerns. After the middle phase of COVID-19 (July 2021), a distinct decrease in sentiments among COPD users contrasted sharply with the upward trend among non-COPD users. Notably, in the post-COVID era (June 2023), COPD users showed reduced levels of joy and trust and increased levels of fear compared to their levels of joy and trust in the middle phase of COVID-19. Moreover, males, older adults, and individuals with lower socioeconomic status showed heightened fear compared to their counterparts.</p><p><strong>Conclusions: </strong>Our data analysis results suggest that populations with COPD experienced heightened mental stress in the post-COVID era. This underscores the importance of developing tailored interventions and support systems that account for diverse population characteristics.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e54543"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study. 大型语言模型在评估自杀意念适当应对措施方面的能力:比较研究。
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/67891
Ryan K McBain, Jonathan H Cantor, Li Ang Zhang, Olesya Baker, Fang Zhang, Alyssa Halbisen, Aaron Kofner, Joshua Breslau, Bradley Stein, Ateev Mehrotra, Hao Yu

Background: With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support.

Objective: The objective of this study was to assess the competency of 3 widely used LLMs to distinguish appropriate versus inappropriate responses when engaging individuals who exhibit suicidal ideation.

Methods: This observational, cross-sectional study evaluated responses to the revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Data collection and analyses were conducted in July 2024. A common training module for mental health professionals, SIRI-2 provides 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by two clinician responses. Clinician responses were scored from -3 (highly inappropriate) to +3 (highly appropriate). All 3 LLMs were provided with a standardized set of instructions to rate clinician responses. We compared LLM responses to those of expert suicidologists, conducting linear regression analyses and converting LLM responses to z scores to identify outliers (z score>1.96 or <-1.96; P<0.05). Furthermore, we compared final SIRI-2 scores to those produced by health professionals in prior studies.

Results: All 3 LLMs rated responses as more appropriate than ratings provided by expert suicidologists. The item-level mean difference was 0.86 for ChatGPT (95% CI 0.61-1.12; P<.001), 0.61 for Claude (95% CI 0.41-0.81; P<.001), and 0.73 for Gemini (95% CI 0.35-1.11; P<.001). In terms of z scores, 19% (9 of 48) of ChatGPT responses were outliers when compared to expert suicidologists. Similarly, 11% (5 of 48) of Claude responses were outliers compared to expert suicidologists. Additionally, 36% (17 of 48) of Gemini responses were outliers compared to expert suicidologists. ChatGPT produced a final SIRI-2 score of 45.7, roughly equivalent to master's level counselors in prior studies. Claude produced an SIRI-2 score of 36.7, exceeding prior performance of mental health professionals after suicide intervention skills training. Gemini produced a final SIRI-2 score of 54.5, equivalent to untrained K-12 school staff.

Conclusions: Current versions of 3 major LLMs demonstrated an upward bias in their evaluations of appropriate responses to suicidal ideation; however, 2 of the 3 models performed equivalent to or exceeded the performance of mental health professionals.

{"title":"Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study.","authors":"Ryan K McBain, Jonathan H Cantor, Li Ang Zhang, Olesya Baker, Fang Zhang, Alyssa Halbisen, Aaron Kofner, Joshua Breslau, Bradley Stein, Ateev Mehrotra, Hao Yu","doi":"10.2196/67891","DOIUrl":"https://doi.org/10.2196/67891","url":null,"abstract":"<p><strong>Background: </strong>With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support.</p><p><strong>Objective: </strong>The objective of this study was to assess the competency of 3 widely used LLMs to distinguish appropriate versus inappropriate responses when engaging individuals who exhibit suicidal ideation.</p><p><strong>Methods: </strong>This observational, cross-sectional study evaluated responses to the revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Data collection and analyses were conducted in July 2024. A common training module for mental health professionals, SIRI-2 provides 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by two clinician responses. Clinician responses were scored from -3 (highly inappropriate) to +3 (highly appropriate). All 3 LLMs were provided with a standardized set of instructions to rate clinician responses. We compared LLM responses to those of expert suicidologists, conducting linear regression analyses and converting LLM responses to z scores to identify outliers (z score>1.96 or <-1.96; P<0.05). Furthermore, we compared final SIRI-2 scores to those produced by health professionals in prior studies.</p><p><strong>Results: </strong>All 3 LLMs rated responses as more appropriate than ratings provided by expert suicidologists. The item-level mean difference was 0.86 for ChatGPT (95% CI 0.61-1.12; P<.001), 0.61 for Claude (95% CI 0.41-0.81; P<.001), and 0.73 for Gemini (95% CI 0.35-1.11; P<.001). In terms of z scores, 19% (9 of 48) of ChatGPT responses were outliers when compared to expert suicidologists. Similarly, 11% (5 of 48) of Claude responses were outliers compared to expert suicidologists. Additionally, 36% (17 of 48) of Gemini responses were outliers compared to expert suicidologists. ChatGPT produced a final SIRI-2 score of 45.7, roughly equivalent to master's level counselors in prior studies. Claude produced an SIRI-2 score of 36.7, exceeding prior performance of mental health professionals after suicide intervention skills training. Gemini produced a final SIRI-2 score of 54.5, equivalent to untrained K-12 school staff.</p><p><strong>Conclusions: </strong>Current versions of 3 major LLMs demonstrated an upward bias in their evaluations of appropriate responses to suicidal ideation; however, 2 of the 3 models performed equivalent to or exceeded the performance of mental health professionals.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e67891"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Language Processing Technologies for Public Health in Africa: Scoping Review.
IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-05 DOI: 10.2196/68720
Songbo Hu, Abigail Oppong, Ebele Mogo, Charlotte Collins, Giulia Occhini, Anna Barford, Anna Korhonen
<p><strong>Background: </strong>Natural language processing (NLP) has the potential to promote public health. However, applying these technologies in African health systems faces challenges, including limited digital and computational resources to support the continent's diverse languages and needs.</p><p><strong>Objective: </strong>This scoping review maps the evidence on NLP technologies for public health in Africa, addressing the following research questions: (1) What public health needs are being addressed by NLP technologies in Africa, and what unmet needs remain? (2) What factors influence the availability of public health NLP technologies across African countries and languages? (3) What stages of deployment have these technologies reached, and to what extent have they been integrated into health systems? (4) What measurable impact has these technologies had on public health outcomes, where such data are available? (5) What recommendations have been proposed to enhance the quality, cost, and accessibility of health-related NLP technologies in Africa?</p><p><strong>Methods: </strong>This scoping review includes academic studies published between January 1, 2013, and October 3, 2024. A systematic search was conducted across databases, including MEDLINE via PubMed, ACL Anthology, Scopus, IEEE Xplore, and ACM Digital Library, supplemented by gray literature searches. Data were extracted and the NLP technology functions were mapped to the World Health Organization's list of essential public health functions and the United Nations' sustainable development goals (SDGs). The extracted data were analyzed to identify trends, gaps, and areas for future research. This scoping review follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) reporting guidelines, and its protocol is publicly available.</p><p><strong>Results: </strong>Of 2186 citations screened, 54 studies were included. While existing NLP technologies support a subset of essential public health functions and SDGs, language coverage remains uneven, with limited support for widely spoken African languages, such as Kiswahili, Yoruba, Igbo, and Zulu, and no support for most of Africa's >2000 languages. Most technologies are in prototyping phases, with only one fully deployed chatbot addressing vaccine hesitancy. Evidence of measurable impact is limited, with 15% (8/54) studies attempting health-related evaluations and 4% (2/54) demonstrating positive public health outcomes, including improved participants' mood and increased vaccine intentions. Recommendations include expanding language coverage, targeting local health needs, enhancing trust, integrating solutions into health systems, and adopting participatory design approaches. The gray literature reveals industry- and nongovernmental organizations-led projects focused on deployable NLP applications. However, these projects tend to support only a few major languages and specif
{"title":"Natural Language Processing Technologies for Public Health in Africa: Scoping Review.","authors":"Songbo Hu, Abigail Oppong, Ebele Mogo, Charlotte Collins, Giulia Occhini, Anna Barford, Anna Korhonen","doi":"10.2196/68720","DOIUrl":"https://doi.org/10.2196/68720","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Natural language processing (NLP) has the potential to promote public health. However, applying these technologies in African health systems faces challenges, including limited digital and computational resources to support the continent's diverse languages and needs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This scoping review maps the evidence on NLP technologies for public health in Africa, addressing the following research questions: (1) What public health needs are being addressed by NLP technologies in Africa, and what unmet needs remain? (2) What factors influence the availability of public health NLP technologies across African countries and languages? (3) What stages of deployment have these technologies reached, and to what extent have they been integrated into health systems? (4) What measurable impact has these technologies had on public health outcomes, where such data are available? (5) What recommendations have been proposed to enhance the quality, cost, and accessibility of health-related NLP technologies in Africa?&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;This scoping review includes academic studies published between January 1, 2013, and October 3, 2024. A systematic search was conducted across databases, including MEDLINE via PubMed, ACL Anthology, Scopus, IEEE Xplore, and ACM Digital Library, supplemented by gray literature searches. Data were extracted and the NLP technology functions were mapped to the World Health Organization's list of essential public health functions and the United Nations' sustainable development goals (SDGs). The extracted data were analyzed to identify trends, gaps, and areas for future research. This scoping review follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) reporting guidelines, and its protocol is publicly available.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Of 2186 citations screened, 54 studies were included. While existing NLP technologies support a subset of essential public health functions and SDGs, language coverage remains uneven, with limited support for widely spoken African languages, such as Kiswahili, Yoruba, Igbo, and Zulu, and no support for most of Africa's &gt;2000 languages. Most technologies are in prototyping phases, with only one fully deployed chatbot addressing vaccine hesitancy. Evidence of measurable impact is limited, with 15% (8/54) studies attempting health-related evaluations and 4% (2/54) demonstrating positive public health outcomes, including improved participants' mood and increased vaccine intentions. Recommendations include expanding language coverage, targeting local health needs, enhancing trust, integrating solutions into health systems, and adopting participatory design approaches. The gray literature reveals industry- and nongovernmental organizations-led projects focused on deployable NLP applications. However, these projects tend to support only a few major languages and specif","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e68720"},"PeriodicalIF":5.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Medical Internet Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1